Azure Data Factory an Universal pipe
Use Cases and Deployment Scope
We live in a world where half of the data for analytics come from SAP and half from non SAP sources. We use Azure Data Factory to load non SAP data from different source systems into Azure lake house. The project follows medallion architecture where Azure Data Factory takes data from multiple sources and stores them in the bronze layer of the medallion architecture. Since our SAP datasphere has limitations connecting to non SAP sources as good and native like Azure Data Factory, we use Azure Data Factory for these scenarios. Further modelling of data in the next layers (silver layer and gold layer) is done using Azure Data Bricks, where the final data product is created. The Azure Data Factory also helps in applying transformations which loading the data from different source systems. Datasphere often relies on ODBC/JDBC/OData connectivity, whereas Azure Data Factory provides maintenance-free connectors for our web applications, like partner portal, cloud applications like one crm, on-prem Oracle systems, and also to NoSQL dbs like MongoDB. To summarize Azure Data Factory is used in our organisation to ingest non SAP data from different sources into our Bronze layer for the Databricks to further clean and curate the data for data product creation. Without Azure data factory connecting the data from different source wouldnt been possible because SAP Datasphere has limitations when it comes to connection with non SAP source systems
Pros
- Connectivity with other cloud environment like Salesforce
- Connectivity with non structured data and big data systems
- Reduces data islands
- Azure Data Factory handles perfectly the huge volume of data in JSON format from our global apps and services
Cons
- The error details where there is an error while processing the files is not clear
- Connectivity with s4 system is not so good compared to Datasphere
- Since Azure Data Factory just transfers data it lacks the capacity to identify the wrongness in the data. It is just a dumb data transfer tool from point A to B
Likelihood to Recommend
Best scenario is for ETL process. The flexibility and connectivity is outstanding. For our environment, SAP data connectivity with Azure Data Factory offers very limited features compared to SAP Data Sphere. Due to the limited modelling capacity of the tool, we use Databricks for data modelling and cleaning. Usage of multiple tools could have been avoided if adf has modelling capabilities.
