Ingesting Data Into Azure (Part 2 of 2)
To read part 1, please click here
Understanding Tooling for Automated Ingestion & Transformation of Data
Azure Data Factory
It is an enterprise-ready solution for moving and transforming data in Azure that also allows you to connect with hundreds of different sources and create pipelines to transform the integrated data, calling multiple other services in Azure. It can help you to create pipelines, data flows, datasets, and power queries:
- Pipelines- They are the main attraction of Azure Data Factory. Complex pipelines can be created by calling multiple services to pull data from a source, transform it, and store it in a sink.
- Datasets- As they are used in pipelines as a source or a sink, you have to specify a connection to a particular data in a datastore that you want to read from or write to in the end before building a pipeline.
- Data Flows- They permits you to do the actual processing or transformation of data within Data Factory itself, instead of calling a different service to do the heavy lifting.
- Power Query- It helps you to do Data exploration with DAX inside the Data Factory, which is generally possible only with Power BI or Excel otherwise.
Azure Synapse Spark Pools
We can also run notebooks from any one of the Azure Data factory or the integration engine in azure Synapse, thus having access to these services automatically. Besides, we can also add Synapse Spark Pool as a so-called Linked Service in the Azure Machine Learning workspace allowing us the access of both the ML compute targets and Spark pool as a target for computations via the Azure Machine Learning SDK; offering us another good option for building a clean end-to-end MLOps workflow.
Copying Data to Blob storage
- Firstly, download the melb_data.csv file from https://www.kaggle.com/dansbecker/melbourne-housing-snapshot, and store it in a suitable folder on your device.
- Now, you have to navigate the folder and run the following command in the CLI while replacing the storage account name with your own-
- In order to verify this, you have to install Azure Storage Explorer and then login to your Azure account in that application; after that, navigate to your storage account and open the mlfiles container. You will see your file where it should be. You can also just drag and drop your file here, creating a blob file automatically, according to your comfort.
- Finally, check the application itself. For example, when you right-click on the container, you can choose Get Shared Access Signature, opening a wizard allowing you to create a SAS token directly here, instead of using the command line.
To read part 1, please click here
Comments
Post a Comment