Ingesting Data Into Azure (Part 1 of 2)

Ingesting Data Into Azure (Part 1 of 2)

By Ashwin Venugopal - February 23, 2023

To read part 2, please click here

Understanding Tooling for the Manual Ingestion of Data

This is the list of the options to bring data into your datastores or directly into your ML pipelines:

Azure Storage Explorer- It's an interactive application that permits you to upload data to and control datastores, like storage accounts and managed disks. This one is the easiest tool for managing storage accounts and can be found here- https://azure.microsoft.com/en-us/features/storage-explorer/#overview.

Azure CLI- We can do anything with the help of CLI, including the creation and uploading of blob into storage account. The proper commands to upload blobs can be found here- https://docs.microsoft.com/en-us/cli/azure/storage/blob.

AzCopy- This one is also designed to copy blobs or files to a storage account and it is not much different from Azure CLI in performance. Its download link and description is here- https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10.

The Azure Portal- There is a web interface present directly in the Azure portal to upload or change data and if you navigate the storage account, the inbuilt storage browser can be used to upload blobs and files directly via the web interface which is also true for any of the database technologies.

RDBMS Management Tooling- Any typical management tool can be used to configure, create, and change tables as well as schemas in a relational database. For an SQL database and Synapse, it is SQL Server Management Studio, for PostgreSQL, it's pgAdmin, and for MySQL, it would be MySQL Workbench.

Azure Data Studio- It can help you to connect with any Microsoft SQL database, Synapse, PostgreSQL database in Azure, and Azure Data Explorer. It's a multiplication tool same as the typical management tooling and can be described here- https://docs.microsoft.com/en-us/sql/azure-data-studio/download-azure-data-studio?view=sql-server-ver15.

Azure Machine Learning Designer (Import Data)- You can also use the Import Data component present in the Machine Learning designer to add data ad hoc to your pipeline, instead of Azure Machine Learning datastore. It's information can be found here- https://docs.microsoft.com/en-us/azure/machine-learning/component-reference/import-data.

To read part 2, please click here

Comments