Posts

Showing posts from February, 2023

Performing Data Analysis & Visualization (Part 2 of 3)

Image
  To read part 1, please click  here To read part 3, please click  here Exploring & Analyzing Tabular Datasets Tabular Datasets helps us in using the full spectrum of mathematical and statistical functions to analyze as well as transform our dataset. However, generally we don't have that kind of time or resources to randomly run every dataset through all the possible techniques in our arsenal. Hence, in order to get a good understanding of a dataset, we can start by checking at the following aspects of every feature and target vector in the dataset: Data Type- Let us know if the content of the vector continuous, ordinal, nominal, or a text string, whether they are stored in the correct programmatic data type, or if it requires a data tyoe conversion. Missing Data- Are there any missing criteria? How do we handle them? Inconsistent Data- Are date and time stored in different ways? Are the same categories written in Different ways? Are there any different categories with the same

Performing Data Analysis & Visualization (Part 1 of 3)

Image
  To read part 2, please click  here To read part 3,please click  here Technical Requirements The following Python libraries and versions will be used to perform data pre-processing as well as high-dimensional visualizations: azurem1-sdk 1 . 34 . 0 azurem1-widgets 1 . 34 . 0 azurem1-dataprep 2 . 20 . 0 pandas 1 . 3 . 2 numpy 1 . 19 . 5 scikit-learn 0 . 24 . 2 seaborn 0 . 11 . 2 plotly 5 . 3 . 1 umap_learn 0 . 5 . 1 statsmodels 0 . 13 . 0 missingno 0 . 5 . 0 Understanding Data Exploration Techniques Data Exploration is an important analytical step to understand that whether your data is at the very least informative enough to build an ML model, and the possible tasks we will perform are all related to the the different type of datasets (where we can save our data) given below: TabularDataset- This class provides methods for performing basic transformations on tabular data and converting them into known formats, like pandas (https://docs.microsoft.com/en-us/python/api/azureml-core/azurem

Using Datasets in Azure Machine Learning

Image
  Creating New Datasets Although there are multiple ways to create new datasets, they are mainly differentiated between tabular and file datasets having different constructors, according to the type of dataset you would like to create: Dataset.Tabular.from_* for tabular datasets Dataset.file.from_* for file-based datasets  Tabular dataset can also be further divided either into a Direct dataset where the data is being accessed from the original location via a public URL; or stored on either the default or a custom datastore. A Dataset object can be accessed or passed around in the current environment through its object reference, but, it can also be registered as well as accessed through the dataset name called a registered dataset. Exploring Data in Datasets There are many ways to explore the registered datasets in Azure ML. In tabular ones, a dataset can be loaded and analyzed programmatically in an Azure Machine Learning workspace and after having a reference to the dataset, it can

Ingesting Data Into Azure (Part 2 of 2)

Image
  To read part 1, please click  here Understanding Tooling for Automated Ingestion & Transformation of Data There are some services that can help us to automatically transform as well as move data and can also integrate easily with the pipelines and MLOps in Azure Machine Learning. Azure Data Factory It is an enterprise-ready solution for moving and transforming data in Azure that also allows you to connect with hundreds of different sources and create pipelines to transform the integrated data, calling multiple other services in Azure. It can help you to create pipelines, data flows, datasets, and power queries: Pipelines- They are the main attraction of Azure Data Factory. Complex pipelines can be created by calling multiple services to pull data from a source, transform it, and store it in a sink. Datasets- As they are used in pipelines as a source or a sink, you have to specify a connection to a particular data in a datastore that you want to read from or write to in the end be

Ingesting Data Into Azure (Part 1 of 2)

Image
  To read part 2, please click  here Understanding Tooling for the Manual Ingestion of Data This is the list of the options to bring data into your datastores or directly into your ML pipelines: Azure Storage Explorer- It's an interactive application that permits you to upload data to and control datastores, like storage accounts and managed disks. This one is the easiest tool for managing storage accounts and can be found here- https://azure.microsoft.com/en-us/features/storage-explorer/#overview. Azure CLI- We can do anything with the help of CLI, including the creation and uploading of blob into storage account. The proper commands to upload blobs can be found here- https://docs.microsoft.com/en-us/cli/azure/storage/blob. AzCopy- This one is also designed to copy blobs or files to a storage account and it is not much different from Azure CLI in performance. Its download link and description is here- https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10. Th

Ingesting Data & Managing Datasets (Part 2 of 2)

Image
  To read part 1, please click  here Exploring Options for Storing Training Data in Azure Database systems can be classified according to the type of data and data access into the following types: Relational Database Management Systems (RDBMSs)- They are generally used to store normalized transactional data via B-tree-based ordered indices and the joining of multiple rows with multiple columns may lead to typical queries filter, group, and aggregate. Azure can support various RDBMSs, such as Azure SQL Database, as well as Azure Database for PostgreSQL and MySQL. NoSQL- These are Key-value-based storage systems used for storing de-normalized data with hash-based or ordered indices. Typical queries can access a single record via a collection distributed according to a partition key. Azure can support various NoSQL-based services like Azure cosmos DB and Azure Table storage. Hence, both the database technologies can be used to store data for machine learning, according to your use cases.

Ingesting Data & Managing Datasets (Part 1 of 2)

Image
  To read part 2, please click  here Choosing Data Storage Solutions for Azure Machine Learning If you want to start training an ML model on remote compute targets like VM, then, you have to ensure that all the executables can access the training data efficiently; all the more for the people who want to access the data in parallel for experimentation, labeling, and training from multiple environments as well as multiple machines. In order to achieve this, we have to manage the data efficiently for different types usages in Azure. Organizing Data in Azure Machine Learning In Azure Machine Learning, data is managed as datasets and data storage as datastores. Datastore  is an abstraction of a physical data storage system that is used to link the existing storage system to an Azure Machine Learning workspace. You will have to provide the connection as well as authentication details to connect the existing storage to the workspace by creating a datastore after which the data storage can be

Understanding the workspace interior

Image
  User Roles For ML workspace, we can also state an identity to the Azure predefined base roles (Owner, Contributor, or Reader) along with the two custom roles AzureML Data Scientist and AzureML Metrics Writer. These are described as follows: Reader- Although this role allows you to look at everything, but, it cannot change any data, action, or anything that could change the state of the resource. Contributor- This one lets you to look as well as change everything except the user roles and rights on the resource. Owner- This role permits you to use any action on a specific resource. AzureML Data Scientist- This one can only create or delete compute resources or modify the workspace settings. AzureML Metrics Writer- It can only write metrics to the workspace. Experiments The main function of an ML is to find a mathematical function, which would be hard to find algorithmically, that when given specific input results in as many cases as possible in the expected output. This function is ty

Exploring the Azure Machine Learning service

Image
  Analyzing the Deployed Services There are three services that are deployed along with the main mldemows workspace, i.e., Storage account, Key vault, and Application Insights. Although, an Azure container registry will be required later on but it doesn't need to there during initial deployment of the workspace. Now, let's have a look at them: The Storage Account for an ML Workspace The storage account is generally known as the default storage account and it's the main datastore for the workspace. This one is vital for the operation of the service as it can store experiment runs, models, snapshots, and source files (such as Jupyter notebooks).  Azure Key Vault for an ML Workspace It is a cloud-managed service that can store secrets like passwords, API keys, certificates, and cryptographic keys either in a software vault or in a managed Hardware Security Module (HSM). You can easily access the key vault via a so-called managed identity, which gives the workspace (the app) it

Preparing the Azure Machine Learning Workspace (Part 2 of 2)

Image
  To read part 1, please click  here Deploying the Workspace We can easily set up our first Azure Machine Learning workspace with the help of CLI, as follows: Log in to your Azure environment via the CLI- $ az login, that will open a website with an AAD login screen, and when you return to the console, you will have some information about your AAD tenant, your subscription, as well as your user. However, if you want to check which subscription is active (for more that one subscription), you can use $ az account show --output table command.  Now, after being done with all that, you have to check the situation with the installed extension via $ az extension list command. You should also remove the old version carefully, so that you won't break a script still in use. To do this, you can use $ az extension remove -n azure-cli-ml; or $ az extension remove -n ml commands. Now, you can install the ML extension via   $ az extension add -n ml command, You can look at the help page for the e

Preparing the Azure Machine Learning Workspace (Part 1 of 2)

Image
  To read part 2, please click  here Deploying an Azure Machine Learning Workspace Firstly, we will require an Azure subscription so that you can log-in to the Azure portal with your identity and with the knowledge of the Azure subscription to which you would like to deploy your ML services. Now, if you want to use your work account, then, you will have to go to porta.azure.com in order to log-in. If it works, then, this means that your company has already set up an Azure AD instance. After this, you can talk to your Azure Global Administrator to discuss about the Azure subscription to use for your purpose. However, if you want to use your private account, then, go to azure.com and click on Free Account to create an Azure AD for yourself with a free trial subscription containing a certain amount of money to spend within 30 days on Azure services. Understanding the Available Tooling for Azure Deployments In Azure, any action that deploys or change an Azure service goes through ARM (Azur

Choosing the Right Machine Learning Service in Azure (Part 4 of 4)

Image
  To read part 1, please click  here To read part 2, please click  here To read part 3, please click  here Custom Compute Services for ML As we all know, there are always some people who would like have the highest flexibility for building custom applications via only IaaS services (the build the foundation for any other PaaS service in Azure), we can rely on the custom compute services in Azure to achieve flexible ML solutions.  Azure Databricks It's a managed service in Azure, and as the name suggests, offers the Databricks platform as a completely integrated solution, i.e., a user can easily deploy from Azure Marketplace, letting the Azure administrators to treat this one as any other Microsoft managed service on the platform. It's named after its company Databricks and was founded by the original creators of spark providing this ever-changing open source technology as a ready-made product to the customers. The platform itself is a big data analytics platform utilizing Apach

Choosing the Right Machine Learning Service in Azure (Part 3 of 4)

Image
  To read part 1, please click  here To read part 2, please click  here To read part 4, please click  here Custom ML Services Since platform services are built on the top of IaaS services containing useful abstractions and functionalities for the relevant domain, Azure makes sure to provide as many PaaS services for different specialized domains as possible. A domain called ML is also laced with various services in order to build custom ML models. Now we will discuss some of the most popular Custom ML PaaS services. Azure Machine Learning Studio (Classic) It's Azure's most widely used tool to build, train, optimize, and deploy ML models via a GUI as well as drag and drop, block-based programming model.  It offers a robust and large number of features, algorithms, and extensions through R and Python support; and comes under one of the oldest managed cloud services for ML in Azure. It also provide built-in building blocks for clustering, regression, classification, anomaly detect

Choosing the Right Machine Learning Service in Azure (Part 2 of 4)

Image
  To read part 1, please click  here To read part 3, please click  here To read part 4, please click  here Azure Cognitive Services Azure Cognitive Services are the most popular choice as they are easy to use and can be integrated with a single REST API call from within any programming language by adding ML capabilities to the existing applications. Some of the popular cognitive services are: Vision: Computer Vision and Face API Language: Text analytics and translator service Speech: Text analytics, speech-to-text, text-to-speech, and speech translation Decision: Anomaly detection and content moderation Most of the Cognitive Services APIs work similarly- Firstly, you have to deploy a specific cognitive service or a Cognitive Services multi-service account in Azure, and after that, you can easily retrieve the API endpoint and access key from the service as well as call the Cognitive Service API with your data and API key. This process will enrich an existing application with API capabil

Choosing the Right Machine Learning Service in Azure (Part 1 of 4)

Image
  To read part 2, please click  here To read part 3, please click  here To read part 4, please click  here Choosing an Azure Service for ML Azure offers vast number of services, it often makes it difficult for someone new to Azure to select the right one for a specific task. Choosing the right service with the right layer of abstraction could save you months if not years of time to market your ML-based product or feature, while the wrong service may initially allow you to start producing results quickly, but, it will also make it impossible to improve the model performance for a specific domain or extend a model for other tasks. What is the Azure Machine Learning Service? The term Azure Machine Learning Service refers to the popular Azure service that offers capabilities in building custom ML solutions and contains various components to manage resources (like compute clusters and data storage) and assets (like datasets, experiments, models, pipelines, etc.), as well as access to these

Understanding End-to-End Machine Learning Process (Part 5 of 5)

Image
  To read part 1, please click  here To read part 2, please click  here To read part 3, please click  here To read part 4, please click  here Deploying Models This step is also known as inferencing or scoring a model. You can clearly see the deployment and operation of an ML pipeline when the model is tested on live data in production, which is generally done to get deeper insight and data to improve the model continuously. So, if you collect the model's performance over time, it will guarantee the model's improvement in it. There are two main architectures for ML-scoring pipelines: Batch scoring using pipelines- It's an offline process in which you can evaluate an ML model against a batch of data. The result of this scoring is usually not time-critical, and the to be scored data is also larger than the model. Real-time scoring using a container-based web service endpoint- It's a technique in which you can score single data inputs (common in stream processing for scorin

Understanding End-to-End Machine Learning Process (Part 4 of 5)

Image
  To read part 1, please click  here To read part 2, please click  here To read part 3, please click  here To read part 5, please click  here Defining Labels & Engineering Features Now, we have to create and transform features, typically referred to as feature engineering while creating labels when missing. Labeling It is also known as annotation and although it's the least exciting part of an ML project, it's the most important one in the whole process. Labeling data requires deep insight and understanding of the context of the dataset as well as the prediction process. Proper labels will greatly help in improving the prediction performance and also helps in studying the dataset deeply. Mislabeling might lead to label noise that can affect the performance of every downstream process in the ML pipeline and it should be avoided.  Some other techniques and tooling are also available to make the labelling process faster due to the fact that ML algorithm can be used both for th