Performing Data Analysis & Visualization (Part 1 of 3)

By Ashwin Venugopal - February 27, 2023

To read part 2, please click here

To read part 3,please click here

Technical Requirements

The following Python libraries and versions will be used to perform data pre-processing as well as high-dimensional visualizations:

azurem1-sdk 1 . 34 . 0
azurem1-widgets 1 . 34 . 0
azurem1-dataprep 2 . 20 . 0
pandas 1 . 3 . 2
numpy 1 . 19 . 5
scikit-learn 0 . 24 . 2
seaborn 0 . 11 . 2
plotly 5 . 3 . 1
umap_learn 0 . 5 . 1
statsmodels 0 . 13 . 0
missingno 0 . 5 . 0

Understanding Data Exploration Techniques

Data Exploration is an important analytical step to understand that whether your data is at the very least informative enough to build an ML model, and the possible tasks we will perform are all related to the the different type of datasets (where we can save our data) given below:

TabularDataset- This class provides methods for performing basic transformations on tabular data and converting them into known formats, like pandas (https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset).

FileDataset- This class primarily offers filtering methods on file metadata (https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.filedataset).

Both of the above datasets can be registered to the Azure Machine Learning Dataset Registry for further use after preprocessing.

To read part 2, please click here

To read part 3,please click here

Search This Blog

Blogs by Ashwin