Choosing the Right Machine Learning Service in Azure (Part 4 of 4)

 




To read part 1, please click here
To read part 2, please click here
To read part 3, please click here







Custom Compute Services for ML

As we all know, there are always some people who would like have the highest flexibility for building custom applications via only IaaS services (the build the foundation for any other PaaS service in Azure), we can rely on the custom compute services in Azure to achieve flexible ML solutions. 

Azure Databricks

It's a managed service in Azure, and as the name suggests, offers the Databricks platform as a completely integrated solution, i.e., a user can easily deploy from Azure Marketplace, letting the Azure administrators to treat this one as any other Microsoft managed service on the platform.

It's named after its company Databricks and was founded by the original creators of spark providing this ever-changing open source technology as a ready-made product to the customers. The platform itself is a big data analytics platform utilizing Apache Spark. Let's look into this:

Distributed Computed using Apache Spark

It's a distributed in-memory analytical engine, based on Apache Hadoop framework distributing a graph of computations to the cluster's worker nodes, that are in turn controlled as well as orchestrated by a primary node that always checks the scheduling, resource availability, and wiring up of data streams. 

As a storage system, it can offers various options ranging from standard local storage and the Hadoop Distributed File System (HDFS) to Azure Data Lake and Amazon S3 storage, while also providing direct access to RDBMS and the documents from NoSQL systems.

Finally, if you want to define and dispatch jobs as well as the computational graphs, you can make use of different programming languages like Scala, Python, and R, executed via Apache Spark. It also provides a few built-in libraries (to allow both data access and manipulation via Spark SQL) as well as distributed computations (via Spark streaming, MLlib, and GraphX).

Although Azure Databricks is considered as a good choice while migrating on-premises Spark-based services to Azure (or building big data analytics, transformation, or recommendation services), its complexity and premium price makes it a poor choice (generally) for ML projects.

Azure Batch

It's a very mature and flexible batch-processing and scheduling framework used to run massive parallel workloads in Azure; while also allowing you to specify custom applications as well as jobs that can be scheduled and executed on a pool of VMs. If you want to build your own custom ML service, Azure Batch is the best solution as it is also known as the foundation for Azure Machine Learning training clusters. 

As Azure Batch is generally used for embarrassing parallel workloads, it becomes less flexible than Azure Databricks, but, at same time less complicated for the end users. It can be used for computing 3D renderings, video and image processing, compute intensive simulations, or general batch computations such as computing recommendation results or batch-scoring ML models.

It can also support various exotic compute instances, memory-optimized and GPU-enabled VMs; along with the multi-intense workloads using a Message Passing Interface (MPI) and Remote Direct Memory Access (RDMA).

If you want to build your custom ML solution without using the comfort and flexibility of Azure Machine Learning, then, Azure Batch is great choice for you.  

Data Science VMs

If you would like a VM to be your cloud-based ML workstation to take advantage of flexible cloud compute, to run your ML experiments, or to perform on-demand GPU-accelerated training, etc., then, you can definitely go for Data Science VMs (DSVMs) instead of a standard VM.

DSVM is a pre-built pre-configured VM optimized for data science and ML applications containing various popular ML libraries and services including CUDA and cuDNN, CRAN-R, Julia, Python, Jupyter, etc. 

Hence, whenever you require a carefree VM with your popular ML tools pre-installed and pre-configured, then, DSVM should be your service of choice as it is a great alternative ML experimentation environment.  









To read part 1, please click here
To read part 2, please click here
To read part 3, please click here


















Comments

Popular posts from this blog

Deployment (Part 3)

Project Resourcing (Part 2)

Design Planning (Part 3)