Choosing the Right Machine Learning Service in Azure (Part 4 of 4)
Custom Compute Services for ML
Azure Databricks
It's named after its company Databricks and was founded by the original creators of spark providing this ever-changing open source technology as a ready-made product to the customers. The platform itself is a big data analytics platform utilizing Apache Spark. Let's look into this:
Distributed Computed using Apache Spark
As a storage system, it can offers various options ranging from standard local storage and the Hadoop Distributed File System (HDFS) to Azure Data Lake and Amazon S3 storage, while also providing direct access to RDBMS and the documents from NoSQL systems.
Finally, if you want to define and dispatch jobs as well as the computational graphs, you can make use of different programming languages like Scala, Python, and R, executed via Apache Spark. It also provides a few built-in libraries (to allow both data access and manipulation via Spark SQL) as well as distributed computations (via Spark streaming, MLlib, and GraphX).
Although Azure Databricks is considered as a good choice while migrating on-premises Spark-based services to Azure (or building big data analytics, transformation, or recommendation services), its complexity and premium price makes it a poor choice (generally) for ML projects.
Azure Batch
As Azure Batch is generally used for embarrassing parallel workloads, it becomes less flexible than Azure Databricks, but, at same time less complicated for the end users. It can be used for computing 3D renderings, video and image processing, compute intensive simulations, or general batch computations such as computing recommendation results or batch-scoring ML models.
It can also support various exotic compute instances, memory-optimized and GPU-enabled VMs; along with the multi-intense workloads using a Message Passing Interface (MPI) and Remote Direct Memory Access (RDMA).
If you want to build your custom ML solution without using the comfort and flexibility of Azure Machine Learning, then, Azure Batch is great choice for you.
Data Science VMs
DSVM is a pre-built pre-configured VM optimized for data science and ML applications containing various popular ML libraries and services including CUDA and cuDNN, CRAN-R, Julia, Python, Jupyter, etc.
Hence, whenever you require a carefree VM with your popular ML tools pre-installed and pre-configured, then, DSVM should be your service of choice as it is a great alternative ML experimentation environment.
Comments
Post a Comment