Machine Learning On Cloud
Machine Learning On Cloud
Machine Learning On Cloud
According to Microsoft:
Machine learning is a data science technique that allows computers to use existing data to forecast
future behaviors, outcomes, and trends.web
As you move forward, you will learn how to use Azure Machine Learning Studio as an integrated,
end-to-end data science and advanced analytics solution, enabling data scientists to prepare data,
develop experiments, and deploy models at cloud scale.
Jupyter Notebook
Apache Spark
Docker
Kubernetes
Python
Conda
Microsoft Machine Learning Library for Apache Spark
Microsoft Cognitive Toolkit.
Closely related to Machine Learning Studio, Machine Learning Service is an offering in Azure from
Microsoft which provides the utility similar to Azure ML Studio.
It is currently in preview and provides the environment with much better support to open
frameworks in python such as Tensorflow, sci-kit learn etc..
It supports Jupyter Notebooks, Visual Studio Code Tools for AI, Azure Batch AI and Containerised
Deployment.
Use Azure ML functionality to clean data, create ML models and deploy them as web services.
Azure ML Platform
This helps understand using Azure ML platform on the web to generate predictions by creating and
deploying ML models.
Create a Predictive experiment from training experiment encapsulating data transformations and
training model.
Use the Predictive experiment to create a web service to generate predictions using API-endpoint
and API key.
The Cortana Intelligence Gallery is a collection of resources people have shared on the Azure ML
platform.
We can also share our work on experiments and web services for others to learn and explore.
Datasets
An Azure ML experiment requires at least one dataset on which the model is created. The data can
be imported directly or into Azure Storage and used for creating a model.
Hive or U-SQL jobs are used to clean and prepare for analysis.
This data can be stored on Azure Storage and can be easily imported to Azure ML Workspace. Data
can also be imported from Hive or Azure SQL database.
Further in the topic you will learn different types of data used in experiments and ways to use them
Multiple files can be uploaded to the Azure ML workspace but maximum size is less
than 2 GBs.
However, import up to 10 GB of data from other sources is possible.
If you need to work with even larger volumes of data, statistical
sampling techniques must be used to sample ten gigabytes of data for training.
Training Data
TRAINING DATA : The Data to train the Experiment is mandatory data and acts as the start of
Dataflow.
Reference Data
REFERENCE DATA is not directly used in the experiment or for training a model.It is only used to
provide additional information.
For an example when a web service is published based on the ML model, a service endpoint is
created to consume the web service.
Based on the client using the service, the reference data to be used can be varied accordingly to
provide customisation and flexibility to the web service.
In this topic, you have learnt ways to import data from various Azure Data stores and the types of
data used. Few other notable points are:
Data can also be imported from On-premises SQL Server or several other online sources using
Import module.
Multiple Data formats such as .txt, .csv, .nh.csv, .tsv, .nh.tsv, Excel file, Azure table, Hive table, SQL
database table, .RData etc.. are supported.
Data types recognised by ML Studio are String, Integer, Double, Boolean, DateTime and TimeSpan.
Import Data
Prepare Data
Scenarios for advanced analytics.
The Lifecycle
- Data Extraction
- Data Cleaning
- Data Transformation
- Data Visualisation
Adopting ML Studio
Azure Machine Learning Studio helps through the entire lifecycle of a Data
Science solution.
We already had a look at Data Import from various sources, which can be
considered as Data Extraction.
Data Cleaning and Transformation are done based on our business problem and the
approach we take to provide a solution for it. You can see this in the videos over the
following cards.
Data Visualisation
Exploratory Data Analysis and Data Visualisation are facilitated by Notebooks in the
Azure ML Studio.Data Visualisation is readily available in Azure ML Studio over the
right-click of uploaded datasets.
However, Notebooks can be used to further visualise data in a required manner with
Python/R scripts adding flexibility and functionality.
Watch the following video to learn using Notebooks for data visualisation.
They are divided into 4 major classes for both Structured and unstructured data.
- Regression
- Classification
- Anomaly Detection
- Clustering
Go through Machine Learning Axioms and other ML courses for model selection and feature
selection.
Supervised Learning
Azure ML Platform is equipped with over 20 types of supervised learning methods in-built and ready
to use.
Users can also write their own python scripts and embed them into the ML workflow for customised
and optimised models using Notebooks and supported ML libraries like sci-kit learn, Tensorflow, etc.
Move to the next cards to check out examples of various Supervised Learning Algorithms along with
concepts of Data cleaning and Transformation.
Regression Model
Watch this video to understand how to train a regression model from a sample
dataset using Azure ML Experimentation service.
Classification Model
Watch this video to understand how to train a classification model from a sample
dataset using Azure ML Experimentation service.
Moving Further
Supervised learning is the widely used ML modelling technique for structured data.
Different algorithms have their own pros and cons. Our requirements must be
decided on to select the optimal algorithm.
Refer the following links for a better understanding on Supervised learning methods
w.r.t Azure ML platform: Feature Engineering, Algorithm Selection and Evaluating
ML Model.
Unsupervised Learning
Clustering is the most commonly used method where similar data is grouped by finding features and
grouping them based on their feature set.
Recommenders
Recommenders, as their name suggests, are used in sectors of e-commerce, ads
and social platforms etc. They recommend related items of interest based on users
previous interaction.
What's Next?
In this topic, you learned how to create an ML model from datasets taken and different types of ML
models supported on Azure ML Platform.
Move over to next topic to learn to deploy the ML models as web services.
Refer the following links for further info on: K-Means Clustering and Match Box Recommender.
In the previous topic, training different ML models using Azure ML Platform is discussed.
Once the trained models are tested for accuracy and optimised, they need to be deployed for
consumption through API.
This topic helps to learn to deploy the trained model as a web service using a predictive experiment.
Webservice
Creating a Webservice generates API end-points which can be accessed through Primary and
Secondary API-keys.
The API end-points take inputs required by the ML model and given JSON output with predictions.
Webservice Workflow
The following workflow is adopted for building a web service with predictive
experiment:
Moving Further
For more info, refer to the following links: Preparing Predictive Model and Deploying Web Service.
Now as the basics of creating and deploying a Webservice are understood, Move over to the Next
topic to learn more about managing and customising Webservices.
Management
Consumption and
You will also learn about metrics and logging regarding the usage of Web services.
API endpoints and API Keys are used according to the requirement.
The APIs are built as REST APIs and can be consumed by required client application by passing
required parameters as an HTTPS request.
Microsoft Excel along with Azure ML Plug-in can also be used to consume the service.
Parameters
Web Service can be customised to take required parameters to access additional
data.
For example, you can specify a database to fetch the data. It will be provided as an
additional parameter apart from the required parameters for a prediction.
This helps in customised client consumption such that each client will use the same
service for prediction but data output/retrieval is different for each of them.
Monitoring
Azure ML Platform enables Web service management through the Usage Statistics and Logging.
Dashboards provide an overview of the total no.of requests made to the API and their success/fail
rate over a selected period of time. They also give the average compute time and latency associated
with the API.
Logging can be enabled for a more detailed JSON response files stored automatically on Azure
storage providing a detailed report of the request and response.
Moving Further
Refer the following links for more info on Consuming Webservice, Adding Parameters, Managing
Webservice and Logging.
Move over to the next topic to learn using these Webservices in Big Data scenarios.
Big Data Scenario
The Webservice API-endpoints can be used as a service by clients for predictions with instant
response.
However, if we want to use ML models in scenarios having large sets of data i.e for Big Data,
processing must be done in batches at scheduled intervals and in optimal times to reduce latency
and promote the asynchronous way of obtaining predictions for our data.
The Azure Data Factory and its pipeline come in to play in these kinds of scenarios.
When Big Data batch processing is done by pipeline, predictions can be part of the pipeline with
Azure ML being used as a linked service.
Azure ML batch execution activity is used to call a predictive web service from a pipeline.
The input data set is passed to the web service input and, the predicted output from the web service
is returned continuing to the next activity in the pipeline.
Other Linked services are data sources, like Azure Storage or Azure SQL Database, and compute
services, like Azure HD Insight or Azure Data Link Analytics.
What about Retraining?
In a Big Data Scenario retraining the model might be needed a tad frequently, since the data
generated is of huge amounts and may vary over short intervals of time.
We need fresh data to train the ML model to improve the model accuracy with the changing inflow
of data.
However, considering the amount of data required to be handled while Big Data Processes are
ongoing, this is neither efficient nor recommended.
In this situation, Azure Data Factory Pipeline and Azure ML Update Resource Service comes to the
rescue to automate the retraining of ML models.
Automating Retraining
To automate the process of retraining, Azure ML provides us a feature to publish
training experiment as a retraining web service.
The following activities can be executed in sequence, in the Azure Data Factory
Pipeline, to achieve the automation task:
1. Azure ML Batch Execution Activity is used to call the retraining web service to
generate a new model as a file.
2. The model file is passed to an Azure ML Update Resource Activity that
updates the scoring experiment replacing the existing model.
Moving Further
Refer the following links for more info on Predictive Pipelines and Update Resource Activity.
As Big Data processing for predictions is looked into, Move over to the next topic to learn process
real-time data for predictions using the Webservices of Azure ML Platform.
Streaming Process
Input : Are often event hubs or IoT hubs that are used to ingest real-time data at scale.
Streaming Job : Used to process the data. Generally, an Azure Stream Analytics query.
Output : The expected result, could be anything from a database update to a real-time dashboard
with analysis.
Using Azure ML functionality to clean data, create ML models and deploy them as web services.
The course helps you to get started with creating solutions using Azure Machine Learning Platform.
A web service can be published as Classic or [New]. Does this make any
difference?
LogNormal
Logistic
MinMax
Zscore
Get the required size of a dataset from a large amount of accumulated data
is called ________.
Rational Sampling
Preperation Sampling
Preprocess Sampling
Statistical Sampling
Regression
Classification
Clustering
Mean variance Normalization is used in case when the parameter
distribution is __________.
Skew Distribution
Continuous Distribution
Azure Apps
Azure Resources
Azure Apps
Azure Resources
A 5GB data file attained from a U-SQL job can be used for a training
experiment through __________.
Statistical Sampling
Manual Upl
What is the output of Azure Data Factory pipeline that uses the
AzureMLBatchExecution activity to retrain a model?
prediction file
model.ilearner file
query
tranformation
function
job
Jupyter Notebook
Use Import
Poisson Regression
None of the options
Logistic Regression
A workspace
A web service
A storage account
A pricing plan
Select the option that represents the correct order of following tasks: (A)
Predictive Experiment (B) Training Experiment (C) Model Evaluation (D)
Data Preprocessing (E) API Publishing
D, B, C, A, E
B, C, D, A, E
B , D, C, A, E
A, B, C, D,
Neural Networks
Logistic Regression
Decision Fore
Which of the following can be generally used to clean and prepare Big
Data?
Pandas
U-SQL
Data Warehouse
Data Lake
Select Azure Data Factory pipeline activity that retrieves predictions from
an Azure Machine Learning web service.
AzureBlob
AzureMLBatchExecution
AzureUpdateResource
AzureMLBatchProcessing
Stored Results
Cached Results
Generated Results
To retrain the predictive model and update the webservice through Azure
Data Factory, which of the datasets are required?
Scored labels for the output of an AzureMLBatchExecution activity.
Which of the following is false about Train Data and Test Data in Azure ML
Studio _________?
Train data and Test data split should follow a thumb-rule of 80 : 20.
When predicting if the patient has cancer or not, which parameter must be
given importance?
Threshold
Accuracy
Recall
Precision