In today’s world, there are a lot of raw data generated every day in almost every IT Industries, so there is an need of a dedicated team who can be to evaluate and plot this data to make inferences and imply the Machine Learning algorithm to make the predictions. Hence there is a huge demand and gap for Data Scientists.
Microsoft Azure Data Scientist DP-100 Certification helps to assess the individual knowledge on data science and machine learning to deploy and run machine learning workloads on Microsoft Azure with the usage of Azure Machine Learning Service.
If you are preparing for Microsoft Azure Data Scientist Certification (DP-100) Exam, then you have to check your readiness by taking these exam questions and answers.
Demand for Data Scientists
According to a report by IBM, the demand for data scientists gets level up by 28% in the year 2024 and beyond, and thus making it one of the ever-growing careers in the future.
Roles and Responsibilities of Data Scientists
- Transforming vast amounts of organized and unstructured data into informative data.
- Finding the data analytics solutions with the greatest potential to advance business.
- Finding hidden patterns and trends using data analysis methods like text analytics, machine learning, and deep learning.
- Data cleansing and validation to increase data accuracy and efficacy.
Top 20 Free DP-100 Exam Questions
Here’s an compiled list of free DP-100 exam questions and answeres framed by our experts and by taking this certification can really helps you to get thorough on every concepts required to clear the real examination.
Also Read: How to Prepare for the Exam DP-100: Designing and Implementing a Data Science Solution on Azure?
Domain: Design and prepare a machine learning solution
Question 1: Which of the following statements are Not True in the usage of Execute Python Script component of Azure Machine Learning designer?
- The script must contain a function named azureml_main
- The entry point function must have two input arguments
- A zip file should always be connected to the third input port
- Run settings section of Execute Python Script component is used to attach a compute target to execute the step
Correct Answer: C
Explanation:
We use Execute Python Script component to add custom logic to Azure Machine Learning designer where we can bring and use our custom-written code.
Option A is incorrect because a function named azureml_main is mandatory for the azure ML Designer framework to execute the step.
Option B is incorrect because the entry point to the function must have two input arguments which must be pandas’ data frames.
Option C is correct because it is not mandatory to add any zip file to the step, it is only needed if we want to use code from our custom python modules/packages.
Option D is incorrect because a default compute target is attached to the pipeline that runs this step and if we want to use a different compute machine Run settings have options to detach and attach compute machines.
References: https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/execute-python-script , https://learn.microsoft.com/en-us/training/modules/create-classification-model-azure-machine-learning-designer/
Domain: Explore data and train models
Question 2: You are conducting multiple experiments trying to find the best Machine Learning algorithm that fits your data for a regression problem, and you are using MLFlow to keep track of the experiments. You want to log the regression metrics for each experiment using the following dictionary -> metrics = {“r2”: 0.1, “mse”: 2500.00, “rmse”: 50.00}. Which of the following code line can fulfill the task?
- mlflow.log_metric(metrics)
- mlflow.log_params(metrics)
- mlflow.log_artifacts(metrics)
- mlflow.log_metrics(metrics)
Correct Answer: D
Explanation:
The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results.
Option A is incorrect because mlflow.log_metric() is used to log a single metric. For example: – mlflow.log_metric(“mse”, 2500.00)
Option B is incorrect because mlflow.log_params() is a batch of parameters for the current run, parameters are different configuration variables that are internal to the machine learning models that can change independently as it learns the patterns in the data.
Option C is incorrect because mlflow.log_artifacts() is generally used to Log all the contents of a local directory as artifacts of the run. Artifacts could be plot files, model output files, or any other file we would like to store for later reference.
Option D is correct because a mlflow.log_metrics() is used to Log multiple metrics for the current run.
References: https://learn.microsoft.com/en-us/training/modules/use-mlflow-to-track-experiments-azure-databricks/5-exercise-experiment, https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_metrics
Domain: Deploy and retrain a model
Question 3: You are building a training pipeline in Azure machine learning studio workspace using AzureML python SDK library azureml.pipeline.core. You also want to Automate the process of retraining after the deployment whenever you receive enough new data. Assume that you are planning to upload the new data at a Blob storage location, and you wanted the pipeline to get triggered as soon as the upload to Blob storage happens. Select all the libraries under azureml.pipeline.core that are needed to accomplish this task. (SELECT TWO)
- from azureml.pipeline.core import Schedule
- from azureml.core.datastore import Datastore
- from azureml.pipeline.core import ScheduleRecurrence
- from azureml.core import Environment
- from azureml.pipeline.core import PipelineRun
Correct Answers: A and B
Explanation:
azureml.pipeline.core contains core functionality to work with Azure Machine Learning pipelines, which are configurable machine learning workflows (i,e. Pipelines). Azure Machine Learning pipelines allow you to create re-usable machine learning workflows that can be used as a template for your machine learning scenarios such as retraining, model- deployment, data processing, etc.
Option A is correct because azureml.pipeline.core.Schedule Class is designed to monitor changes to the blob container or location in the blob storage container and create a trigger event that starts the training pipeline with pre-set input parameters.
Option B is correct because azureml.pipeline.core.Datastore is needed to create an object that refers to the Blob storage location where our input files are placed and this object is used as input to the Schedule object which monitors for any file additions.
Option C is incorrect because azureml.pipeline.core.ScheduleRecurrence Class is used to schedule a pipeline to run at periodic intervals (i,e. after 15 days, weeks, hours, etc). An instance of this class is used as an input to the Schedule object but when triggering a pipeline based on monitoring a blob storage location we don’t need to use this class.
Option D is incorrect because azureml.core.Experiment Class is used to Fetch the environment to run a pipeline. This is attached to pipeline and used when we submit the pipeline to run.
Option E is incorrect because azureml.Pipeline.core.PipelineRun Class is used to run the built pipeline. This class doesn’t help to build a pipeline rather this is used after the development activity.
https://learn.microsoft.com/en-us/training/modules/create-pipelines-in-aml/
Domain: Design and prepare a machine learning solution
Question 4: You are working on a binary classification problem and tried all classical machine learning algorithms but none of them resulted in a satisfactory output. So you have turned towards building a deep learning model and you have multiple optimizers to choose from. Identify the option that is not an optimization algorithm of a Deep Neural Network.
- Stochastic Gradient Descent (SGD)
- Adaptive Learning Rate (ADADELTA)
- Adaptive Momentum Estimation (Adam)
- Gradient Clipping (GC)
Correct Answer: D
Explanation:
An Optimizer modifies/updates the attributes of the neural network which is called learning, such as weights, bias and learning rate, etc. Different algorithms were proposed to do these activities each having its own advantages and disadvantages.
Option A is incorrect because it is one of the basic optimizers which calculates loss and updates weight in the back-propagation.
Option B is incorrect because it is an advanced version of SGD which also adjusts the learning rate during backpropagation.
Option C is incorrect because it is also an extension of SGD combined with the momentum algorithm.
Option D is correct because Gradient Clipping is not an optimizer algorithm rather it is a technique where the error derivative in the backpropagation is clipped which is an effective way to tackle Exploding Gradients that occur during Deep learning models training.
http://learn.microsoft.com/en-us/training/modules/train-evaluate-deep-learn-models
Domain: Deploy and retrain a model
Question 5: You have created an Azure Pipeline to test the deployment script and hosted a repository containing the deployment pipeline on a private GitHub repository. Now you want to set up a CI/CD Pipeline from GitHub that executes the pipeline in Azure whenever a change is made to the deployment script. The First step in setting up the CI/CD pipeline is to set up Authentication with Azure Pipelines for GitHub. Below are the listed steps required to set up authentication in GitHub. Please select the correct sequence to achieve the goal.
- Add a Personal Access Token as a secret of your repository
- Generate a Personal Access token, in your DevOps Organization with an expiry date
- Open your GitHub repository and go to Security Settings
- Sign in to your Azure DevOps organization
Correct Answer: D, B, C and A
GitHub Actions helps us to automate software development workflows such as build, test, package, release, or deploy any project on GitHub with a workflow from within GitHub. With GitHub Workflows for Azure, you can create workflows that you can set up in your repository to build, test, package, release, and deploy to Azure. To set up a workflow that can trigger an azure pipeline, GitHub needs to authenticate with Azure DevOps, and using Personal Access Token is one of the simplest ways of doing it which connects your GitHub account to Azure DevOps in the following steps:
- Sign in to your Azure DevOps organization.
- Add a Personal Access Token as a secret of your repository.
- Open your GitHub repository and go to Security Settings.
- Add a Personal Access Token as a secret of your repository.
Reference:
https://learn.microsoft.com/en-us/azure/developer/github/github-actions
Domain: Design and prepare a machine learning solution
Question 6: You are designing a machine learning solution using azure Machine learning studio. Which of the following is not true in regard to using Azure Machine Learning Studio?
- The workspace is the top-level resource for Azure Machine Learning
- AzureMl Pipelines are reusable workflows for training and retraining your model
- When you create the workspace, you also need to create an associated workspace such as Azure Storage Account, Azure Container Registry, etc. before starting to use the workspace
- You can interact with the workspace using AzureML Python SDK from any python environment remotely
Correct Answer: C
Explanation:
Option A is incorrect because the workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning.
Option B is incorrect because an Azure Machine Learning pipeline is an independently executable workflow of a complete machine learning task that can be executed on a variety of compute machines from AzureML Studio Workspace.
Option C is correct because when you create the workspace, associated resources are also created for you. You may choose to not use these created default resources and create resources of your own, nonetheless, you will always have default resources created for you to start experimenting within the workspace.
Option D is incorrect because Azure provides us with the open-source Azure libraries for Python to simplify provisioning, managing, and using Azure resources from any remote Python environment after proper authorization.
https://learn.microsoft.com/en-us/azure/machine-learning/concept-workspace
Domain: Design and prepare a machine learning solution
Question 7: You are tasked with creating a binary prediction model as soon as possible and are provided with a .csv file of size 400 MB. After quick thinking, you have decided to use Azure Ml designed as it provides a quick out-of-the-box solution. You have to use this .csv file as input to your model. Which is the fastest and also recommended way by Microsoft for the ingestion of data into your solution in AzureML?
- Use the Import Data component in AzureMl Designer to read data from the local machine using a path on your local computer
- Register the .csv file as a Dataset and access it in the AzureMl Designer
- Upload your file to an HTTPS server like GitHub. Use the Import Data component in azureML designer to read data from the local machine using a path on that http/s server
- Upload your data to an Azure Blob Storage and access the AzureML Designer
Correct Answer: B
Explanation:
Option A is incorrect because the Data Import Component of AzureMl Designer doesn’t support reading data directly from a local computer.
Option B is correct because datasets are part of AzureMl Workspace and are managed by Azure and also recommended by Microsoft while working with AzureML Studio. We can also take full advantage of advanced data features like versioning and tracking and data monitoring using datasets. Apart from datasets, we can also use data stores by pointing them toward our files in the Azure blob container. So if you could register the file as a dataset it would appear under the AzureML Designer > asset library > data section. You select it and use it inside the designer.
Option C is incorrect because uploading files to servers causes security issues as anyone with a link can access it if not properly managed and it is not a preferred way when dealing with confidential data.
Option D is incorrect because files from azure blob storage cannot be directly accessed from Designer. You need to create a datastore connecting the azure blob storage container with Azure ML Studio and then we can use this datastore as a way to import data into AzureML Designer.
References: https://learn.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-v2?tabs=cli#data, https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/import-data
Domain: Explore data and train models
Question 8: You are writing a training script and you wish to run it on a remote compute target with compute disk size of 30 GB and 4 GB Ram. Your input data of size 40 GB is registered as a file dataset. Which mode should you use to access input data in your training script?
- as_mount()
- as_download()
- as_upload()
- as_hdfs()
Correct Answer: A
Explanation:
Option A is correct because the data size exceeds the compute disk size, downloading isn’t possible. For this scenario, It is recommended mounting since only the data files used by your script are loaded at the time of processing.
Option B is incorrect because datasets cannot be downloaded because of size restrictions.
Option C is incorrect because as_upload() method is used to upload model outputs and other artifacts to azure but not to read into your script.
Option D is incorrect because the as_hdfs() is used with Azure Synapse.HDFS is Hadoop Distributed File System and is normally used to process large data sets running on Compute clusters with multiple nodes for parallel processing. This Model is used for experiment runs dealing with problems whose solution involves both Azure Synapse and Azure Machine Learning.
References:
https://learn.microsoft.com/en-us/training/paths/build-ai-solutions-with-azure-ml-service/, https://learn.microsoft.com/en-us/azure/machine-learning/v1/how-to-train-with-datasets
Domain: Deploy and retrain a model
Question 9: You have successfully trained and validated a machine learning model using AzureML Python SDK. You are using Visual Studio Code Editor as it has extensions such as VSCode AzureML which help you to access/control Azure services from your local machine. You want to deploy the trained model to a compute target as a web service for some further testing. Which of the following cannot fulfill the requirement?
- Azure Container Instance
- Azure Kubernetes Service
- Local development environment
- Azure Virtual Machine
Correct Answer: D
Explanation:
Option A is incorrect because Azure Container Service supports deploying azure machine learning models which is generally a preferred way during testing and development.
Option B is incorrect because Azure Kubernetes Service also supports the deployment of models and Kubernetes is more suitable for production workloads.
Option C is incorrect because we can deploy models to the local development environment during testing and development. This method is preferred while debugging any errors caused.
Option D is correct because we cannot deploy a model directly to the virtual machine. It is just a compute machine created on the cloud and doesn’t necessarily have all properties and support needed to deploy a machine model directly and start consuming the endpoint created.
References:
https://learn.microsoft.com/en-us/azure/machine-learning/v1/concept-model-management-and-deployment?source=recommendations, https://learn.microsoft.com/en-us/training/paths/build-ai-solutions-with-azure-ml-service/
Domain: Design and prepare a machine learning solution
Question 10: You have trained a model using a dataset registered in AzureML Studio which contains historical data from the past 2 years. As time progresses, you will collect new data and you fear that over time there may be trends that change the profile of the data. So you have set up data drift monitoring in Azure Machine Learning Studio to notify the data science team whenever a data drift is detected. From the following, select the one which is not true regarding the setup of data drift in AzureML.
- To monitor data drift using registered datasets, you need to register two datasets
- The data drift monitor gets triggered for every small change in the data and cannot be customized to trigger changes that we think are significant
- You can create dataset monitors using the visual interface in Azure Machine Learning studio
- An alert notification by email can be configured while defining data drift to notify team members of the data drift
Correct Answer: B
Explanation:
Option A is incorrect because To monitor data drift using registered datasets, you need to register two datasets. 1) baseline dataset – usually the original training data.2) A target dataset that will be compared to the baseline based on time intervals. This dataset requires a column for each feature you want to compare, and a timestamp column so the rate of data drift can be measured.
Option B is correct because Data drift is measured using a calculated magnitude of change in the statistical distribution of feature values over time. You can expect some natural random variation between the baseline and target datasets, but you should monitor for large changes that might indicate significant data drift and not for small changes. It is possible to only detect Large changes that we are interested in. This can be achieved by defining a threshold for data drift magnitude above which you want to be notified
Option C is incorrect because You can create dataset monitors using the visual interface in Azure Machine Learning studio. Go to Datasets Section and Select Data Monitor option to set up Data drift.
Option D is incorrect because we can use the alert_configuration setting while defining DataDriftDetector to notify the drift detected via mail.
Reference:
Domain: Design and prepare a machine learning solution
Question 11: You have developed a training pipeline using Jupyter notebooks directly in your workspace in Azure Machine Learning studio. You have used the following command in a cell to start the training pipeline.
run = experiment.submit(ScriptRunConfigObject)
Which of the following command lets you retrieve all the metrics logged during the run?
- RunDetails(run).show()
- run.get_detailed_status()
- run.get_metrics()
- run.wait_for_completion()
Correct Answer: C
Explanation:
Option A is incorrect because RunDetails(run).show() is used to monitor the run once the training job is submitted. We can view the Live messages being logged and the Live status of the pipeline run in real-time.
Option B is incorrect because fetches the latest status of the run. Based on the current status it will fetch some additional logs.
Option C is correct because get_metrics() lets us fetch metrics for runs in the given while analyzing the Training pipeline run/Job results.
Option D is incorrect because of wait_for_completion() Wait for the completion of this run and Returns the status object after the wait.
References:
azureml.core.Run class – Azure Machine Learning Python | Microsoft Learn, azureml.core.Run class – Azure Machine Learning Python | Microsoft Learn, MLNotebooks/train-with-datasets.ipynb at master · Azure/MachineLearningNotebooks · GitHub
Domain: Explore data and train models
Question 12: A teammate of yours has used Azure AutoML to choose the right algorithm for a forecasting problem. Your team has found the right model that gives the best forecast. So he implemented it in his final solution but he/she is wondering if he has followed Responsible Artificial Intelligence (Responsible AI) Guidelines. Help your teammate by finding where the Responsible AI Guidelines haven’t been followed ?
- Creation of a Responsible AI Dashboard for the final model created
- The error analysis component of the Responsible AI dashboard shows that the error rate is uniform across different groups of the categorical input features
- Final solution is designed in a way where we could explain if a forecast is higher or lower
- Entire Repository is Committed to your company’s GitHub channel with the config file that has secrets, such as database connection strings or passwords
Correct Answer : D
Explanation:
Option A is incorrect because the creation of a Responsible AI Dashboard helps us to check one of the principles of the Responsible AI Guidelines Fairness and inclusiveness which says that AI systems should treat everyone fairly and avoid affecting similarly situated groups of people in different ways. So here Responsible AI Guidelines are upheld.
Option B is incorrect because the error must be uniform across all groups/regions of input features which represents another principle of the Responsible AI Guidelines, Reliability, and safety. For example, Let’s say we have a higher rate for a group of individuals less than 25 years of age, We are essentially introducing bias in the model for people below 25 years of age as the results for them are not reliable when compared to others. So here Responsible AI Guidelines are upheld.
Option C is incorrect because based on the forecast organisation takes key decisions such as ordering inventory, hiring employees, making delivery promises, etc so it’s essential to understand why the output is the way it is. This is also another principle of the Responsible AI Guidelines called Transparency. So here Responsible AI Guidelines are upheld.
Option D is correct because storing secrets in source code is impractical and is a huge security risk. It violates another principle of Responsible AI Guidelines called Privacy and security. One of the best alternatives recommended by Microsoft to storing secrets in source code is to make them available in the application environment by using Azure Key Vault. Azure Key Vault is an azure service that provides secure storage of generic secrets for applications.
References:
What is Responsible AI – Azure Machine Learning | Microsoft Learn, What is Responsible AI – Azure Machine Learning | Microsoft Learn, Authentication secrets – Azure Machine Learning | Microsoft Learn
Domain: Prepare a model for deployment
Question 13: You have developed a pipeline using AzureML Python SDK, which has three steps -> Step-1: Read and preprocess input data, Step-2: Train the Model on preprocessed data, Step-3: Deploy the trained model to an endpoint. You have successfully published the pipeline and is triggered every month to retrain the model with newly collected input data. Suddenly Pipeline re-train run failed. After the investigation, you found the problem and made changes to the training script used in Step-2. Choose the right code snippet out of the following that does the job of republishing the pipeline?
- from azureml.pipeline.core import Pipeline
pipeline = Pipeline(workspace=yourmlworkspace, steps=[Step-1, Step-2, Step-3])
pipeline.publish(name=pipeline_name, version=new_version_no)
- from azureml.pipeline.core import Pipeline
pipeline = Pipeline(workspace=yourmlworkspace)
pipeline_run = experiment.submit(pipeline)
pipeline.publish(name=pipeline_name, version=new_version_no)
- from azureml.pipeline.core import Pipeline
pipeline = Pipeline(workspace=yourmlworkspace, steps=[Step-1, Step-2, Step-3])
pipeline_run = experiment.submit(pipeline)
pipeline.publish(name=pipeline_name, version=new_version_no)
- from azureml.pipeline.core import Pipeline
pipeline = Pipeline(workspace=yourmlworkspace, steps=[Step-2])
pipeline_run = experiment.submit(pipeline)
pipeline.publish(name=pipeline_name, version=new_version_no)
Correct Answer: C
Explanation:
Using Python SDK for AzureML, we can interact/access/use AzureML services. One such Service is Azure Machine Learning Pipelines. We can develop pipelines for data processing training, inferencing, etc. Generally, Machine Learning Solution would contain three components 1) data processing 2) training 3) Deployment. Here the pipeline is defined in three pipeline steps for each of the discussed components. We need to publish a pipeline every time a change is made to it.
Since we have made changes to the training step (Step-2) we need to create a pipeline object with all its steps -> pipeline = Pipeline(workspace=yourmlworkspace, steps=[Step-1, Step-2, Step-3]). Then we need to submit the pipeline as a job using Azure ML Experiment Class to check the proper functioning of the pipeline -> pipeline_run = experiment.submit(pipeline). Once after the successful run of the Job. If we are satisfied with the results we will have to publish it ->pipeline.publish(name=pipeline_name, version=new_version_no).
Option A is incorrect because Pipeline is not submitted before publishing to check the functioning of the changes made.
Option B is incorrect because while creating the pipeline we did not mention the pipeline steps that are part of the pipeline.
Option C is correct because it shows all the steps to create a pipeline, ran the pipeline, and then published it.
Option D is incorrect because we have only mentioned Step-2 and left out all other steps while creating the pipeline.
References:
Azureml.pipeline.core.pipeline.Pipeline class – Azure Machine Learning Python | Microsoft Learn, Create and run ML pipelines – Azure Machine Learning | Microsoft Learn
Domain: Deploy and retrain a model
Question 14: An ML Team is planning to deploy a model built using Azure Services. You have been asked to identify a computer target to deploy the model for inference for various cases. Which of the following should you recommend for each requirement? Drag the appropriate service to the correct answer area.
Incorrect Matching:
Service Name | Answer Area |
Local web service | Use for low-scale CPU-based workloads that require less than 48 GB of RAM. Doesn’t require you to manage a cluster |
Azure Machine Learning endpoints | Run inferencing workloads on on-premises, cloud, and edge Kubernetes clusters |
Azure Machine Learning Kubernetes | Fully managed computes for real-time (managed online endpoints) and batch scoring (batch endpoints) on serverless compute |
Azure Container Instances | Use for limited testing and troubleshooting. Hardware acceleration depends on the use of libraries in the local system |
Correct Answer: 1-D, 2-C, 3-B and 4-A
Service Name | Answer Area |
Local web service | Use for limited testing and troubleshooting. Hardware acceleration depends on the use of libraries in the local system |
Azure Machine Learning endpoints | Fully managed computes for real-time (managed online endpoints) and batch scoring (batch endpoints) on Serverless Compute |
Azure Machine Learning Kubernetes | Run inferencing workloads on on-premises, cloud, and edge Kubernetes clusters |
Azure Container Instances | Use for low-scale CPU-based workloads that require less than 48 GB of RAM. Doesn’t require you to manage a cluster |
Explanation:
Local web service
This is a solution for limited testing and troubleshooting. Here the model is deployed on the local machine you are working on so Hardware acceleration depends on the use of libraries in the local system.
Azure Machine Learning endpoints
This Solution is Microsoft recommended way to deploy models for inference since it is Fully managed computes (Managed by Azure) for real-time (managed online endpoints) and batch scoring (batch endpoints) on serverless computers.
Azure Machine Learning Kubernetes
This is the traditional option of deploying models in a production environment where we can run inferencing workloads on on-premises, cloud, and edge Kubernetes clusters.
Azure Container Instances
This Solution can be used for low-scale CPU-based workloads that require less than 48 GB of RAM. Doesn’t require you to manage a cluster (Azure Manages it). It only supports Real-time inference and is Recommended for dev/test purposes only.
References:
Deploy machine learning models – Azure Machine Learning | Microsoft Learn\, Deploy ML models to Kubernetes Service with v1 – Azure Machine Learning | Microsoft Learn, How to deploy models to Azure Container Instances with CLI (v1) – Azure Machine Learning | Microsoft Learn, Deploy machine learning models to online endpoints – Azure Machine Learning | Microsoft Learn
Domain: Deploy and retrain a model
Question 15: Your colleagues developed an AzureML pipeline using PythonScriptStep, where your team wrote custom code for training and subsequent deployment using the AzureML SDK to manage AzureML resources. The deployment target is an Azure Container instance. Below is the code snippet used for the deployment. Now they want to verify the deployment status. If the deployment fails, they want to know the reason for the failure.
code snippet used for deployment:
from azureml.core.webservice import AciWebservice
deployment_config=AciWebservice.deploy_configuration(cpu_cores=0.5,memory_gb=1,auth_enabled=True)
service = Model.deploy(
ws,
“YourACIServiceName”,
[model],
inference_config,
deployment_config,
overwrite=True,
)
service.wait_for_deployment()
Which of the following can be utilized to achieve the goal? [SELECT TWO]
- Use the following print statement after you deploy -> print(service.get_logs())
- In the AzureML web Studio, use the UI to get deployment logs
- Go to the pipeline job, and get logs from the outputs and logs section which are printed automatically when a deployment fails
- If deployment is not successful, Re-Deploy the model locally and debug the problem
- Raise a Microsoft support request if a deployment fails to get the necessary help
Correct Answers: A and B
Explanation:
The following services can be used to look at the status and logs of the deployment:
By Using Azure ML SDK
– use the following print statement after you deploy -> print(service.get_logs())
– service.get_logs() prints all the logs that were recorded by AzureML SDK packages and also logs that were placed in deployment scripts. So you would know exactly what went wrong with the deployment. These logs can be found in the output logs of the Job where the pipeline ran.
Code example:-
from azureml.core.webservice import AciWebservice
deployment_config=AciWebservice.deploy_configuration(cpu_cores=0.5,memory_gb=1,auth_enabled=True)
service = Model.deploy(
ws,
“YourACIServiceName”,
[model],
inference_config,
deployment_config,
overwrite=True,
)
service.wait_for_deployment()
service.get_logs()
By Using Azure ML Web Studio UI
- Under the assets section of the AzureML Web Studio > inside endpoints, you can look at the Deployment state and logs of the service you have selected to deploy.
Option C is incorrect because No Logs will be printed about the deployment except for the exception message saying that the deployment has failed.
Option D is incorrect because deploying the model locally is a way to troubleshoot the failure which is mostly runtime errors. Some issues cannot be found from this method as There are a lot of elements to a real-time service deployment, including the trained model, the runtime environment configuration, the scoring script, the container image, and the container host. We cannot exactly replicate everything by deploying it locally even though a majority of issues can be found. This method is used to troubleshoot the cause of the error and is not helpful to look at the deployment status or immediate error information.
Option E is incorrect because Microsoft support requests are raised when dealing with problems when the problem is out of the scope/capacity of the developer and it is not the right way to deal with deployment issues.
Reference:
Deploy machine learning models – Azure Machine Learning | Microsoft Learn
Domain: Prepare a model for deployment
Question 16: You have been given a task to quickly check the feasibility of building an image classification mode and were given 100k images as input data. You have decided to quickly train multiple models and see if the performance of any model is acceptable. You decide to use Azure Machine Learning (AzureML) Experiments to do the training and wanted to use MLflow to track Tracking experiments to record and compare parameters and results. You do not have enough time to write code that can log/record results for all the models. So you decide to use the working MLflow Autolog() functionality. So among the following machine learning libraries choose a library that doesn’t support Auto Logging with MLflow?
- Theano
- Pytorch
- TensorFlow
- Keras
Correct Answer: A
Explanation:
Option A is correct because Theano is a Python library that allows us to evaluate mathematical operations and it doesn’t support Auto Logging with MLflow.
Option B is incorrect because we can either use mlflow.autolog() or explicitly call mlflow.pytorch.autolog()
Option C is incorrect because we can either use mlflow.autolog() or explicitly call mlflow.tensorflow.autolog()
Option D is incorrect because we can either use mlflow.autolog() or explicitly call mlflow.keras.autolog()
References:
Theano – Azure Marketplace, Logging MLflow models – Azure Machine Learning | Microsoft Learn, MLflow Tracking for models – Azure Machine Learning | Microsoft Learn
Domain: Design and prepare a machine learning solution
Question 17: In Azure Machine Learning studio, you can manage the computer targets for your data science activities. There are four kinds of computer resources you can create and use. What is the purpose of having four kinds of computer resources? Drag the appropriate service to the correct answer area.
Incorrect Match
Compute Target | Answer Area |
Compute Instances | Links to existing Azure compute resources, such as Virtual Machines or Azure Databricks clusters. |
Compute Clusters | Deployment targets for predictive services that use your trained models |
Inference Clusters | Scalable clusters of virtual machines for on-demand processing of experiment code. |
Attached Compute | Development workstations that data scientists can use to work with data and models. |
Correct Answers: 1-D, 2-C, 3-B and 4-A
Compute Instances | Development workstations that data scientists can use to work with data and models |
Compute Clusters | Scalable clusters of virtual machines for on-demand processing of experiment code |
Inference Clusters | Deployment targets for predictive services that use your trained models |
Attached Compute | Links to existing Azure compute resources, such as Virtual Machines or Azure Databricks clusters |
Explanation:
Compute Instances
can be used as a training compute target similar to Azure Machine Learning compute training clusters. But a compute instance has only a single node, while a compute cluster can have more nodes. So these are generally used as Development workstations that data scientists can use to work with data and models.
Compute Clusters
Since Compute clusters can have multiple nodes they are Scalable clusters of virtual machines for the on-demand processing of experiment code. They are generally used to run all kinds of Machine learning Jobs and can also be shared with other users in your workspace.
Inference Clusters
These are Deployment targets for predictive services that use your trained models. These are generally Azure Kubernetes Service (AKS) clusters.
Attached Compute
These compute targets are not managed by Azure Machine Learning. You create this type of compute target outside Azure Machine Learning and then attach it to your workspace. These compute resources can require additional steps for you to maintain or to improve performance for machine learning workloads.
Azure Machine Learning supports the following compute types: Remote virtual machines, Azure HDInsight, Azure Databricks, Azure Data Lake Analytics, etc.
References:
What are compute targets – Azure Machine Learning | Microsoft Learn, Microsoft Azure AI Fundamentals: Explore visual tools – machine learning | Microsoft Learn
Domain: Explore data and train models
Question 18: You are dealing with an NLP Task where you have to predict the genres of a movie.
Your input dataset(.txt file) looks like this;
text | labels |
“Avengers End Game” | “Action, Adventure” |
“Exorcist” | “Horror” |
“RRR” | “Action, Historical Fiction” |
You have decided to use Azure AUTOMLCapabilities. Which of the following can we use to train a natural language processing model using AutoML?
- A) job = automl.classification(
compute=my_compute_name,
experiment_name=my_exp_name,
training_data=my_training_data_input,
target_column_name=”labels”,
primary_metric=”accuracy”,
n_cross_validations=5,
enable_model_explainability=True)
- B) job = automl.text_classification(
compute=compute_name,
experiment_name=exp_name,
training_data=my_training_data_input,
validation_data=my_validation_data_input,
target_column_name=”labels”,
primary_metric=”accuracy”)
- C) job = automl.text_classification_multilabel(
compute=compute_name,
experiment_name=exp_name,
training_data=my_training_data_input,
validation_data=my_validation_data_input,
target_column_name=”labels”,
primary_metric=”accuracy”)
- D) text_ner_job = automl.text_ner(
compute=compute_name,
experiment_name=exp_name,
training_data=my_training_data_input,
validation_data=my_validation_data_input)
Correct Answer: C
Explanation:
Option A is incorrect because automl.classification is used to solve normal classification (non-NLP) problems but not text classification problems. Here we are dealing with text data and we are trying to generate an NLP Model.
Option B is incorrect because automl.text_classification is used to solve multi-class classification problems. multi-class classification -> if there are multiple possible classes and each sample (data point) can be classified as exactly one class. The task is to predict the correct class for each sample. For example, classifying a movie as “Comedy” or “Romantic”.
Option C is correct because we are dealing with a Multi-Label Classification problem. If you look at the input data each data point can contain more than 1 Label associated with it. Multi-Label Classification -> here are multiple possible classes and each sample can be assigned any number of classes. The task is to predict all the classes for each sample. For example, classifying a movie as “Comedy”, “Romantic”, or “Comedy and Romantic”.
Option D is incorrect because automl.text_ner is used for problems related to Named Entity Recognition (NER). It involves the identification of key information in the text and classification into a set of predefined categories. An entity is basically the thing that is consistently talked about or referred to in the text. At its core, NLP is just a two-step process, below are the two steps that are involved: Detecting the entities from the text, Classifying them into different categories. The Format (CoNLL format) of input data required for this task is also different from what is provided
References:
Set up AutoML for NLP – Azure Machine Learning | Microsoft Learn, azureml-examples/automl-nlp-text-classification-multilabel-task-paper-cat.ipynb at main · Azure/azureml-examples · GitHub
Domain: Deploy and retrain a model
Question 19: You are designing an event-driven workflow where you want to automatically trigger a re-training pipeline published in your AzureML workspace whenever you detect a data drift in the training data. You have set up a dataset monitor to detect data drift in a workspace.
Arrange the following steps you need to perform in the right order to achieve the workflow.
- Subscribe to the event type –> Microsoft.MachineLearningServices.DatasetDriftDetected
- Create an Azure Logic App from your Azure Machine Learning Subscription page and select MachineLearningServices.DatasetDriftDetected event(s) to be notified for
- Use Http Step in Azure Logic Apps to trigger the Retraining pipeline
- Go to your Azure Machine Learning Subscription page, and select Create Event Subscription
Correct Answer: D, A, B and C
Explanation:
Azure Cloud Platform of Microsoft Provides a service called Azure Event Grid which lets us create workflows based on events. An event Grid Monitors events and triggers workflows designed by us. Azure Cloud Platform provides another service called Azure Logic Apps which lets us create and orchestrate the workflows which are generally triggered by Event Grids.
One of the ways to create a retraining workflow based on data drift is by using Azure Event Grid and Azure Logic Apps. First step is to set up an Event Grid subscription where the event grid will receive DataDrift events from our ML workspace. Next, we will create a Logic App where it hears for data drift events coming from our ML Workspace and we will use an HTTP step inside Azure Logic App where we can send a Post request to the pipeline endpoint of the re-training pipeline (All published Pipelines have endpoint’s which we can use to trigger pipeline).
So the order is ->
- D) Go to your Azure Machine Learning Subscription page, select Create Event Subscription
- A) Subscribe to the event type → Microsoft.MachineLearningServices.DatasetDriftDetected
- B) Create an Azure Logic App from your Azure Machine Learning Subscription page and select
MachineLearningServices.DatasetDriftDetected event(s) to be notified
- C) Use Http Step in Azure Logic Apps to trigger the Retraining pipeline
References:
Trigger events in ML workflows (preview) – Azure Machine Learning | Microsoft Learn, Trigger Azure Machine Learning pipelines – Azure Machine Learning | Microsoft Learn
Domain: Deploy and retrain a model
Question 20: In which of the following scenarios should you choose Real-time endpoints over Batch endpoints?
- When you have expensive models that require a longer time to run inference
- When you need to perform inference over large amounts of data, distributed in multiple files
- When your prediction calls to the deployment model are not frequent
- When you have low latency requirements
Correct Answer: D
Explanation:
After you train a machine learning model, you need to deploy the model so that others can use it to do inferencing. In Azure Machine Learning, you can use endpoints and deployments to do so. An endpoint is an HTTPS endpoint that clients can call to receive the inferencing (scoring) output of a trained model. There are two kinds of endpoints 1) online and 2) Batch
Option A is incorrect because online endpoints are designed for faster response times and when we know that the prediction takes longer time, we should go for the Batch endpoint.
Option B is incorrect because It takes time to process large volumes of data. And batch endpoint suites for this scenario as it is not only cheaper but also more efficient as we can use parallelization.
Option C is incorrect because we won’t be using the endpoint for most of its time which results in more cost/wastage of resources. Here we should use the Batch endpoint.
Option D is correct because Online endpoints are endpoints that are used for online (real-time) inferencing. Compared to batch endpoints, online endpoints contain deployments that are ready to receive data from clients and can send responses back in real time. So Online endpoints are significantly faster translating to lower latency.
References:
Use batch endpoints for batch scoring – Azure Machine Learning | Microsoft Learn, What are endpoints? – Azure Machine Learning | Microsoft Learn
Summary
To prepare for the DP-100 exam, you can start by reviewing the exam objectives and Azure documentation related to data science solutions. You can also use practice exams and sample questions to test your knowledge and identify areas that require further study.
In addition, it’s essential to have hands-on experience with Azure services related to data science. And thus Whizlabs comes up with hands-on labs and sandboxes to get familiar with real time settings.
If you want to pass the exam, you must practice, practice and practice often and get certified now.
- Study Guide DP-600 : Implementing Analytics Solutions Using Microsoft Fabric Certification Exam - June 14, 2024
- Top 15 Azure Data Factory Interview Questions & Answers - June 5, 2024
- Top Data Science Interview Questions and Answers (2024) - May 30, 2024
- What is a Kubernetes Cluster? - May 22, 2024
- Skyrocket Your IT Career with These Top Cloud Certifications - March 29, 2024
- What are the Roles and Responsibilities of an AWS Sysops Administrator? - March 28, 2024
- How to Create Azure Network Security Groups? - March 15, 2024
- What is the difference between Cloud Dataproc and Cloud Dataflow? - March 13, 2024