DP-203 Exam - Free Actual Q&As, Page 5 - ExamTopics

- Expert Verified, Online, Free.
Get 20% Off on Annual Contributor Access

Use coupon code SALE2024 to apply during checkout .
Get Contributor Access
Offer Ends in 00:02:13s
 Custom View Settings

Question #41 Topic 1
DRAG DROP -
You need to build a solution to ensure that users can query specific files in an Azure Data Lake Storage Gen2 account from an Azure Synapse
Analytics serverless SQL pool.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and
arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.
Select and Place:
Correct Answer:
Step 1: Create an external data source
You can create external tables in Synapse SQL pools via the following steps:
1. CREATE EXTERNAL DATA SOURCE to reference an external Azure storage and specify the credential that should be used to access the
storage.
2. CREATE EXTERNAL FILE FORMAT to describe format of CSV or Parquet files.
3. CREATE EXTERNAL TABLE on top of the files placed on the data source with the same file format.
Step 2: Create an external file format object
Creating an external file format is a prerequisite for creating an external table.
Step 3: Create an external table
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables
You are designing a data mart for the human resources (HR) department at your company. The data mart will contain employee information and
employee transactions.
From a source system, you have a flat extract that has the following fields:
✑ EmployeeID
FirstName -
✑ LastName
✑ Recipient
✑ GrossAmount
✑ TransactionID
✑ GovernmentID
✑ NetAmountPaid
✑ TransactionDate
You need to design a star schema data model in an Azure Synapse Analytics dedicated SQL pool for the data mart.
Which two tables should you create? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. a dimension table for Transaction
B. a dimension table for EmployeeTransaction
C. a dimension table for Employee Most Voted
D. a fact table for Employee
E. a fact table for Transaction Most Voted
Correct Answer: CE
C: Dimension tables contain attribute data that might change but usually changes infrequently. For example, a customer's name and address
are stored in a dimension table and updated only when the customer's profile changes. To minimize the size of a large fact table, the customer's
name and address don't need to be in every row of a fact table. Instead, the fact table and the dimension table can share a customer ID. A query
can join the two tables to associate a customer's profile and transactions.
E: Fact tables contain quantitative data that are commonly generated in a transactional system, and then loaded into the dedicated SQL pool.
For example, a retail business generates sales transactions every day, and then loads the data into a dedicated SQL pool fact table for analysis.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview
Community vote distribution

CE (100%)
You are designing a dimension table for a data warehouse. The table will track the value of the dimension attributes over time and preserve the
history of the data by adding new rows as the data changes.
Which type of slowly changing dimension (SCD) should you use?
A. Type 0
B. Type 1
C. Type 2 Most Voted
D. Type 3
Correct Answer: C
A Type 2 SCD supports versioning of dimension members. Often the source system doesn't store versions, so the data warehouse load process
detects and manages changes in a dimension table. In this case, the dimension table must use a surrogate key to provide a unique reference to
a version of the dimension member. It also includes columns that define the date range validity of the version (for example, StartDate and
EndDate) and possibly a flag column (for example,
IsCurrent) to easily filter by current dimension members.
Incorrect Answers:
B: A Type 1 SCD always reflects the latest values, and when changes in source data are detected, the dimension table data is overwritten.
D: A Type 3 SCD supports storing two versions of a dimension member as separate columns. The table includes a column for the current value
of a member plus either the original or previous value of the member. So Type 3 uses additional columns to track one key instance of history,
rather than storing additional rows to track each change like in a Type 2 SCD.
Reference:
https://docs.microsoft.com/en-us/learn/modules/populate-slowly-changing-dimensions-azure-synapse-analytics-pipelines/3-choose-between-
dimension-types

C (100%)
DRAG DROP -
You have data stored in thousands of CSV files in Azure Data Lake Storage Gen2. Each file has a header row followed by a properly formatted
carriage return (/ r) and line feed (/n).
You are implementing a pattern that batch loads the files daily into a dedicated SQL pool in Azure Synapse Analytics by using PolyBase.
You need to skip the header row when you import the files into the data warehouse. Before building the loading pattern, you need to prepare the
required database objects in Azure Synapse Analytics.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and
arrange them in the correct order.
NOTE: Each correct selection is worth one point
Select and Place:
Correct Answer:
Step 1: Create an external data source that uses the abfs location
Create External Data Source to reference Azure Data Lake Store Gen 1 or 2
Step 2: Create an external file format and set the First_Row option.
Create External File Format.
Step 3: Use CREATE EXTERNAL TABLE AS SELECT (CETAS) and configure the reject options to specify reject values or percentages
To use PolyBase, you must create external tables to reference your external data.
Use reject options.
Note: REJECT options don't apply at the time this CREATE EXTERNAL TABLE AS SELECT statement is run. Instead, they're specified here so that
the database can use them at a later time when it imports data from the external table. Later, when the CREATE TABLE AS SELECT statement
selects data from the external table, the database will use the reject options to determine the number or percentage of rows that can fail to
import before it stops the import.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-t-sql-objects https://docs.microsoft.com/en-us/sql/t-
sql/statements/create-external-table-as-select-transact-sql
HOTSPOT -
You are building an Azure Synapse Analytics dedicated SQL pool that will contain a fact table for transactions from the first half of the year 2020.
You need to ensure that the table meets the following requirements:
✑ Minimizes the processing time to delete data that is older than 10 years
✑ Minimizes the I/O for queries that use year-to-date values
How should you complete the Transact-SQL statement? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
Box 1: PARTITION -
RANGE RIGHT FOR VALUES is used with PARTITION.
Part 2: [TransactionDateID]
Partition on the date column.
Example: Creating a RANGE RIGHT partition function on a datetime column
The following partition function partitions a table or index into 12 partitions, one for each month of a year's worth of values in a datetime
column.
CREATE PARTITION FUNCTION [myDateRangePF1] (datetime)
AS RANGE RIGHT FOR VALUES ('20030201', '20030301', '20030401',
'20030501', '20030601', '20030701', '20030801',
'20030901', '20031001', '20031101', '20031201');
Reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-partition-function-transact-sql
You are performing exploratory analysis of the bus fare data in an Azure Data Lake Storage Gen2 account by using an Azure Synapse Analytics
serverless SQL pool.
You execute the Transact-SQL query shown in the following exhibit.
What do the query results include?
A. Only CSV files in the tripdata_2020 subfolder.
B. All files that have file names that beginning with "tripdata_2020".
C. All CSV files that have file names that contain "tripdata_2020".
D. Only CSV that have file names that beginning with "tripdata_2020". Most Voted
Correct Answer: D

D (100%)
DRAG DROP -
You use PySpark in Azure Databricks to parse the following JSON input.
You need to output the data in the following tabular format.
How should you complete the PySpark code? To answer, drag the appropriate values to the correct targets. Each value may be used once, more
than once, or not at all. You may need to drag the spit bar between panes or scroll to view content.
Select and Place:
Correct Answer:
Box 1: select -
Box 2: explode -
Bop 3: alias -
pyspark.sql.Column.alias returns this column aliased with a new name or names (in the case of expressions that return more than one column,
such as explode).
Reference:
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.Column.alias.html https://docs.microsoft.com/en-
us/azure/databricks/sql/language-manual/functions/explode
HOTSPOT -
You are designing an application that will store petabytes of medical imaging data.
When the data is first created, the data will be accessed frequently during the first week. After one month, the data must be accessible within 30
seconds, but files will be accessed infrequently. After one year, the data will be accessed infrequently but must be accessible within five minutes.
You need to select a storage strategy for the data. The solution must minimize costs.
Which storage tier should you use for each time frame? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
Box 1: Hot -
Hot tier - An online tier optimized for storing data that is accessed or modified frequently. The Hot tier has the highest storage costs, but the
lowest access costs.
Box 2: Cool -
Cool tier - An online tier optimized for storing data that is infrequently accessed or modified. Data in the Cool tier should be stored for a
minimum of 30 days. The
Cool tier has lower storage costs and higher access costs compared to the Hot tier.
Box 3: Cool -
Not Archive tier - An offline tier optimized for storing data that is rarely accessed, and that has flexible latency requirements, on the order of
hours. Data in the
Archive tier should be stored for a minimum of 180 days.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/access-tiers-overview https://www.altaro.com/hyper-v/azure-archive-storage/
You have an Azure Synapse Analytics Apache Spark pool named Pool1.
You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1. The structure and data types vary by file.
You need to load the files into the tables. The solution must maintain the source data types.
What should you do?
A. Use a Conditional Split transformation in an Azure Synapse data flow.
B. Use a Get Metadata activity in Azure Data Factory.
C. Load the data by using the OPENROWSET Transact-SQL command in an Azure Synapse Analytics serverless SQL pool.
D. Load the data by using PySpark. Most Voted
Correct Answer: C
Serverless SQL pool can automatically synchronize metadata from Apache Spark. A serverless SQL pool database will be created for each
database existing in serverless Apache Spark pools.
Serverless SQL pool enables you to query data in your data lake. It offers a T-SQL query surface area that accommodates semi-structured and
unstructured data queries.
To support a smooth experience for in place querying of data that's located in Azure Storage files, serverless SQL pool uses the OPENROWSET
function with additional capabilities.
The easiest way to see to the content of your JSON file is to provide the file URL to the OPENROWSET function, specify csv FORMAT.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/query-json-files https://docs.microsoft.com/en-us/azure/synapse-
analytics/sql/query-data-storage

D (91%) 9%
You have an Azure Databricks workspace named workspace1 in the Standard pricing tier. Workspace1 contains an all-purpose cluster named
cluster1.
You need to reduce the time it takes for cluster1 to start and scale up. The solution must minimize costs.
What should you do first?
A. Configure a global init script for workspace1.
B. Create a cluster policy in workspace1.
C. Upgrade workspace1 to the Premium pricing tier.
D. Create a pool in workspace1. Most Voted
Correct Answer: D
You can use Databricks Pools to Speed up your Data Pipelines and Scale Clusters Quickly.
Databricks Pools, a managed cache of virtual machine instances that enables clusters to start and scale 4 times faster.
Reference:
https://databricks.com/blog/2019/11/11/databricks-pools-speed-up-data-pipelines.html

D (73%) C (27%)
 Previous Questions Next Questions 
Get IT Certification
Unlock free, top-quality video courses on ExamTopics with a simple
registration. Elevate your learning journey with our expertly curated content.
Register now to access a diverse range of educational resources designed for
your success. Start learning today with ExamTopics!
Start Learning for free

DP-203 Exam - Free Actual Q&As, Page 5 - ExamTopics

Uploaded by

Copyright:

Available Formats

DP-203 Exam - Free Actual Q&As, Page 5 - ExamTopics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DP-203 Exam - Free Actual Q&As, Page 5 - ExamTopics

Uploaded by

Copyright:

Available Formats

- Expert Verified, Online, Free.

Get 20% Off on Annual Contributor Access

Get Contributor Access

Offer Ends in 00:02:13s

 Custom View Settings

Analytics serverless SQL pool.

arrange them in the correct order.

Select and Place:

Step 1: Create an external data source

2. CREATE EXTERNAL FILE FORMAT to describe format of CSV or Parquet files.

Step 2: Create an external file format object

Creating an external file format is a prerequisite for creating an external table.

Step 3: Create an external table

NOTE: Each correct selection is worth one point.

A. a dimension table for Transaction

B. a dimension table for EmployeeTransaction

C. a dimension table for Employee Most Voted

D. a fact table for Employee

E. a fact table for Transaction Most Voted

Community vote distribution

history of the data by adding new rows as the data changes.

Which type of slowly changing dimension (SCD) should you use?

C. Type 2 Most Voted

EndDate) and possibly a flag column (for example,

IsCurrent) to easily filter by current dimension members.

Community vote distribution

carriage return (/ r) and line feed (/n).

required database objects in Azure Synapse Analytics.

arrange them in the correct order.

NOTE: Each correct selection is worth one point

Select and Place:

Create External File Format.

Use reject options.

import before it stops the import.

NOTE: Each correct selection is worth one point.

RANGE RIGHT FOR VALUES is used with PARTITION.

Partition on the date column.

Example: Creating a RANGE RIGHT partition function on a datetime column

CREATE PARTITION FUNCTION [myDateRangePF1] (datetime)

AS RANGE RIGHT FOR VALUES ('20030201', '20030301', '20030401',

'20030501', '20030601', '20030701', '20030801',

'20030901', '20031001', '20031101', '20031201');

serverless SQL pool.

You execute the Transact-SQL query shown in the following exhibit.

What do the query results include?

A. Only CSV files in the tripdata_2020 subfolder.

Community vote distribution

You need to output the data in the following tabular format.

NOTE: Each correct selection is worth one point.

Select and Place:

NOTE: Each correct selection is worth one point.

lowest access costs.

hours. Data in the

Archive tier should be stored for a minimum of 180 days.

Question #49 Topic 1

What should you do?

A. Use a Conditional Split transformation in an Azure Synapse data flow.

B. Use a Get Metadata activity in Azure Data Factory.

D. Load the data by using PySpark. Most Voted