Databricks Certified Data Engineer Associate 9

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Welcome to download the Newest 2passeasy Databricks-Certified-Data-Engineer-Associate dumps

https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-Associate/ (88 New Questions)

Exam Questions Databricks-Certified-Data-Engineer-


Associate
Databricks Certified Data Engineer Associate Exam

https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-
Associate/

Passing Certification Exams Made Easy visit - https://www.2PassEasy.com


Welcome to download the Newest 2passeasy Databricks-Certified-Data-Engineer-Associate dumps
https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-Associate/ (88 New Questions)

NEW QUESTION 1
A data engineer has created a new database using the following command: CREATE DATABASE IF NOT EXISTS customer360;
In which of the following locations will the customer360 database be located?

A. dbfs:/user/hive/database/customer360
B. dbfs:/user/hive/warehouse
C. dbfs:/user/hive/customer360
D. More information is needed to determine the correct response

Answer: B

Explanation:
dbfs:/user/hive/warehouse - which is the default location

NEW QUESTION 2
Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?

A. Manually programming in an alert system in each cell of the Notebook


B. Setting up an Alert in the Job page
C. Setting up an Alert in the Notebook
D. There is no way to notify the Job owner in the case of Job failure
E. MLflow Model Registry Webhooks

Answer: B

Explanation:
https://docs.databricks.com/en/workflows/jobs/job-notifications.html

NEW QUESTION 3
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The cade block used by the data engineer is below:

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer
use to fill in the blank?

A. trigger("5 seconds")
B. trigger()
C. trigger(once="5 seconds")
D. trigger(processingTime="5 seconds")
E. trigger(continuous="5 seconds")

Answer: D

Explanation:
# ProcessingTime trigger with two-seconds micro-batch interval df.writeStream \
format("console") \ trigger(processingTime='2 seconds') \ start()
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers

NEW QUESTION 4
A data engineer needs to create a table in Databricks using data from their organization’s existing SQLite database.
They run the following command:

Which of the following lines of code fills in the above blank to successfully complete the task?

A. org.apache.spark.sql.jdbc
B. autoloader
C. DELTA
D. sqlite
E. org.apache.spark.sql.sqlite

Passing Certification Exams Made Easy visit - https://www.2PassEasy.com


Welcome to download the Newest 2passeasy Databricks-Certified-Data-Engineer-Associate dumps
https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-Associate/ (88 New Questions)

Answer: A

Explanation:
CREATE TABLE new_employees_table USING JDBC
OPTIONS (
url "<jdbc_url>",
dbtable "<table_name>", user '<username>', password '<password>'
) AS
SELECT * FROM employees_table_vw https://docs.databricks.com/external-data/jdbc.html#language-sql

NEW QUESTION 5
A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the
data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.
Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?

A. SELECT * FROM sales


B. spark.delta.table
C. spark.sql
D. There is no way to share data between PySpark and SQL.
E. spark.table

Answer: C

Explanation:
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate()
df = spark.sql("SELECT * FROM sales") print(df.count())

NEW QUESTION 6
A data organization leader is upset about the data analysis team’s reports being different from the data engineering team’s reports. The leader believes the siloed
nature of their organization’s data engineering and data analysis architectures is to blame.
Which of the following describes how a data lakehouse could alleviate this issue?

A. Both teams would autoscale their work as data size evolves


B. Both teams would use the same source of truth for their work
C. Both teams would reorganize to report to the same department
D. Both teams would be able to collaborate on projects in real-time
E. Both teams would respond more quickly to ad-hoc requests

Answer: B

Explanation:
A data lakehouse is designed to unify the data engineering and data analysis architectures by integrating features of both data lakes and data warehouses. One of
the key benefits of a data lakehouse is that it provides a common, centralized data repository (the "lake") that serves as a single source of truth for data storage
and analysis. This allows both data engineering and data analysis teams to work with the same consistent data sets, reducing discrepancies and ensuring that the
reports generated by both teams are based on the same underlying data.

NEW QUESTION 7
A data engineer is attempting to drop a Spark SQL table my_table and runs the following command:
DROP TABLE IF EXISTS my_table;
After running this command, the engineer notices that the data files and metadata files have been deleted from the file system.
Which of the following describes why all of these files were deleted?

A. The table was managed


B. The table's data was smaller than 10 GB
C. The table's data was larger than 10 GB
D. The table was external
E. The table did not have a location

Answer: A

Explanation:
managed tables files and metadata are managed by metastore and will be deleted when the table is dropped . while external tables the metadata is stored in a
external location. hence when a external table is dropped you clear off only the metadata and the files (data) remain.

NEW QUESTION 8
Which of the following must be specified when creating a new Delta Live Tables pipeline?

A. A key-value pair configuration


B. The preferred DBU/hour cost
C. A path to cloud storage location for the written data
D. A location of a target database for the written data
E. At least one notebook library to be executed

Answer: E

Explanation:
https://docs.databricks.com/en/delta-live-tables/tutorial-pipelines.html

Passing Certification Exams Made Easy visit - https://www.2PassEasy.com


Welcome to download the Newest 2passeasy Databricks-Certified-Data-Engineer-Associate dumps
https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-Associate/ (88 New Questions)

NEW QUESTION 9
A data engineer is working with two tables. Each of these tables is displayed below in its entirety.

The data engineer runs the following query to join these tables together:

Which of the following will be returned by the above query?

A. Option A
B. Option B
C. Option C
D. Option D
E. Option E

Answer: C

NEW QUESTION 10
A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when it is
necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a
Databricks Job.
Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their
dashboard?

A. They can turn on the Auto Stop feature for the SQL endpoint.

Passing Certification Exams Made Easy visit - https://www.2PassEasy.com


Welcome to download the Newest 2passeasy Databricks-Certified-Data-Engineer-Associate dumps
https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-Associate/ (88 New Questions)

B. They can ensure the dashboard's SQL endpoint is not one of the included query's SQL endpoint.
C. They can reduce the cluster size of the SQL endpoint.
D. They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.
E. They can set up the dashboard's SQL endpoint to be serverless.

Answer: A

NEW QUESTION 10
Which of the following data lakehouse features results in improved data quality over a traditional data lake?

A. A data lakehouse provides storage solutions for structured and unstructured data.
B. A data lakehouse supports ACID-compliant transactions.
C. A data lakehouse allows the use of SQL queries to examine data.
D. A data lakehouse stores data in open formats.
E. A data lakehouse enables machine learning and artificial Intelligence workloads.

Answer: B

Explanation:
One of the key features of a data lakehouse that results in improved data quality over a traditional data lake is its support for ACID (Atomicity, Consistency,
Isolation, Durability) transactions. ACID transactions provide data integrity and consistency guarantees, ensuring that operations on the data are reliable and that
data is not left in an inconsistent state due to failures or concurrent access. In a traditional data lake, such transactional guarantees are often lacking, making it
challenging to maintain data quality,
especially in scenarios involving multiple data writes, updates, or complex transformations. A data lakehouse, by offering ACID compliance, helps maintain data
quality by providing strong consistency and reliability, which is crucial for data pipelines and analytics.

NEW QUESTION 13
Which of the following benefits is provided by the array functions from Spark SQL?

A. An ability to work with data in a variety of types at once


B. An ability to work with data within certain partitions and windows
C. An ability to work with time-related data in specified intervals
D. An ability to work with complex, nested data ingested from JSON files
E. An ability to work with an array of tables for procedural automation

Answer: D

Explanation:
Array functions in Spark SQL are primarily used for working with arrays and complex, nested data structures, such as those often encountered when ingesting
JSON files. These functions allow you to manipulate and query nested arrays and structures within your data, making it easier to extract and work with specific
elements or values within complex data formats. While some of the other options (such as option A for working with different data types) are features of Spark SQL
or SQL in general, array functions specifically excel at handling complex, nested data structures like those found in JSON files.

NEW QUESTION 14
Which of the following statements regarding the relationship between Silver tables and Bronze tables is always true?

A. Silver tables contain a less refined, less clean view of data than Bronze data.
B. Silver tables contain aggregates while Bronze data is unaggregated.
C. Silver tables contain more data than Bronze tables.
D. Silver tables contain a more refined and cleaner view of data than Bronze tables.
E. Silver tables contain less data than Bronze tables.

Answer: D

Explanation:
https://www.databricks.com/glossary/medallion-architecture

NEW QUESTION 17
Which of the following tools is used by Auto Loader process data incrementally?

A. Checkpointing
B. Spark Structured Streaming
C. Data Explorer
D. Unity Catalog
E. Databricks SQL

Answer: B

Explanation:
The Auto Loader process in Databricks is typically used in conjunction with Spark Structured Streaming to process data incrementally. Spark Structured
Streaming is a real-time data processing framework that allows you to process data streams incrementally as new data arrives. The Auto Loader is a feature in
Databricks that works with Structured Streaming to automatically detect and process new data files as they are added to a specified data source location. It allows
for incremental data processing without the need for manual intervention.
How does Auto Loader track ingestion progress? As files are discovered, their metadata is persisted in a scalable key-value store (RocksDB) in the checkpoint
location of your Auto Loader pipeline. This key-value store ensures that data is processed exactly once. In case of failures, Auto Loader can resume from where it
left off by information stored in the checkpoint location and continue to provide exactly-once guarantees when writing data into Delta Lake. You don’t need to
maintain or manage any state yourself to achieve fault tolerance or exactly-once semantics.https://docs.databricks.com/ingestion/auto- loader/index.html

Passing Certification Exams Made Easy visit - https://www.2PassEasy.com


Welcome to download the Newest 2passeasy Databricks-Certified-Data-Engineer-Associate dumps
https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-Associate/ (88 New Questions)

NEW QUESTION 19
In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing,
which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?

A. Checkpointing and Write-ahead Logs


B. Structured Streaming cannot record the offset range of the data being processed in each trigger.
C. Replayable Sources and Idempotent Sinks
D. Write-ahead Logs and Idempotent Sinks
E. Checkpointing and Idempotent Sinks

Answer: A

Explanation:
The engine uses checkpointing and write-ahead logs to record the offset range of the data being processed in each trigger. -- in the link search for "The engine
uses " youll find the answer.https://spark.apache.org/docs/latest/structured-streaming- programming-
guide.html#:~:text=The%20engine%20uses%20checkpointing%20and,being%20processe d%20in%20each%20trigger.

NEW QUESTION 23
A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to a data analytics dashboard for a retail use case. The
job has a Databricks SQL query that returns the number of store-level records where sales is equal to zero. The data engineer wants their entire team to be
notified via a messaging webhook whenever this value is greater than 0.
Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of stores with $0 in sales
is greater than zero?

A. They can set up an Alert with a custom template.


B. They can set up an Alert with a new email alert destination.
C. They can set up an Alert with one-time notifications.
D. They can set up an Alert with a new webhook alert destination.
E. They can set up an Alert without notifications.

Answer: D

NEW QUESTION 27
Which of the following is hosted completely in the control plane of the classic Databricks architecture?

A. Worker node
B. JDBC data source
C. Databricks web application
D. Databricks Filesystem
E. Driver node

Answer: C

Explanation:
In the classic Databricks architecture, the control plane includes components like the Databricks web application, the Databricks REST API, and the Databricks
Workspace. These components are responsible for managing and controlling the Databricks environment, including cluster provisioning, notebook management,
access control, and job scheduling. The other options, such as worker nodes, JDBC data sources, Databricks Filesystem (DBFS), and driver nodes, are typically
part of the data plane or the execution environment, which is separate from the control plane. Worker nodes are responsible for executing tasks and computations,
JDBC data sources are used to connect to external databases, DBFS is a distributed file system for data storage, and driver nodes are responsible for coordinating
the execution of Spark jobs.

NEW QUESTION 32
A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop invalid records at each table. They notice that
some data is being dropped due to quality concerns at some point in the DLT pipeline. They would like to determine at which table in their pipeline the data is
being dropped.
Which of the following approaches can the data engineer take to identify the table that is dropping the records?

A. They can set up separate expectations for each table when developing their DLT pipeline.
B. They cannot determine which table is dropping the records.
C. They can set up DLT to notify them via email when records are dropped.
D. They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.
E. They can navigate to the DLT pipeline page, click on the “Error” button, and review the present errors.

Answer: D

Explanation:
To identify the table in a Delta Live Tables (DLT) pipeline where data is being dropped due to quality concerns, the data engineer can navigate to the DLT pipeline
page, click on each table in the pipeline, and view the data quality statistics. These statistics often include information about records dropped, violations of
expectations, and other data quality metrics. By examining the data quality statistics for each table in the pipeline, the data engineer can determine at which table
the data is being dropped.

NEW QUESTION 36
Which of the following commands will return the number of null values in the member_id column?

A. SELECT count(member_id) FROM my_table;


B. SELECT count(member_id) - count_null(member_id) FROM my_table;
C. SELECT count_if(member_id IS NULL) FROM my_table;
D. SELECT null(member_id) FROM my_table;
E. SELECT count_null(member_id) FROM my_table;

Passing Certification Exams Made Easy visit - https://www.2PassEasy.com


Welcome to download the Newest 2passeasy Databricks-Certified-Data-Engineer-Associate dumps
https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-Associate/ (88 New Questions)

Answer: C

Explanation:
https://docs.databricks.com/en/sql/language-manual/functions/count.html
Returns
A BIGINT.
If * is specified also counts row containing NULL values.
If expr are specified counts only rows for which all expr are not NULL. If DISTINCT duplicate rows are not counted.

NEW QUESTION 41
A data engineer wants to schedule their Databricks SQL dashboard to refresh once per day, but they only want the associated SQL endpoint to be running when it
is necessary.
Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their
dashboard?

A. They can ensure the dashboard’s SQL endpoint matches each of the queries’ SQL endpoints.
B. They can set up the dashboard’s SQL endpoint to be serverless.
C. They can turn on the Auto Stop feature for the SQL endpoint.
D. They can reduce the cluster size of the SQL endpoint.
E. They can ensure the dashboard’s SQL endpoint is not one of the included query’s SQL endpoint.

Answer: C

NEW QUESTION 46
Which of the following describes the relationship between Gold tables and Silver tables?

A. Gold tables are more likely to contain aggregations than Silver tables.
B. Gold tables are more likely to contain valuable data than Silver tables.
C. Gold tables are more likely to contain a less refined view of data than Silver tables.
D. Gold tables are more likely to contain more data than Silver tables.
E. Gold tables are more likely to contain truthful data than Silver tables.

Answer: A

Explanation:
In some data processing pipelines, especially those following a typical "Bronze-Silver-Gold" data lakehouse architecture, Silver tables are often considered a more
refined version of the raw or Bronze data. Silver tables may include data cleansing, schema enforcement, and some initial transformations. Gold tables, on the
other hand, typically represent a stage where data is further enriched, aggregated, and processed to provide valuable insights for analytical purposes. This could
indeed involve more aggregations compared to Silver tables.

NEW QUESTION 47
A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to
use Python without making any changes to those cells.
Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?

A. It is not possible to use SQL in a Python notebook


B. They can attach the cell to a SQL endpoint rather than a Databricks cluster
C. They can simply write SQL syntax in the cell
D. They can add %sql to the first line of the cell
E. They can change the default language of the notebook to SQL

Answer: D

NEW QUESTION 48
A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to an ELT job. The ELT job has its Databricks SQL query
that returns the number of input records containing unexpected NULL values. The data engineer wants their entire team to be notified via a messaging webhook
whenever this value reaches 100.
Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of NULL values reaches
100?

A. They can set up an Alert with a custom template.


B. They can set up an Alert with a new email alert destination.
C. They can set up an Alert with a new webhook alert destination.
D. They can set up an Alert with one-time notifications.
E. They can set up an Alert without notifications.

Answer: C

Explanation:
To achieve this, the data engineer can set up an Alert in the Databricks workspace that triggers when the query results exceed the threshold of 100 NULL values.
They can create a new webhook alert destination in the Alert's configuration settings and provide the necessary messaging webhook URL to receive notifications.
When the Alert is triggered, it will send a message to the configured webhook URL, which will then notify the entire team of the issue.

NEW QUESTION 51
A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version
that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have
been deleted.
Which of the following explains why the data files are no longer present?

Passing Certification Exams Made Easy visit - https://www.2PassEasy.com


Welcome to download the Newest 2passeasy Databricks-Certified-Data-Engineer-Associate dumps
https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-Associate/ (88 New Questions)

A. The VACUUM command was run on the table


B. The TIME TRAVEL command was run on the table
C. The DELETE HISTORY command was run on the table
D. The OPTIMIZE command was nun on the table
E. The HISTORY command was run on the table

Answer: A

Explanation:
The VACUUM command in Delta Lake is used to clean up and remove unnecessary data files that are no longer needed for time travel or query purposes. When
you run VACUUMwith certain retention settings, it can delete older data files, which might include versions of data that are older than the specified retention period.
If the data engineer is unable to restore the table to a version that is 3 days old because the data files have been deleted, it's likely because the VACUUM
command was run on the table, removing the older data files as part of data cleanup.

NEW QUESTION 56
A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query
using table_name.
They have the following incomplete code block:
(f"SELECT customer_id, spend FROM {table_name}")
Which of the following can be used to fill in the blank to successfully complete the task?

A. spark.delta.sql
B. spark.delta.table
C. spark.table
D. dbutils.sql
E. spark.sql

Answer: E

NEW QUESTION 58
An engineering manager wants to monitor the performance of a recent project using a Databricks SQL query. For the first week following the project’s release, the
manager wants the query results to be updated every minute. However, the manager is concerned that the compute resources used for the query will be left
running and cost the organization a lot of money beyond the first week of the project’s release.
Which of the following approaches can the engineering team use to ensure the query does not cost the organization any money beyond the first week of the
project’s release?

A. They can set a limit to the number of DBUs that are consumed by the SQL Endpoint.
B. They can set the query’s refresh schedule to end after a certain number of refreshes.
C. They cannot ensure the query does not cost the organization money beyond the first week of the project’s release.
D. They can set a limit to the number of individuals that are able to manage the query’s refresh schedule.
E. They can set the query’s refresh schedule to end on a certain date in the query scheduler.

Answer: E

Explanation:
If a dashboard is configured for automatic updates, it has a Scheduled button at the top, rather than a Schedule button. To stop automatically updating the
dashboard and remove its subscriptions:
Click Scheduled.
In the Refresh every drop-down, select Never.
Click Save. The Scheduled button label changes to Schedule. Source:https://learn.microsoft.com/en-us/azure/databricks/sql/user/dashboards/

NEW QUESTION 60
A data engineer needs to apply custom logic to string column city in table stores for a specific use case. In order to apply this custom logic at scale, the data
engineer wants to create a SQL user-defined function (UDF).
Which of the following code blocks creates this SQL UDF?
A.

B.

C.

D.

E.

Passing Certification Exams Made Easy visit - https://www.2PassEasy.com


Welcome to download the Newest 2passeasy Databricks-Certified-Data-Engineer-Associate dumps
https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-Associate/ (88 New Questions)

A.

Answer: A

Explanation:
https://www.databricks.com/blog/2021/10/20/introducing-sql-user-defined- functions.html

NEW QUESTION 65
A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using
LIVE TABLE.
The table is configured to run in Production mode using the Continuous Pipeline Mode. Assuming previously unprocessed data exists and all definitions are valid,
what is the
expected outcome after clicking Start to update the pipeline?

A. All datasets will be updated at set intervals until the pipeline is shut dow
B. The compute resources will persist to allow for additional testing.
C. All datasets will be updated once and the pipeline will persist without any processin
D. The compute resources will persist but go unused.
E. All datasets will be updated at set intervals until the pipeline is shut dow
F. The compute resources will be deployed for the update and terminated when the pipeline is stopped.
G. All datasets will be updated once and the pipeline will shut dow
H. The compute resources will be terminated.
I. All datasets will be updated once and the pipeline will shut dow
J. The compute resources will persist to allow for additional testing.

Answer: C

Explanation:
In a Delta Live Table pipeline running in Continuous Pipeline Mode, when you click Start to update the pipeline, the following outcome is expected: All datasets
defined using STREAMING LIVE TABLE and LIVE TABLE against Delta Lake table sources will be updated at set intervals. The compute resources will be
deployed for the update process and will be active during the execution of the pipeline. The compute resources will be terminated when the pipeline is stopped or
shut down. This mode allows for continuous and periodic updates to the datasets as new data arrives or changes in the
underlying Delta Lake tables occur. The compute resources are provisioned and utilized during the update intervals to process the data and perform the necessary
operations.

NEW QUESTION 70
A data engineer has left the organization. The data team needs to transfer ownership of the data engineer’s Delta tables to a new data engineer. The new data
engineer is the lead engineer on the data team.
Assuming the original data engineer no longer has access, which of the following individuals must be the one to transfer ownership of the Delta tables in Data
Explorer?

A. Databricks account representative


B. This transfer is not possible
C. Workspace administrator
D. New lead data engineer
E. Original data engineer

Answer: C

Explanation:
https://docs.databricks.com/sql/admin/transfer-ownership.html

NEW QUESTION 74
A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using
Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate
their pipeline to use Delta Live Tables.
Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?

A. None of these changes will need to be made


B. The pipeline will need to stop using the medallion-based multi-hop architecture
C. The pipeline will need to be written entirely in SQL
D. The pipeline will need to use a batch source in place of a streaming source
E. The pipeline will need to be written entirely in Python

Answer: A

NEW QUESTION 78
A data engineer that is new to using Python needs to create a Python function to add two integers together and return the sum?
Which of the following code blocks can the data engineer use to complete this task?
A)

Passing Certification Exams Made Easy visit - https://www.2PassEasy.com


Welcome to download the Newest 2passeasy Databricks-Certified-Data-Engineer-Associate dumps
https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-Associate/ (88 New Questions)

B)

C)

D)

E)

A. Option A
B. Option B
C. Option C
D. Option D
E. Option E

Answer: D

Explanation:
https://www.w3schools.com/python/python_functions.asp

NEW QUESTION 82
A data engineer wants to create a data entity from a couple of tables. The data entity must be used by other data engineers in other sessions. It also must be
saved to a physical location.
Which of the following data entities should the data engineer create?

A. Database
B. Function
C. View
D. Temporary view
E. Table

Answer: E

Explanation:
In the context described, creating a "Table" is the most suitable choice. Tables in SQL are data entities that exist independently of any session and are saved in a
physical location. They can be accessed and manipulated by other data engineers in different sessions, which aligns with the requirements stated. A "Database" is
a collection of tables, views, and other database objects. A "Function" is a stored procedure that performs an operation. A "View" is a virtual table based on the
result-set of an SQL statement, but it is not stored physically. A "Temporary view" is a feature that allows you to store the result of a query as a view that
disappears once your session with the database is closed.

NEW QUESTION 87
Which of the following is stored in the Databricks customer's cloud account?

A. Databricks web application


B. Cluster management metadata
C. Repos
D. Data
E. Notebooks

Answer: D

NEW QUESTION 88
A data engineer has a Job with multiple tasks that runs nightly. Each of the tasks runs slowly because the clusters take a long time to start.
Which of the following actions can the data engineer perform to improve the start up time for the clusters used for the Job?

A. They can use endpoints available in Databricks SQL


B. They can use jobs clusters instead of all-purpose clusters
C. They can configure the clusters to be single-node
D. They can use clusters that are from a cluster pool
E. They can configure the clusters to autoscale for larger data sizes

Answer: D

Passing Certification Exams Made Easy visit - https://www.2PassEasy.com


Welcome to download the Newest 2passeasy Databricks-Certified-Data-Engineer-Associate dumps
https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-Associate/ (88 New Questions)

Explanation:
Cluster pools are a way to pre-provision clusters that are ready to use. This can reduce the start up time for clusters, as they do not have to be created from
scratch. All-purpose clusters are not pre-provisioned, so they will take longer to start up. Jobs clusters are a type of cluster pool, but they are not the best option for
this use case. Jobs clusters are designed for long-running jobs, and they can be more expensive than other types of cluster pools. Single-node clusters are the
smallest type of cluster, and they will start up the fastest. However, they may not be powerful enough to run the Job's tasks. Autoscaling clusters can scale up or
down based on demand. This can help to improve the start up time for clusters, as they will only be created when they are needed. However, autoscaling clusters
can also be more expensive than other types of cluster pool https://docs.databricks.com/en/clusters/pool-best-practices.html

NEW QUESTION 92
Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?

A. Parquet files can be partitioned


B. CREATE TABLE AS SELECT statements cannot be used on files
C. Parquet files have a well-defined schema
D. Parquet files have the ability to be optimized
E. Parquet files will become Delta tables

Answer: C

Explanation:
https://www.databricks.com/glossary/what-is- parquet#:~:text=Columnar%20storage%20like%20Apache%20Parquet,compared%20to%2
0row%2Doriented%20databases. Columnar storage like Apache Parquet is designed to bring efficiency compared to row-based files like CSV. When querying,
columnar storage you can skip over the non-relevant data very quickly. As a result, aggregation queries are less time-consuming compared to row-oriented
databases.

NEW QUESTION 93
A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files
should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up
the pipeline to only ingest those new files with each run.
Which of the following tools can the data engineer use to solve this problem?

A. Unity Catalog
B. Delta Lake
C. Databricks SQL
D. Data Explorer
E. Auto Loader

Answer: E

Explanation:
Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage without any additional
setup.https://docs.databricks.com/en/ingestion/auto-loader/index.html

NEW QUESTION 94
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The code block used by the data engineer is below:

If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data
engineer use to fill in the blank?

A. processingTime(1)
B. trigger(availableNow=True)
C. trigger(parallelBatch=True)
D. trigger(processingTime="once")
E. trigger(continuous="once")

Answer: B

Explanation:
https://stackoverflow.com/questions/71061809/trigger-availablenow-for-delta- source-streaming-queries-in-pyspark-databricks

NEW QUESTION 97
......

Passing Certification Exams Made Easy visit - https://www.2PassEasy.com


Welcome to download the Newest 2passeasy Databricks-Certified-Data-Engineer-Associate dumps
https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-Associate/ (88 New Questions)

THANKS FOR TRYING THE DEMO OF OUR PRODUCT

Visit Our Site to Purchase the Full Set of Actual Databricks-Certified-Data-Engineer-Associate Exam
Questions With Answers.

We Also Provide Practice Exam Software That Simulates Real Exam Environment And Has Many Self-Assessment Features. Order the
Databricks-Certified-Data-Engineer-Associate Product From:

https://www.2passeasy.com/dumps/Databricks-Certified-Data-Engineer-
Associate/

Money Back Guarantee

Databricks-Certified-Data-Engineer-Associate Practice Exam Features:

* Databricks-Certified-Data-Engineer-Associate Questions and Answers Updated Frequently

* Databricks-Certified-Data-Engineer-Associate Practice Questions Verified by Expert Senior Certified Staff

* Databricks-Certified-Data-Engineer-Associate Most Realistic Questions that Guarantee you a Pass on Your FirstTry

* Databricks-Certified-Data-Engineer-Associate Practice Test Questions in Multiple Choice Formats and Updatesfor 1 Year

Passing Certification Exams Made Easy visit - https://www.2PassEasy.com


Powered by TCPDF (www.tcpdf.org)

You might also like