What Is Data and Why Data Is A Very Important Asset
What Is Data and Why Data Is A Very Important Asset
Data is a collection of facts such as numbers, descriptions, and observations used in decision
making.In this competitive market, data is a valuable asset, and when analyzed properly can turn
into a wealth of useful information and inform critical business decisions.
12. What are the tasks that are involved in the analytical system?
Data Ingestion: Data ingestion is the process of capturing the raw data. This data could be taken
from control devices measuring environmental information such as temperature and pressure,
point-of-sale devices recording the items purchased by a customer in a supermarket, financial data
recording the movement of money between bank accounts, and weather data from weather
stations. Some of this data might come from a separate OLTP system. To process and analyze this
data, you must first store the data in a repository of some sort. The repository could be a file store,
a document database, or even a relational database.Data Transformation/Data Processing: The
raw data might not be in a format that is suitable for querying. The data might contain anomalies
that should be filtered out, or it may require transforming in some way. For example, dates or
addresses might need to be converted into a standard format. After data is ingested into a data
repository, you may want to do some cleaning operations and remove any questionable or invalid
data, or perform some aggregations such as calculating profit, margin, and other Key Performance
Metrics (KPIs). KPIs are how businesses are measured for growth and performance.Data
Querying: After data is ingested and transformed, you can query the data to analyze it. You may
be looking for trends, or attempting to determine the cause of problems in your systems. Many
database management systems provide tools to enable you to perform ad-hoc queries against your
data and generate regular reports.Data Visualization: Data represented in tables such as rows and
columns, or as documents, aren’t always intuitive. Visualizing the data can often be useful as a
tool for examining data. You can generate charts such as bar charts, line charts, plot results on
geographical maps, pie charts, or illustrate how data changes over time. Microsoft offers
visualization tools like Power BI to provide rich graphical representation of your data.
14. You have a lot of customer data and you have decided to store this
data in the relational database. What is the first thing you should do?
normalization
23. What are the other differences between streaming and batch
processing of data?
Data Scope: Batch data can process all the data in the dataset. Stream processing typically only
has access to the most recent data received, or within a rolling time window (the last 30 seconds,
for example).Data Size: Batch data is suitable for handling large datasets efficiently. Stream
processing is intended for individual records or micro batches consisting of few
records.Performance: The latency for batch processing is typically a few hours. Stream
processing typically occurs immediately, with latency in the order of seconds or milliseconds.
Latency is the time taken for the data to be received and processed.Analysis: You typically use
batch processing for performing complex analytics. Stream processing is used for simple response
functions, aggregates, or calculations such as rolling averages.
32. What are some of the common tools that Data engineer uses?
sqlcmd utility, Azure Databricks, and Azure HDInsight, etc
34. What are some of the common tools that Data Analyst uses?
Power BI
41. What are the most common use cases of relational databases?
Examples of OLTP applications that use relational databases are banking solutions, online retail
applications, flight reservation systems, and many online purchasing applications.
44. You have a table that is read frequently and rarely updates or inserts.
How do you increase the performance of the queries?
Creating a index
57. You are building a system that monitors the temperature throughout
a set of office blocks and sets the air conditioning in each room in each
block to maintain a pleasant ambient temperature. Your system has to
manage the air conditioning in several thousand buildings spread across
the country or region, and each building typically contains at least 100
air-conditioned rooms. What type of NoSQL datastore is most
appropriate for capturing the temperature data to enable it to be
processed quickly?
A key-value store
58. What Is Data Wrangling?
Wrangling is the process by which you transform and map raw data into a more useful format for
analysis. It can involve writing code to capture, filter, clean, combine, and aggregate data from
many sources.
66. We have IaaS, PaaS, SaaS. In which category that Azure data
services fall into?
Azure Data Services fall into the PaaS category. These services are a series of DBMSs managed
by Microsoft in the cloud. Each data service takes care of the configuration, day-to-day
management, software updates, and security of the databases that it hosts.
68. Using Azure Data Services reduces the amount of time that you need
to invest to administer a DBMS. Is this true?
True
72. How do you get more control than Azure Data Services allow?
you can install your database management system on a virtual machine that runs in Azure.
75. In Which situation you should run the Microsoft SQL Server on
Virtual Machines?
This approach is optimized for migrating existing applications to Azure, or extending existing on-
premises applications to the cloud in hybrid deployments.A hybrid deployment is a system where
part of the operation runs on-premises, and part in the cloud. Your database might be part of a
larger system that runs on-premises, although the database elements might be hosted in the cloud.
77. What are the options available when selecting the Azure SQL
database?
Azure SQL Database is a PaaS offering from Microsoft. Azure SQL Database is available with
several options: Single Database, Elastic Pool, and Managed Instance.Single Database
This option enables you to quickly set up and run a single SQL Server database. You create and
run a database server in the cloud, and you access your database through this server. Microsoft
manages the server, so all you have to do is configure the database, create your tables, and
populate them with your data. You can scale the database if you need additional storage space,
memory, or processing power.Elastic Pool
This option is similar to Single Database, except that by default multiple databases can share the
same resources, such as memory, data storage space, and processing power. The resources are
referred to as a pool. You create the pool, and only your databases can use the pool. This model is
useful if you have databases with resource requirements that vary over time, and can help you to
reduce costs.Managed Instance
Managed instance effectively runs a fully controllable instance of SQL Server in the cloud. You
can install multiple databases on the same instance. You have complete control over this instance,
much as you would for an on-premises server.
78. What are the use cases of the Azure SQL Database?
Modern cloud applications that need to use the latest stable SQL Server features.
Applications that require high availability.
Systems with a variable load, that need the database server to scale up and down quickly.
79. What are the features of the Azure SQL Database?
* Azure SQL Database automatically updates and patches the SQL Server software to ensure that
you are always running the latest and most secure version of the service.* The scalability features
of Azure SQL Database ensure that you can increase the resources available to store and process
data without having to perform a costly manual upgrade.* The service provides high availability
guarantees, to ensure that your databases are available at least 99.99% of the time.* Azure SQL
Database supports point-in-time restore, enabling you to recover a database to the state it was in at
any point in the past.* Databases can be replicated to different regions to provide additional
assurance and disaster recovery* Advanced threat protection provides advanced security
capabilities, such as vulnerability assessments, to help detect and remediate potential security
problems with your databases.* It continuously monitors your database for suspicious activities,
and provides immediate security alerts on potential vulnerabilities, SQL injection attacks, and
anomalous database access patterns.* SQL Database helps secure your data by providing
encryption. For data in motion, it uses transport layer security. For data at rest, it uses transparent
data encryption.
80. What is the use case for Azure SQL server managed instance?
Consider Azure SQL Database managed instance if you want to lift-and-shift an on-premises SQL
Server instance and all its databases to the cloud, without incurring the management overhead of
running SQL Server on a virtual machine.
82. PostgreSQL has its own query language called pgsql. Is this true?
True
83. What are the deployment options for Azure Database for
PostgreSQL?
Azure Database for PostgreSQL single-server
The single-server deployment option for PostgreSQL provides similar benefits as Azure Database
for MySQL. You choose from three pricing tiers: Basic, General Purpose, and Memory
Optimized. Each tier supports different numbers of CPUs, memory, and storage sizes—you select
one based on the load you expect to support.Azure Database for PostgreSQL Hyperscale
(Citus)
Hyperscale (Citus) is a deployment option that scales queries across multiple server nodes to
support large database loads. Your database is split across nodes. Data is split into chunks based
on the value of a partition key or sharding key. Consider using this deployment option for the
largest database PostgreSQL deployments in the Azure Cloud.
84. Scaling up or out will take effect without restarting the SQL database.
Is this true?
True
85. What is the best way to transfer the data in a PostgreSQL database
running on-premises into a database running Azure Database for
PostgreSQL service?
Use the Azure Database Migration Services
86. When using an Azure SQL Database managed instance, what is the
simplest way to implement backups?
Backups are automatically handled
87. You’re responsible for all software installation and maintenance, and
performing backups when SQL Server running on a virtual machine. Is
this true?
True
90. Consider the following SQL statement what is the table name and
columns name?
SELECT *
FROM customers
WHERE username = 'contoso'
Table Name: customers
Column Name: username
91. Consider the following SQL statement what is the table name and
columns name?
98. Which clause should you use with the SELECT, UPDATE, and
DELETE statements to apply changes only for specific rows?
WHERE
99. Which clause should you use t sort the data in the select query?
ORDER BY
100. Which clause should you use to retrieve the related data from
multiple tables?
JOINA join condition defines the way two tables are related in a query by:Specifying the column
from each table to be used for the join. A typical join condition specifies a foreign key from one
table and its associated primary key in the other table.Specifying a logical operator (for example, =
or <>,) to be used in comparing values from the columns.
101. Consider the following SQL query which type of statement is this?
102. Consider the following SQL query which type of statement is this?
105. You have semi-structured data and you want to store that data in the
database as key-value pairs where the key is unique and columns can
vary and each row holding the entire data for a logical entity. Which
storage option should you select?
Azure Table Storage
107. Why Azure Table Storage provides much faster access to the data
you need?
Azure Table Storage provides much faster access to the data because the data is available in a
single row, without requiring that you perform joins across relationships.To help ensure fast
access, Azure Table Storage splits a table into partitions.
112. What are point queries and range queries in Azure Table Storage?
In a point query, when an application retrieves a single row, the partition key enables Azure to
quickly hone in on the correct partition, and the row key lets Azure identify the row in that
partition.In a range query, the application searches for a set of rows in a partition, specifying the
start and end point of the set as row keys. This type of query is also very quick, as long as you
have designed your row keys according to the requirements of the queries performed by your
application.
113. You need to define a schema for Azure Table Storage. Is this
correct?
FalseAzure Table Storage tables are schemaless. It's easy to adapt your data as the needs of your
application evolve.
118. You need to create a storage account before creating an Azure Table
Storage. Is this correct?
True
120. You need to create a storage account before creating an Azure Blob
Storage. Is this correct?
True
121. What are the different types of the blob that Azure Blob Service
Supports?
Block blobs: A block blob is handled as a set of blocks. Each block can vary in size, up to 100
MB. A block blob can contain up to 50,000 blocks, giving a maximum size of over 4.7 TB. The
block is the smallest amount of data that can be read or written as an individual unit. Block blobs
are best used to store discrete, large, binary objects that change infrequently.Page blobs: A page
blob is organized as a collection of fixed size 512-byte pages. A page blob is optimized to support
random read and write operations; you can fetch and store data for a single page if necessary. A
page blob can hold up to 8 TB of data. Azure uses page blobs to implement virtual disk storage for
virtual machines.Append blobs: An append blob is a block blob optimized to support append
operations. You can only add blocks to the end of an append blob; updating or deleting existing
blocks isn’t supported. Each block can vary in size, up to 4 MB. The maximum size of an append
blob is just over 195 GB.
122. You control who can read and write blobs inside a container at the
container level. Is this true?
True
124. You are saving objects into Blob Storage with Hot tier and you want
to move these files after 6 months to the Archive tier as we no longer
need to access the files. How do you handle this scenario?
You can create lifecycle management policies for blobs in a storage account. A lifecycle
management policy can automatically move a blob from Hot to Cool, and then to the Archive tier,
as it ages and is used less frequently (policy is based on the number of days since modification).
126. What should you do if you want to maintain and restore earlier
versions of a blob?
Versioning
127. Which feature should you enable if you want to recover a blob that
has been removed or overwritten by accident?
Soft Delete
134. What are some of the use cases of Azure File Storage?
Migrate existing applications to the cloud.Share server data across on-premises and cloud.Integrate
modern applications with Azure File Storage.Simplify hosting High Availability (HA) workload
data.
135. You shouldn’t use Azure File Storage for files that can be written by
multiple concurrent processes simultaneously. Why?
Don't use Azure File Storage for files that can be written by multiple concurrent processes
simultaneously. Multiple writers require careful synchronization, otherwise the changes made by
one process can be overwritten by another. The alternative solution is to lock the file as it is
written, and then release the lock when the write operation is complete. However, this approach
can severaly impact concurrency and limit performance.
138. Cosmos DB guarantees less than 10-ms latencies for both reads
(indexed) and writes at the 99th percentile, all around the world. Is this
true?
True
141. When should you use a block blob, and when should you use a page
blob?
Use a page block for blobs that require random read and write access. Use a block blob for discrete
objects that change infrequently.
142. Why might you use Azure File storage?
To enable users at different sites to share files.
143. You are building a system that monitors the temperature throughout
a set of office blocks and sets the air conditioning in each room in each
block to maintain a pleasant ambient temperature. Your system has to
manage the air conditioning in several thousand buildings spread across
the country/region, and each building typically contains at least 100 air-
conditioned rooms. What type of NoSQL datastore is most appropriate
for capturing the temperature data to enable it to be processed quickly?
Send the data to an Azure Cosmos DB database and use Azure Functions to process the data.
144. What are the several tools that you can use to provision services?
The Azure portal.
The Azure command-line interface (CLI)
Azure PowerShell
Azure Resource Manager templates
146. The default connectivity for Azure Cosmos DB and Azure Storage is
to enable access to the world at large. You can connect to these services
from an on-premises network, the internet, or from within an Azure
virtual network. Is this correct?
True
147. What are the options for protecting Azure resources such as storage
account, Azure cosmos DB, etc?
Azure Private Endpoint
Firewalls and virtual networks
Configure authentication
Configure access control
Configure advanced security
148. When you configure CosmosDB for replication what is the default
behavior?
By default, only the region in which you created the account supports write operations; the replicas
are all read-only.
159. What is the operator that you use as part of the SELECT clause to
eliminate duplicates in the result data?
DISTINCT
165. Your data might contain dates and times formatted in different ways
in different data sources. You can use _______to transform these items
into a single uniform structure.
Azure Data Factory
182. Use ________ for Smaller volumes of data (a few terabytes) and
Multiple sources that can be correlated.
Azure Analysis Services
188. What is the primary difference between a data lake and a data
warehouse?
A data lake holds raw data, but a data warehouse holds structured information
191. Which of these services can be used to ingest data into Azure
Synapse Analytics?
Azure Data Factory
192. You have a large amount of data held in files in Azure Data Lake
storage. You want to retrieve the data in these files and use it to populate
tables held in Azure Synapse Analytics. Which processing option is most
appropriate?
Synapse SQL pool
194. In Azure Databricks how do you change the language a cell uses?
The first line in the cell is %language. For example, %scala.
198. Power BI consists of three main elements. What are those elements?
Power BI Desktop
Power BI service
Power BI Mobile