DP 900
DP 900
DP 900
JSON or XML
Blob files
Another database
Apache Kafka
Data visualization let’s you distill thousands or millions of rows of data into an easily
digestible format
● Diagnostic
● Predictive
● Prescriptive
● Cognitive
Descriptive Analytics
What happened?
● Today’s sales
● Service appointments booked
● Net margins
“Hindsight”
Diagnostic Analytics
Why did it happen?
Very rarely is data from outside 100% formatted exactly how you need it
Some Transformation might be required to modify columns type, format, order, and
even generate new data (i.e. fname + ‘ ‘ + lname = full_name)
ELT
E - Extract
L - Load
T - Transform
ETL
E - Extract
T - Transform
L - Load
Data Processing
Two types...
● Batch processing
● Stream processing
DP-900
Microsoft Certified: Azure Data Fundamentals
Push the maximum performance out of the CPU through expert tweaks
Fully-managed service
4 to 80 vCores
32 GB to 8 TB storage
Azure SQL Database
Close to 100% compatibility to SQL Server on premises
2 to 80 vCores
5 GB and 4 TB storage
Best practice is for every table to have a “primary key”, also called an index
FirstName - string
LastName - string
ManagerID - int (FOREIGN KEY) ← Points back to the Employees table (recursive)
FirstName - string
LastName - string
The data of the table is physically stored by primary key sorted order
But you can define other indexes if it’s common to query on them
No retraining, no retooling
Evolution past DW
Allows you to maintain your existing solution and migrate to the cloud
SQL Managed
Instance
SQL Managed Instance
Aims to bridge the gap between Azure SQL Database and SQL Server in a VM
Let Microsoft manage the server and the database engine software
Execute queries
Azure Data Studio
Cross-platform software to work with SQL Server relational data
MacOS in preview
Most vendor version of SQL are not 100% compatible with the standard
SQL Server uses T-SQL, Oracle uses PL/SQL, MySQL uses SQL/PSM, PostgreSQL
uses PL/pgSQL
They are not often compatible - an Oracle PL/SQL command would not work
unaltered on SQL Server
DP-900
Microsoft Certified: Azure Data Fundamentals
Often, the way the data is stored better supports the application’s use
Not only SQL
(NoSQL)
The internet has
changed the way
we package data
to send around
Facebook - 1994
Google - 1998
Twitter - 2006
HTML - 1991
XML - 1998
JSON - 2005+
There are 200
billion tweets
per year
Imagine trying
to run a search
query on a table
with 1 trillion
rows...
Non-relational
databases are
optimized for
different uses
DP-900
Microsoft Certified: Azure Data Fundamentals
● Data integrity
● Open source
DP-900
Microsoft Certified: Azure Data Fundamentals
When data needs further processing (ETL or ELT) before it can be analyzed
When you want to remove historical data from your day-to-day systems (archiving)
When users are confused by the data structures, table names or column names
when building reports in PowerBI
DP-900
Microsoft Certified: Azure Data Fundamentals
Data transformation
Pipelines can perform their tasks sequentially (serial) or perform several tasks in
parallel
Pipeline Triggers
Pipeline Run - an instance of a pipeline execution
● Scheduled trigger
● Tumbling window
● Event-based
Manual Pipeline Run
Just as it sounds
You can even set start and end times on a trigger, so they don’t fire after a certain
date
Tumbling Window
It’s a type of scheduled trigger that has some interesting properties
Good for when the pipeline is designed to process data by the time period
specified
Report is “interactive”