DP-203 Resources

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

DP-203 Resources

1. Design and Implement Data Storage (40-45%)


1. Design a data storage structure
1. design an Azure Data Lake solution
1. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-best-practices
2. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-data-scenarios
2. recommend file types for storage &
3. recommend file types for analytical queries
1. https://docs.microsoft.com/en-us/azure/data-factory/connector-azu
re-data-lake-storage#dataset-properties
4. design for efficient querying
1. https://docs.microsoft.com/en-us/azure/data-explorer/data-lake-qu
ery-data#optimize-your-query-performance
2. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-query-acceleration
3. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-query-acceleration-how-to?tabs=azure-powershell%2Cpow
ershell
5. design for data pruning
1. https://en.wikipedia.org/wiki/Decision_tree_pruning
2. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-performance-tuning-guidance
3. https://docs.microsoft.com/bs-cyrl-ba/azure/databricks//delta/optim
izations/dynamic-file-pruning
4. https://databricks.com/blog/2020/04/30/faster-sql-queries-on-delta-
lake-with-dynamic-file-pruning.html
5. https://docs.microsoft.com/en-ca/azure/databricks//delta/optimizati
ons/dynamic-file-pruning
6. design a folder structure that represents the levels of data transformation
1. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-best-practices#directory-layout-considerations
2. https://techcommunity.microsoft.com/t5/data-architecture-blog/how
-to-organize-your-data-lake/ba-p/1182562
3. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-namespace
7. design a distribution strategy
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/sql-data-warehouse-tables-distribute
8. design a data archiving solution
1. https://azure.microsoft.com/en-ca/updates/archive-tier-for-azure-d
ata-lake-storage-now-generally-available/
2. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob
-storage-tiers?tabs=azure-portal#archive-access-tier
2. Design a partition strategy
1. design a partition strategy for files
2. design a partition strategy for analytical workloads
3. design a partition strategy for efficiency/performance
4. design a partition strategy for Azure Synapse Analytics
5. identify when partitioning is needed in Azure Data Lake Storage Gen2
1. https://docs.microsoft.com/en-us/azure/architecture/best-practices/
data-partitioning
2. https://docs.microsoft.com/en-us/azure/architecture/best-practices/
data-partitioning-strategies
3. Design the serving layer
1. design star schemas
1. https://docs.microsoft.com/en-us/power-bi/guidance/star-schema
2. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/sql-data-warehouse-tables-overview
2. design slowly changing dimensions
1. https://en.wikipedia.org/wiki/Slowly_changing_dimension
2. https://docs.microsoft.com/en-us/learn/modules/populate-slowly-c
hanging-dimensions-azure-synapse-analytics-pipelines/
3. https://docs.microsoft.com/en-us/learn/modules/populate-slowly-c
hanging-dimensions-azure-synapse-analytics-pipelines/3-choose-
between-dimension-types
4. https://docs.microsoft.com/en-us/learn/modules/populate-slowly-c
hanging-dimensions-azure-synapse-analytics-pipelines/2-describe
5. https://www.youtube.com/watch?v=Sg2AAk1vwEs
3. design a dimensional hierarchy
1. https://docs.microsoft.com/en-us/power-bi/guidance/star-schema#
snowflake-dimensions
2. https://en.wikipedia.org/wiki/Snowflake_schema
3. https://docs.microsoft.com/en-us/azure/data-factory/connector-sno
wflake
4. design a solution for temporal data
1. https://docs.microsoft.com/en-us/azure/azure-sql/temporal-tables
2. https://en.wikipedia.org/wiki/Temporal_database
5. design for incremental loading
1. https://docs.microsoft.com/en-us/azure/data-factory/tutorial-increm
ental-copy-overview
2. https://docs.microsoft.com/en-us/azure/data-factory/tutorial-increm
ental-copy-change-tracking-feature-portal
3. https://docs.microsoft.com/en-us/azure/data-factory/tutorial-increm
ental-copy-portal
4. https://www.youtube.com/watch?v=F9cBFnxaSGI
6. design analytical stores
1. https://docs.microsoft.com/en-us/azure/architecture/data-guide/tec
hnology-choices/analytical-data-stores
2. https://docs.microsoft.com/en-us/azure/architecture/data-guide/big
-data/#lambda-architecture
7. design metastores in Azure Synapse Analytics and Azure Databricks
1. https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-use-ext
ernal-metadata-stores
2. https://docs.microsoft.com/en-us/azure/databricks/data/metastore/
3. https://docs.microsoft.com/en-us/azure/synapse-analytics/metadat
a/overview
4. https://docs.microsoft.com/en-us/azure/databricks/data/metastores
/external-hive-metastore
5. https://www.youtube.com/watch?v=pBB5zFnhgyE&list=PL7_h0bR
fL52oZqAfV_kumYLUH5dbcWm9q
4. Implement physical data storage structures
1. implement compression
1. https://docs.microsoft.com/en-us/azure/data-factory/supported-file-
formats-and-compression-codecs
2. https://docs.microsoft.com/en-us/azure/data-factory/format-parque
t
3. https://databricks.com/glossary/what-is-parquet
4. https://docs.informatica.com/data-integration/powerexchange-ada
pters-for-informatica/10-5/powerexchange-for-microsoft-azure-blo
b-storage-user-guide/microsoft-azure-blob-storage-data-objects/d
ata-compression-in-microsoft-azure-blob-storage-sources-and-tar.
html
2. implement partitioning
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/sql-data-warehouse-tables-partition
3. implement sharding
1. https://docs.microsoft.com/en-us/azure/architecture/patterns/shard
ing
2. https://docs.microsoft.com/en-us/azure/azure-sql/database/elastic-
scale-introduction
3. https://docs.microsoft.com/en-us/azure/azure-sql/database/elastic-
scale-shard-map-management
4. implement different table geometries with Azure Synapse Analytics pools
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/get-start
ed-analyze-sql-pool
2. https://docs.microsoft.com/en-us/azure/synapse-analytics/get-start
ed-analyze-sql-on-demand
3. https://docs.microsoft.com/en-us/azure/synapse-analytics/get-start
ed-analyze-spark
5. implement data redundancy
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/backup-and-restore
2. https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/
migrate/azure-best-practices/analytics/azure-synapse
3. https://docs.microsoft.com/en-us/azure/storage/common/storage-r
edundancy
4. https://docs.microsoft.com/en-us/azure/databricks/scenarios/howt
o-regional-disaster-recovery
6. implement distributions
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/sql-data-warehouse-tables-distribute
7. implement data archiving
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/backup-and-restore
2. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-supported-blob-storage-features
a. https://docs.microsoft.com/en-us/azure/storage/blobs/stora
ge-blob-storage-tiers
5. Implement logical data structures
1. build a temporal data solution
1. https://docs.microsoft.com/en-us/azure/azure-sql/temporal-tables
2. https://docs.microsoft.com/en-us/azure/architecture/
2. build external tables
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/deve
lop-tables-external-tables?tabs=hadoop
3. implement file and folder structures for efficient querying and data pruning
1. https://docs.microsoft.com/en-us/azure/data-explorer/data-lake-qu
ery-data
2. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-performance-tuning-guidance
6. Implement the serving layer
1. deliver data in a relational star schema
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/deve
lop-tables-overview
2. https://en.wikipedia.org/wiki/Star_schema
2. deliver data in Parquet files
1. https://databricks.com/glossary/what-is-parquet
2. https://docs.microsoft.com/en-us/azure/data-factory/format-parque
t
3. implement a dimensional hierarchy
1. https://docs.microsoft.com/en-us/power-bi/guidance/star-schema#
snowflake-dimensions
2. https://en.wikipedia.org/wiki/Snowflake_schema
3. https://docs.microsoft.com/en-us/azure/data-factory/connector-sno
wflake
2. Design and Develop Data Processing (25-30%)
1. Ingest and transform data
1. transform data by using Apache Spark
1. https://docs.microsoft.com/en-us/azure/databricks/scenarios/datab
ricks-extract-load-sql-data-warehouse
2. transform data by using Transact-SQL
1. https://docs.microsoft.com/en-us/azure/data-factory/connector-azu
re-sql-data-warehouse
3. transform data by using Data Factory
1. https://docs.microsoft.com/en-us/azure/data-factory/connector-azu
re-sql-database
2. https://docs.microsoft.com/en-us/azure/data-factory/transform-dat
a-using-spark
4. transform data by using Azure Synapse Pipelines
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/get-start
ed-pipelines
2. https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipel
ines-activities?toc=/azure/synapse-analytics/toc.json&bc=/azure/s
ynapse-analytics/breadcrumb/toc.json
5. transform data by using Stream Analytics
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-introduction
6. cleanse data
1. https://en.wikipedia.org/wiki/Data_cleansing
2. https://www.sqlshack.com/data-cleansing-in-azure-machine-learni
ng/
3. https://app.pluralsight.com/guides/cleaning-data-with-azure-ml-stu
dio
4. https://docs.microsoft.com/en-us/azure/machine-learning/algorith
m-module-reference/clean-missing-data
7. split data
1. https://docs.microsoft.com/en-us/azure/machine-learning/algorith
m-module-reference/split-data
8. shred JSON
1. https://docs.microsoft.com/en-us/sql/relational-databases/json/con
vert-json-data-to-rows-and-columns-with-openjson-sql-server?vie
w=sql-server-ver15
2. https://docs.microsoft.com/en-us/sql/t-sql/functions/openjson-trans
act-sql?view=sql-server-ver15
9. encode and decode data
1. https://docs.microsoft.com/en-us/answers/questions/129474/azure
-data-factory-base64-encoded-secrets.html
10. configure error handling for the transformation
1. https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-fl
ow-error-rows
2. https://techcommunity.microsoft.com/t5/azure-data-factory/underst
anding-pipeline-failures-and-error-handling/ba-p/1630459
3. https://docs.microsoft.com/en-us/azure/data-factory/data-factory-u
x-troubleshoot-guide
4. https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-
azure-monitor
11. normalize and denormalize values
1. https://docs.microsoft.com/en-us/azure/machine-learning/algorith
m-module-reference/normalize-data
12. transform data by using Scala
1. https://docs.microsoft.com/en-us/azure/databricks/scenarios/datab
ricks-extract-load-sql-data-warehouse
13. perform data exploratory analysis
1. https://azure.microsoft.com/en-us/resources/videos/perform-explor
atory-analytics-over-your-data-lake/
2. https://docs.microsoft.com/en-us/learn/modules/perform-machine-l
earning-with-azure-databricks/
2. Design and develop a batch processing solution
1. develop batch processing solutions by using Data Factory, Data Lake,
Spark, Azure
1. https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factor
y-data-processing-using-batch
2. https://docs.microsoft.com/en-us/azure/architecture/data-guide/tec
hnology-choices/batch-processing
2. Synapse Pipelines, PolyBase, and Azure Databricks &
3. create data pipelines
1. https://docs.microsoft.com/en-us/sql/relational-databases/polybas
e/polybase-versioned-feature-summary?view=sql-server-ver15
2. https://docs.microsoft.com/en-us/azure/databricks/clusters/configu
re
3. https://www.youtube.com/watch?v=JUQXx0R0RfE
4. design and implement incremental data loads
1. https://docs.microsoft.com/en-us/azure/data-factory/tutorial-increm
ental-copy-overview
5. design and develop slowly changing dimensions
1. https://docs.microsoft.com/en-us/learn/modules/populate-slowly-c
hanging-dimensions-azure-synapse-analytics-pipelines/
2. https://docs.microsoft.com/en-us/learn/modules/populate-slowly-c
hanging-dimensions-azure-synapse-analytics-pipelines/3-choose-
between-dimension-types
3. https://docs.microsoft.com/en-us/learn/modules/populate-slowly-c
hanging-dimensions-azure-synapse-analytics-pipelines/2-describe
6. handle security and compliance requirements
1. https://azure.microsoft.com/en-ca/overview/trusted-cloud/complian
ce/
2. https://docs.microsoft.com/en-ca/azure/compliance/
7. scale resources
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/quickstart-scale-compute-portal
2. https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-p
erformance
8. configure the batch size
1. https://docs.microsoft.com/en-us/azure/batch/batch-automatic-scal
ing
2. https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batc
h
9. design and create tests for data pipelines
1. https://docs.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/
ci-cd-azure-devops
10. integrate Jupyter/IPython notebooks into a data pipeline
1. https://docs.microsoft.com/en-us/azure/databricks/notebooks/
2. https://docs.microsoft.com/en-us/azure/databricks/notebooks/note
books-use
3. https://docs.microsoft.com/en-us/azure/databricks/notebooks/note
books-manage
11. handle duplicate data
1. https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-fl
ow-dedupe-nulls-snippets
12. handle missing data &
13. handle late-arriving data
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-time-handling
2. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-solution-patterns
3. https://docs.microsoft.com/en-us/azure/machine-learning/algorith
m-module-reference/clean-missing-data
4. https://learning.oreilly.com/library/view/stream-analytics-with/9781
788395908/0b61b6d7-d805-42e2-a1cf-24148ce07f47.xhtml
5. https://docs.microsoft.com/en-us/azure/stream-analytics/event-ord
ering
14. upsert data
1. https://docs.microsoft.com/en-us/azure/data-factory/data-flow-alter
-row
15. regress to a previous state
1. https://docs.microsoft.com/en-us/answers/questions/31313/transa
ctions-in-adf.html
2. https://docs.microsoft.com/en-us/azure/data-factory/connector-azu
re-sql-data-warehouse
16. design and configure exception handling
1. https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-fl
ow-error-rows
17. configure batch retention
1. Configure a simple Azure Batch Job with Azure Data Factory -
Microsoft Tech Community
18. design a batch processing solution
1. https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factor
y-data-processing-using-batch
19. debug Spark jobs by using the Spark UI
1. https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-sp
ark-job-debugging
3. Design and develop a stream processing solution
1. develop a stream processing solution by using Stream Analytics, Azure
Databricks, and Azure Event Hubs
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-introduction
2. https://docs.microsoft.com/en-us/azure/databricks/spark/latest/stru
ctured-streaming/
3. https://docs.microsoft.com/en-us/azure/architecture/reference-arch
itectures/data/stream-processing-databricks
2. process data by using Spark structured streaming
1. https://docs.microsoft.com/en-us/azure/databricks/spark/latest/stru
ctured-streaming/
3. monitor for performance and functional regressions
1. https://docs.microsoft.com/en-us/azure/databricks/kb/jobs/job-run-
dash
2. https://docs.microsoft.com/en-us/azure/data-factory/concepts-data
-flow-monitoring
4. design and create windowed aggregates
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-window-functions
5. handle schema drift
1. https://docs.microsoft.com/en-us/azure/data-factory/concepts-data
-flow-schema-drift
6. process time series data
1. https://azure-samples.github.io/azureiotlabs/timeseriesinsights/#:~
:text=Azure%20Time%20Series%20Insights%20is,over%20the%2
0world%20in%20seconds.
2. https://docs.microsoft.com/en-ca/azure/time-series-insights/
7. process within one partition
8. process across partitions
1. https://docs.microsoft.com/en-us/azure/architecture/reference-arch
itectures/event-hubs/partitioning-in-event-hubs-and-kafka
2. https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-fea
tures#partitions
3. https://docs.microsoft.com/en-us/azure/stream-analytics/repartition
4. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-parallelization
9. configure checkpoints/watermarking during processing
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-time-handling
10. scale resources
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-scale-jobs
11. handle interruptions
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-job-reliability
2. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-time-handling
12. design and configure exception handling
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-output-error-policy
2. https://docs.microsoft.com/en-us/azure/stream-analytics/configurat
ion-error-codes
13. upsert data
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-documentdb-output
14. replay archived stream data
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-concepts-checkpoint-replay
15. design a stream processing solution
1. https://docs.microsoft.com/en-us/azure/architecture/reference-arch
itectures/data/stream-processing-stream-analytics
4. Manage batches and pipelines
1. trigger batches
2. handle failed batch loads
1. https://docs.microsoft.com/en-us/azure/batch/error-handling
2. https://docs.microsoft.com/en-us/azure/batch/batch-job-task-error-
checking
3. https://docs.microsoft.com/en-us/azure/batch/batch-pool-node-err
or-checking
4. https://docs.microsoft.com/en-us/azure/batch/best-practices
3. validate batch loads
1. https://docs.microsoft.com/en-us/azure/batch/batch-job-task-error-
checking
4. manage data pipelines in Data Factory/Synapse Pipelines
5. schedule data pipelines in Data Factory/Synapse Pipelines
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/get-start
ed-pipelines
2. https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipel
ines-activities
6. implement version control for pipeline artifacts
1. https://docs.microsoft.com/en-us/azure/data-factory/source-control
7. manage Spark jobs in a pipeline
1. https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factor
y-spark
3. Design and Implement Data Security (10-15%)
1. Design security for data policies and standards
1. design data encryption for data at rest and in transit
1. https://docs.microsoft.com/en-us/azure/storage/common/storage-s
ervice-encryption
2. https://docs.microsoft.com/en-us/azure/cosmos-db/database-encr
yption-at-rest
3. https://docs.microsoft.com/en-us/azure/synapse-analytics/security/
workspaces-encryption
4. https://docs.microsoft.com/en-us/azure/security/fundamentals/encr
yption-atrest
2. design a data auditing strategy
1. https://docs.microsoft.com/en-us/azure/azure-sql/database/auditin
g-overview
2. https://docs.microsoft.com/en-us/azure/cosmos-db/audit-control-pl
ane-logs
3. design a data masking strategy, design for data privacy
1. https://docs.microsoft.com/en-us/azure/security/fundamentals/prot
ection-customer-data
2. https://docs.microsoft.com/en-us/azure/azure-sql/database/dynami
c-data-masking-overview
4. design a data retention policy
1. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifec
ycle-management-concepts?tabs=azure-portal
2. https://docs.microsoft.com/en-us/azure/azure-monitor/logs/manag
e-cost-storage
3. https://docs.microsoft.com/en-us/azure/azure-monitor/app/data-ret
ention-privacy
4. https://azure.microsoft.com/en-ca/updates/retention-by-type/
5. design to purge data based on business requirements
1. https://docs.microsoft.com/en-us/azure/storage/blobs/soft-delete-b
lob-overview
2. https://docs.microsoft.com/en-us/rest/api/keyvault/purgedeletedsto
rageaccount/purgedeletedstorageaccount
3. https://docs.microsoft.com/en-us/azure/data-explorer/kusto/conce
pts/data-purge
4. https://docs.microsoft.com/en-us/azure/storage/blobs/soft-delete-b
lob-enable
6. design Azure role-based access control (Azure RBAC) and POSIX-like
Access Control List
1. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-access-control-model
7. (ACL) for Data Lake Storage Gen2
1. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-access-control
8. Design and implement row-level and column-level security
1. https://docs.microsoft.com/en-us/sql/relational-databases/security/
row-level-security?view=sql-server-ver15
2. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/column-level-security
2. Implement data security
1. implement data masking
1. https://docs.microsoft.com/en-us/azure/azure-sql/database/dynami
c-data-masking-overview
2. implement Azure RBAC
1. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-access-control-model
3. implement POSIX-like ACLs for Data Lake Storage Gen2
1. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-access-control
4. implement a data retention policy
1. https://azure.microsoft.com/en-ca/updates/lifecycle-management-f
or-azure-data-lake-storage-is-now-generally-available/
2. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifec
ycle-management-concepts?tabs=azure-portal
5. implement a data auditing strategy
1. https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-la
ke-analytics-diagnostic-logs
6. manage identities, keys, and secrets across different data platform
technologies
1. https://docs.microsoft.com/en-us/rest/api/storageservices/authoriz
e-with-shared-key
2. https://docs.microsoft.com/en-us/azure/storage/common/storage-s
as-overview?toc=/azure/storage/blobs/toc.json
3. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-access-control-model
7. implement secure endpoints (private and public)
1. https://docs.microsoft.com/en-us/azure/private-link/private-endpoin
t-overview
2. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-best-practices
3. https://docs.microsoft.com/en-us/azure/data-factory/data-moveme
nt-security-considerations
8. implement resource tokens in Azure Databricks
1. https://docs.microsoft.com/en-us/azure/databricks/administration-g
uide/access-control/tokens
2. https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/lat
est/aad/service-prin-aad-token
9. load a Data Frame with sensitive information &
10. write encrypted data to tables or Parquet files &
11. manage sensitive information
1. https://databricks.com/blog/2020/11/20/enforcing-column-level-enc
ryption-and-avoiding-data-duplication-with-pii.html
2. https://databricks.com/session_na20/encryption-and-masking-for-s
ensitive-apache-spark-analytics-addressing-ccpa-and-governance
4. Monitor and Optimize Data Storage and Data Processing (10-15%)
1. Monitor data storage and data processing
1. implement logging used by Azure Monitor
1. https://docs.microsoft.com/en-us/azure/azure-monitor/logs/data-pl
atform-logs
2. configure monitoring services
1. https://docs.microsoft.com/en-us/azure/azure-monitor/deploy
3. measure performance of data movement
1. https://docs.microsoft.com/en-us/azure/azure-sql/database/monito
ring-with-dmvs
4. monitor and update statistics about data across a system
5. monitor data pipeline performance
1. https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-
azure-monitor
6. measure query performance
1. https://docs.microsoft.com/en-us/azure/azure-sql/database/query-
performance-insight-use
7. monitor cluster performance
1. https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-key-sc
enarios-to-monitor
2. https://docs.microsoft.com/en-us/azure/synapse-analytics/monitori
ng/how-to-monitor-using-azure-monitor
3. https://docs.microsoft.com/en-us/azure/architecture/databricks-mo
nitoring/
8. understand custom logging options
1. https://docs.microsoft.com/en-us/azure/azure-monitor/agents/data-
sources-custom-logs
9. schedule and monitor pipeline tests
10. interpret Azure Monitor metrics and logs
1. https://docs.microsoft.com/en-us/azure/azure-monitor/essentials/d
ata-platform-metrics
11. interpret a Spark directed acyclic graph (DAG)
2. Optimize and troubleshoot data storage and data processing
1. compact small files
2. rewrite user-defined functions (UDFs)
3. handle skew in data
1. https://en.wikipedia.org/wiki/Skewness
2. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/sql-data-warehouse-tables-distribute#choose-a-distrib
ution-column-with-data-that-distributes-evenly
3. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/sql-data-warehouse-tables-distribute#determine-if-the
-table-has-data-skew
4. handle data spill
1. https://en.wikipedia.org/wiki/Data_breach
2. https://docs.microsoft.com/en-us/compliance/regulatory/gdpr-brea
ch-notification
3. https://docs.microsoft.com/en-us/compliance/regulatory/gdpr-brea
ch-azure-dynamics
5. tune shuffle partitions
1. https://docs.microsoft.com/en-us/azure/architecture/databricks-mo
nitoring/performance-troubleshooting
6. find shuffling in a pipeline
7. optimize resource management
8. tune queries by using indexers
1. https://docs.microsoft.com/en-us/azure/azure-sql/database/autom
atic-tuning-overview
2. https://docs.microsoft.com/en-us/sql/relational-databases/automati
c-tuning/automatic-tuning?view=sql-server-ver15
9. tune queries by using cache
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/performance-tuning-result-set-caching
10. optimize pipelines for analytical or transactional purposes
11. optimize pipeline for descriptive versus analytical workloads
12. troubleshoot a failed spark job
1. https://docs.microsoft.com/en-us/azure/databricks/kb/jobs/
2. https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-sp
ark-known-issues
3. https://docs.microsoft.com/en-us/azure/data-factory/data-factory-tr
oubleshoot-guide
13. troubleshoot a failed pipeline run
1. https://docs.microsoft.com/en-us/azure/data-factory/data-factory-tr
oubleshoot-guide

You might also like