Azure Data Factory
Azure Data Factory
Azure Data Factory
The activity is the task we performed on our data. We use activity inside the Azure
Data Factory pipelines. ADF pipelines are a group of one or more activities. For ex:
When you create an ADF pipeline to perform ETL you can use multiple activities to
extract data, transform data and load data to your data warehouse. Activity uses
Input and output datasets. Dataset represents your data if it is tables, files, folders
etc. Below diagram shows the relationship between Activity, dataset and pipeline:
An Input dataset simply tells you about the input data and it’s schema. And an
Output dataset will tell you about the output data and it’s schema. You can attach
zero or more Input datasets and one or more Output datasets. Activities in Azure
Data Factory can be broadly categorized as:
3- Control Activities
1- Copy Activity: It simply copies the data from Source location to destination
location. Azure supports multiple data store locations such as Azure Storage, Azure
DBs, NoSQL, Files, etc.
To know more about Data Movement activities, please use below link:
Pipelines and activities in Azure Data Factory - Azure Data Factory | Microsoft Docs
1- Data Flow: In data flow, First, you need to design data transformation workflow to
transform or move data. Then you can call Data Flow activity inside the ADF pipeline.
It runs on Scaled out Apache Spark Clusters. There are two types of DataFlows:
Mapping and Wrangling DataFlows
7- Stored Procedure: In Data Factory pipeline, you can use execute Stored
procedure activity to invoke a SQL Server Stored procedure. You can use the
following data stores: Azure SQL Database, Azure Synapse Analytics, SQL Server
Database, etc.
8- U-SQL: It executes U-SQL script on Azure Data Lake Analytics cluster. It is a big
data query language that provides benefits of SQL.
9- Custom Activity: In custom activity, you can create your own data processing
logic that is not provided by Azure. You can configure .Net activity or R activity that
will run on Azure Batch service or an Azure HDInsight cluster.
11- Databricks Python Activity: This activity will run your python files on Azure
Databricks cluster.
12- Azure Functions: It is Azure Compute service that allows us to write code logic
and use it based on events without installing any infrastructure. It stores your code
into Storage and keep the logs in application Insights.Key points of Azure Functions
are :
1- It is a Serverless service.
2- It has Multiple languages available : C#, Java, Javascript, Python and PowerShell
Pipelines and activities in Azure Data Factory - Azure Data Factory | Microsoft Docs
2- Execute Pipeline Activity: It allows you to call Azure Data Factory pipelines.
3- Filter Activity: It allows you to apply different filters on your input dataset.
4- For Each Activity: It provides the functionality of a for each loop that executes for
multiple iterations.
7- Lookup Activity: It reads and returns the content of multiple data sources such as
files or tables or databases. It could also return the result set of a query or stored
procedures.
8- Set Variable Activity: It is used to set the value to a variable of type String, Array,
etc.
9- Switch Activity: It is a Switch statement that executes the set of activities based
on matching cases.
10- Until Activity: It is same as do until loop. It executes a set of activities until the
condition is set to true.
12- Wait Activity: It just waits for the given interval of time before moving ahead to
the next activity. You can specify the number of seconds.
13- Web Activity: It is used to make a call to REST APIs. You can use it for different
use cases such as ADF pipeline execution.
14- Webhook Activity: It is used to to call the endpoint URLs to start/stop the
execution of the pipelines. You can call external URLs also.
Azure Data Factory and Azure Synapse Analytics support the following transformation activities that
can be added either individually or chained with another activity.
Expand table
Expand table
Control Description
activity
Execute Execute Pipeline activity allows a Data Factory or Synapse pipeline to invoke another
Pipeline pipeline.
For Each ForEach Activity defines a repeating control flow in your pipeline. This activity is used to
iterate over a collection and executes specified activities in a loop. The loop implementation
of this activity is similar to the Foreach looping structure in programming languages.
Get GetMetadata activity can be used to retrieve metadata of any data in a Data Factory or
Metadata Synapse pipeline.
If Condition The If Condition can be used to branch based on condition that evaluates to true or false.
Activity The If Condition activity provides the same functionality that an if statement provides in
programming languages. It evaluates a set of activities when the condition evaluates
to true and another set of activities when the condition evaluates to false.
Lookup Lookup Activity can be used to read or look up a record/ table name/ value from any
Activity external source. This output can further be referenced by succeeding activities.
Until Implements Do-Until loop that is similar to Do-Until looping structure in programming
Activity languages. It executes a set of activities in a loop until the condition associated with the
activity evaluates to true. You can specify a timeout value for the until activity.
Validation Ensure a pipeline only continues execution if a reference dataset exists, meets a specified
Activity criteria, or a timeout has been reached.
Wait When you use a Wait activity in a pipeline, the pipeline waits for the specified time before
Activity continuing with execution of subsequent activities.
Web Web Activity can be used to call a custom REST endpoint from a pipeline. You can pass
Activity datasets and linked services to be consumed and accessed by the activity.
Webhook Using the webhook activity, call an endpoint, and pass a callback URL. The pipeline run waits
Activity for the callback to be invoked before proceeding to the next activity.