You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/copy-activity-performance.md
+15-6Lines changed: 15 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,16 @@ After reading this article, you will be able to answer the following questions:
38
38
39
39
ADF offers a serverless architecture that allows parallelism at different levels, which allows developers to build pipelines to fully utilize your network bandwidth as well as storage IOPS and bandwidth to maximize data movement throughput for your environment. This means the throughput you can achieve can be estimated by measuring the minimum throughput offered by the source data store, the destination data store, and network bandwidth in between the source and destination. The table below calculates the copy duration based on data size and the bandwidth limit for your environment.
| 10 TB | 19.4 days | 9.7 days | 4.9 days | 1.9 days | 0.9 days | 0.1 days |
48
+
| 100 TB | 194.2 days | 97.1 days | 48.5 days | 19.4 days | 9.5 days | 0.9 days |
49
+
| 1 PB | 64.7 mo | 32.4 mo | 16.2 mo | 6.5 mo | 3.2 mo | 0.3 mo |
50
+
| 10 PB | 647.3 mo | 323.6 mo | 161.8 mo | 64.7 mo | 31.6 mo | 3.2 mo |
42
51
43
52
ADF copy is scalable at different levels:
44
53
@@ -66,8 +75,8 @@ Take these steps to tune the performance of your Azure Data Factory service with
66
75
67
76
Copy activity should scale almost perfectly linearly as you increase the DIU setting. If by doubling the DIU setting you are not seeing the throughput double, two things could be happening:
68
77
69
-
1. The specific copy pattern you are running does not benefit from adding more DIUs. Even though you had specified a larger DIU value, the actual DIU used remained the same, and therefore you are getting the same throughput as before. If this is the case, go to step #3
70
-
2. By adding more DIUs (more horsepower) and thereby driving higher rate of data extraction, transfer, and loading, either the source data store, the network in between, or the destination data store has reached its bottleneck and possibly being throttled. If this is the case, try contacting your data store administrator or your network administrator to raise the upper limit, or alternatively, reduce the DIU setting until throttling stops occurring.
78
+
- The specific copy pattern you are running does not benefit from adding more DIUs. Even though you had specified a larger DIU value, the actual DIU used remained the same, and therefore you are getting the same throughput as before. If this is the case, go to step #3
79
+
- By adding more DIUs (more horsepower) and thereby driving higher rate of data extraction, transfer, and loading, either the source data store, the network in between, or the destination data store has reached its bottleneck and possibly being throttled. If this is the case, try contacting your data store administrator or your network administrator to raise the upper limit, or alternatively, reduce the DIU setting until throttling stops occurring.
71
80
72
81
**If the copy activity is being executed on a self-hosted Integration Runtime:**
73
82
@@ -77,8 +86,8 @@ Take these steps to tune the performance of your Azure Data Factory service with
77
86
78
87
If you would like to achieve higher throughput, you can either scale up or scale out the self-hosted IR:
79
88
80
-
1. If the CPU and available memory on the self-hosted IR node are not fully utilized, but the execution of concurrent jobs is reaching the limit, you should scale up by increasing the number of concurrent jobs that can run on a node. See [here](create-self-hosted-integration-runtime.md#scale-up) for instructions.
81
-
2. If, on the other hand, the CPU is high on the self-hosted IR node and available memory is low, you can add a new node to help scale out the load across the multiple nodes. See [here](create-self-hosted-integration-runtime.md#high-availability-and-scalability) for instructions.
89
+
- If the CPU and available memory on the self-hosted IR node are not fully utilized, but the execution of concurrent jobs is reaching the limit, you should scale up by increasing the number of concurrent jobs that can run on a node. See [here](create-self-hosted-integration-runtime.md#scale-up) for instructions.
90
+
- If, on the other hand, the CPU is high on the self-hosted IR node and available memory is low, you can add a new node to help scale out the load across the multiple nodes. See [here](create-self-hosted-integration-runtime.md#high-availability-and-scalability) for instructions.
82
91
83
92
As you scale up or scale out the capacity of the self-hosted IR, repeat the performance test run to see if you are getting increasingly better throughput. If throughput stops improving, most likely either the source data store, the network in between, or the destination data store has reached its bottleneck and is starting to get throttled. If this is the case, try contacting your data store administrator or your network administrator to raise the upper limit, or alternatively, go back to your previous scaling setting for the self-hosted IR.
84
93
@@ -100,7 +109,7 @@ Take these steps to tune the performance of your Azure Data Factory service with
100
109
101
110
In this sample, during a copy run, Azure Data Factory notices the sink Azure SQL Database reaches high DTU utilization, which slows down the write operations. The suggestion is to increase the Azure SQL Database tier with more DTUs.
102
111
103
-

112
+

104
113
105
114
In addition, the following are some common considerations. A full description of performance diagnosis is beyond the scope of this article.
0 commit comments