SQL Server 2008 Replication Technical Case Study
SQL Server 2008 Replication Technical Case Study
SQL Server 2008 Replication Technical Case Study
Wefound out sp_droparticle was not clearing the urReplPeelId attribute when a P2P publication with
conflict detection was dropped while the table was actively being subscribed to by a read-only
subscription.
There are two workarounds to consider:
Workaround #1: Before you drop P2P replication, use SQL Server Management Studio to set
the current P2P subscription option Allow P2P conflict detection to False, apply the
changes, and then drop the p2p replication. Next time you create the P2P replication, the
issue (MSSQLServer, Error number: 8144)does not occur.
Workaround #2:
1. Enable conflict detection again (use a new originator_id):
exec sp_configure_peerconflictdetection @publication = 'tranp2p', @action = 'enable',
@originator_id = 10, @continue_onconflict='true'
2. Disable the conflict:
exec sp_configure_peerconflictdetection @publication = 'tranp2p', @action = 'disable'
3. Drop and add your P2P publication again.
This is a bug that is expected to be fixed in Cumulative Update 4 for SQL Server 2008 SP1, which is
due in September of 2009.
5.1.8 Log Reader Agent Fails After Adding an Article to a Publication
with binary large object (BLOB) Data
The Log Reader may fail in the following situation:
1. One or more tables with BLOB data exist in a concurrent snapshot publication with an active
subscription.
2. You add a new article into this same publication and run the snapshot.
3. During the snapshot of the new article created in step 2, there is update activity to table with the
BLOB data in the existing publication.
4. When the Log Reader processes the newly generated concurrent snapshot for the new article, it will
try to generate reconcile commands for the active articles with BLOB data as well.
Page 25
The Log Reader will report the following error:
__________________________________________________________________________________
The process could not execute 'sp_replcmds' on 'HK-AGMDB-201'. (Source: MSSQL_REPL, Error
number: MSSQL_REPL20011)
Get help: http://help/MSSQL_REPL20011
Could not locate text information records for the column "Comment", ID 21 during command
construction. (Source: MSSQLServer, Error number: 18773)
Get help: http://help/18773
The Log-Scan Process failed to construct a replicated command from log sequence number (LSN)
{00023895:00009d11:0003}. Back up the publication database and contact Customer Support
Services. (Source: MSSQLServer, Error number: 18805)
Get help: http://help/18805
The process could not execute 'sp_replcmds' on 'HK-AGMDB-201'. (Source: MSSQL_REPL, Error
number: MSSQL_REPL22037)
__________________________________________________________________________________
The reason that this error occurs is that when the Log Reader is reconciling the commands, some of
the off-row data may not be available. In order to solve the issue and get replication up and running,
you can try removing the article that creates the issue or modify replication configuration.
To remove the article:
1. Determine the article that is causing the problem. This can be done by taking the column
referenced in the error message and searching the INFORMATION_SCHEMA views for a column
with the same name that also has a BLOB data type.
2. Remove this article from the publication.
3. 3) Stop all activity, and then add the article back into publication. If activity is not stopped, it is
possible that you will encounter the same error on another article with BLOB data.
To modify replication configuration, use one of the following:
Create new publications for any new articles. Do not add new articles to existing publications.
Use the database snapshot sync_method. This is available only in Enterprise Edition.
Database snapshot is preferred over the native sync method because table locks are avoided,
and it is preferred over the concurrent method due to the issue described above. The database
snapshot sync method is not available as an option through the UI. This option is only available
through the replication stored procedures delivery method.
SQL Server Books Online provides further information about the @sync method argument of
sp_addpublication.
Page 26
Figure 6
5.2 System Testing
After it was verified that all issues resulting from Unit Testing were resolved, applications were working
as designed, and replication encountered no issues, the next step was to test the applications as they
would behave in a production system.
It was agreed by the customer that they would allocate resources to help test the applications and to
migrate production jobs to the new server with the same schedules as the production server.
After each iteration of system testing, the following validations were performed:
1) Application able to work properly with no errors.
2) Row counts on all nodes and all subscribers were the same.
3) If row counts were not the same, the tablediff utility was run between the Publisher and
Subscriber to determine discrepancies.
4) Conflict Detection Report from Replication Conflict Viewer was reviewed.
5) Replication Monitor was used to determine whether errors had occurred.
6) Replication Latency was measured.
System testing allowed the team to discover a very interesting trend. The team noticed that there was
substantial delay in the replication of key monetary transactions. From a unit testing perspective, there
was no latency. Transactions were replicated to the various nodes within 10 seconds. However, during
system testing, latency approached a couple of hours.
Through investigation, it was determined that a specific application that performed daily bulk uploads of
textual data, which resulted in millions of replication transactions and commands, was causing backlog.
These test results were reviewed with the customer and a conclusion was made that this delay was
unacceptable. In order to resolve this issue, the project team had to analyse the published tables and
determine whether there was a way to prevent this backlog. After investigation, the project team
decided on creating two separate P2P publications. One publication comprised the tables that were
associated to reference data that was textual in nature, and the other publication comprised
transactional data that was financial in nature. By creating two publications, a new set of distribution
agents was created, which distributed transactions for a specific publication. If there is backlog in one
set, the other set can still continue. This approach was tested and the latency associated to monetary
transactions was eliminated.
Page 27
The following figure shows the modified replication topology.
Asia Core 1
Asia Core 2
U.S. Core 1
U.S Core 2
P2P Reference
P2P
Financial
P2P
Reference
P2P Financial
P2P
Reference
P2P
Financial
Asia Web Asia Web Asia Web
Web Publication
DW
Read Only
Read Only Publication
U.S. Web U.S. Web U.S. Web
Web Publication
P2P
Reference
P2P
Financial
P2P Financial
P2P
Financial
P2P
Reference
P2P Reference
Figure 7
5.3 Performance Testing
After it has been verified that everything is working as designed and is meeting customer functional
requirements, the next step is to performance-test the topology to make sure that all bottlenecks have
been identified and the topology is configured optimally.
Performance testing allowed the team to understand the effect of different replication configurations
based on a controlled dataset. Each test was run multiple times and the results averaged together.
Results were logged and performance improvements/degradation calculated.
5.3.1 Testing Scenarios
Replication is highly configurable. Although data may be replicated successfully, different
configurations may yield better performance. The following test scenarios configurations were executed
as part of the performance testing.
Page 28
Configuration Primary Purpose Test
Push vs. Pull To determine the performance
advantage/disadvantages of push
vs. pull subscriptions
Configure distribution agent to be
push for one set of tests and pull
for another set of tests on the
same dataset.
Windows Server 2003 with SQL
Server 2005 vs. Windows
Server 2008 with SQL Server
2008
To determine whether the
improved TCP/IP stack on
Windows Server 2008 results in
performance gains
Set up a SQL Server 2005
instance on Windows Server 2003
and a separate SQL Server 2008
instance on Windows Server
2008. These instances would then
subscribe to the same publication.
Subscription Streams To determine whether parallel
subscription streams result in
performance gains
Modify the Distribution Agent
startup parameters to include the -
SubscriptionStreams command-
line parameter.
WAN Acceleration To determine how much WAN
Acceleration improves replication
performance
Turn off WAN Acceleration and
run test and then turn on WAN
Acceleration for the exact same
data set.
Multi-Master Peer Distribution To determine the effect on
replication performance if one of
the master peers are down and
one peer needs to handle all of the
distribution
Stop all distribution agents on
Asia Core 2 so that Asia Core 1
needs to distribute data to all
Subscribers.
Table 2
Each one of the scenarios was tested using the same set of transactions. The first transaction
consisted of a set DELETE statement that removed 347,737 transactions from the database. After the
DELETE was performed, the same 347,737 rows were then re-inserted using a customer application. It
is also important to note that the data that was being deleted/inserted included BLOBs that were stored
in a nvarchar(max) column in SQL Server.
5.3.2 Effect of Push Subscription vs. Pull Subscription
A subscription is a request for a copy of the data and database objects in a publication. A subscription
defines which publication will be received, and where and when it will be received. When planning for
subscriptions, consider where you want agent processing to occur. The type of subscription you
choose controls where the agent runs. With a push subscription, the Distribution Agent runs at the
Distributor, whereas with a pull subscription, agents run at the Subscriber. After a subscription is
created, it cannot be changed from push to pull without dropping the existing push subscription and
creating a new pull subscription.
Page 29
Distribution Agent Architecture
Reader Thread
The reader thread reads the commands from the MSrepl_commands table in the distribution
database by using the OLE DB interface. The entire set of rows in the table is read in one batch. For
example, if there are 500,000 pending changes in the MSrepl_commands table, the reader thread will
query all 500,000 rows and pass on the result set.
Command Buffer
The result set from the reader thread is placed into a 40-KB command buffer. There are two such
command buffers. When the first one is full, the writer thread is signaled and the changes in the
command buffer are applied to the destination. Concurrently, the second command buffer is filled with
the remaining results and waits for the writer thread to consume the changes. This enables the writer
thread to be fully active in the case of a high-transaction-volume system.
Writer Thread
The writer thread is responsible for reading the changes from a command buffer and applying it to the
Subscriber. As soon as all the changes from the command buffer are consumed, the writer releases
the buffer and moves on to read changes from the second command buffer, if needed. Because the
buffer size is 40 KB, the size of the data pushed each time over the network is constrained to 40 KB.
Figure 8
With a push subscription, the Publisher propagates changes to a Subscriber without a request from the
Subscriber. Changes can be pushed to the Subscriber on demand, continuously, or on a scheduled
basis. The Distribution Agent runs at the Distributor in a push subscription.
Page 30
With a pull subscription, the Subscriber requests changes made at the Publisher. Pull subscriptions
allow the user at the Subscriber to determine when the data changes are synchronized. The
Distribution Agent runs at the Subscriber in a pull subscription.
Test Results
Push (A) Pull (B) Performance
Improvement
((A-B)/B)*100
Server Asia Secondary Site
Server 1
Asia Secondary Site
Server 1
Operating System Windows Server 2008 Windows Server 2008
SQL Server Version SQL Server 2008
SP1/CU1
SQL Server 2008
SP1/CU1
Subscription Type Push Pull
Delivery Type Stored procedure Stored procedure
Total Commands 695,477 695,477
Delivery Total Work Time
(minutes)
142.23 106.19 33.93%
Delivery Tran/Sec 40.75 78.4
Delivery Cmds/Sec 40.75 78.4
Server Asia Secondary Site
Server 1
Asia Secondary Site
Server 1
Operating System Windows Server 2008 Windows Server 2008
SQL Server Version SQL Server 2008
SP1/CU1
SQL Server 2008
SP1/CU1
Subscription Type Push Pull
Delivery Type SQL SQL
Total Commands 695,477 695,477
Delivery Total Work Time
(minutes)
330.5 211.8 56.04%
Delivery Tran/Sec 17.53 30.21
Delivery Cmds/Sec 17.53 30.21
Table 3
Page 31
These tests expose two critical facts:
Pull subscriptions are significantly faster than push subscriptions. This is primarily due to the
architecture of the distributor agent.
Stored Procedure delivery format is significantly faster than SQL delivery format. This will be
explained in the next section.
5.3.3 Effect of Stored Procedure Delivery vs. SQL Delivery
Replication enables you to configure the format in which commands are delivered. The default format is
stored procedure delivery format. This is the most efficient delivery format. In addition to stored procedure
delivery format, replication can be configured to call a customized stored procedure, SQL statements
without column lists, and SQL statements with column lists.
Due to a P2P conflict detection bug with read-only subscriptions, SQL delivery format had to be utilized
for the read-only subscribers, because it is not possible to subscribe to partial publications. A new
publication had to be created with the complete set of articles to be replicated to the read-only
subscribers using the SQL delivery format. When SQL delivery format is used, the actual DML statement
is constructed and sent to the distribution database. It was not recommended to use customized stored
procedures due to supportability as well as any unforeseen upgrade issues in the future.
With SQL Server 2008, it is possible to compress the size of the packet that goes over the network by
passing in parameters instead of string literals. Parameters are the default for stored procedure delivery
format, but string literals are the default for SQL delivery format. In order to pass in parameters instead of
string literals, the following commands must be run for every article in the publication that uses SQL
delivery format:
http://msdn.microsoft.com/en-us/library/ms175980(SQL.90).aspx
An example of the command that needs to be executed for each article in the publication is:
EXEC sp_changearticle @publication = '<publication>', @article = '<article>', @property = 'status',
@value='parameters'
Test Results
SQL Delivery Format (A) Stored Procedure
Delivery Format (B)
Performance
Improvement
((A-B)/B)*100
Server Asia Secondary Site
Server 1
Asia Secondary Site
Server 1
Operating System Windows Server 2008 Windows Server 2008
SQL Version SQL Server 2008
SP1/CU1
SQL Server 2008
SP1/CU1
Subscription Type Push Push
Delivery Type SQL Stored Procedure
Page 32
Total Commands 695,477 695,477
Delivery Total Work Time
(minutes)
330.5 142.23 132.37%
Total Tran/Sec 17.53 40.75
Total Cmds/Sec 17.53 40.75
Table 4
Based on these tests, you can see the performance degradation associated with the SQL delivery format
across a slow WAN link. This is primarily due to the size of the packets that are going across the network.
In order to get around this problem, a couple of workarounds were considered.
1. Create a separate publication specifically for the Web-only publication.
Because the Web publication is a subset of articles of both P2P publications, it was necessary to
create a separate read-only publication. The P2P bug manifests itself when a read-only
publication is created on top of a table article already published as part of a P2P publication.
Because it is not possible to subscribe to partial P2P publications, one of the options was to
create a separate P2P publication for the Web publication. The problem with this approach is that
the transactional consistency may be compromised, because related tables may be part of
separate publications. Each publication and each subsequent subscription to the publication will
create a Distribution Agent job. These Distribution Agent jobs are independent of each other and
may or may not run on different schedules. Because the agents are independent, there is no
guarantee that all the transactions will be distributed in the right order if the publications are
separated into three separate publications. For this reason, option 1 was eliminated.
2. Create a read-only subscription to the P2P publication.
This option builds off of the currently existing P2P publications. Instead of using the Configure
P2P Topology Wizard, one of the options is to create a read-only subscription by right clicking on
the publication and then clicking Create Subscription. This enables you to create a one way
subscription to the P2P publication. Changes from the Subscriber will not flow back to the
Publisher.
In order to use this topology, the full set of tables must exist in the Subscriber. This is because it
is not possible to subscribe to partial publications. This prerequisite was discussed with the
customer and it was determined that the data warehouse and read-only Subscriber would be able
to have all tables in the subscriber databases, but this was not possible in the Web databases
due to security issues and concerns.
The performance degradation was only associated to Subscribers that were connected to the
WAN. These Subscribers were the data warehouse and the read-only Subscriber. The Web
databases were not affected because they were on the same subnet. Because the data
warehouse and the read-only Subscribers were able to subscribe to the full set of articles in the
P2P publications, it was decided to create a read-only subscription to the P2P publications for
these two Subscribers. The Web databases, because they were not allowed to receive the full set
of articles, would continue to use their own separate Web publication with the SQL delivery
format.
Page 33
The following figure illustrates the new topology.
Asia Core 1
Asia Core 2
U.S. Core 1
U.S. Core 2
P2P Reference
P2P
Financial
P2P
Reference
P2P Financial
P2P
Reference
P2P
Financial
Asia Web Asia Web Asia Web
Web Publication
DW
Read Only
U.S. Web U.S. Web U.S. Web
Web Publication
P2P
Reference
P2P
Financial
P2P Financial
P2P
Financial
P2P
Reference
P2P Reference
P2P
Financial
P2P
Reference
P2P
Financial
P2P
Reference
Figure 9
5.3.4 Effect of Improved Windows Server 2008 TCP/IP Stack
Windows Server 2008 provides an improved TCP/IP stack. Internal tests have proven that SQL Server
2008 running on Windows Server 2008 result in substantial performance improvements over SQL Server
2005 running on Windows Server 2003. For more information, see Geo-Replication Performance Gains
with Microsoft SQL Server 2008 Running on Windows Server 2008 (http://msdn.microsoft.com/en-
us/library/dd263442.aspx).
Per the MSDN Case Study, the main features contributing to improvement in the end-to-end
performance of SQL Server 2008 running on Windows Server 2008 over Windows Server 2003 are:
Receive Window autotuning
Send Buffer scaling
Compound TCP
Due to the results from internal testing, it was worthwhile to perform similar tests on the customer
replication topology with actual data to see if the same performance improvements would be able to be
achieved. The test results are listed in the following table.
Page 34
Test Results
SQL Server 2005 (A) SQL Server 2008 (B) Performance
Improvement
((A-B)/B)*100
Server Asia Secondary Site
Server 1
Asia Secondary Site
Server 2
Operating System Windows Server 2003 Windows Server 2008
Subscription Type SQL SQL
Transaction Type Single Transaction Deleting
Many Records
Single Transaction Deleting
Many Records
Total Commands 347,737 347,737
Delivery Total Work Time
(minutes)
73.48 70.83 3.74%
Total Tran/Sec 0 0
Total Cmds/Sec 78.87 81.81
Server Asia Secondary Site
Server 1
Asia Secondary Site
Server 2
Operating System Windows Server 2003 Windows Server 2008
Subscription Type SQL SQL
Transaction Type 1 Transaction / 1
Command
1 Transaction / 1
Command
Total Commands 347,737 347,737
Delivery Total Work Time
(minutes)
222.98 215.68 6.63%
Total Tran/Sec 25.99 26.87
Total Cmds/Sec 25.99 26.87
Table 5
Although performance gains were achieved, the performance gains were not significant. Greater
improvement was seen with the utilization of a WAN Accelerator between the Asia Secondary Site and
the Asia Primary Site. For more information, see WAN Acceleration section (5.3.5).
Page 35
5.3.5 Effect of WAN Accelerator
All three data centers are connected via a WAN. In between each data center is a third-party WAN
Accelerator appliance that is used to optimize throughput as much as possible. WAN Accelerators employ
data reduction, compression, and quality of service techniques such as priority to improve WAN
efficiency.
It was the request of the customer to test the effectiveness of the WAN Accelerator in between their three
data centers. The results of the tests are detailed in the following table.
Test Results
WAN Accelerator Off (A) WAN Accelerator On (B) Performance
Improvement
((A-B)/B)*100
Server Asia Secondary Site
Server 1
Asia Secondary Site
Server 1
Operating System Windows Server 2008 Windows Server 2008
SQL Server Version SQL Server 2008
SP1/CU1
SQL Server 2008
SP1/CU1
Subscription Type Pull Pull
Delivery Type SQL SQL
Transaction Type Single Transaction
Deleting Many Records
Single Transaction
Deleting Many Records
Total Commands 347,737 347,737
Delivery Total Work Time
(minutes)
70.83 20 254.15%
Total Tran/Sec 0 0
Total Cmds/Sec 81.81 289.78
Server Asia Secondary Site
Server 1
Asia Secondary Site
Server 1
Operating System Windows Server 2008 Windows Server 2008
SQL Server Version SQL Server 2008
SP1/CU1
SQL Server 2008
SP1/CU1
Subscription Type Pull Pull
Delivery Type SQL SQL
Transaction Type 1 Transaction / 1
Command
1 Transaction / 1
Command
Page 36
Total Commands 347,737 347,737
Delivery Total Work Time
(minutes)
215.68 123.28 74.95%
Total Tran/Sec 0 0
Total Cmds/Sec 26.87 47.01
Table 6
As can be seen from these tests, the use of a WAN Accelerator in a replication topology that
communicates via slow WAN speeds can greatly improve performance.
From a WAN perspective, SQL replication is very similar to storage replication. As a result, WAN
acceleration provides similar benefits. For example, by implementing data reduction, compression,
latency mitigation and loss correction, WAN acceleration results in faster replication times, better WAN
bandwidth utilization, and faster host server performance - all across wider geographic distances.
5.3.6 Effect of Subscription Streams
The -SubscriptionStreams parameter can greatly improve aggregate replication throughput. It allows
multiple connections to a Subscriber to apply batches of changes in parallel, while maintaining many of
the transactional characteristics present when using a single thread. If one of the connections fails to
execute or commit, all connections will abort the current batch, and the agent will use a single stream to
retry the failed batches. Before this retry phase completes, there can be temporary transactional
inconsistencies at the Subscriber. After the failed batches are successfully committed, the Subscriber is
brought back to a state of transactional consistency.
Publisher
Asia Core 1 Publisher
Subscriber Subscriber
No Subscription Streams
Subscription Streams
(3 Threads)
Figure 10
Page 37
Test Results
No Streams (A) Streams (B) Performance
Improvement ((A-
B)/B)*100
Server Asia Secondary Site
Server 1
Asia Secondary Site
Server 1
Operating System Windows Server 2008 Windows Server 2008
SQL Server Version SQL Server 2008
SP1/CU1
SQL Server 2008
SP1/CU1
Subscription Type Pull Pull
Delivery Type Stored Procedure Stored Procedure
Subscription Streams 0 4
Total Commands 695,477 695,477
Delivery Total Work Time
(minutes)
106.19 98.8 7.47%
Delivery Tran/Sec 78.4 58.66
Delivery Cmds/Sec 78.4 117.32
Server Asia Secondary Site
Server 1
Asia Secondary Site
Server 1
Operating System Windows Server 2008 Windows Server 2008
SQL Server Version SQL Server 2008
SP1/CU1
SQL Server 2008
SP1/CU1
Subscription Type Pull Pull
Delivery Type Stored Procedure Stored Procedure
Subscription Streams 0 2
Total Commands 695,477 695,477
Delivery Total Work Time
(minutes)
106.19 90.8 16.95%
Delivery Tran/Sec 78.4 63.82
Delivery Cmd/Sec 78.4 127.64
Table 7
Page 38
These tests show that adding too many streams doesnt necessarily result in linear performance
improvements. Testing must be done to determine the optimal number of subscription streams. In
addition, it was discovered that the SQL delivery format cannot utilize the subscription streams feature of
SQL Server 2008. This has been confirmed by the SQL Server Product Team and is due to SQL
deliverys inability to perform rowcount checks. Stored procedure delivery format is able to perform this
rowcount check.
5.3.7 Effect of Removing a Node
The purpose of this test was to determine what the impact of removing a node from the P2P publication
topology would have on replication performance. P2P topology will attempt to distribute transactions in
the most efficient matter.
For example, in the below topology, with transactions originating from Asia Core 1, P2P replication will
determine to find the most efficient way to distribute transactions.
Asia Core 1
Asia Core 2
U.S. Core 1
U.S Core 2
Path 1
Path 1
Path 2
Path 2
Path 3
Path 1
Figure 11
The first set of transactions will attempt to be replicated via Path 1. While Path 1 transactions are being
replicated to United States Core 1 and United States Core 2, the next set of transactions will be
distributed to Asia Core 2, because the transaction delivery is more efficient due to the Asia databases
being on the same subnet. After Asia Core 2 database receives the next set of transactions, it will attempt
to deliver the next batch to United States Core 1 and United States Core 2, but still maintaining
transactional consistency. It is able to do this because each node keeps track of the original publisher
LSN and performs a check on this peer LSN before applying a set of transactions.
In the scenario where Asia Core 2 were down for maintenance or disaster, it was important to understand
whether there would be any performance degradation if a node was moved from the Asia subnet. This
was done by simply removing the subscriptions between Asia Server 2 and United States Server 1 and
Page 39
Asia Server 2 and United States Server 2. By removing the subscriptions on Asia Server 1, it was
possible to simulate server downtime.
The following figure illustrates the tested topology.
Tested Topology
Asia Core 1 U.S. Core 1
U.S. Core 2
P2P Reference
P2P
Financial
P2P
Financial
DW
Read Only
U.S. Web U.S. Web U.S. Web
Web Publication
P2P
Reference
P2P Financial
P2P
Reference
P2P
Financial
P2P
Reference
P2P
Financial
P2P
Reference
Asia Core 2
P2P
Financial
P2P
Reference
Figure 12
Test Results
Four-Node Topology (A)
Three-Node Topology (B)
Performance
Improvement
((A-B)/B)*100
Server Asia Server 2 Asia Server 2
Operating System Windows Server 2008 Windows Server 2008
SQL Server Version SQL Server 2008
SP1/CU1
SQL Server 2008
Subscription Type Push Push
Delivery Type Stored Procedure Stored Procedure
Total Commands 347,737 347,737
Delivery Total Work Time
(minutes)
2.21 36.5 -93.94%
Delivery Tran/Sec 2610.7 158.84
Page 40
Delivery Cmds/Sec 2610.7 158.84
Server United States Server 1 United States Server 1
Operating System Windows Server 2008 Windows Server 2008
SQL Version SQL Server 2008
SP1/CU1
SQL Server 2008
SP1/CU1
Subscription Type Push Push
Delivery Type Stored Procedure Stored Procedure
Total Commands 347,737 347,737
Delivery Work Time
(minutes)
49.5 174.76 -71.67%
Delivery Tran/Sec 1.19 33.16
Delivery Cmds/Sec 116.87 33.16
Server United States Server 2 United States Server 2
Operating System Windows Server 2008 Windows Server 2008
SQL Version SQL Server 2008
SP1/CU1
SQL Server 2008
SP1/CU1
Subscription Type Push Push
Delivery Type Stored Procedure Stored Procedure
Total Commands 347,737 347,737 -71.68%
Delivery Work Time
(minutes)
49.5 174.80
Delivery Tran/Sec 1.19 33.15
Delivery Cmds/Sec 116.71 32.91
Table 8
These tests conclude that having a separate peer on the same subnet will significantly improve the
performance of P2P replication. This is due to the offloading of P2P commands to another master while
the original publisher continues to distribute transactions to other subscribers. P2P replication is able to
keep transactional consistency because replication keeps track of the original publisher database as well
as its associated LSN.
Page 41
Due to the results, it was decided to further distribute the replication to prevent one single node from
being responsible for the distribution of transactions. This resulted in the final replication topology:
Final Topology
Asia Core 1
Asia Core 2
U.S. Core 1
U.S Core 2
P2P Reference
P2P
Financial
P2P
Reference
P2P Financial
P2P
Reference
P2P
Financial
Asia Web
DW
Read Only
U.S. Web U.S. Web U.S. Web
Web Publication
P2P
Reference
P2P
Financial
P2P Financial
P2P
Financial
P2P
Reference
P2P Reference
P2P
Financial
P2P
Reference
Asia Web Asia Web
P2P
Financial
P2P
Reference
Asia Web Asia Web
Web Publication
Web Publication
U.S. Web U.S. Web
Web
Publication
Figure 13
Page 42
6 DEPLOYMENT
The final phase of the governance model is the Deploy Phase. The goals of this phase are to place the
solution into production at the designated environments and to facilitate a smooth transfer of solution from
project team to the operations team as soon as possible.
6.1 Go-Live Planning
Setting up replication in a controlled environment is not difficult. Connected users, back-office
applications, general database availability, and most importantly, customer-facing applications are not
critical when setting up a test/development environment. Down time is not an issue in a test/development
issue. However, application/database downtime is critical when it is time to go live with P2P.
In order to limit downtime, the go-live process needs to be thoroughly analysed. It is critical to work with
the main stakeholders of the database to understand acceptable downtime. It is important to understand
how long it takes for specific go-live tasks to complete. Full/Differential/Transaction Log database
backups and restores, database upgrades, publication/subscription creation, snapshot delivery, and the
FTP of database backups to remote servers are just a few of the many tasks that need to be analysed to
understand the amount of time necessary to complete all go-live tasks. All of these items will dictate how,
what, when, where, why, and in what order these tasks will be performed when migrating to production.
Extensive downtime was not an option. The steps necessary to perform the upgrade, the transfer of
database backups, and replication setup took well over a day due to various infrastructure challenges
associated to the transfer of database backups from the Asia-based data center to the U.S.-based data
center. In an ideal situation, a downtime of a day would have been the suggested configuration, but the
nature of the business did not allow for that.
In order to minimize downtime, we analysed ways to limit downtime yet still provide the stability and time
necessary to complete the migration steps. After discussing all of the options with the customer, the
option to reduce the amount of downtime consisted of the creation of a read-only transactional replication
publication//subscription from their existing SQL Server 2005 database to the SQL Server 2008 database.
The following table lists the final migration time and application availability.
Step Process Application Down Downtime (min)
1 Upgrade SQL Server 2005 to
SQL Server 2008 via
Database Restore script
No N/A
2 Turn off write applications Yes 5
3 Take differential backup Yes 5
4 Create publication Yes 1
5 Create Subscription Yes 1
6 Turn on write apps No N/A
7 Restore differential backup in
SQL Server 2008
No N/A
8 Perform full database
backup on SQL Server 2008
No N/A
Page 43
9 Transfer database to remote
servers
No N/A
10 Setup Replication Topology No N/A
11 Synchronize Replication No N/A
12 Start Replication Agent on
SQL Server 2005
No N/A
Table 9
As illustrated in the table, total application/database downtime consisted of 12 minutes. Furthermore,
these steps were performed during nonstandard work hours, which further reduced the potential
downtime impact. The migration methodology also provided the customer more liberty to migrate their
applications to the new server as data was constantly being synchronized with the transactions
originating from the SQL Server 2005 database. After the customer was ready to cut over to the new
system, the only steps necessary were to stop the replication agent on SQL Server 2005 and then repoint
applications to the SQL Server 2008 instance.
6.2 Go-Live Testing
Testing of the go-live process is just as important as the go-live planning. After the go-live plan is created,
it is important to script out those tasks and run them through as many times as possible to obtain an
understanding of what issues may occur and how long the migration will take.
With proper testing, the actual go-live process should be less risky, prescriptive, and more predictive in
nature.
Page 44
7 CONCLUSIONS
P2P replication is a mature technology, and it is enterprise ready. It is currently being utilized inside
Microsoft and many other global enterprises throughout the world.
The main purpose of this document was to prove that the replication topology would be able to be
replicated to the peer servers within the expected timeframes and to help understand the issues that may
surface as part of the implementation. In addition, the performance tests allowed the team to understand
the effect of different replication configurations based on a controlled dataset.
Replication in general is not difficult to implement. However, properly designing, testing, and optimizing
replication is iterative, and adequate time must be allocated to ensure that all components are extensively
tested.