The Advantages of Microsoft Merge Replication for Mobile and Distributed Applications

Filename: SQLServer2005_MergeComparative.doc1

The Advantages of Microsoft Merge Replication for Mobile and Distributed Applications

SQL Server Technical Article

Writers: Michael Blythe

Technical Reviewer: Philip Vaughn; Vijay Tandra Sistla

Project Editor: Jeannine Nelson-Takaki

Published:February 3 2006

Updated:

Applies To: SQL Server 2005

Summary:This white paper summarizes the advantages of using Microsoft SQL Server merge replicationin mobile and distributed environments.This paper also addresses specific issues raised by a competitor, Progress Software, in a paper titled "Comparing Microsoft SQL Server Replication and DataXtend RE for Mobile and Distributed Applications."

Copyright

White Papers Do NOT Permit Modification and/or Reproduction

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place or event is intended or should be inferred.

Microsoft and SQL Server 2005are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Filename: SQLServer2005_MergeComparative.doc1

The Advantages of Microsoft Merge Replication for Mobile and Distributed Applications

Table of Contents

Introduction

Supported Replication Topologies

Merge Replication Configuration

Configuration

Encryption

Snapshot Storage and Transfer

Selecting Data for Replication

Synchronizing Data

Capturing Changes in the Database

Maintaining Ongoing Synchronization

Managing Changes to the Replication Network

Support for Complex Architectures

Support for Multiple Database Types

Support for Complex, High-Volume Database Applications

Conclusion

The Advantages of Microsoft Merge Replication for Mobile and Distributed Applications1

Introduction

Microsoft SQL Server 2005 provides three types of replication, each with different capabilities: merge replication, transactional replication, and snapshot replication. This paper principally discusses merge replication, a synchronization technology that lets database applications work while disconnected from the network. Originally released in SQL Server 7.0, merge replication is primarily designed for mobile applications and distributed server applications that have possible data conflicts. The most common uses of merge replication include:

Exchanging data with mobile users: Many applications require data to be available to remote users, including sales people, delivery drivers, and so on. These applications include customer relationship management (CRM), sales force automation (SFA), and field force automation (FFA) applications.
Consumer point of sale (POS) applications: POS applications, such as checkout terminals, typically require data to be replicated from remote sites to a central site.
Integrating data from multiple sites: Applications frequently integrate data from multiple sites. For example an application that supports regional offices might require data to flow between regional offices and a central office.

This paper introducesreplication toplogies, and discusses configuration and maintenance,data synchronization, and support for complex architectures, with occasional comparisons to competing products.

Supported Replication Topologies

A replication topology defines the relationship between servers and copies of data and clarifies the logic that determines how data flows between servers. SQL Server replication supports numeroustypes of topologies.

Merge replication supports both the hub-and-spoke and republishing topologies. The decision to promote the hub-and-spoke topology in merge replication, instead of the peer-to-peer topology used by competitors such as Progress Software, is driven by the following design considerations:

The hub-and-spoke topology creates a clear schema owner (the Publisher), which allows SQL Server to support seamless replication of schema changes. All schema changes are made at the Publisher, and they flow to each Subscriber during synchronization.
The topology enables SQL Server to support consistent conflict resolution schemes in which you can specify, for example, that changes at the Publisher should overwrite changes at any Subscriber if a conflict occurs.

A republishing topology is essentially an extension of a hub-and-spoke topology, with one or more Subscribers also acting as Publishers that replicate data to other Subscibers in a hierarchy.

SQL Server replication also supports a ring or peer-to-peer topology with peer-to-peer transactional replication (available in SQL Server 2005 Enterprise Edition). Peer-to-peer transactional replication is ideal for the situations Progress Software describes: "load balancing, failover, and back-of up databases at the hub level."For more information, see "Selecting the Appropriate Type of Replication" and "Peer-to-Peer Transactional Replication" in SQL Server Books Online.

Merge Replication Configuration

In this section, we discuss a number of areas related to configuring replication: available tools; encryption; snapshot transfer and storage; and selecting data for replication.In each case we show the flexibility and ease of use that merge replication provide.

Configuration

Microsoft provides access to replication features through a combination of a rich user interface, wizards to jumpstart replication setup, and APIs to extend and automate development. The APIs include

Replication Management Objects (RMO), a managed code application programming interface (API)
Transact-SQL stored procedures

Replication is fully integrated with SQL Server Management Studio. This integration means that you can configure and monitor your replication topology by using tools you are already familiar with, without having to leave SQL Server Management Studio. This integrated approach provides flexibility and power to customize configurations and gives you more security options than would be possible with a one-size-fits-all approach.

Wizards provide the easiest way to create publications and subscriptions. Because merge replication is frequently used in mobile scenarios in which synchronization is controlled programmatically by an application on the device, you might use the New Publication Wizard to create the publication, and use RMO to create subscriptions and synchronize each Subscriber. This allows applications to be deployed easily without requiring an administrator to directly configure each device and without requiring device usersto run a configuration wizard. For more information, see "Implementing Replication" and "Replication Samples" in SQL Server Books Online.

SQL Server provides similar flexibility for configuring replication security, enabling fine-grained control over the accounts under which replication agents run and make connections. Each agent can run under a different Windows user and can make connections using Windows Authentication or SQL Server Authentication. If all computers involved in replication are in the same domain or in domains that trust each other, we recommend that you use Windows domain users, but this is not required. For example, if you configure replication outside a Windows domain, you can specify local Windows users on each computer and still use Windows Authentication or SQL Server Authentication for agent connections. For more information, see "Replication Agent Security Model" and "Replication Security Best Practices" in SQL Server Books Online.

Not all replication configuration steps are automated, and the services of a network adminstrator might be required for some scenarios, such as using FTP to transfer snapshot files. However, the network requirements for SQL Server replication are no more complex than should be expected from any distributed application. Sometimes a network administrator might have to provide assistance, but this is not unusual for distributed applications.

Encryption

If encryption is required, several options are available, depending on the transfer protocol that is used to send the replicated data. Replication does not encrypt data stored in tables, but encryption is available at the transport level by usingseveral technologies, including the following industry standard technologies: Virtual Private Networks (VPN), Secure Sockets Layer (SSL), and Internet Protocol Security (IPSEC). We recommend that you use one of these encryption methods for the connections between computers in a replication topology.

For more information about using encryption with replication, see the following topics in SQL Server Books Online:

For information about encrypting connections, see "Encrypting Connections to SQL Server."
For information about using VPN and SSL for replicating data over the Internet, see "Securing Replication Over the Internet."
For information about using database encryption with replication, see "Elements of Replication Security."

Snapshot Storage and Transfer

Merge replication is implemented by using the Snapshot Agent and the Merge Agent. If the publication is unfiltered or uses static filters, the Snapshot Agent creates a single snapshot. If the publication uses parameterized filters, the Snapshot Agent creates a snapshot for each partition of data.The Merge Agent applies the initial snapshots to the Subscribers. It also merges incremental data changes that occurred at the Publisher or Subscribers after the initial snapshot was created, and detects and resolves any conflicts according to rules that you specify.

We recommend that you use a UNC share to store snapshots. In merge replication deployments,snapshots are typically of a manageable size for two reasons: the subscription database is frequently on a device with limited storage capacity; and the Subscriber usually requires a partition of the data available at the Publisher, instead ofthe whole data set.

For cases in which the snapshot is too large to transfer over the available network connection, you can initialize a subscription using an alternative method such as backup and restore, or you can transfer a snapshot on removable media. For more information, see "Initializing a Merge Subscription Without a Snapshot" and "Alternate Snapshot Folder Locations" in SQL Server Books Online.

Contrary to the competitor’s claim that merge replication snapshots are too large to copy over UNC paths, and must be sent by using FTP, we recommend FTP only for cases in which replication agentscannot make connections to a UNC share.We do not recommend "…an alternative transfer protocol, [for which] one would have to set up separate encrypted FTP servers on each site…."

However, SQL Server 2005 provides the option for snapshot transfer through FTP for those cases in which replication agents cannot make connections to a UNC share. Typically this occurs when replicating outside a domain. After an FTP server is configured, snapshot transfer can be configured quickly and easily in the Publication Properties dialog box, and does not require "numerous, manual configuration steps on each of the involved hosts." In general, the use of FTP should be based on your application requirements and has no relation to snapshot size. For more information, see "Transferring Snapshots Through FTP" in SQL Server Books Online.

Selecting Data for Replication

Merge replication lets you publish an entire database or a subset of the tables in a database. When you publish a table, you can apply filters so that only a subset of the data from the table is replicated. Merge replication provides the following filtering options to support your application requirements:

Parameterized row filters, which use a data value supplied by a Subscriber to send Subscribers different data sets (referred to as "dynamic filters" in SQL Server 2000 Books Online). For more information, see "Parameterized Row Filters" in SQL Server Books Online.
Join filters, which extend a row filter from one published table to another. For more information, see "Join Filters" in SQL Server Books Online.
Static row filters, which specify a data set that all Subscribers to a publication receive. For more information, see "Filtering Published Data" in SQL Server Books Online.
Column filters, which specify a subset of columns to be published. For more information, see "Filtering Published Data" in SQL Server Books Online.

As with other replication features, filtering can be configured using SQL Server Management Studio, RMO, or Transact-SQL stored procedures.

Regardless of the type of filtering used, by default, merge replication processes all changes to data on a row-by-row basis, and treats as a conflict any change made to the same row at more than one node. To borrow the example of an address from the Progress Software paper, merge replication would treat as a conflict any change made to the City, State, or ZipCode columns at more than one node, as long as those columns were in the same row. There would be no "silent data corruption."

To handle changes in rows across multiple tables, merge replication introduces logical records in SQL Server 2005. The logical records feature allows you to define a relationship between related rows in different tables so that the rows are processed as a unit. For example, if the tables Order and OrderDetails are related through a logical record, changes to related rows are committed or rolled back as a unit. If an order is being replicated from a Subscriber to the Publisher and there is a network outage, for example, rows that have already been delivered to the Publisher are not committed until all related rows are delivered to the Publisher. The Publisher and Subscribers always see consistent views of the data. For more information, see "Grouping Changes to Related Rows with Logical Records" in SQL Server Books Online.

Synchronizing Data

After replication is configured, data changes are tracked in the publication and subscription databases. During synchronization these changes are applied at the appropriate nodes in the replication topology. This section of the paper provides a brief introduction to change tracking and discusses areas related to synchronization, including validation and reinitialization.

Capturing Changes in the Database

SQL Server Merge replication has a sophisticated change tracking and enumeration system, which has been refined over the last ten years. The system uses metadata tables in the publication and subscription databases together with triggers and uniqueidentifer columns on the published tables. The uniqueidentifier column is used to uniquely identify each row across the topology. The system is robust and flexible, providing scalable change tracking for all scenarios and enabling advanced features like subscription-based filtering, logical records, and custom conflict detection. For more information, see "How Merge Replication Tracks and Enumerates Changes" in SQL Server Books Online.

SQL Server adds a uniqueidentifier column if the published table does not contain one. This can affect applications if column lists are not used in INSERT and UPDATE statements. However, best practices for database application design dictate that column lists should always be used. This means that most customers are not affected by the addition of the column.

Maintaining Ongoing Synchronization

In a mobile application, Subscribers are frequently offline, and must synchronize data whenever a connection becomes available; thereforereplication must be able to operate well in an occasionally connected environement. Replication should also provide mechanisms to reinitialize a Subscriber if required and to validate that data at the Publisher and Subscriber match.

As described in an earlier section of this paper, merge replication uses a snapshot to initialize each subscription. After the snapshot is applied, incremental changes are tracked and applied to Subscribers by the Merge Agent. Changes to published tables are tracked at the level of the row or the column, depending on the needs of your application. Merge replication lets you control how changes are batched; therefore, you can tune replication for the conditions on your network. The Merge Agent includes logic to retry any failed batches, which can occur when replicating over unreliable networks.

Contrary to assertions by Progress Software, merge replication is not "based upon an initial and periodic snapshot-and-restore process to re-synchronize the sites in the network." We refer to re-synchronization of sites as reinitialization, and we do not recommend or require frequent reinitialization of Subscribers. However, reinitialization is available; it is typically used only in the following cases:

A Subscriber has not synchronized for a long period of time. If the subscription has expired, reinitialization is required. If the subscription has not expired but there are a large number of pending changes at the Publisher, it might be quicker to reinitialize the subscription than it is to apply the changes.
One or more properties have changed that require reinitialization. For a complete list of changes that require reinitialization, see "Changing Publication and Article Properties" in SQL Server Books Online.

Validation means verifying that data at the Subscriber matches data at the Publisher. Regardless of the replication vendor, it is always possible that data can be out of synchronization at a given time, especially when frequently disconnected Subscribers receive changes over unreliable networks. Therefore, validation provides a method to guarantee that data matches without resorting to reinitialization.