Extending the Publisher to the Cloud

SQL Azure Data Sync

SQL Azure Data SyncThis weekend I decided to setup a SQL Azure sync group to synchronize a SQL Azure database with an on-premise replication publisher.  I was curious if this would work, as I’ve been unable to find any documentation stating whether or not SQL Azure Data Sync can co-exist with SQL Server Replication.  Not only was I able to get this working, but I took it a step further and downloaded the Windows Azure and Windows Phone SDKs — then wrote a quick and dirty Windows Phone app which connects to my SQL Azure sync group hub database and pulls down Customer and related Order data to my mobile device, making it available via a pivot control within the application.

I was able to create a sync group in SQL Azure Data Sync to synchronize the Customer, SalesOrderHeader, and SalesOrderDetail tables in the AdventureWorksLTAZ2008R2 database, which also acted as an on-premise Merge publisher in my local network with 3 subscribers.  The solution provided a sync group hub database containing the Merge publication Customer and related Order data, synching one-way from on-premise to the cloud.  I settled for synchronizing the sync group every 30 minutes and kept existing subscribers synching with the on-premise publisher on their normal schedules, which was every 1 minute in my test topology.

Publisher Sync Group

Windows Phone Application

Windows Phone SDKFrom here, I created 2 projects — 1 Windows Azure WCF Service Web Role which allows for Windows Phone app interaction with a SQL Azure database using Entity Framework, and 1 Windows Phone application using WCF to connect to the SQL Azure sync group hub database through the service reference and consume the Customer and related Order data through a pivot control.

A couple of screenshots of the app:


View Customer OrdersCustomer Order
 

As you can see, the potential for SQL Azure Data Sync is interesting.  Not only can sync groups be useful in providing an offsite database storage solution, but when synchronized with an on-premise replication publisher, can also provide a mechanism for exposing enterprise data to the cloud to be consumed by mobile devices.  SQL Azure Data Sync is currently available for preview and can be found in the Windows Azure Management Portal.  More information about SQL Azure Data Sync can be found in SQL Azure Data Sync on MSDN.

How to let non-admin business users run replication agent jobs

This blog post is basic in nature but highlights a very real requirement that may arise when administering replication.  Occasionally we, as Database Administrators, need to grant non-administrator business users the ability to run replication agent jobs due to business needs.  Instead of giving the business user the ability to execute/stop/start jobs by adding them to a SQL Server Agent fixed database role, it is probably wise to limit the business user to execute only the replication agent job in question.

For example, if we want to give a business user the ability to run only the Snapshot Agent job – first, we can create a stored procedure RunSnapshotAgent that runs the Snapshot Agent job, using WITH EXECUTE AS to run the stored procedure as the database owner:

CREATE PROCEDURE dbo.RunSnapshotAgent
WITH EXECUTE AS 'snapshotagent_job_owner'
AS
EXEC sp_start_job
	@job_name = 'snapshot_agent_job'
GO

Where 'snapshot_job_owner' is the snapshot agent job owner and 'snapshot_agent_job' is the snapshot agent job name.

Next, we can create a role named snapshot_executor, and add the business user to that role:

CREATE ROLE snapshot_executor;
 
EXEC sp_addrolemember
	@rolename = 'snapshot_executor',
	@membername = 'BUSINESSUSER1';
GO

Where ‘BUSINESSUSER1’ is the business user’s login.

Then, grant execute rights for the RunSnapshotAgent stored procedure to the snapshot_executor role to allow the business user to start only the snapshot agent job:

GRANT EXEC ON dbo.RunSnapshotAgent TO snapshot_executor;
GO

After creating the stored procedure, creating the role, and adding the business user to that role; the business user can then execute the RunSnapshotAgent stored procedure to run only the Snapshot Agent:

EXEC
dbo.RunSnapshotAgent;
GO

Undocumented Gotcha in Merge Replication Tracking

In the spirit of Robert Davis's (blog | @SQLSoldier) blog post: Undocumented Gotchas of Transactional Replication, I decided to contribute an undocumented gotcha in Merge Replication to his list.  First, let me start off by saying that if you happen to have a difficult replication scenario that your looking for advice on, Robert Davis is an excellent resource.  He was kind of enough to let me pick his brain at SQL Saturday #92 in Portland, Oregon about a topic that would send most people running for the door.  Not only did he offer a solution to my problem, he offered 3 different solutions, which was very nice of him.

Gotcha

One gotcha that you need to watch out for in Merge Replication is changing an existing article's tracking type from row-level tracking to column-level tracking, and when using Server subscription types.  When an article is added to a Merge publication, by default it is added using row-level tracking.  Sometimes, in an effort to reduce conflicts, one might decide to change an article's tracking type from row-level tracking to column-level tracking.  However, after doing so, one might realize that even though column-level tracking has been enabled, that conflicts still occur even though updates were made to different columns for a row at different nodes in the topology.  Further complicating the issue is the fact that the article tracking property will actually indicate that column-level tracking is enabled when checked via the Article Properties dialog or via the stored procedure sp_helpmergearticle.  It can be a little confusing.

Example

For example, lets say we have a Merge publication P1 and subscriptions S1 and S2 which are Server subscription types.  If P1 has an article Customer which is set to row-level tracking and we then change it to column-level tracking using the Article Properties dialog or via the stored procedure sp_changemergearticle and reinitialize subscriptions, the following updates will still generate a conflict after syncing both subscriptions:

S1:

UPDATE Customer
SET FirstName = 'Robert'
WHERE CustomerID = 10

Sync

S2:

UPDATE Customer
SET LastName = 'Davis'
WHERE CustomerID = 10

Sync

Solution

The proper way to handle this is to set the article tracking property to column-level from the very start.  Adding articles and setting column-level tracking using the Publication Wizard or sp_addmergearticle when creating the publication will yield the correct results, avoiding conflicts when updates are made to different columns for the same row.  Unfortunately, if the article has already been added to the publication, it will need to be dropped and added back to the publication with column-level tracking enabled.

Counting Merge Replication Article Changes

One of the cool things about Merge Replication is that it has the capability of revealing very interesting information regarding applications and their associated statements ran against published articles.  We can learn quite a bit about an application just by the sheer number of data changes that are replicated as a result of the statements it executes.  This can reveal not only which tables an application changes the most, but may even be useful in identifying flaws in application statements when the amount of data changes being replicated is more than anticipated.

Merge Replication tracks data changes through triggers and system tables in the publication and subscription databases.  The replication system tables contain metadata that indicates which data changes should be propagated.  One of Merge Replication system tables, MSmerge_genhistory, contains one row for each generation (collection of changes) that is delivered to a Publisher or Subscriber.  MSmerge_genhistory contains a column changecount, which can be used to determine which table articles contain the most data changes.  So to get an idea for which articles contain the most data changes within the retention period, we can do something like this:

USE MyPublicationDB
GO
 
SELECT
name AS [Object Name],
SUM(changecount) as [Change Count]
FROM dbo.MSmerge_genhistory WITH (NOLOCK)
INNER JOIN dbo.sysmergearticles  WITH (NOLOCK) ON art_nick = nickname
WHERE genstatus = 1 OR genstatus = 2
GROUP BY name
ORDER BY [Change Count] DESC

In this scenario, the articles in my test Merge topology that have been changed since my retention period are Customer, SalesOrderDetail, Address, Product, and SalesOrderHeader,  which have had 847, 542, 450, 295, and 32 data changes respectively.  If you find yourself wondering which articles in your Merge topology are exhibiting the most data changes, the changecount column in MSmerge_genhistory is what you're looking for.

Merge Article Count PivotTable And Chart

Replication Bookmarks

I started a list of the resources I frequent the most in the MSDN Library SQL Server Books Online - Replication, Technical Articles and Whitepapers, Blogs and Websites, Videos, Forums, and Books - Replication Bookmarks.

I find myself visiting these pages often and thought this could be a useful reference when searching for quality content on SQL Server Replication.  The list is subject to change as I intend to add to it over time.  One list I plan to grow is Blogs and Websites as I'm aware that there have been a number of blog posts written on the topic of SQL Server Replication.  If you're interested in being added to my replication bookmarks, feel free to get in touch, I'm always looking for something to read.

The process could not read file due to OS error 3

When configuring a subscription to a publication in SQL Server, sometimes the agent process cannot read from the snapshot:

The process could not read file X:\Program Files\Microsoft SQL Server\<instance>\MSSQL\repldata\<snapshot>\<snapshot_file> due to OS error 3. (Source: MSSQL_REPL, Error number: MSSQL_REPL20016)
Get help: http://help/MSSQL_REPL20016
The system cannot find the path specified.
(Source: MSSQL_REPL, Error number: MSSQL_REPL3)
Get help: http://help/MSSQL_REPL3

This error can be puzzling at first especially when the path indeed does exist and the replication agent process accounts have sufficient privileges to access the snapshot folder.

The error occurs because when using pull subscriptions and/or a remote distributor you must specify a UNC network share such as \\<computername>\snapshot rather than a local path for the snapshot folder.  To alleviate this issue, make the snapshot folder a UNC network share and update the Distributor Properties to reflect the change.

More Information

Securing the Snapshot Folder

Merge Business Logic Handler Conflict Resolver

Introduction

This example demonstrates how to implement a business logic handler for a Merge article to perform custom resolution of conflicting update data changes that occur at Publisher and Subscribers.

Building the Sample

This sample was built using Visual Studio 2010.

Description

With Merge Replication you can utilize the business logic handler framework to write a managed code assembly that gets executed during the merge synchronization process.  The assembly can respond to a variety of replication events including data changes, conflicts, and errors.

For this example I will demonstrate how to implement a business logic handler for a Merge article that handles update conflicts.  In order to do this, we will override the UpdatesConflictHandler and make use of the ADO.NET data sets that are provided by the merge process to not only mimic, but extend the functionality included in the Microsoft SQL Server Priority Column Resolver.

The Microsoft SQL Server Priority Column Resolver, when used, uses a column to determine the conflict winner.  Basically, whichever column in the conflicting rows at Publisher or Subscriber that has the highest value, wins the conflict.  This is limited in the fact that you can only incorporate one column into the equation.  This example will extend this functionality to incorporate two columns by making use of the business logic handler framework.

Code Snippet

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.Common;
using System.Linq;
using System.Text;
 
using Microsoft.SqlServer.Replication.BusinessLogicSupport;
 
namespace BLH_CustomConflictResolver
{
    public class SalesOrderHeaderBusinessLogicHandler : Microsoft.SqlServer.Replication.BusinessLogicSupport.BusinessLogicModule
    {
        // Variables to hold server names.
        private string publisherName;
        private string subscriberName;
 
        public SalesOrderHeaderBusinessLogicHandler()
        {
        }
 
        // Implement the Initialize method to get publication
        // and subscription information.
        public override void Initialize(string publisher, string subscriber, string distributor,
            string publisherDB, string subscriberDB, string articleName)
        {
            // Set the Publisher and Subscriber names
            publisherName = publisher;
            subscriberName = subscriber;
        }
 
        // Declare what types of row changes, conflicts, or errors to handle
        override public ChangeStates HandledChangeStates
        {
            get
            {
                // We are handling update conflicts for this example
                return ChangeStates.UpdateConflicts;
            }
        }
 
        // This handler mimics the Microsoft SQL Server Priority Column Resolver but extends its functionality to incorporate 2 columns
        // in the resolution process.
        public override ActionOnUpdateConflict UpdateConflictsHandler(DataSet publisherDataSet, DataSet subscriberDataSet, ref DataSet customDataSet,
                ref ConflictLogType conflictLogType, ref string customConflictMessage, ref int historyLogLevel, ref string historyLogMessage)
        {
            // Priority column #1
            int publisherRevisionNumber = Int32.Parse(publisherDataSet.Tables[0].Rows[0]["RevisionNumber"].ToString());
            int subscriberRevisionNumber = Int32.Parse(subscriberDataSet.Tables[0].Rows[0]["RevisionNumber"].ToString());
 
            // Priority column #2
            DateTime publisherModifiedDate = DateTime.Parse(publisherDataSet.Tables[0].Rows[0]["ModifiedDate"].ToString());
            DateTime subscriberModifiedDate = DateTime.Parse(subscriberDataSet.Tables[0].Rows[0]["ModifiedDate"].ToString());
 
            // Check priority for column #1
            if (publisherRevisionNumber &gt; subscriberRevisionNumber)
            {
                customDataSet = publisherDataSet.Copy();
            }
            else if (subscriberRevisionNumber &gt; publisherRevisionNumber)
            {
                customDataSet = subscriberDataSet.Copy();
            }
            else if (publisherRevisionNumber == subscriberRevisionNumber)
            {
                // Priority column #1 for publisher and subscriber are equal
                // Check priority for column #2
                if (publisherModifiedDate &gt; subscriberModifiedDate)
                {
                    customDataSet = publisherDataSet.Copy();
                }
                else
                {
                    customDataSet = subscriberDataSet.Copy();
                }
            }
 
            return ActionOnUpdateConflict.AcceptCustomConflictData;
        }
    }
}

Downloading the Sample

This sample can be downloaded from the MSDN Code Gallery.

More Information

For more information on executing business logic during merge synchronization see:

Updatable Subscriptions Removed in SQL Server 2012

After spending some time looking at SQL Server 2012 CTP3, it appears as if Transactional Publication with Updatable Subscriptions will officially be removed in SQL Server 2012.  According to Books Online, Microsoft is recommending that users switch to Peer-to-Peer Replication instead.  This is a shame considering Peer-to-Peer is an Enterprise-only feature and will cost significantly more than Standard Edition.  On the other hand, we will still be able to achieve updatable subscriptions in Standard Edition via Merge Replication or Bidirectional Transactional Replication.

Here are the screenshots comparing the SQL Server 2008 R2 New Publication Wizard with the SQL Denali CTP3 New Publication Wizard:

SQL Server 2008 R2 vs. SQL Server Denali CTP3

SQL Server 2008 R2 vs. SQL Server Denali CTP3 - New Publication Wizard

SQL Saturday #92 Lunch: Bend Area Gathering

SQLSaturday #92There's nothing more I love than free food and SQL Server, no joke.  I take my free food and SQL Server very seriously and you should too. Fortunately for me, the folks at PASS know the way to my heart.  It seems as if they've tailored a day of free SQL Server training and food just for me.  I have no idea what's on the menu is but it's free and that's good enough for me to make the trip.  My guess is we'll get an entree of best practices, a side of tips and tricks, and a delectable networking dessert.

With that said, I’ll be attending SQLSaturday #92 in Portland, OR on October 8th, 2011.  If anyone is interested, I’ll be meeting during lunch with anyone from Central Oregon that is interested in working together to create a PASS Chapter in the Bend/Redmond area.  Anyone who is interested is welcome to join the lunch conversation.  Box lunches will be served, so pick up a lunch and join us in this discussion.

Replication Agent Logging

To troubleshoot replication agent failures, replication agent profiles can be configured to output verbose agent logs to a text file.  All replication agents contain the parameters -Output output_path_and_file_name and -OutputVerboseLevel [0|1|2], which when specified, output the agent log to the file output_path_and_file_name and will contain varying degrees of information depending on the verbose level passed.  In general, the information contained in the verbose agent logs consists of progress report messages, error messages, procedure calls, and various session statistics.

Depending on the verbose level and the agent, some or all of this information will be provided.  For the purpose of troubleshooting replication agent failures, a verbose level of 2 should be specified, which prints all error and progress report messages.  It might also be worth mentioning that some agents can be passed undocumented output verbose levels 3 and 4 which provide even more information.  The information provided in the verbose agent logs can be useful when trying to diagnose and identify exactly where replication is failing as well as get detailed information as to the root cause of the failure.  In addition, the verbose agent logs are likely the first piece of diagnostic data that Microsoft Support will ask for when diagnosing a replication support incident.

Reading the logs takes a little getting used to but after utilizing them to troubleshoot a few times, it becomes second nature.  I often use them when replication failures are occurring and the error given in Replication Monitor and the View Synchronization Status window isn’t sufficient enough to diagnose the issue.  An example error message with Merge Replication that occasionally crops up is: The process was successfully stopped. (Source: MSSQL_REPL, Error number: MSSQL_REPL-2147200963), which isn’t very verbose.  Knowing how to output the verbose agent logs is a smart skill to have under these circumstances in order to identify the root cause.

The parameters to output the verbose agent logs can be passed to the Run Agent job step in the Job Properties dialog, Agent Profile dialog in Replication Monitor, via RMO, or by way of invoking the agents from the command line.  After adding the parameters, the replication agent must be restarted for the profile changes to take effect.  Once restarted, the replication agent will begin outputting the verbose log to the path and file name specified.  After reproducing the failure, the log can then be examined for further information.

The verbose agent log is outputted in the form of a text document and can be opened with your favorite text editor.  The -OutputVerboseLevel specified for the agent profile will dictate the contents of the log.  Typically, if a replication agent failure occurs, it will fail and the process will stop.  That means that the failure is likely to be found at the end of the log, in the last 20 or so lines in the output file.

Be sure to remove the agent parameters after reproducing the failure and collecting the log.  Not doing so will result in the log to be appended to on subsequent synchronizations and consume disk space.  Outputting agent logs can also impact replication performance and thus isn’t recommended to do unless troubleshooting replication failures.