Scaling out SQL Azure with Database Sharding

Demo Script

Scaling Out SQL Azure with Database Sharding

Lab version:2.0.0

Last updated:1/22/2019

Contents

Overview

Key Messages

Key Technologies

Time Estimates

Setup and Configuration

Task 1 –Running the Dependency Checker

Task 2 - Configuring the Database Connection String

Task 3 – Creating and Populating Shard Databases with Sample Data

Demo Flow

Opening Statement

Step-by-Step Walkthrough

Load and Partition Data

Review the Sample Application

Query and Insert Partitioned Data

Summary

Overview

This document provides setup documentation, step-by-step instructions, and a written script for showing a demo of SQL Azure. This document can also serve as a tutorial or walkthrough of the technology. In this demo you will see the basics of inserting and querying data from a shardedSQL Azure database. For additional demos of the Windows Azure platform, please visit

Note: In order to run through this demo, you must have a SQL Azure developer account. For more information on how to purchase an account, visit the SQL Azure Portal at

Key Messages

In this demo you will see three key things:

Database sharding adds complexity to your application but it’s managable
You can abstract the partitioning strategy from your individual queries
Running queries in parallel is important for performance

Key Technologies

This demo uses the following technologies:

SQL Azure
.NET Framework 4.0
Microsoft Visual Studio 2010
Windows Azure Tools for Microsoft Visual Studio

Time Estimates

Estimated time for setting up and configuring the demo: 10+mins(depending on how much sample data you generate)
Estimated time to complete the demo: 20 min

Setup and Configuration

The setup and configuration for this demo involves the following tasks:

Run the Dependency Checker
Configure the Database Connection String
Create and PopulatingShard Databases with Sample Data

Task 1 –Running the Dependency Checker

This demo does not have any advanced configuration requirements. You simply need to have the prerequisites installed and have an account for SQL Azure. For more information on how to purchase an account, visit the SQL Azure Portal at

The following steps describe how to run the Dependency Checker utility included with the demo to verify that you have the prerequisite components. You can skip this exercise if you are confident that the prerequisites are properly installed and configured.

Open a Windows Explorer window and browse to the demo’sSource\Scriptfolder.
Double-click theDependencies.depfile in this folder to launch the Dependency Checkertool and install any missing prerequisites.
If the User Account Control dialog is shown, confirm the action to proceed.

Note: This process may require elevation.The .dep extension is associated with the Dependency Checker tool during its installation. For additional information about thesetupprocedure and how to install the Dependency Checker tool, refer to theSetup.docxdocument in theAssetsfolder of the training kit.

Note: As shipped, the connection string template in the configuration file points to a local (default) instance of SQLExpress. To load data into SQL Azure, you will need to update the connection string in the configuration file to use your SQL Azure account settings.

Task 2 - Configuring the Database Connection String

To use SQL Azure during the demo, you will need to update the connection string used by the application.

Open the ServiceConfiguration.cscfg file in the \code\sharding.cloud\ folder and update the connection string in the ShardsDB setting with your SQL Azure account information.
Open the Web.config file in the \code\sharding.web\folder and update the connection string in the ShardsDB setting with your SQL Azure account information.
Open the App.config file in the \assets\ShardDbLoader\ folder and update the connection string in the ShardsDB setting with your SQL Azure account information.

Task 3 –Creating and Populating Shard Databases with Sample Data

The sample application in this demo requires its shard data to be stored in SQL Azure. Populating the necessary databases takes time so you need to ensure that this procedure has completed before you start the demo. Prepare this step well in advance.

The demo assets include an application that will generate random data and populate the shard databases in SQL Azure. You can control the amount of data produced by specifying how many contacts and orders the program needs to generate. In addition, the shard loader supports two separate partitioning strategies, specifically by date and by country. Running the program with one partitioning scheme does not overwrite the data generated with the other, which means that you could have two separate sets of databases, one for each partitioning strategy. You specify program options in its command line. To view available options, type SHARDDB /help.

Note: Actually, a user can create only five databases on SQL Azure. For this reason, you will not be able to create both partitioning strategies (date and country) on SQL Azure at the same time using the same user.

Since, this demo script has been prepared to show how shardingcan be performed on SQL Azure using a countrypartition strategy; you should use the ShardDBcommand line applicationimplementing a countrypartitioning strategy when you run it against SQL Azure.

Figure 1

Shard loader application

To run the application, follow these steps:

Open a command prompt and change the current directory to the assets\ShardDbLoader folder of the demo. If you executed the Configuration Wizard, it should contain a SHARDDB.EXE executable that resulted from the successful compilation of the solution; otherwise, build the solution in the code folder of the demo installation directory.
To create a new set of sample data, run the loader application specifying the /create option and the number of orders and contacts,as well as the partitioning strategy to use. For example, to generate 5000 orders and 300 contacts using the country partitioning strategy, run the shard loader with the following command line.

SHARDDB.EXE /create orders:5000 contacts:300 partition:country

To delete data generated by this application, you need to run the loader with the /delete option and specify which partitioning strategy you wish to delete. For example, to delete the data generated by the country partitioning scheme, specify the following command line.

SHARDDB.EXE /delete partition:country

Demo Flow

The following diagram illustrates the high-level flow for this demo and the steps involved:

Figure 2

Demo Flow

Opening Statement

During this demo, we will examine a sample application that was built to highlight some of the fundamentals of building apps that connected to sharded databases. It’s critical to understand when it’s appropriate to shard your database and that the decision to do so should not be taken lightly as it adds an additional level of complexity to your application. For certain scenarios the benefits of database scale out can be huge. This is particularly true for applications that require massive throughput.

There are many different approaches for scaling out your databases. It’s very important to determine the right strategy as it will directly impact the complexity and performance of your solution. This demo focuses on a scenario where you have a workload for a single application spread across multiple databases with the need for you application to determine dynamically which database to connect to for a given query and the ability combine the result sets. For some scenarios, especially for Independent Software Vendors (ISVs) a much simpler strategy may be to assign each customer their own database, removing the need to allow for queries across multiple databases.

When using a scale out database strategy you get the added resources of each of the machines processing your workload. Having one 10 GB database is not the same as having ten 1 GB databases. When you have ten 1GB databases you have distributed your workload over many more machines which is particularly important for high throughput.

In future versions of SQL Azure we can expect to see more features for helping to managing the partitioning of your data among multiple databases and fan-out queries.

In this demo you will specifically see three key things:

Database sharding adds complexity to your application but it’s managable
You can abstract the partitioning strategy from your individual queries
Running queries in parallel is important for performance

Step-by-Step Walkthrough

This demo consists of the following segments:

Load and Partition Data
Review the Sample Application
Queryand InsertPartitioned Data

Load and Partition Data

In this segment, you load sample data into a local SQLExpress instance to show how the partitioning strategy affects the distribution of data in the shards.

Action / Script / Screenshot

Open a command prompt and change the current directory to the assets\ShardDbLoader folder of the demo installation directory.
Execute SHARDDB.EXE specifying a country partitioning strategy. You can use the default values for the inserted contact and order count parameters.

SHARDDB /create partition:country
Note: If you previously executed this demo script, you will already have created the partitioned databases. To delete them, use the /delete parameter in the loader application and specify the partitioning strategy for which to delete data. For example,
SHARDDB /deletepartition:country
Note: As shipped, the connection string template in the configuration file points to a local (default) instance of SQLExpress. If you previously configured the connection string to load data into SQL Azure, you will need to restore its original value to use local storage. /

To start this demo, we first need to populate the store used by the application with sample data. To do this, we will use a console application that loads random data into the databases. Each database contains a copy of the tables used by the sample application, namely Contacts, Products, SalesOrderHeader, and SalesOrderDetail.
As you will see later in the demo, the sample application uses SQL Azure for storage, but to illustrate how the partitioning strategy affects the distribution of data in each partition, we will initially use a local SQL Server instance to create two separate sets of shards, each partitioned with a different strategy. Developing locally and deploying to SQL Azure is a pattern we expect to see for building applications.
Note that some of the tables contain reference data and its contentis not partitioned; instead, the tables are replicated to each shard. This is the case for both the Contacts and Products tables. On the other hand, the data in theSalesOrderHeadertable is partitionedusing a configurable strategy. In addition, SalesOrderDetailis considered a child of the SalesOrderHeader table. Hence, each row in this table is stored in the shard of its associated row in the parent table. In other words, all detail rows for an order are stored in the same shard as the order header.
We will now create our data set. First, we will partition the rows by country. The shard loader application allows us to specify the partitioning strategy in its command line. We can also specify the number of contacts and orders inserted into each shard but we will use the default values for now. This will allows us to insert a limited number of rows into the local SQL Server instance, enough to show the distribution of data based on a partition strategy.

Open SQL Server Management Studio and connect to the local SQL Server instance.
In the Object Explorer, locate the set of databases that correspond to the country partitioning masscheme. They should be named COUNTRY_00 to COUNTRY_0N.
Pick one of the databases in this set, expand the Tables node, right-click the SalesOrderHeader table, and choose Select Top 1000 Rows.
Repeat the process to show the SalesOrderHeader in one of the other databases in this set.

Now that we have created our sample data, we will use SQL Server Management Studio to examine the data inserted by the loader and see how it has distributed the information in the shards.
First, wewill view the contents for the SalesOrderHeader table in one of the databases in the set. You can see that the Country column shows that the table only contains rows for a single country.
Looking at the contents of the same table in one of the other databases in this set, we can see that it contains data for a different country.

Execute the SHARDDB application once again, this time specifying a date partitioning strategy.

SHARDDB /create partition:date /

Next, we will repeat the loading process but this time specifying that the data be partitioned by date or, to be precise, by quarter.

Switch back to SQL Server Management Studio.
In the Object Explorer, locate the set of databases that correspond to the data partitioning scheme. They should be named QUARTER_01 to QUARTER_4.
Show the contents of the SalesOrderHeader table for one of the databases in this new set.

Once again, we can look at the data inserted by the loader in SQL Server Management Studio. As you can see, it has created a new set of databases, one for each quarter.
Examining the contents of the SalesOrderHeader table, we can see that in this new partitioning scheme, the table now contains orders for a single quarter of the year. Note that in this demo, for simplicity, the date partitioning scheme only takes into account the quarter so that orders for different years could potentially be allocated to the same shard.

Review the SampleApplication

In this segment, you review the sample application and provide a brief description of its implementation.

Action / Script / Screenshot

Start Visual Studio 2010 as an administrator (required to run in the development fabric).
Open the Microsoft.Samples.Sharding.sln solutionin the code folder of this demo.

I will now give you a quick tour of a sample application used in this demo. We’ve built a simple Windows Azure web application designed to highlight some features of database sharding. Let’sstart Visual Studio 2010 and open the solution.
The solution contains three projects. The first one is a sample Web site that allows us to query and insert orders into a SQL Azure data store. Next, we have a cloud service project that we use to host the site as a Web role in Windows Azure. Finally, the third project in the solutioncontains the console application that we used earlier to load and partition our sample data.

Open the IShardPartitionStrategy.cs file in the Partition folder of the Sharding.Webproject and show its methods.

As we discussed previously, data is partitioned based on a given strategy. For this application, the data access class is designed to accept pluggable partitioning strategies. A partitioning strategy is simply a class that implements a special interface named IShardPartitionStrategy. Let me briefly describe how it works.
The GetShardFor method is the core method in this interface. The purpose of this method is to map any row in a table to a specific shard based on the value of its columns.
To implement the GetShardFor method, it requires metadata that specifies whether a table needs to be partitioned and if so, the name of the field to use as the partition key. The interface exposes this metadata through its PartitionMetadata property.
TheShards property returns a list of identifiers for every shard available in a partitioning scheme. To illustrate with an example, if we use a strategy that partitions by country, it would return a list that contains a shard identifier for each country. An application can map each identifier in the list to a connection string that points to the corresponding target database.
Finally, we have a Name property that returns a description of the partitioning strategy which could be used to identify how the data is partitioned.

Open the ByDatePartitionStrategy.cs file in the Partition folder of the Sharding.Web project.
Highlight the EntityPartitionMap field at the top of the class.

To create a partitioning strategy, you need a class that implements the IShardPartitionStrategy interface. In the sample, we provide two different strategies, the first partitions data based on the country where the order is shipped, while the second one partitions based on the quarter of the order. The shard loader application that we used at the start of the demo shares the same partitioning strategies to load its data. Let’s quickly look at the implementation for each of these classes.
First, we willreview the ByDatePartitionStrategy.Here we have the metadata that determines how each table is partitioned. Notice that it contains an entry for each of the tables. Each table has metadata that specifies whether the table needs to be partitioned (or Sharded), and which partition field to use. Tables that are not partitioned are identified as Global.

Briefly describe the implementation of the GetShardFor method.

Next, we will look at the implementation of the GetShardFor method. The method starts by retrieving the metadata for the chosen table to determine whether it needs to be partitioned.
For “Global” tables, it returns a list of every available shard. This is for the benefit of the loader application, which can replicate the data to each shard. For queries, any of the shards in this list can be used to retrieve data since they all contain the same data.
For “Sharded” tables, the method retrieves the value of the partition key field, it ensures that it is of DateTime type, and then extracts its quarter.
Finally, it calls the GetShardName using the month value as a parameter. This last method simply returns an identifier of the form QUARTER_{0}, where the placeholder is replaced by the quarter value based on the month. The shard identifier is then returned to the application.

/
Open the ByCountryPartitionStrategy.cs file in the Partition folder of the Sharding.Webproject to discuss the implementation of the GetShardFor method. /

The ByCountryPartitionStrategy is very similar.Examining the GetShardFor method in this class, we see that it too retrieves the value of the partition key field and searches an array of countries for this value. It then uses the index of the matching entry to call the GetShardName method, which returns a shard identifier of the form COUNTRY_{0}, where the placeholder is replaced by the index of the country.

Open the ServiceConfiguration.cscfgfile in the CloudService project and locate theshardsDB setting in the configuration.
Show that the connection string points to database.windows.net and that the database name (Initial Catalog) is specified as a placeholder.

For this demo, we will run the sample app locally using the development fabric.
Regardless of its execution environment, you can see from the connection string in the configuration file that the application uses storage in SQL Azure. Notice that the Initial Catalog setting in the connection string includes a placeholder for the database name. The application uses this connection string as a template and maps each shard to a different database by replacing the database name with the shard identifier.

Open the Default.aspx page and switch to design view to show that different queries are available.

To complete our review of the sample application, let’s examine its user interface. The application is very simple and contains a main page where you execute different queries against each of the partitions.
Each query exposed in the UI is associated with a method in the data access class.

Queryand InsertPartitioned Data

In this segment, you show how to query and insert data in the partitioned database.