/ Microsoft Deploying 10-Terabyte Enterprise Data Warehouse using SQL Server 2008
Overview
Country or Region:United States
Industry:IT Services
Customer Profile
Based in Redmond, Washington, Microsoft is the worldwide leader in software, services, and Internet technologies for personal and business computing.
Business Situation
Microsoft IT needed to unify information from a number of databases across the enterprise to gain a more integrated view of its customers.
Solution
The IT group created an enterprise data warehouse (EDW) using Microsoft® SQL Server® 2008 Enterprise. The EDW is expected to exceed 10 terabytes within its first year of operation.
Benefits
Integrated view of the customer
Easier data management
Automated capture of data changes
3 times smaller backup storage with compression
Faster data inserts
Extensibility of Integration Services / “Our testing found that the Backup Compression feature in SQL Server 2008 provides 3-to-1 compression, which will correspondingly reduce the disk space we need to deploy to support backups.”
Alexey Yeltsov, Lead Systems Administrator, Microsoft
Microsoft, with more than 60,000 employees worldwide, and with 2006 revenue exceeding U.S.$50 billion, generates a lot of internal data that the company wanted to centralize to provide it with a more integrated view of its customers. To fill this need, Microsoft IT is using the Microsoft® Application Platform to create an enterprise data warehouse (EDW) hosted on Microsoft SQL Server® 2008 database software, and running on the Windows Server® 2008 Enterprise operating system. The EDW, which is already 2.5 terabytes and is expected to exceed 10 terabytes within the first 12 months of deployment, will give the company a better view of its customers. The IT group has found it can use the Backup Compression feature of SQL Server 2008 to gain a 3-to-1 reduction in backup storage space. The company is also enjoying easier data management because of new features in SQL Server 2008.

Situation

With operations in 89 countries, Microsoft is the largest software company in the world, generating 2006 revenues exceeding U.S.$50 billion. Like other large organizations, the company requires a repository of data to help it better track its business and to analyze customer needs.

The company, which has more than 60,000 employees, already had a large collection of data warehouses, data marts, and other repositories to support reporting and analytics across a spectrum of its internal organizations and functions, including its FeedStore database—so named because it pulls data from 39 internal sources and feeds some 500 subscribing internal applications. (To read a case study about the company’s FeedStore database, please see:

But the Microsoft IT group identified a need for creating a centralized data repository that would begin with FeedStore data and go even further, adding information from customer relationship management, finance applications, and a range of other sources to create an enterprise data warehouse (EDW).

“We needed an enterprise data warehouse to provide a true integration of the customer and financial data found in other data warehouses and repositories around the company,” explains Jim Walch, GM of Business Intelligence in Microsoft IT. “Creating an EDW using the Microsoft® Application Platform was the only way we could provide users with a consistent view of information across the whole enterprise. Microsoft IT’s EDW creates the perfect platform to enable our users to analyze data using tools like SQL Server® Reporting Services, SQL Server Analysis Services, Performance Point Server, and Office Excel.”

Solution

The organization created its EDW using Microsoft SQL Server 2008 Enterprise database software, running on the Windows Server® 2008 Enterprise operating system. The company needed a rock solid database that could support the 2.5 terabyte size and provide scalability for what is anticipated to be a 10-terabyte EDW within the first 12 months of deployment.

The EDW includes data from the company’s FeedStore database, while adding information from a number of other sources including the company’s Microsoft Accounting Reporting System, a data warehouse of information used to build Microsoft’s financial statements, and the Resource & Project Management database which stores information used for resource allocation and management, including resource skills and resource roles.

Microsoft IT wanted to build its EDW using SQL Server 2008 in order to take advantage of new features, including:

MERGE SQL Statement. The MERGE SQL statement, new for SQL Server 2008, enhances efficiency of executing common database administration tasks such as checking whether a row exists and then executing an insert or update.

Change Data Capture. New for SQL Server 2008, Change Data Capture collects the complete content of changes, and places the information in change tables. The feature maintains cross-table consistency and also works across schema changes.

Backup compression. With SQL Server 2008 backup compression, the compression is performed in memory before the data is transferred to disk, enabling backups to run faster since less disk I/O is required. Backup compression reduces the storage required to keep backups online, reducing the overall cost of keeping disk-based backups.

Enhanced TABLOCK insert. SQL Server 2008 has enhanced the efficiency of using TABLOCK inserts to gain faster data loads. The table locking of TABLOCK provides faster loading than using row locking.

The EDW solution also makes extensive use of SQL Server 2008 Integration Services. First introduced with SQL Server 2005, and enhanced for SQL Server 2008, Integration Services provides enterprise-grade tools for creating and managing extract, transform, and load (ETL) processes. Microsoft IT used SQL Server 2008 Integration Services to create an internal tool, called METaL, for automating much of the process of creating new ETL packages.

“We’ve used a number of new features in SQL Server 2008 as major building blocks for our solution,” says Alexey Yeltsov, Lead Systems Administrator at Microsoft. “And we are looking to take advantage of several other new features, so we are impressed with the wealth of new functionality and features that have been designed into the product.”

Benefits

The enterprise data warehouse created using SQL Server 2008 is giving the IT group the integrated view of the customer the company needed, as well as easier data management. SQL Server 2008 features are yielding a number of other benefits for the company, including 3 times smaller backup storage with compression, faster data inserts, and the ability to build solutions using the extensibility of SQL Server Integration Services.

Integrated View of the Customer

As with other large organizations with extensive data collections, Microsoft saw the value of creating an EDW to provide an integrated repository that yields a more complete view into the data.

“Why build an EDW?” asks Yeltsov. “We need to gather our data from different sources into one place to resolve any conflicts in definitions and to provide a single view of the truth.”

The EDW, as a comprehensive, integrated, data set, also provides a rich view into customer activity across Microsoft business units and geographies.

“The EDW we are creating with SQL Server 2008 provides our business users at Microsoft with a 360-degree view of the customer,” says Mike Gudyka, Senior Applications Developer at Microsoft. “We can gain a more complete view of the customer and improve the quality and velocity of decisions by integrating data that was otherwise isolated in silos across the organization.”

Easier Data Management

EDW developers benefited from using the MERGE SQL statement in SQL Server 2008, because MERGE SQL combines ability to do an insert, update, or delete off the target table in one statement.

“Prior to SQL Server 2008 and its MERGE SQL statement we had to write custom code which we implemented as stored procedures,” says Gudyka. “We can now use a single MERGE SQL statement to replace two fairly complex stored procedures.”

Replacing stored procedures with the MERGE SQL Statement should also increase update processing speed.

“Using MERGE SQL reduces the amount of code that would otherwise need to be written for table maintenance, and so reduces the chance of making mistakes,” says Yeltsov. “Handling what used to require two or three transactions with a single statement also provides faster update processing.”

Automated Capture of Data Changes

The Change Data Capture feature of SQL Server 2008 has made it easier for developers to create solutions requiring a mechanism to track change history on a table. Prior to SQL Server 2008, capturing data changes required the creation of custom code.

“Before we had the Change Data Capture feature we created our own code to handle changes,” says Gudyka. “This worked, but didn’t scale well. When dealing with tables that have millions of rows, it could take longer than we liked to load all of the row indexes to look for changes. Change Data Capture provides a much more efficient mechanism that captures the rows as they’re changed. This greatly enhances the scalability of tracking changes.”

Change Data Capture also supports a user defined limit on how long change data history is kept.

“The ability to define the period of retention is a big help because previously you would have to code your own job that would go through and selectively prune the table based on the time stamp of the data,” says Ron Ortloff, Technology Architect at Microsoft. “With the Change Data Capture feature, data is held for 3 days by default, but you can simply configure the feature for longer or shorter periods of change data retention.”

3 Times Smaller Backup Storage with Compression

Anticipating its database to rapidly exceed 10 terabytes, and knowing that multiple backup copies are usually kept, the IT group was eager to take advantage of the ability to compress data with SQL Server 2008.

“Our testing found that the Backup Compression feature in SQL Server 2008 provides 3-to-1 compression, which will correspondingly reduce the disk space we need to deploy to support backups,” says Yeltsov. “SQL Server 2008 backup can compress data in-memory – just before it is written to the disk.”

The IT group has found that backup compression can reduce the time required to complete backups. Yeltsov notes, “We’re seeing faster backups because the CPU compresses data faster than it would have taken to write uncompressed data to disk.”

Faster Data Inserts

With table locking enabling faster loading of large data sets than using row locks, the IT group was happy to find that SQL Server 2008 has enhanced the efficiency of TABLOCK inserts.

With earlier versions of SQL Server, INSERT statements were fully logged into the transaction log, which meant that a 1 gigabyte (GB) insert resulted in 1 or more GB of information being written to the transaction log. “With the enhanced TABLOCK insert functionality of SQL Server 2008, the transaction log entries for a 1 gigabyte insert are reduced to just a few megabytes,” says Yeltsov. “This reduces disk usage and enables significantly faster inserts. We’re seeing large inserts processed at least three times faster compared to earlier versions of SQL Server.”

Extensibility of Integration Services

The extensibility of SQL Server 2008 Integration Services helped the IT group to create an internal solution, called METaL, to automate much of the creation of SQL Server Integration Services packages for performing ETL processes on data being brought into the EDW.

The METaL application includes a metadata store and business logic to use the metadata to automate creation and distribution of Integration Services packages customized for a range of ETL needs.

“Integration Services provides an excellent execution platform that we use to perform ETL across a distributed environment,” says Gudyka. “We like the fact that we can use our metadata to create Integration Services packages and then send them out for execution. As the packages are running they report back on their progress, supporting fine-grained monitoring.”

In summary Microsoft IT is using the Microsoft Application Platform, including SQL Server 2008 Enterprise, to gain a more integrated view of its customers by gathering several terabytes of data from across its operations into a unified enterprise data warehouse. The company is taking advantage of several new features in SQL Server 2008, including the ability to reduce the size of its backup footprint by using Backup Compression.


Windows Server 2008, SQL Server 2008, and Visual Studio 2008

Windows Server 2008, SQL Server 2008, and Visual Studio 2008 provide a secure and trusted foundation for creating and running your most demanding applications. Combined, the products offer advanced security technology, developer support for the latest platforms, improved management and Web tools, flexible virtualization technology to optimize your infrastructure, and access to relevant information throughout your organization.

For more information about Windows Server 2008, go to:

For more information about SQL Server 2008, go to:

For more information about Visual Studio 2008, go to: