Loss and Penalties with 99.95 Percent SAP Uptime
“We can now install software security patches on our mission-critical SAP ERP system and accrue no more than one minute of downtime by using SQL Server 2012 AlwaysOn and Windows Failover Clustering."
Elke Bregler, Principle Service Architect, Microsoft IT
To facilitate continuous operations, Microsoft needs its SAP enterprise resource planning (ERP) system to be available every day, around the clock. By using built-in features in Windows Server 2012 and SQL Server 2012, the company maintains 99.95 percent uptime, avoids spending millions of dollars, reduces risk, cuts diskspace by three times, and boosts productivity.
This case study is for informational purposes only.MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.
Document published December 2013
Business Needs
Microsoft is the worldwide leader in software, services, and solutions. To manage people, assets, services, and projects, every employee uses the company’s SAP ERP system. Customers also use it when they access the Microsoft Marketplace. Because SAP ERP is mission-critical, engineers need to keep it available. “The cost of SAP ERP downtime really depends on the time of day and when it occurs on the calendar,” says Elke Bregler, Principle Service Architect at Microsoft IT. “Five minutes might not cost us very much if it happens in the middle of the night during a slow period. However, five minutes of downtime during a busy period could cost us millions of dollars.”
Previously, IT personnel made required system changes—including installing software patches or adding new business processes—only on weekends,in the middle of the night, because any modification elevated risk and required 30 minutes to several hours of downtime. Bregler says, “Whenever we made a system change, I would be nervous during the whole process because there was so much business risk involved. I felt like I aged five years each time.”
The system’s size and load also make it more difficult to keep it continuously available. For example, in peak months, SAP ERP processes more than 125 million transactions. In an average month, the system manages 4.6 million batch processes.Italso requests about 120 terabytes (TB) of data from its 5.4 TB database that resides ona storage area network (SAN) with402 petabytes of data.
To increase uptime and still scale the system to meet evolving requirements, IT personnel sought tools that couldboost efficiency and reduce downtime and risk.
Solution
Upgrading the SAP ERPsystem to run on the Windows Server 2012 Enterprise operating system and Microsoft SQL Server 2012 Enterprise Edition software helped engineers meet these goals.For example, IT personneleliminate a single point of failure by using Windows Server 2012 Failover Clustering to run two replicated instances of SAP ERP. The cluster for the SAP Central Services Instances (CI)—which is the single pointoffailure for SAP systems—runs on one HP ProLiant DL380 G5 server in an active/passive configuration. The same configuration exists in the disaster-recovery site. If a disaster(real or test) occurs, the failover of the SAP CI is expedited by using the same alias on bothclusters. By doing so, noone has to modify SAP ERP to recognize a different server nameas being primary when failover occurs. Instead, engineers only need to designate the IP address of the primary cluster in the Domain Name System in Windows Server.
At the database level, the IT team uses page-level compression in SQL Server 2012 to save disk space. Engineers also use SQL Server AlwaysOn availability groups to maintain three database replicas. Each one runs on a separate HP ProLiant DL580 G7 server. Each of the primary and secondary replicas, which are synchronous mirrors, connects to an EMC Symmetrix VMAX enterprise SAN. BothSANs are also mirrors, and each attachesto a different cluster to reduce risk. The third database replica, which is at a remote data center for disaster-recovery purposes, is updated via asynchronous replication.
To make a change to the SAP ERP system,engineers take the passive cluster or database replica offline and make the modification. After verifying that the offline cluster or system is working, an engineeradds it back into production as the passive cluster or replica, and it automatically synchronizes with the active cluster or replica. To update the active system, an engineer makes it the passive one and then follows the same procedure. To be safe, engineers also maintain several complete replicas of the SAP ERP environment for test and development.
Benefits
By usingfeatures in SQL Server 2012 and Windows Server 2012, engineers increasedthe uptime of themission-critical system; avoided unnecessary fees, risk, and stress; and boosted savings and efficiency.
Avoids Millions in Expenses with 99.95 Percent Uptime
Today, the SAP ERP system is available 99.95 percent of the time. As a result, Microsoft avoids paying millions of dollars in fees and lost productivity. “We can now install software security patches on our mission-critical SAP ERP system and accrue no more than one minute of downtime by using SQL Server 2012 AlwaysOn and Windows Failover Clustering” says Bregler. “We also replaced some of the CPUs in the SAP ERP database servers and migrated our SAN to new hardware, and,in both cases, we did it with just one minute of downtime.” Engineers are using the same high-availability design to achieve similar results in other environments.
Minimizes Business Risk
Although unplanned downtime is rare, it can happen. Engineers have dramatically reduced the impact of an unexpected error just by using features in Windows Server 2012 and SQL Server 2012. For example, running SAP ERP on failover clusters means that servers or even a cluster can fail and the application will continue to run. In addition, engineers can easily maintain three copies of the production database. In the off chance that the primary and secondary replicas go offline simultaneously, engineers can failover to the third copy: it will not yet reflect the error because it’s updated asynchronously. IT personnel also reduce risk and streamline the approval process for system modificationsby first making changes in the test environment. Engineers have also expedited disaster recovery processes by requiring ongoing drills using the test environment. “We have reduced our recovery point objective to just a few seconds and our disaster-recovery processes to just four hours,” says Bregler.
Cuts Space by Threefold
Microsoft has minimized its data-center requirements and costs with database compression. Because IT staff have used different levels of compression for years, it is difficult to measure how much space and money engineers have saved with it. However, Bregler says, “We recently created a custom table for a new business process that had 1.5 billion rows and was 1.5 terabytes in size. By using page-level compression in SQL Server 2012, the table now spans 6 billion rows and yet it is only 400 gigabytes in size.”
Improves Efficiency and Peace of Mind
By running SAP ERP on Windows Server 2012 and SQL Server 2012, business employees have become more productive because they can dependably access a core tool. IT personnel are also more efficient and less stressed because they can work on the system while the replica application, database, SAN, or server supportoperations. As a result, “We can now makechanges to our SAP ERP system during regular business hours,which makes us more efficient,” says Bregler. Not only are fewer errors made because people are rested, but also more IT personnel are available for questions, and spare parts are available faster. Bregler adds, “Windows Failover Clustering and SQL Server 2012 AlwaysOn have really given us much greater freedom and peace of mind. I’m also not aging as fast anymorebecause we have greatly reduced risk, and we have a lot more options for maintaining our mission-critical systems.”
This case study is for informational purposes only.MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.
Document published December 2013