Deployment Planning, Architecture, and Guidance on System Center Service Manager
Published: October 2010
Microsoft IT implemented System Center Service Manager to improve their IT service management, which provides built-in processes for incident and problem resolution, change control, and asset lifecycle management. Service Manager automatically connects knowledge and information from System Center Operations Manager, System Center Configuration Manager, and Active Directory.
Deployment Planning, Architecture, and Guidance on System Center Service ManagerPage 1
Introduction
The Management Systems Division (MSD) team in Microsoft IT is responsible for planning and deploying Systems Center Service Manager across the management service support groups, which include Microsoft.com, Windows Update, MSDN, and TechNet. The MSD team used System Center Service Manager to implement an incident-management solution to manage all of the incidents from the more than 5000 servers that run these services. These servers generate approximately 3500 incidents per weekand are supported by over 100 different support engineers distributed throughout the different support groups and across the globe.
Creating the Incident-Management Solution
The MSD team identified the following goals for the new incident-management solution:
- Provide a single solution for multiple distributed support teams (Microsoft.com, Windows Update, MSDN, and TechNet).
- Replace in-house development tools that had been built over many years with a single solution. These tools had become expensive to maintain and were not particularly seamless.
- Reduce the amount of time it takes to process an incident. The goal was to be able to resolve an incident or escalate it to the appropriate support team within 15 minutes.
- Provide the support teams with a distributed reporting capability. The MSD team knew that they would not be able to provide one type of reporting service to meet all of the various support group requirements. The team therefore wanted to distribute the reporting capability so that the support groups could produce whatever types of customized reports that they needed to meet their business needs.
- Standardize and centralize all of the troubleshooting guides. Over the years, the team had created over 1000 different troubleshooting guides that the support engineers used. The team wanted to bring all of the troubleshooting guides into a central place and automatically get them to the support engineers to quickly resolve incidents. The team expected that this would help reduce the time it takes to process incidents.
Team
A three-person team designed and built the solution over a four-month period. The team included:
- A program manager who did all of the database extensions and form customizations
- A developer who created a custom C# program for the workflow
- An administrator who administered the testing system while the team worked out the solution
The Solution
When an alert comes in to Systems Center Operations Manager, the alert is sent through a connector to System Center Service Manager. A Tier 1 team evaluates the incident and has 15 minutes to either resolve the incident or get it assigned to the appropriate senior support engineer or application group.
The MSD team automated the process to automatically assign an asset map when an incident comes in. An asset map describes the service. For example, Microsoft.com is made up of many applications and each application has multiple parts. The asset map describes what applications make up the service with computers as leaf nodes. When an incident comes in, the incident is automatically associated with a computer and is then assigned to an engineer responsible for that computer. An engineer can use the map to determine exactly where a particular incident is affecting an application in the service.
The automated process also assigns the appropriate Knowledge Base guidance for a particular computer or asset map. For example, if there is a SQL Server issue, the Knowledge Base document tells the engineer what diagnostic steps to run. Tier 1 and Tier 2 engineers can then follow pre-written instructions for a given problem.
Based on the expected incident volume, the team chose to implement System Center Service Manager in a four-server configuration.
The System Center Service Manager Console and Authoring Tool
The System Center Service Manager console provides a comprehensive look at the whole system. In the console, an engineer can look at computers, views of incidents, work items, administrative tasks, and the data warehouse. The engineer can also run reports from the console. The MSD team took advantage of the powerful views capability in System Center Service Manager to create different views for each of the different support groups. This makes it possible for each of the support groups to look at only the incidents appropriate to their area.
The team used the Authoring Tool that comes with System Center Service Manager to extend the database and forms. For example, the team added Computer, Event ID, and Event Source fields to the database and added columns for those fields to the views. Engineers can sort on the Computer column to quickly see all of the incidents associated with a particular computer. They can use the Event IT column to quickly look at one or more Event IDs to see how many computers are affected. By adding columns to the views, engineers can see what is happening across multiple incidents and across multiple computers and properties.
The team also used the Authoring Tool to create two workflows. The team created a C# program to automate the workflow that processes incidents as they enter the system—to fill out the asset map. The team created another workflow, the SendMail workflow,to enablean engineer to send an incident to another engineer or team.
The support teams use the built-in reporting capability in System Center Service Manager to create daily and weekly reports. For example, a support engineer can produce a report that displays all of the incidents that the engineer has worked on in the last two days or a report that displays all of the incidents from different parts of the asset map or from different computers. Engineers use these reports as planning and troubleshooting guides. The MSD team uses SQL Server Reporting Services to distribute the capability for writing custom reports. The team produced some sample SQL Server Reporting Services reports and gave these to the different support groups. The support groupscan customizethese reports and run them whenever they want.
Action Log and History
The MSD team makes extensive use of the Action Log and History. Whenever an engineer takes any action on an incident, the engineer adds comments to the Action Log.And every time the incident is touched, either by the automated program or by a support engineer, it is recorded in the History. Absolutely everything is recorded.This gives the team a complete and accurate record if they need to know exactly what happened—who worked on an incidentand what actions they took.
Conclusion
The MSD team in Microsoft IT used System Center Service Manager to quickly create a highly scalable incident-management solution for Microsoft.com, Windows Update, MSDN, and TechNet. The team uses the new solution to monitor approximately 5000 servers that generate more than 3500 incidents per week.
Systems Center Service Manager enabled the MSD team to create an effective solution for multiple distributed teams. The console provides a comprehensive view of the service, flexible views to show specific information to different support teams, and daily and weekly reports that engineers can use for planning and troubleshooting purposes. The team used the Authoring Tool that comes with System Center Service Manager to extend the database, customize forms, and create workflows to automate their processes. And they used SQL Server Reporting Services to distribute the report-creation ability. The MSD team has found System Center Service Manager to be a very easy-to-use and extensible platform.
For More Information
For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Order Centre at (800) 933-4750. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information via the World Wide Web, go to:
© 2010 Microsoft Corporation. All rights reserved.
This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, SQL Server, Windows, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
Deployment Planning, Architecture, and Guidance on System Center Service ManagerPage 1