Reading: Monitor and optimise network components and server

Monitor and optimise network components and server

Monitor and optimise network components and server

Requirements for monitoring network components and servers

Why monitor?

Service level agreements (SLAs)

Performance and optimisation

Benchmarks

Change management

The monitoring process

What to monitor?

Network monitoring

Server monitoring

Benchmarking and documentation

System optimisation

Analysis and interpretation of monitoring results

Network optimisation

Server optimisation

Change management

Monitoring and optimisation tools and utilities

Native tools and utilities

Third party tools and utilities

Summary

Requirements for monitoring network components and servers

Monitoring of network components and servers is an important partof any Network/Systems Administrator duties. Generally, monitoring is either an automated or manual task that forms part of a routine maintenance schedule for servers and the network. The network can consist of many different components, and you’ll need to use appropriate tools and methods in order to undertake useful monitoring.

Why monitor?

Once a network or server has been installed, how do you know it is working as it should? Just like a car or any appliance, itmay need maintenance or parts replaced to keep it in top working order. Network and server monitoring allows the Network Administrator to see how hardware and software are performing. We can look for certain signs or warningsthat the system is not working efficiently and take action to fix things to prevent system degradation or failure.

We monitor network components and servers to:

  • gather data and information on the operational status and how servers and network components are performing
  • ensure the integrity and availability of the network and services so that there is no adverse impact on business operations
  • detect any security breaches or unauthorised network activity
  • gather historical data to enable planning for future capacity,required services and performance enhancements of the network and servers
  • measure if obligations to a client with respect to agreed levels of service (SLA) or maintenance of the client network and servers are being met.

Service level agreements (SLAs)

A service level agreement (SLA) is a formal and binding agreement between organisations and/or individuals. Within the IT environment, an SLA is drafted between an IT service provider/vendor and its client/customer. The SLA clearly states what levels of IT service will be provided and maintained by the IT service provider. These levels of service may coverpointssuch as software application response times, the amount of file storage space available to users, acceptable internet access time and download times. These levels of service are agreed to by all parties involved in the SLA.

The SLA should clearly define the following:

  • Services covered by the SLA — this lists what will be covered; software, hardware and application or system. This may include an audit list of agreed equipment and software. For example, ‘Six Windows 2003 standard servers, two Linux Web Apache servers, switched network environment comprising of four HP 2425 managed switches and associated cabling, system, network and web administration.’
  • Responsibilities of the IT service provider — outlines what the service provider is responsible for and will do. For example, the service provider may be responsible for providing a helpdesk contactable via telephone, or ensure that there is at least one service technician on the client’s site at all times.
  • Responsibilities of the client —lists what the client or customer must do with respect to the SLA. For example, ‘The client must provide a complete list of IT equipment to the service provider and advise 28 days in advance of any proposed network changes’, or ‘The client must check that correct user operation of software or equipment has occurred before logging a call for assistance.’
  • Clear definition of the service level—lists in detail what is covered by the SLA. The list can be long and detailed, for example:

–Email services: ‘Email will be delivered anywhere on the corporate network within one hour of dispatch.’

–File storage: ‘Up to 1 GB of disk storage will be available for every user.’

–System administration: ‘The service provider will create and maintain network user accounts for the organisation.’

  • Response times —specifies how long it will take the service provider to respond and fix any reported faults or requests for assistance. There are usually various times listed and also the coverage period (eg 24 hours or 9am-5pm Monday to Friday). For example, ‘When a telephone call to the helpdesk is made it will be answered in 60 seconds, and a technician will respond to the call within 2 hours and provide a fix or resolution within 2 working days.’
  • Criteria and measurement – this is how the service levels are measured. For example, ‘Application response times or network bandwidth availability may be measured using an agreed third party utility. Response times for requests or faults will be logged in a database accessible by both parties.’
  • Procedures — describes how things will be done. This may cover how requests for help will be made, for example if via telephone, email or the web.
  • Penalties or consequences — outlines what will happen if the service provider or client fails to meet its obligations as outlined in the SLA. This could be that the failing party pays a financial penalty fee.

The SLA is designed to protect both service providers and clients with respect to IT support. SLA clearly sets out expectations and obligations. Monitoring network components and servers ties into SLA processes.

Performance and optimisation

Network and server installations vary in requirements and configurations. Once software and equipment have been installed it may require some tuning or optimisation to have the systems working and performing at optimum levels. You will need to undertake some sort of monitoring to determine current performance and then determine if this can be improved or optimised. There are various tools and utilities you can use for this process which will be discussed further in this reading.

Benchmarks

Network and server monitoring is required to establish benchmarks.

A benchmark is the result of an objective test used to measure the performance of a computer system relative to some known standards. Benchmarking your network and servers gives you a starting point with respect to performance and optimisation. This will give you information like bandwidth and data throughput that can be compared to manufacturers’ performance specifications.You can then attempt to improve or optimise the performance of the components using these test results as indicators for improvements.Benchmarking programs are a great way to see the relative performance increase that your tweaks and changes have achieved.

Once you have optimised your network and servers the results from your final benchmark tests will become your baseline — that is, what you will compare future monitoring and test results to. The baseline is the level at which your system should perform and any future test results below this level indicate system deterioration.

Change management

Many organisations have strict policies and procedures regarding how changes in the existing IT environment are managed. This reduces the risk of disruption to business operations as a consequence of a failed change in the IT environment. The monitoring of network components and servers is part of the change management process. The impact of a change in the IT environment is determined by monitoring before and after the change activity.

Changes that may require monitoring for any adverse effects on an existing IT environment include both hardware and software additions, removal or upgrades and any configuration changes to existing hardware and software.

The monitoring process

Understanding the purpose of monitoring is the first step in developing a monitoring process. The next step is determining what to monitor and how.

What to monitor?

What are we going to monitor? Is it a network or a system?

Usually we can interchange the terms ‘network’ and ‘system’. Users and IT professionals will often use the words interchangeably but sometimes there can be a distinction. The network is normally the infrastructure that provides users access to data, information and services. Organisationsmay install and develop applications for the business on the network. These applications may also be known as systems.A network operating system, for example Windows 2003Standard Server, may also be considered a system.

Most applications today will run on a network. When analysing the performance of an application it is necessary to evaluate the way it is configured and used on the network and how network performance will impact the application. This includes the performance of the servers that run or interact with the application.

So that brings us back to our original question: what will we monitor? We will want to monitor everything that can impact network and system performance. So what can impact it?

Reflection

List all the components that make up a typical network.

Feedback

There are many components in a typical network. These include:

  • servers
  • workstations
  • printers
  • users
  • cable
  • hubs and switches
  • routers and bridges
  • server operating systems
  • desktop operating systems
  • server applications such as mail servers
  • database servers hardware and software
  • applications
  • disks
  • utilities.

All of these can have an impact on performance.

Network monitoring

In the past, network monitoring was predominately used to watch the communication paths used to transmit and receive data to make sure all was working as expected. Nowadays, network monitoring is the process used to measure and watch the performance and activity on a computer network beyond just communications paths.

Network monitoring generally means collecting operating data from network devices. This data can be stored in log files or databases and made available for analysis at some later date. The collected historical data can be reviewed to determine if the network device is operating correctly or to determine capacity needs or trends.

The graph below is an example of data collected from a network device over a 24-hour period.

Figure 1: Network monitoring with data displayed in graphical form

The information relates to data packets in and out of the device. From this information we can see that the busy time for data traffic is between 09:00 and 15:00 and that the device is hardly used outside of these times.

If the purpose of monitoring is to ensure availability or connectivity between network devices, real time monitoring with no data collection may be employed. This means that no data is collected and only a check testor device poll is performed to see if the network device is up and running. Should a network device stop working an alert or notification will be generated by the monitoring software.

What is monitored on network devices depends upon the type of device. Generally the following should be monitored:

1availability — the operational state, up and working

2network traffic statistics — utilisation, throughput, errors

3connection details — user connections and activity

4resource usage — utilisation of device resources like memory, etc

5activity and alert logs — local logs on the device.

If monitoring is undertaken for network security reasons then special attention should be given to connection details and user activity concerning the network device. The types of network traffic involving the network device may need closer examination. For example, repeated and constant TCP/IP connection requests from an address outside the client network may be a security hack attempt.

The actual task of monitoring network devices can be a manual or automated process.

Manual monitoring processes involve accessing the network device, inspecting or collecting the device local log files, manually running any native or third party monitoring utilities to collect required data or view the operational status of the device. Manual monitoring processes are repetitive tasks that require scheduled human activity and intervention.

Automated monitoring processes usually involve the use of specific software utilities that collect data from devices without human intervention using standard protocols like SNMP or syslog. The software can be configured to issue notifications or alerts to appropriate people triggered by defined event conditions.

Server monitoring

Servers are considered network devices; however, there are additional monitoring considerations. Servers are usually configured to perform a specific role or provide a specific service (file server, application server, etc). Monitoring may need to look at the status of the server with respect to its role. For example, monitoring hard disk performance for a file server may be of value.

For any server, in addition to the network device monitoring options, you need to consider monitoring the following server resources:

1processor —the percentage of CPU time is taken by various processes in the server

2memory — how the available memory is divided up and which processes it is used for

3disk— excessive disk activity, read, write and paging performance

4network — volume of network traffic in and out of the server.

The importance of these resources and the impact on server performance is discussed further under‘System optimisation’in this reading.

As with network devices, the process of monitoring a server can employ manual methods or automated methods. Monitoring software and utilities can be either native to the server operating system (supplied as part of the OS) or third party utilities.

As with network devices, what you monitor and how you monitor it will be determined by your organisational requirements and policies and your need for monitoring.

Benchmarking and documentation

As previously discussed a benchmark is an objective test that can be used to measure the performance of a computer system. When an installation is complete the baseline benchmark should be documented.

In relation to change management, benchmarking programs are a great way to see the relative performance increase that your tweaks and changes have achieved. Running a benchmark before and after a change will give you a good idea of where you stand.

Along with your benchmark data the monitoring process should be documented and become part of the Network and System Administrators’ procedures manual. This will ensure that all personnel involved in network and system maintenance or administration know what is being monitored and how it should be conducted.

Monitoring with data collection can produce large quantities of information. Your monitoring process should address what to do with this data, where it is stored, and how and when to analyse it. Monitoring should be meaningful and useful. It is possible to collect too much data that no one will view, thus monitoring becomes a pointless task.

Your monitoring documentation should address:

1purpose of monitoring— why monitoring will be conducted (eg security, performance, network status) and what the outcomes of monitoring will be (eg optimisation, review SLA, ongoing capacity planning)

2roles and responsibilities — who will perform what tasks and what role management plays, as well as when and how to review the documentation

3what will be monitored — specifically state the monitoring requirements (eg disk space usage and I/O performance in all server, network bandwidth utilisation)

4how monitoring is conducted —detail what utilities will be used and how to use them, including schedules and routines

5information management — where collected data will be stored and in what format, how long to keep data and how to archive if required

6analysis process —what will happen with the collected data; how it will be analysed and for what purpose (eg planningcapacity, looking for security breaches of internal network)

7change management — if the monitoring and data analysis suggest the need for network changes to achieve required outcomes (eg improve network performance, improve network security) and how the changes will be implemented

8baseline data — list all relevant historical monitoring information like benchmarks and baselines.

System optimisation

Once we complete the installation of a server and its operating system or a network device, the system should be working. System optimisation means improving how well the system operates. This may involve changes to software or hardware configurations. System tests and monitoring will provide information required to optimise a system.Be aware that optimisation may come at a cost because some performance improvements may be only achievable by purchasing additional hardware or components.

Analysis and interpretation of monitoring results

Having run monitoring utilities and software you can start analysing the collected data. But what are you looking for? Is the system performing or not? Is the network really slow?

How will you know if the system is performing as it should? We live in an instant world. We want things to happen in seconds, not minutes. It was not that long ago that a user could wait for days to get a report, and that report could take hours of computer time to run and print. Now we expect the same report to flash up on a screen.

To analyse system performance you need to have either performance specifications or some form of benchmark.

Ideally, documentation that includes the specifications for the performance of hardware, applications and the network should be available from vendors and manufacturers. The user requirement statement for installations also states performance criteria. These may be expressed in absolute terms such as:

  • ‘The first screen of application X will appear within five seconds of clicking.’
  • ‘A ten-page report will take no more than ten minutes to create and five minutes to be printed.’
  • ‘Up to 1 GB of disk storage will be available for every user.’
  • ‘Email will be delivered anywhere on the corporate network within one hour of dispatch.’

Alternatively, there may be relative benchmarks such as:

‘The application will work on the network no slower than 70% of the time that it would operate on a stand-alone machine using a P4 processor with 512 MB of RAM.’