An Overview of the NetIQ® XMP™ for Dell OpenManage™

Enterprise Systems Group (ESG)

Dell OpenManage™

Systems Management

Dell White Paper

By Joseph Santandrea

May 2002

Contents

Executive Summary

Key Points

Introduction

Features

Rule Groups

Configuration

Conclusion

Key Customer Benefits

Obtaining NetIQ XMP for Dell OpenManage

Appendix A

Dell OpenManage Server Agent 4.0 and Newer

Dell OpenManage Server Assistant 5.1.0.

CI/O Management Software Shared Rules :: NetIQ

Dell OpenManage Remote Assistant Card (DRAC) Shared :: NetIQ

Dell OpenManage Remote Assistant Server Shared :: NetIQ

Dell OpenManage Shared Rules :: NetIQ

PERC - PowerEdge RAID Controller Shared Rules :: NetIQBasic

Section 1

Executive Summary

Microsoft® Operations Manager 2000 (MOM 2000) is an enterprise console for managing hardware systems. NetIQ® XMP™ for Server Hardware is software that extends the management capabilities of Microsoft Operations Manager 2000 (MOM) and enables MOM environments to view, monitor and manage server hardware from various server vendors running Microsoft Windows®. The NetIQ XMP for Dell OpenManage™ is specifically for managing Dell™ PowerEdge™ servers and PowerVault™ storage.

XMP, for Extended Management Pack, is a library of pre-defined management rules. The XMP for Dell OpenManage provides the ability to manage Dell hardware through a MOM 2000 console. This means that the MOM 2000 console can now view, monitor, and manage Dell server hardware running Microsoft Windows NT® and Windows 2000. This extends the manageability of Dell servers to include performance and application management.

With essential health monitoring and analysis functions, this Extended Management Pack solution (XMP solution) enables users to help optimize performance and availability through automated event detection and correction.

Key Points

  • Embraces and Extends the MOM 2000 Platform
    Provides MOM 2000-based management of Dell PowerEdge servers. NetIQ XMP for Server Hardware extends native MOM 2000 capabilities to provide a solution that leverages NetIQ’s library of pre-defined rules for managing specific hardware servers.
  • Provides an Advanced, Application-Specific Solution for Managing Dell Server Hardware
    Proactively monitors the performance and availability of Dell server hardware with MOM 2000 through pre-defined knowledge and rules that are designed to optimize performance and availability of critical server hardware.

Section 2

Introduction

The NetIQ XMP for Dell OpenManage monitors Dell™ PowerEdge™ servers [1]through Dell Server Agents to help ensure they are functioning normally in a Windows 2000 or Windows NT 4.0 environment. The NetIQ XMP for Dell OpenManage helps diagnose system hardware problems, as well as issues with network transmissions. By detecting, alerting on, and automatically responding to critical events, this XMP is designed to indicate, correct, and prevent possible service outages or configuration problems.

With the embedded expertise in the NetIQ XMP for Dell OpenManage, it is possible to proactively manage Dell PowerEdge servers and identify issues before they become critical. This XMP can help reduce costly errors, prevent service outages, and reduce the operations costs of Dell PowerEdge servers.

Figure 1: Microsoft Operations Manager 2000, NetIQ and Dell OpenManage[2]

Features

The NetIQ XMP for Dell OpenManage monitors the status of various Dell PowerEdge server components. Microsoft Operations Manager technology helps to ensure delivery of alerts to administrators, which is not available using SNMP traps. This XMP highlights events that may indicate possible service outages or configuration problems, so you can quickly take corrective or preventive actions. For example, this XMP monitors the following components and conditions:

  • Adaptec SCSI subsystem
  • Array logical drives and physical drives
  • Individual fans
  • Dell memory devices
  • Network interface transmission errors and network interface failures
  • Power supplies and power redundancy status
  • Temperature and voltage probes

The NetIQ XMP for Dell OpenManage highlights any failures or configuration problems, helping to increase the security, availability, and performance of Dell PowerEdge servers.

Rule Groups

MOM 2000 structures the logic it uses to process incoming events into rule groups. Rule groups are further structured into parent/child relationships making it easy to navigate and group large numbers of rules. The NetIQ XMP for Dell OpenManage is organized with the following parent and child rule groups:

  • Dell OpenManage :: NetIQ XMP
  • Dell OpenManage Server Agent 4.0 and Newer :: NetIQ
  • Dell OpenManage Server Assistant 5.1.0 :: NetIQ
  • CI/O Management Software Shared Rules :: NetIQ
  • Dell OpenManage Remote Assistant Card (DRAC) Shared :: NetIQ
  • Dell OpenManage Remote Assistant Server Shared :: NetIQ
  • Dell OpenManage Shared Rules :: NetIQ
  • PERC - PowerEdge RAID Controller Shared Rules :: NetIQ

A complete listing of the rule group structures and all the rules located within each rule group can be found in the Appendix of this document.

Configuration

The NetIQ XMP for Dell OpenManage monitors Dell PowerEdge servers through the Dell Server Agents to ensure they are functioning normally. This XMP requires that Dell Server Agents be installed on all servers it is to monitor. The default notification group for processing rule responses within this XMP is Hardware Support. For information about adding operators to this notification group, see the product Help menu and/or documentation.

Section 4

Conclusion

The NetIQ XMP for Dell OpenManage helps users to maximize their investments in both the Microsoft Operations Manager and Dell PowerEdge systems. Events are detected within the MOM environment, and corrective action can be prescribed automatically. The goal of the product is to maximize the manageability and reliability of the Dell environment while helping to lower total cost of ownership.

Key Customer Benefits

The NetIQ XMP for Dell OpenManage provides the following benefits to users:

  • Single point management in heterogeneous environments:
  • Ease of use – users can work with the system with which they are most familiar
  • Integration of technological resources – resource management is available from a single location
  • Faster prevention of and reaction to issues – with only a single place to look and a single interface to configure, management can become easier
  • Supporting low total cost of ownership by:
  • Low training costs – users will not need specialized training
  • Low software costs – leverage current investment in enterprise management software
  • Improving uptime over distributed resources – by focusing management resources in a single location, prevention of and reaction to issues can become more efficient
  • Leverage existing investments:
  • No new enterprise management software to purchase (except the integration and additional partner licenses)
  • Rapid integration of assets – as resources are added to the environment, they are immediately manageable from within the established processes

Obtaining NetIQ XMP for Dell OpenManage

The NetIQ XMP for Dell OpenManage is available for purchase directly from Dell’s Software & Peripherals department. Please contact a Dell representative for details.


THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND.

Dell and PowerEdge are trademarks of Dell Computer Corporation. Microsoft, Windows NT, and Windows are registered trademarks of Microsoft Corporation. NetIQ is a registered trademark and XMP is a trademark of NetIQ Corporation.

Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others.

©Copyright 2002 Dell Computer Corporation. All rights reserved. Reproduction in any manner whatsoever without the express written permission of Dell Computer Corporation is strictly forbidden. For more information, contact Dell. Dell cannot be responsible for errors in typography or photography.

Information in this document is subject to change without notice.

Appendix A

The following is a complete listing of the rule groups packaged with the NetIQ XMP for Dell OpenManage and the event, alert and performance rules they contain.

Dell OpenManage Server Agent 4.0 and Newer

Event Rules

  • Voltage warning detected
  • Temperature value unknown
  • Temperature returned to a normal value
  • Redundancy degraded
  • Voltage value unknown
  • AC power cord sensor failed
  • Memory device pre-failure failure detected
  • Redundancy lost
  • Fan value unknown
  • Voltage non-recoverable value detected
  • Redundancy not applicable
  • Chassis intrusion value unknown
  • Power supply value unknown
  • Dell OpenManage Server Agent startup complete
  • Redundancy regained
  • Power supply failure detected
  • Fan warning detected
  • AC power restored
  • Chassis intrusion returned to normal
  • Fan sensor failed
  • SMBIOS data absent
  • Power cord not being monitored
  • Power supply returned to normal
  • Memory device pre-failure warning detected
  • Fan enclosure removed
  • Fan non-recoverable value detected
  • AC power lost
  • Memory device pre-failure returned to a normal value
  • Current failure detected
  • AC power lost
  • Redundancy value unknown
  • Thermal shutdown protection initiated
  • Chassis intrusion non-recoverable value detected
  • Temperature sensor failed
  • Power supply non-recoverable value detected
  • Chassis intrusion sensor failed
  • Power supply sensor failed
  • Fan failure detected
  • Redundancy offline
  • Voltage sensor failed
  • Memory device pre-failure sensor disabled
  • System BIOS update scheduled
  • Current sensor failed
  • Fan enclosure removed for an extended amount of time
  • Current returned to a normal value
  • Current value unknown
  • Chassis intrusion in progress
  • Power supply warning detected
  • Chassis intrusion detected
  • Fan enclosure sensor failed
  • Fan enclosure inserted
  • AC power lost
  • Fan enclosure non-recoverable value detected
  • Voltage failure detected
  • Dell OpenManage Server Agent starting
  • Scheduled system BIOS update canceled
  • Temperature non-recoverable value detected
  • Fan returned to a normal value
  • Fan enclosure value unknown
  • Temperature failure detected
  • Voltage returned to a normal value
  • Memory device pre-failure value unknown
  • Current warning detected
  • Temperature warning detected
  • Current non-recoverable value detected
  • Redundancy sensor failed
  • Memory device pre-failure non-recoverable value detected

Alert Rules

  • Severity of "Error" or higher - Hardware Support

Performance Processing Rules

  • Dell OpenManage Server Agent - Private Bytes
  • Dell OpenManage Server Agent - Handle Count

Dell OpenManage Server Assistant 5.1.0.

Event Rules

  • None

Alert Rules

  • Severity of "Error" or higher - Hardware Support

Performance Processing Rules

  • Dell OpenManage Server Agents - Private Bytes
  • Dell OpenManage Server Agents - Handle Count

CI/O Management Software Shared Rules :: NetIQ

Event Rules

  • CIODell - Event ID 1
  • CIOArrayManagement - Event ID 1

Alert Rules

  • Severity of "Error" or higher - Hardware Support

Performance Processing Rules

  • Dell OpenManage CI/O Agents - Handle Count
  • Dell OpenManage CI/O Agents - Private Bytes
  • Dell OpenManage CI/O Agents - Handle Count
  • Dell OpenManage CI/O Agents - Handle Count
  • Dell OpenManage CI/O Agents - Handle Count
  • Dell OpenManage CI/O Agents - Private Bytes
  • Dell OpenManage CI/O Agents - Private Bytes
  • Dell OpenManage CI/O Agents - Private Bytes

Dell OpenManage Remote Assistant Card (DRAC) Shared :: NetIQ

Event Rules

  • DRAC battery absent
  • DRAC battery charge failure detected
  • DRAC battery charge low
  • DRAC battery fast charge count approaching limit
  • DRAC battery fast charge count exceeded limit
  • DRAC detected critical status reported by ESM
  • DRAC detected warning status reported by ESM
  • DRAC lost communication with ESM
  • DRAC temperature sensor detected a failure
  • DRAC temperature sensor returned to normal
  • DRAC temperature sensor warning detected
  • DRAC wall adapter voltage sensor detected a failure
  • DRAC wall adapter voltage sensor returned to normal
  • DRAC wall adapter voltage sensor warning detected

Alert Rules

  • Severity of "Error" or higher - Hardware Support

Performance Processing Rules

  • Dell OpenManage DRAC Agents - Handle Count
  • Dell OpenManage DRAC Agents - Private Bytes

Dell OpenManage Remote Assistant Server Shared :: NetIQ

Event Rules

  • Main System Chassis Bottom Power Supply: 12v
  • Voltage Warning detected
  • Main System Chassis 1st Power Supply Fan
  • Dell Remote Assistant Server Event ID 269
  • Main System Chassis Power Supply 1
  • Dell Remote Assistant Server Event ID 272
  • Fan Warning detected
  • Dell Remote Assistant Server Event ID 273
  • Dell Remote Assistant Server Event ID 300
  • Amperage Failure detected
  • Storage System 4 Backplane Location 1
  • Dell Remote Assistant Server Event ID 301
  • Dell Remote Assistant Server Event ID 302
  • Main System Chassis Backplane Location 1
  • Amperage Warning detected
  • Dell Remote Assistant Server Event ID 303
  • Storage System 1 Backplane Location 1
  • Dell Remote Assistant Server Event ID 304
  • Main System Chassis Backplane: 5v
  • Power Supply redundancy failure detected
  • Dell Remote Assistant Server Event ID 305
  • Main System Chassis Backplane: 12v
  • Server: Alert
  • Dell Remote Assistant Server Event ID 306
  • Main System Chassis Backplane: SCSI A Termination
  • Power Supply degraded redundancy failure detected
  • Dell Remote Assistant Server Event ID 307
  • Main System Chassis Backplane: Battery
  • Dell Remote Assistant Server Event ID 308
  • Main System Chassis Control Panel
  • Power Supply failure detected
  • Main System Chassis Backplane Top
  • Dell Remote Assistant Server Event ID 309
  • Storage System 3 Backplane Location 1
  • Main System Chassis Top Power Supply
  • Main System Chassis Backplane Bottom
  • Dell Remote Assistant Server Event ID 310
  • Main System Chassis Fan 1
  • Main System Chassis Bottom Power Supply
  • Chassis Intrusion detected
  • Dell Remote Assistant Server Event ID 311
  • Dell Remote Assistant Server Event ID 312
  • ECC Fault
  • Dell Remote Assistant Server Event ID 313
  • Watchdog Reset
  • Dell Remote Assistant Server Event ID 314
  • Lost connection to storage system
  • Dell Remote Assistant Server Event ID 315
  • Dell Remote Assistant Server Event ID 320
  • Re-established connection to Harrier
  • Dell Remote Assistant Server Event ID 321
  • System Up
  • Storage System 2 Backplane Location 1
  • Dell Remote Assistant Server Event ID 323
  • Main System Chassis Processor 1 heatsink
  • Dell Remote Assistant Server Event ID 324
  • Dell Remote Assistant Server Event ID 325
  • Main System Chassis Ambient location 1
  • Dell Remote Assistant Server Event ID 326
  • Dell Remote Assistant Server Event ID 327
  • Dell Remote Assistant Server Event ID 330
  • Unknown Alert
  • Temperature Failure detected
  • Main System Chassis Motherboard: 12v
  • Fan Failure detected
  • Storage System 5 Backplane Location 1
  • Voltage Failure detected
  • Main System Chassis Chassis Bottom
  • Main System Chassis Chassis Middle
  • Dell Remote Assistant Server Event ID 260
  • Temperature Warning detected
  • Main System Chassis Chassis Top
  • Dell Remote Assistant Server Event ID 261
  • Main System Chassis System Power Supply
  • Dell Remote Assistant Server Event ID 268

Alert Rules

  • Severity of "Error" or higher - Hardware Support

Performance Processing Rules

  • None

Dell OpenManage Shared Rules :: NetIQ

Event Rules

  • Dell Baseboard Agent-Server: Event ID 1
  • Dell Baseboard Agent-Server: Event ID 331
  • Dell Baseboard Agent-Server: Event ID 220
  • Dell Baseboard Agent-Server: Event ID 332
  • Dell Baseboard Agent-Server: Event ID 221
  • Dell Baseboard Agent-Server: Event ID 340
  • Dell Baseboard Agent-Server: Event ID 242
  • Dell Baseboard Agent-Server: Event ID 243
  • Dell Baseboard Agent-Server: Event ID 246
  • Temperature Failure detected
  • Fan Failure detected
  • Voltage Failure detected
  • Temperature Warning detected
  • Voltage Warning detected
  • Fan Warning detected
  • Amperage Failure detected
  • Amperage Warning detected
  • Power Supply redundancy failure detected
  • Power Supply degraded redundancy failure detected
  • Power Supply failure detected
  • Chassis Intrusion detected
  • Dell Baseboard Agent-Server: Event ID 322
  • ECC Fault
  • Lost connection to storage system
  • Dell Server Console - Event ID 1
  • Temperature Sensor detected a warning
  • Voltage Sensor detected a warning
  • Current Sensor detected a warning
  • VxSvc_EnclPro Service detected a warning
  • VxSvc_EnclPro Service detected a warning
  • VxSvc_EnclPro Service turned off
  • Temperature Sensor detected a failure
  • Voltage Sensor detected a failure
  • Current Sensor detected a failure
  • VxSvc_EnclPro Service detected a failure
  • VxSvc_EnclPro Service detected a failure

Alert Rules

  • Severity of "Error" or higher - Hardware Support

Performance Processing Rules

  • Network Interface Transmission Errors Inbound
  • Network Interface Transmission Errors Outbound

PERC - PowerEdge RAID Controller Shared Rules :: NetIQBasic

Event Rules

  • PERC Battery Backup Alert
  • PERC Message : Event ID 124
  • PERC Message : Event ID 126
  • PERC Message : Event ID 128
  • AFAPORT - Event ID 20
  • AFAPORT - Event ID 1
  • SetAcl operation from the FileArray failed with a status of
  • Read operation would have caused kernel stack overflow.
  • Fatal internal error has occurred in the FileArray filesystem. Please reboot your system to recover!
  • FileArray communication packets could not be allocated. Please contact Adaptec support.
  • The adapter failed to initialize, Status = NN
  • The driver failed to load the FastFsa driver.
  • Could not connect interrupt for adapter.
  • The FSA Filesystem driver failed while attempting to initialize this adapter.
  • The adapter has failed self-test diagnostics.
  • The adapter firmware has encountered a fatal error.
  • Read operation from the FileArray failed with a status of
  • Write operation to the FileArray failed with a status of
  • Cache lookup in the File Array Metadata cache failed unexpectedly.
  • An error has occurred which has caused the NT side and the adapter side to get out of sync.