OPC DA Interface Failover Manual

OPC DA Interface Failover Manual

for OPC DA Interface Version 2.3.11.0

OSIsoft, Inc.
777 Davis St., Suite 250
San Leandro, CA94577USA
(01) 510-297-5800 (main phone)
(01) 510-357-8136 (fax)
(01) 510-297-5828 (support phone)

Houston, TX
Johnson City, TN
Longview, TX
Mayfield Heights, OH
Philadelphia, PA
Phoenix, AZ
Savannah, GA
Yardley, PA
/ OSIsoft Australia
Perth, Australia
Auckland, New Zealand
OSI Software GmbH
Altenstadt,Germany
OSIsoft Asia Pte Ltd.
Singapore
OSIsoft Canada ULC
Montreal, Canada
Calgary, Canada
OSIsoft, Inc. Representative Office
Shanghai, People’s Republic of China
OSIsoft Japan KK
Tokyo, Japan
OSIsoft Mexico S. De R.L. De C.V.
Mexico City, Mexico
OSIsoft do Brasil Sistemas Ltda.
Sao Paulo, Brazil
Sales Outlets/Distributors
Middle East/North Africa
Republic of South Africa
Russia/Central Asia / South America/Caribbean
Southeast Asia
South KoreaTaiwan

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, mechanical, photocopying, recording, or otherwise, without the prior written permission of OSIsoft, Inc.
OSIsoft, the OSIsoft logo and logotype, PI Analytics, PI ProcessBook, PI DataLink, ProcessPoint, Sigmafine, Analysis Framework, PI Datalink, IT Monitor, MCN Health Monitor, PI System, PI ActiveView, PI ACE, PI AlarmView, PI BatchView, PI ManualLogger, PI ProfileView, ProTRAQ, RLINK, RtAnalytics, RtBaseline, RtPortal, RtPM, RtReports and RtWebParts are all trademarks of OSIsoft, Inc. All other trademarks or trade names used herein are the property of their respective owners.

RESTRICTED RIGHTS LEGEND

Use, duplication, or disclosure by the Government is subject to restrictions as set forth in subparagraph I(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013

Table of Contents

Introduction

Reference Manuals

Diagram of Hardware Connection

Principles of Operation

Server-Level Failover

Server-Level Failover Configurations

Watchdog Tags

Logging the Current Server

Logfile Messages for Server-Level Failover

Interface-Level Failover Using Microsoft Clustering

Choosing a Cluster Mode

Failover Mode

How It Works

Configuring APIOnline

Checklist for Cluster Configuration

Configuring the Interface for Cluster Failover

Buffering Data on Cluster Nodes

Group and Resource Creation Using Cluster Administrator

Cluster Group Configuration

Installation of the Resources

Logfile Messages for Interface-Level Failover

Using Combination of Server- and Interface-Level Failover

Revision History

Introduction

This is a supplemental document for configuring the OPC DA Interface to the PI System. It covers configuring and managing the interface for redundancy of the OPC server, the OPC DA interface, or both. It is intended to be used in conjunction with the OPC DA Interface Manual.

For server-level failover, no special hardware or software is required. Interface-level failover using Microsoft clustering requires a Microsoft Cluster. Interface-level failover using UniInt does not require any special hardware or software. Configuration of the interface for UniInt failover is not covered in this manual. It is documented in the OPC DA Interface to the PI System Manual.

In this manual each type of redundancy will be addressed separately and briefly looked at being used together. Note that all of the command-line parameters discussed in this document can be configured using the Interface Configuration Utility (PI ICU). The ICU simplifies configuration and maintenance, and is strongly recommended. PI ICU can only be used for interfaces which are collecting data for PI Systems, version 3.3 and up.

Reference Manuals

OSIsoft

OPC DA Interface to the PI System Manual
UniInt Interface Users Manual

Diagram of Hardware Connection

Server-level failover configuration

Interface-level failover using Microsoft clustering configuration

Principles of Operation

The OPC DA interface is designed to provide redundancy for both the OPC server and the interface itself. For server-level failover, the interface can be configured to change to another OPC Server when the current server no longer serves data, or when an OPC item changes value or quality, or when the OPC Server changes state. This allows the data collection process to be controlled at the lowest possible level, and ensures that data collection will continue even if the connection to the PI System fails.

For interface-level failover, two copies of the interface are running at the same timewith only one sending data to the PI System. There are two types of interface-level failover supported by this interface. One uses Microsoft clustering and the other uses UniInt failover mechanism. This manual covers configuration using Microsoft clustering.

When using Microsoft clustering, the cluster controls which copy of the interface is actuallycollecting data at any given time. Since the OPC Server may not be cluster-aware, there are several modes which can be configured, to ensure the least possible data loss in the event of a failover, without putting undue stress on the underlying data collection system. This type of failover is not highly recommended unless the user has other reasons for doing that.

The server-level failover can be combined with any type of interface-level failover to achieve redundancy at both levels of data collection, so that even the loss of both an OPC Server and one OPC DAInterface will not interrupt data collection. However, both types of interface-level failover can not be used at the same time.

Server-Level Failover

The basic idea behind server-level failover is that the interface should always be connected to a server that can provide data. The problem comes in how the interface knows when it should try to connect to another server. There are several ways in which an OPC Server may indicate that it is not able to serve data.

It does not accept connections. This is the simplest one to deal with. There is nothing to configure except the name of the alternate server.
It changes state when it is not active, usually to OPC_STATUS_SUSPENDED. The interface can be configured to fail over to another server when the current server leaves the RUNNING state.
It sends bad quality for all tags. To use anOPC item must be defined which will always have a GOOD quality except when the server isnot serving data.
It has one or more OPC items which have a specific value when the server can serve data and another specific value when it cannot. With this version, it may be necessary to use the Transformation and Scaling ability of the interface, but as long as there is some way to translate the not-active value to a zero and the active value to >0, these OPC items can be used for Watchdog tags. It is possible to specify multiple tags as watchdogs, and specify a minimum value that defines an active server, so that the loss of some server functionality (for instance, one or two OPC Servers are not working) will not cause failover, but a falling below a specified minimum will trigger failover to another server.
It has one or more OPC items which have GOOD quality when the server can serve data and BAD quality when it cannot. One watchdog tag or multiple watchdog tags can be specified, in addition to specifying the maximum number of watchdog tags which can have BAD quality on the active server without triggering failover.
It has an OPC Item which has a specific, known value when a given server can servedata and a different known value when that server cannot servedata. In these cases, there is always one Item for each server, and two Watchdog tags are used to control which server is active. This configuration is referred to as “server-specific watchdogs”, because the watchdog Item refers to a given server’s current status, regardless of which server the Item value was read from.

Note:Special handling is also included for Honeywell Plantscape servers, as several customers have had difficulty in getting server-level failover to work properly with these servers. The /HWPS flag tells the interface to failover when it receives an error code of 0xE00483FD or 0xE00483FC on any tag.

The following table lists the command-line parameters used to control server-level failover. The next sections explain how to configure the interface for each of the cases above, using these parameters, and how to use the timing parameters to get the least data loss with the most reliability.

Parameter / Description
/BACKUP / The name and location of the backup OPC server
/CS / The string tag into which should be written the name of the currently active server.
/FT / The number of seconds to try to connect, before switching to the backup server.
/NI / The number of interfaces running on this node.
/SW / The number of seconds to wait for RUNNING state, before switching to the backup server.
/WD / Watchdog tag specifications.
/WQ / Fail over if watchdog tag has bad quality or any error.
/WS / Fail over if the server leaves the RUNNING state.

Server-Level Failover Options using ICU Control

Backup OPC Server Node Name– The name or IP address of the backup OPC Server Node (/BACKUP);
List Servers – This button when click get a list of OPC Server Names from system found in the Backup OPC Server Node Name field. It populates the Backup OPC Server Name dropdown list.
Backup OPC Server Name– The registered name of the backup OPC Server on the above node (/BACKUP);
Number of Interfaces on this Node– The count of how many instances of the OPC DA interface are running on this node (/NI=#);
Switch to Backup Delay (sec)– The number of seconds to try to connect, before switching to the backup server (/FT=#);
Wait for RUNNINGState (sec)– The number of seconds to wait for RUNNING status, before switching to the backup server (/SW=#);
Current Active Server Tag– The string tag into which should be written the name of the currently active server (/CS=tag);
Primary Server Watchdog Tag– Watchdog tag for the Primary Server (/WD1=tag);
Backup Server Watchdog Tag– Watchdog tag for the Backup Server (/WD2=tag);
Multiple Watchdog Tag Trigger Sum– When using multiple watchdog tags, failover will be triggered if the sum of the value of these tags drops below the value entered in this box (/WD=#);
Maximum number of Tags which can have Bad Quality or Any Error without triggering Failover: -- (/WQ=#) Trigger a failover if more than # number of watchdog tags have Bad Quality or Any Error. If one watchdog tag is configured set /wq=0. If more than one watchdog tag is configured, then # can be set from 0 to the number of watchdog tags configured minus 1.
Failover if Server Leaves RUNNING State– (/WS=1).

OPC DA Interface Failover Manual1

Server-Level Failover Configurations

These are the server-level failover options supported by the interface. This section does not deal with timing of failover at all, only with how failover is triggered. Please see the next section for timing considerations.

Inactive Server Doesnot Allow Connections

This is the easiest to configure, using the /BACKUP parameter to provide the name of the other OPC server. If the interface cannot connect to one server, it will try the other one. The selection of which server is active is completely managed by the servers.

/SERVER=OSI.DA.1 /BACKUP=othernode::OSI.DA.1

Inactive Server Leaves OPC_STATUS_RUNNING State

This is controlled by using the /WS parameter. Once the interface is connected to a server and collecting data, the server’s state is checked every 30 seconds. With the /WS flag set, if the server leaves the RUNNING state, the interface will disconnect from the current server and try to connect to the other server.

/WS=1

Inactive Server sets Quality to BAD

Some servers only indicate that they are not the active server by setting the quality of some or all of their items to BAD. This can be used to trigger the failover of the interface to the other server, but the quality of the tag being used for a watchdog must be bad only when the interface should failover.

/WQ=# directs the interface to fail over to the other server if more than # number of watchdog tags have Bad Quality or Any Error. Note that v1.0a servers do not return error codes for individual items, so for version 1.0a servers this parameter only checks the quality of the value sent from the server.

If one watchdog tag is configured set /wq=0. If more than one watchdog tag is configured, than # can be set from 0 to the number of watchdog tags configured minus 1.

/WQ= to the number of watchdog tags minus 1.

Watchdog Tags

For server-level failover, a specific PI tag can be defined as a watchdog tag.The OPC item which this tag reads must have a specific, known value when the server is able to serve data and another specific, known value when the server is unable to serve data. It is called a watchdog tag because its value changes to announce a change in the server status.

The remaining configuration options use Watchdog tags. Watchdog tags allow the OPC servers to tell the interface which server is the currently active server. The basic idea is that if the value of the watchdog tag representing a server is greater than zero, that server is the active server. There are two different modes for using watchdog tags: isolated mode and server-specific mode. In isolated mode, each server only knows its own state. The items being used for these watchdog tags represent the current state of the server (such as backup state or active state). These items could have different values for the two servers at any given time. In server-specific mode, both servers know the state of the other server. Because of this, the items being used for the watchdog tags should match. In general, server-specific watchdog tags are a more robust failover model.

Note that watchdog tags are read in the same way as normal data tags, and the values are passed along to PI. The PI tags must be configured as integer tags, but Location2 settings can be used to read other datatypes into the integer tags. Also, the same scaling and transformation formulas are applied to the watchdog tags as for ordinary tags, so using an integer PI tag and scaling parameters the interface can recognize values of -3 and 7 as 0 and 10, respectively. Any transformation that results in an integer value of 0 for backup and >0 for active can be used in a watchdog tag.

The watchdog tags should be configured to be Advise tags, if the server can support advise tags, otherwise they should be put into a scan class with a short scan period. Whenever the values are received from the server, whether polled or advised, the values are checked to see if they match the current state of the interface. If the watchdog tags say that the interface should be connected to the other server, the interface will disconnect from the current server and attempt to connect to the other server. If it cannot connect successfully to the other server within the configured failtime given by the /FT parameter on the command-line, it will flip back over to the original server and try it again, in case that server has now become the active server.

Isolated Watchdog Tags

With isolated watchdog tags, each server only knows its own state. There are two ways to use this model. The simple version has one tag, which by itself shows whether the server is ready to serve data. Multiple tags can also be used to reflect the server’s connections to its underlying data systems.

One tag

The same Item in each server reflects the state of that server. The interface will read the Item as an ordinary data value, and if the value is not greater than zero, the interface will disconnect from the server and attempt to connect to the other server. At least one of the servers should always return a 1 as the current value for this Item. The watchdog tag is identified to the interface with the /WD1 parameter. With this model, the /WD2 parameter is not used. If /WD2 is specified and not /WD1, it will be ignored by the interface.

/WD1=ServerActive

PI tag ServerActive has Instrumenttag = Watchdog1

Multiple Watchdog Tags

Multiple tags can be defined as watchdog tags, and use the sum of their values to determine whether the server is active or not. The general idea behind this model is that the server may have access to flags that show whether its collecting data from various sources, and that as long as some number of those flags show data collection, the server should continue to be used, but if enough of those flags show a connection loss, the other server should be tried to see if it has access to the underlying devices.