Message Queuing for Network Monitoring

Message Queuing for Network Monitoring
COMP520-12A Interim Report
Anthony James Coddington
Supervisor: Richard Nelson
8 June 2012

Introduction

This project is to investigate and integrate a messaging system for network monitoring that is robust, reliable and scalable in wide area networks. Network monitoring has the particular requirement that messaging must cope with adverse network conditions (such as intermittent connectivity), as network monitoring is generally most useful when the network is not behaving as expected. The system is intended to replace the existing messaging system for results and test coordination system in the University of Waikato AMP network monitoring system, originally developed by the National Laboratory for Applied Research (NLANR). An overview of AMP can be found in the attached project proposal. This report documents progress made so far towards the project, as well as changes to the project plan.

Overview of Messaging Systems

The requirements of reliability and scalability with wide area networks entail a number of practical considerations that were used in evaluating systems. In order to be reliable when nodes can be disconnected or networks behave badly, a messaging system for network monitoring needs to use a reliable transport protocol and support some form of client-side persistence (for when the network is not available). Most systems fulfilled the former requirement by at least using TCP to guarantee some form of message integrity, and many could further ensure message delivery to the destination and lack of duplication through acknowledgements. Persistence was a major issue with almost all existing systems as many systems operating with a broker-centric model (where clients send to a central broker that guarantees message delivery and handles delivery to the destination) implemented persistence at the broker side, which does not solve the problem of nodes potentially being disconnected from the network. Another issue with some existing systems is security, as many are designed for internal network or data centre use and thus not suitable for internet use. Authentication is more important than encryption for AMP (to avoid unauthorized or malicious reports, but not necessarily hide the contents)(B. Jones, personal communication, April 23, 2012), but many systems suffered from a lack of secure authentication methods as well.

One solution considered for the persistence issue was to have a co-resident broker on each node, but this meant the broker must also be lightweight. The major issue encountered with selecting a system was that many existing systems were very enterprise-oriented and hence complex and often written in high resource requiring languages such as Java; which is less of a problem for the server but not suitable for potentially lightweight clients. Overall, almost all were designed to solve a different problem (many more minor systems were for load balancing of queues of tasks for example), and hence had extra complexity for areas not required without implementing needed features such as persistence.

The following is a discussion of the major existing messaging systems. The existing messaging system in AMP, which uses a circular in memory and on disk buffer for persistence, and SSL certificates for authentication, is outlined in the project proposal. The main issues identified with the current system are that it is one way, non-generic and persistence is occasionally unreliable (B. Jones and R. Nelson., personal communication, April 23, 2012). Extending the existing system was considered, but little advantage was to be gained over adapting an existing lightweight protocol like STOMP.

Advanced Message Queuing Protocol (AMQP)

The Advanced Message Queuing Protocol was developed “to become the standard for interoperability between all messaging middleware,” with a standardised wire protocol(OASIS, AMQP, n.d.). The protocol is thus designed to meet the needs of many applications, and (in prior versions) is broker based. According to SpringSource (“AMQP 0-9-1 Model Explained,”n.d.), producers send messages to exchanges (which are entities within the broker), which then route messages to queues that consumers subscribe to. Various types of exchanges are supported to support one-of-many queuing, publish-subscribe and fan-out messaging. Queues exist on the server, and the client must know the names of the queues it wants to subscribe to(SpringSource)which can be problematic. SSL security as well as SASL authentication are supported (although only plain-text SASL authentication is well supported (SpringSource, “SASL Authentication”, n.d.)). The broker-centric approach of AMQP is useful in many situations and allows clients to be simple; but it is not particularly useful for potentially disconnected nodes as all of the queuing functionality needs to be duplicated. This is not the main issue with AMQP however.

There are three major open implementations of AMQP: RabbitMQ, Apache Qpid and OpenAMQ. RabbitMQ appears the most popular and is written in Erlang, supporting AMQP 0.9.1 (with some extensions)(SpringSource, Compatibility and Conformance, n.d.). Apache Qpid, originally developed by RedHat, currently implements AMQP 0.10 (more recent than 0.9.1) and has C++ and Java variants(Apache, AMQP Compatibility of Qpid releases, n.d.). These protocol versions are not interoperable, (especially as the Qpid C++ broker does not support any older versions(Apache, AMQP Compatibility of Qpid releases, n.d.)). This should not be much of an issue going forward, as AMQP 1.0, a new version of the standard, will be forwards compatible with future versions of the protocol. However, AMQP 1.0 represents a complete re-architecture of AMQP, using a lower level system of nodes and links on which to build, rather than brokers, exchanges and queues(OASIS, 2011, pp. 23-24). This allows more flexibility in broker implementation, but could fragment and confuse the market (although RabbitMQ and Qpid both plan to support 1.0 at some point in the future). OpenAMQ was developed by iMatix and, according to Kramer (2009), was the original AMQP implementation. It was discontinued in 2011 (partly due to this fragmentation) in favour of focusing on developing ZeroMQ (Hintjens, iMatix will end OpenAMQ support by 2011, 2010). Despite its complexity, AMQP (including version 1.0) does not yet define standard broker to broker federation(OASIS, 2011, p. 7). RabbitMQ supports federation using AMQP extensions(SpringSource, “Federation Plugin”, n.d.). Federation in Apache Qpid is interesting as it uses a control protocol to instruct one broker to subscribe to all relevant topics on the other, and act as a producer in the reverse direction to forward messages(Apache, “Using Broker Federation”, n.d.).This, with co-resident C++ brokers on each node, was considered as a solution for network monitoring, but was decided against due to the protocol fragmentation and complexity in configuration (as a management tool is used to configure such federation).

MQ Telemetry Transport (MQTT)

MQTT was developed by IBM and Eurotech beginning in 1999 and was previously called SCADA (MQTT, n.d.). It is an open specification and IBM recently open-sourced its clients as part of the Eclipse Paho project(MQTT, Initial Eclipse Paho contributions completed , n.d.). It was originally designed for telemetry from low power, high latency networks and as such has clients written for a variety of embedded systems(MQTT, n.d.). It has been used for a variety of Machine-to-Machine or Internet of Things situations (MQTT, Projects, n.d.); as well as for push notifications for the Facebook Messenger mobile application(Zhang, 2011). One of the main advantages of MQTT being designed for low power, low data rate applications is it is reasonably simple and low overhead. This also has disadvantages, with a lack of useful authentication (only a short plain-text field), although it has been used with SSL (MQTT, n.d.). Unfortunately the lightweight main open source server, Mosquito, does not support SSL at this time (Light, 2011). The most recent version of Apache ActiveMQ and its eventual successor Apollo have recently added support for MQTT and do support SSL however (Apache, ActiveMQ 5.6.0 Release, 2012). MQTT is designed as a publish-subscribe system, with a micro-broker, and messages can have three levels of quality of service: at most once (best effort), at least once and exactly once; with different amounts of traffic involved(IBM, 2010). According to IBM (2010), clients can also specify a ‘will’ message to be delivered when they become unreachable to indicate this to other nodes. As MQTT is designed for telemetry, some of the clients support persistence. While the Java Eclipse Paho client supports persistence and SSL, the C client does not(Eclipse Foundation, n.d.). MQTT was a strong candidate and may have been chosen if the open source implementations were more mature, but ultimately STOMP was chosen for its increased flexibility, protocol simplicity and interoperability.

Facebook Scribe

Facebook Scribe was originally developed by Facebook for internal use as a system for reliably collecting logs from many servers (Johnson, 2008). Essentially each node connects to a local re-sender, which handles reliably transmitting logs to (a possible hierarchy of) servers. It is an extremely simple system, using Apache Thrift (a cross platform and language service framework) Remote Procedure Calls, with a simple OK or TRY_LATER response from the server as the result of the RPC. The re-sender also does reasonably reliable (but imperfect) persistence to ensure logs are eventually delivered if nodes become unable to reach servers (Facebook, 2010). Log messages consist of a category and a message (both strings) and log messages can be stored in various ways or forwarded by servers, by their category (or a part thereof) through a server configuration file(Facebook, 2010a). Facebook Scribe was designed for internal use and hence does not support SSL, but another developer has implemented such support using Thrift SSL sockets(Qualtrics Labs Inc., 2011). The main limitation of Facebook Scribe is that it is one way, and would have required lateral thinking to be used beyond the intended purpose of collecting logs. The RPC mechanism, while simple, was also unappealing due to potential issues with RPC such as blocking methods and added complexity.

ZeroMQ

ZeroMQ was developed by iMatix Corporation to resolve many of the perceived issues with AMQP (Hintjens, iMatix will end OpenAMQ support by 2011, 2010), (iMatix Corporation, n.d.). According to iMatix (n.d.) it is designed to be like network sockets for messaging systems, with sockets for common messaging patterns such as publish-subscribe and request-response on top of ordinary network sockets; with which to build a messaging system that could fit a variety of patterns, potentially in a brokerless manner. Similar to a TCP stack, the message queues inside sockets are opaque, and little beyond a maximum size can be set. ZeroMQ has an extremely informative and detailed manual, where complex patterns are described and built from sockets and intermediary devices(iMatix Corporation, n.d.). ZeroMQ is designed to be extremely fast and suitable for use for financial applications (iMatix Corporation, n.d.). A variety of ‘RFCs’ are defined for these patterns, but implementation beyond the example code in the manual is not provided. The main issue with ZeroMQ is that it does not define any security mechanisms, and does not support authentication or security mechanisms such as SSL. There has, however, been some discussion of implementing SASL authentication in an RFC that includes a broker(Hintjens, 11/MTL - Message Transfer Layer - 0MQ Request for Comments, 2011). The other issue is, with opaque message queues, persistence is difficult and would require an intermediate device (which could be within the same application) and re-implementing much of the queuing functionality (iMatix Corporation, n.d., Chapter 4: Reliable Request-Reply). Previous versions of ZeroMQ defined a mechanism for storing large queues on disk, but this did not survive application restarts (iMatix Corporation, n.d., (Semi-)Durable Subscribers and High-Water Marks)and has been removed from the upcoming version 3 to avoid confusion (iMatix Corporation, Long-term Planning, n.d.). Overall, while usable for many applications, ZeroMQ did not bring many benefits over a completely custom protocol for network monitoring, while complicating reliability and persistence.

Other Systems

A variety of other systems were also evaluated. Many systems(such as beanstalkd by Rarick (n.d.)) were solely suitable for one-of-many message queuingand thus not suitable for the general messaging required by network monitoring systems. Several systems were based on Memcached, a distributed key-value store(Danga Interactive, n.d.) and repurposing it as a message queuing system(Rarick, n.d.). A similar solution to Memcached is Redis, which also defines a number of other objects such as queues, and has more recently even added a publish-subscribe interface(Redis, 2010); but effectively still has the server-side persistence issue while adding additional system complexity. There were also a number of Java JMS based systems that were avoided due to the requirement clients were relatively lightweight.

Apache ActiveMQ, one of the most popular messaging systems, and its native OpenWire protocol were not used due to a lack of interoperability with any other messaging system and relatively heavyweight C/C++ library (Apache, ActiveMQ-CPP, n.d.). However, Apache ActiveMQ also supports STOMP as a first class protocol, which is interoperable with other supported protocols, and has access to almost all of the features available in ActiveMQ while allowing the possibility of interoperability with almost any message broker, supporting SSL; while having a simple wire format that is straightforward to implement clients for in a variety of languages (Apache, ActiveMQ – STOMP, n.d.).

STOMP

STOMP is a simple text-based protocol for messaging. Its main advantage is its simplicity; meaning it is frequently added as an additional protocol to many message brokers (a list of which can be found at (STOMP, n.d.)). This simplicity also means it is simple to write clients for a variety of languages. According to STOMP (2011),STOMP messages simply consist of a frame type, headers separated by newlines, a new line and then the message body with a terminating null character. A stomp SEND frame with destination is sent to the connected broker (optionally requesting a receipt), and the broker then sends MESSAGE frames to appropriate subscribers, who then acknowledge the message to the broker. The main headers defined are for (optional) content length and destination, but custom headers can be defined and will be passed on by the broker.

One issue is that the destination header format is defined by the broker implementation. Specific error message formats are also not defined(STOMP, 2011). Because destinations are opaque, feature support is dependent on the broker implementation but is generally similar to AMQP. The existing STOMP C library is low level and does not support SSL(LogicBase, 2005), but the protocol is simple enough to make writing a new client possible. STOMP 1.1 adds heart-beating, negative acknowledgement and support for virtual hosts, and more clearly defines the STOMP 1.1 standard to allow better interoperability (STOMP, 2011).

Despite having some of the same disadvantages as AMQP, STOMP was chosen as the most suitable system for network monitoring; due to its potential for interoperability and simple, flexible header format meaning the protocol could be extended easily. Primarily STOMP was chosen over AMQP due to concerns with the fragmentation of the AMQP standard, and because STOMP retains interoperability. The proposed architecture section that follows contains a more detailed discussion of STOMP.

Proposed Architecture

The proposed architecture involves implementing a STOMP 1.1 C library with a high level API and built-in persistence using a well-tested embedded database or similar system that stores information both in memory and reliably on disk (using transactions for example); such as Berkeley DB(Oracle, n.d.) or SQLite(SQLite, n.d.). The library will support both unsecured and SSL STOMP connections mostly transparently using OpenSSL BIO sockets(OpenSSL, n.d.), as well as some form of connection failover. Communication will be via publish-subscribe topics, with communication to specific nodes achieved using individual topics. The reply-to header would be set in messages by the sending node so the receiving node(s) can reply to messages using this topic. To work around STOMP’s lack of end-to-end acknowledgement (message-IDs not being sent back to the publisher), the correlation-id header would be used for this purpose. Neither of these headers are part of the STOMP 1.1 specification, but are used by ActiveMQ to provide better interoperability with JMS(Apache, Apache ActiveMQ – STOMP, n.d.), and any broker must pass the headers to STOMP recipients regardless. According to STOMP(2011), STOMP does, however support receipts from the broker of messages reaching it, and acknowledgements of clients to the broker, which would be used to improve reliability. Currently Apache ActiveMQ is being used as the broker, but the client would be compatible with any STOMP broker (possibly requiring minor changes to topic formats, which would most likely be externally configured in the final application). ActiveMQ, while relatively complex, has the advantage it supports relatively seamless interoperability with OpenWire and XMPP as well as a REST web service(Apache, “Cross Language Clients”, n.d.). Currently ActiveMQ, RabbitMQ (SpringSource, RabbitMQ Stomp Adapter, n.d.)and Apache Apollo (the successor to ActiveMQ in development) are the only brokers known to support STOMP 1.1, although the client library should also support STOMP 1.0.