Sliding Time Window That Keeps Relevant Data on the Screen

Customer Meeting3/26/08 2:02 PM

Late

Hubert (5 mins)
Dave (7 mins)

View Client

Latency over time
Sliding time window that keeps relevant data on the screen
Currently on random data (our dataset is too small to graph well)
Can change sliding window
Working to make time of day label more useful
Needs to create a database for each server being viewed due to RD4J restrictions, but can send from manager instead
Currently up to 25 kB if stored on client machine
Jacy: all those machines have 2 GB of RAM, so not worried about space. Also, there is a use case for keeping around old data (pulling in ad hoc data), so it wouldn’t just be replaced and the 25 kB limit would be breached.
Brad: may be able to save processing time by tying granularity of data to the size of the window
Jacy: we need to know how the system load will scale (will a second graph double the load, or is most of the load overhead?); we cannot overload the client. Test with at least 20 server graphs.
Can we trick RD4J into not writing to the disk?
This component would fit into the main UI by coming up when a particular server is clicked on in the dashboard; the time window could be changed via a slider or button. Should also try to see if the sliding window view could be displayed in the dashboard view (set the shape to an image)—this may make sense for intra-server correlation
Aggregation currently occurs on the view manager and is pulled from the disk
Brad: hope to show things from the view server for next week
Matt: promise to absolutely, definitely have live data running through an entire connected system (across Mule) by next week

Recoverability

Camp 1: parser tags messages (must be taken offline and restarted in order to make changes)
Camp 2: parser passes rules along to data client, and data client tags messages (modifiable at runtime)
Tree of rules that can be combined with logical connectives
Jacy: all the systems have different life cycles, so we don’t want to force everything to die in order to update the parser
Chelsea: by the same token, a parser could easily be taken offline and then be brought back up with changes without affecting the message flow
Jacy: don’t want to focus too much on making the parser easy to write if it causes us headaches
Brad: it probably makes life easier to make the parser easy to write and tag messages on its own, because it makes the data client easier to write (which is our module)
Each piece must pass along some knowledge of itself to the piece that is able to recover it
Jacy: it seems to couple the pieces too tightly; this is only worthwhile if it follows a standard API. Look at FIX’s example of recoverability protocol and emulate the uniformity and isolation they use
Jacy: the JPM standard generally states that handshaking upon startup determines what action must be taken to recover. The actual system restart is manually performed by Operators (run book describes how to bring them back up). There are some systems that continually ping their components and send misses along to scripts that can restart them—don’t focus on this. It may be worthwhile to do whatever is necessary to keep/start the Controller running; an external script would actually restart it, but we must write each piece such that it automatically restores its last state

Correlation

Concurrency design relies on a dynamically-sized thread pool
We’ve shifted to thinking of the system as always being in Learning Mode
Administrators should have the ability to explicitly remove edges and short circuit the effort to correlation along that edge

Mule

Currently at the end of the design phase and running into startup bugs
Still have questions about dynamic endpoints and TCP (Jacy hasn’t been able to get in contact with JPM Mule users yet but is trying)

Questions

Matt: how does logging in JPM work?
Jacy: each system has its own log or set of logs that is configured to give alerts to Operate; a set of patterns is then set up for Operate to handle. We should make a log for the Controller and provide this set of rules
Matt: is it appropriate to log other modules?
Jacy: at appropriate levels, yes
Matt: do we need to provide an implementation of “go back and correlate old messages” behavior?
Jacy: given the timeframe, more worried about the ability to add that later

Concerns

We’re still at the proof-of-concept level
No individual components are running, and certainly not the whole system end-to-end
Integration needs to happen more frequently

Want to see everything working together by next week; discussion of view client performance