DARPA Intrusion Tolerant Systems Workshop

Williamsburg, Virginia

October 5-6, 1999

Tolerance methods group summary presentation

Notes by Carl Landwehr, Mitretek Systems

Introduction

Jay Lala, DARPA Program Manager for Intrusion Tolerant Systems, convened the workshop. Presentations on the first morning of the workshop reviewed concepts and technology of Fault Tolerant Systems (see additional notes), together with a few talks discussing how those concepts might be applied to computer and network security problems. The workshop then split into two groups. One group, chaired by Roy Maxion of Carnegie Mellon University, was to discuss methods for recognizing that an intrusion has occurred. The other, chaired by Carl Landwehr of Mitretek Systems, was to consider how, given that an intrusion had (or could have) occurred, the concepts and mechanisms of fault tolerance might be applied so that the system could sustain its critical operations and perhaps restore its original capability. At the end of the first afternoon, and again at the end of the workshop, each group reported its results.

On the second morning, Jay Lala suggested some particular questions to help focus the deliberations:

1. Can fault tolerance techniques be adapted to intrusion/attack tolerance? If so,

What additional vulnerabilities do these techniques introduce?

How can these added vulnerabilities be countered?

2. What are the functions needed to perform intrusion tolerance?

3. What research will be needed to develop systems with these functions?

4. Can we identify any “challenge” problems that might be used to inspire friendly competition to advance the technology (in the vein of the SIFT/FTMP competition)?

These notes amplify the final presentation of the second group.

After some discussion, the group agreed that fault tolerance techniques could indeed provide a basis for developing intrusion tolerant systems, and proceeded to discuss what new vulnerabilities these techniques might introduce.

1. Vulnerabilities introduced by mechanisms for fault tolerance and possible counters

The group discussed a number of potential vulnerabilities, listed below, but it cautioned that the list should be considered representative, not complete.

Redundancy tends to reduce confidentiality.

As recognized in the charge to the workshop, redundancy is a concept basic to fault tolerance. However, replicating confidential information inevitably increases its exposure to threats.

Possible Counter: Secret Sharing Schemes. Cryptographic mechanisms exist that allow keys to be reconstructed only if k out of n holders of the secret combine their shares. As demonstrated in the work on fragmentation, redundancy, and scattering conducted at LAAS-Toulouse, this approach can be used to support replication without necessarily increasing vulnerability to the same extent.

Adaptation - Attacker may exploit to force system into states to his advantage.

One of the techniques of fault tolerance is to design the system to adapt to the occurrence of failures, perhaps by reconfiguring itself to omit a faulty component. An attacker who understands how the system will adapt, and who can intrude upon it, may do so simply to force the system to adapt in a way that is to the attacker’s advantage (e.g., by disconnecting itself from a network that seems under attack).

Possible Counter: Dynamic, perhaps random adaptation methods that cannot be precisely predicted by the attacker could reduce the effectiveness of this kind of attack.

Redundancy management adds complexity, may reduce predictability

In fault tolerant systems with active, redundant components, managing this redundancy properly has proven a difficult task. In some cases, the redundancy management mechanisms have proven complex enough that they themselves have contributed to system unreliability to the point that it became unclear whether the simplex system might not have behaved better than the redundant one. [I believe the AFTI-F16 is a case in point.] . Introducing redundancy, and managing it, for purposes of intrusion/attack tolerance could lead to similar results.

Possible Counter: The group didn’t identify a specific counter for this problem, except to be sure that redundancy management schemes are not overly complex. Management schemes simple enough that their behavior can be analyzed rigorously would seem desirable, though see the previous point.

FT mechanisms do not necessarily remove all single points of intrusion failure.

The group expected that fault tolerant mechanisms might be introduced where systems had known single points of vulnerability to attack or intrusion, in order to reinforce those points. However, it is likely that some, possibly unidentified, single failure points will remain. The presence of some FT mechanisms may lull the defender into believing that the system is less vulnerable than it is in fact.

Possible Counter: The only real counter for this seems to be vigilance and understanding on the part of the designers, certifiers, and operators, as well as reinforcing residual single points of failure using other (possibly non-technical means).

Assumptions behind FT mechanisms may not hold and may not be sufficiently reexamined in new context

Many mechanisms for achieving fault tolerance make some assumptions about the environment – e.g., on communication modes, failure modes, etc. Designers who simply borrow these mechanisms without understanding their assumptions may build systems that are vulnerable to an attacker with a deeper understanding.

Possible Counter: Educate the designers so that they understand the assumptions, and re-examine the assumptions behind the fault tolerant mechanisms.

2. Some functions useful for intrusion tolerance

The group noted that although the primary focus of the workshop is on maintaining system integrity and availability in the face of intrusion, integrity requirements may impose certain confidentiality requirements. Specifically, many integrity mechanisms depend on cryptographic functions to assure that malicious attempts to alter data can be detected. If the confidentiality of the secret key is compromised, however, an attacker may be able to alter data protected by these mechanisms without being detected. Further, direct violations of confidentiality, such as occur when sensitive data are simply copied and transmitted, are very hard to detect, unless there is some physically protected audit of all data read and written.

Detection

The ability to detect damage to system components, to programs, and to data, is clearly important to constructing an intrusion tolerant system. It would also be desirable to detect the route used by the attacker to achieve the damage, if not the attacker’s identity.

Recovery - state restoration

The ability to restore state following an intrusion will be needed, since presumably some damage will have occurred. The ability to rely on backward recovery may be limited because a malicious attacker may have arranged that backup copies will be contaminated (e.g. by introducing an attack that only takes effect after days or weeks of latency), so forward recovery mechanisms will also be needed.

Masking / error correction

A fundamental method fault tolerance is to triplicate units that may fail, vote the results, and select the majority voted value. This procedure can mask errors and permit them to be corrected. Determining how to apply this approach to intrusion tolerance is complicated by the fact that, for example, triplicating the same software component will mean that if that component is compromised, all three versions may produce the identical sabotaged value. Thus it will be appropriate to consider diversity in combination with voted redundancy.

Redundancy management

Redundancy, probably in several forms, will be a basic technique applied in developing intrusion tolerant systems. Managing this redundancy without introducing too much complexity will be a challenge.

Adaptation / Reconfiguration

To counter an intrusion that damages part of a system, that system will need to be able to reconfigure itself and possibly to adapt to a changed environment, perhaps with diminished functions.

Latent attack detection / self test

Latent malicious code represents an intrusion. The ability to detect such code will be useful for intrusion tolerance; self-test functions that run when the system is idle might be one way to find such code.

System Behavior Models

A model for system behavior is not a function, per se. However, the ability to determine that an intrusion has occurred and that corrective action is needed depends on the system’s ability to recognize that its behavior has changed in some respect. Models of how the system is expected to behave, probably formulated at several different levels of abstraction, could support this capability.

Extent of Compromise - Data Flow Models

After an intrusion is recognized, it is important to determine the extent, or the potential extent of the damage. Models that reveal the flow of data in the system may be useful in determining the extent of the compromise and deciding what needs to be repaired.

3. Potential research directions

The group identified a variety of potential topics for research, including:

Camouflage - changing protocols to disguise behavior

The idea is to explore ways to make it harder for the attacker to determine what the system is doing, perhaps by using different protocols to achieve the same result at different times.

Dynamic Confinement and Authentication

The idea is to explore the use of shifting containment regions and authentication to contain the effects of an intrusion.

Randomness in Algorithms

The idea is to explore the use of probabilistic algorithms that have developed for use in fault tolerant systems to see if the could prove helpful in containing the effects of intrusions.

Dynamic Reconfiguration and Adaptation

Research topics in a variety of techniques for dynamic reconfiguration and adaptation were discussed, including

- techniques for dynamically changing network addresses for components, to confuse intruders

- techniques for changing the number of redundant components, to shift defensive posture,

- techniques for shifting the location of resources, so a particular computation might be difficult for an attack to find

Fragmentation, Redundancy, and Scattering

Research conducted in Europe over roughly the last ten years has developed methods for preserving the confidentiality and integrity of files in a file system by fragmenting them, copying the fragments and encrypting them using secret sharing techniques, and dispersing the encrypted fragments across several different systems. This work is expected to be developed further under European funding of the MAFTIA project, and could have applications to intrusion tolerant systems.

Models and analytical techniques

Although it may be an advantage to have a system that is unpredictable from the point of view of an intruder, the users of a system will want predictability in system functions. Research in modeling and analyzing the behavior of system designs that strive for intrusion tolerance will be needed.

Validation/evaluation

The problem of assessing whether the requirements for an intrusion tolerant system have been correctly stated, and the problem of determining how well an intrusion tolerant system meets its requirements both appear to be extremely difficult. Only recently have there been significant efforts to measure the relative effectiveness of intrusion detection systems, and the methodology has been subject to some criticism. Validating and evaluating intrusion tolerant systems is likely to be even more difficult.

Security Policy for Intrusion Tolerant Systems

The security policy for intrusion tolerant systems may have to anticipate the fact that an intrusion succeeds. An intrusion tolerant system may require a policy that takes account of different modes of behavior depending on whether an intrusion is in progress or not, and research may be needed to develop appropriate security policies.

Functional / Analytic Redundancy

One method for tolerating an intrusion may be to have several different means for computing the same result, as a spacecraft of airplane may have alternative means to navigate. If an intrusion disables one way of achieving a function, the redundant means may be used. Research may be needed to identify appropriate ways to implement this concept so that an intruder that is able to damage one redundant component cannot necessarily damage all of the others that back it up.

Massive Redundancy

There is an increasing amount of low cost computing hardware available on our networks, but very little effective use is made of this fact in current architectures. Research to find ways to exploit the massively redundant hardware already in place to provide better intrusion tolerance seems appropriate.

Intrusion Tolerant Transaction Processing Schemes

Transaction processing schemes typically require all participants in a transaction to agree before the transaction can go forward. This property is extremely important in maintaining consistent information in distributed systems, yet it could perhaps be exploited by an attacker to prevent a system from making any progress, since a single sabotaged component could indefinitely delay a transaction. It may be appropriate to investigate the properties of alternative transaction processing models that could function more effectively, perhaps in a degraded mode, in the face of an intrusion. Such models might, for example draw on application-specific information in order to permit progress without unanimity.

5. Challenge problems

The group conducted a free ranging discussion on challenge problems and the sorts of advances that might cause the sponsors of the Intrusion Tolerant Systems program to consider it a notable success, with the following results

Moles on the design team

The idea here is to consider the possibility that there is an active intruder on the team designing an intrusion tolerant system. This might lead to building systems with the assumption that the design documentation will be known and available to the attacker. An even more difficult problem is posed if one assumes that the mole on the design team may attempt to include hidden weaknesses in the design.

Evaluation of Intrusion tolerant systems

This area has already been discussed as a research topic, but the group felt the difficulty of this problem might merit elevating it to the level of a challenge problem. Developing appropriate metrics for intrusion tolerant systems will be difficult, as will developing methodologies for carrying out measurements. One idea discussed was, in the case of a design competition between two teams, that each team might also serve as a red team for its competitor’s design.

Software architectures that support least privilege or other properties of interest

This item is perhaps more in the nature of another research topic. The idea is that most current software architectures, and particularly those in COTS systems, do not support the concept of least privilege (i.e., that each software component’s privileges should be limited to those required for it to perform its specific function). A consequence is that a compromised component can do damage to the system far beyond its normal functions. Architectures that reduce this kind of exposure could lead to much greater intrusion tolerance that is found in current software systems.

X% of critical functions maintained for n hours following intrusion

Capabilities of fault tolerant systems have often been demonstrated by showing that the system continues to run when a fault is injected into the system or a board is pulled from a chassis. A successful demonstration of technology developed by this program might show that some significant percentage of critical functions are maintained for some period of time after an intrusion occurs, or perhaps that a compromised system operator cannot cause more than a limited amount of damage.