A Fault-Tolerant Secure Distributed File System

(FTSDFS)

Arun A Kumar

Fei Lu

Submitted for the course CS 600.442, Cryptography and Network Security

as a Final Project

December 2001

Baltimore

Abstract

This projects attempts to create a secure, fault-tolerant environment for users to store files.

The files that need to be stored are first split into n shares, appended with their hash values, encrypted and then stored on n servers. The system will retrieve files as long as at least t (t<n) servers are functional. A malicious user cannot retrieve the file, as the file is encrypted. In addition, it would take at least t malicious users to destroy the file. The system is resistant against all types of active as well as passive malicious attacks.

It currently supports most variants of Unix, including, but not limited to Solaris® and Linux.

The application assumes that the failure of the servers will not exceed t out of n (both chosen dynamically by the user). The files are stored using the IDA[1] scheme. But, since this scheme is not totally secure, and some information about the original file can be extracted out of partial shares, the file is encrypted using AES after dispersal. The system is flexible enough to accept key sizes of 128 or 192 or 256 bits as the secret key. AES has been implemented using the CBC mode. AES encryption makes the system secure. Hence, the security of the system is dependant on safe storage of the secret key. The key may be stored securely on a smart card or a similar device. In addition, each share is appended with its hash value (MD5) and an update counter (to resolve update inconsistencies). The interface to the file system is through the GNU Midnight Commander. This form of an implementation makes it analogous to threshold cryptography schemes like the one proposed by Shamir – ‘How to Share a Secret’, 1979. But, this scheme is highly space efficient in the sense that each share requires only m/t bytes of data (where m is the size of the original file) + a constant overhead which is the initialization vector + size of the hash function.

System Description

The application is client-server based. The client provides a major part of the functionality. This is done intentionally so that the failure or compromise of a server does not compromise security. The client is responsible for splitting, combining, encrypting, decrypting files and computing the hash values.

The Client

The interface for the client used is that of the GNU Midnight Commander, which was the first file manager used for the GNOME project. The initial code was obtained from the GNU free software foundation. It has been modified to incorporate code to let it access the distributed file system. Initially, when a connection is requested between the client and the distributed file system, the client authenticates the user with the server. Hence, the user must have an account on each of the servers that forms the distributed server network. Once authenticated, the distributed file server gives access to all files that belong to the user who is logged in. The files in the current directory structure are shown on a pane on the user interface.

Whenever the user copies a file from the local file system to the distributed file system, the client obtains a 128/192/256 bit key from the user for encrypting the file. The file is passed on to the part of the code implementing IDA. This splits the file into n shares, each of length m/t, where m is the length of the file. Each file is identical in size. These files are written to the local file system with a random string appended to the each of the filenames, as they are all stored on the same location. After dispersing, it calculates the hash value (MD5) for each share and appends it to the file along with an update counter that is incremented every time the file is updated. It then uses AES in CBC mode to encrypt each share with the same key. The IV (initializing vector) for the AES is also stored in the encrypted file as a 16byte header. The IV is generated using a pseudo random number generator. The first byte in the next block is used to store the length of the last block. This is done because the file size need not necessarily be a multiple of 16 and hence, the last block size might vary between 1 and 16. Since only 4 bits are required to represent this number, the four higher order bits are filled with random bits and the actual block length is stored in the lower four bits. The subsequent blocks have 16 bytes of data. The encryption is then carried out and each share is updated on the local file system. The shares are then copied to the servers with their original filenames.

Consider the case when a request is made to copy a file from the distributed file system to the local file system. It first obtains the decryption key from the user. It now uses this decryption key to recreate the original file share. The client contacts t servers randomly for a share of the file. If a server is down or its version of the file is corrupt (determined by decrypting the share and comparing it to its hash value and comparing the update counters), the client switches to the next server. It proceeds in this manner until it has obtained t shares. It now reconstructs the original file and stores it locally.

The Midnight Commander interface requires that a file on the distributed file system also be edited/viewed within the interface. In order to provide this functionality, the file is first copied to the local host and edited on the local host. When the user finishes modifying/viewing the file, the system compares the file with the original file to detect any changes that were made by the user. If a change is detected, the original file is erased from all servers and updated with the new copy. This concludes the section on the description of the client. The next section briefly describes the server’s capabilities and the mode of communication between the client and the server.

The Server

The server provides very minimal functionality in order to make the system secure. The distributed file server is a daemon process running on the server. The number of such participating in the distributed environment can be arbitrarily large. The only requirement is that the client configuration must be updated to reflect the presence of additional servers. The most important feature of this system is that additional servers can be added to the system without affecting existing users in any way. Hence, the clients would work just fine even if they were not updated to reflect the presence of new servers.

Initially, when the server is loaded up, it establishes a socket to listen to. This is done by requesting the port mapper for a port. The port mapper assigns a port number below 1024 to the server daemon. Note that super user privileges are required to run a daemon that uses a port number below 1024. Subsequent requests for communication by the client with the server are preceded by the client contacting the port mapper to find the port associated with the server daemon. This method is preferable, as the requests for connections can be traced, tracked and logged. Once the client establishes contact with the host, it authenticates itself with the server. The server performs authentication using the PAM (Portable Authentication Module) library. Once verified, a TCP connection is established over the network between the client and the server. The client sends requests for accessing, retrieving, storing and erasing files using RPC (Remote Procedure Calls).

Design Goals and Notes

Design Goals -:

Confidentiality
Fault Tolerance
Minimal overhead in terms of file size of the shares
Integrity
Ease of Use
Portability
Scalability
Generic
Anonymity

Confidentiality - The most significant factor affecting the design of this system was that of confidentiality. It was desired to have a system that would prevent even n colluding servers from retrieving the contents of the original file.

Fault Tolerance – It was desired to have a high level of fault tolerance. The level of fault tolerance has been kept dynamic so that every user can customize it to suit his/her requirements. Level of fault tolerance is set by the values of t and n. This is defined in the configuration file of the client.

Minimal Overhead – A minimal overhead is highly desirable in any system. A small file size for each share would translate to several benefits. It would result in reduced usage of server space, reduced network traffic and balancing of network traffic over the network. This is made possible as the file is distributed over several different servers.

Integrity – Integrity is very important in the context of malicious users who have compromised the server. Hence, a mechanism needs to be provided that can detect and correct a share that has been modified by a malicious user. This is achieved by calculating the hash (MD5) of each share and then appending it to the share. This whole share is then encrypted. Hence, the system detects any kind of a malicious attack on a share. If the share is found to be corrupt, the client just switches to the next server.

Ease of Use – The user interface has been made so that it is very user-friendly and all operations take place transparently. Hence, a user would hardly notice a difference in operation with a distributed (except, of course, while keying in the secret key).

Portability – The system does not use any machine dependant code. In addition, the whole code has been written in C++ with parts of it in C. Since C++, C compilers are available for almost all configurations; it should be no problem porting the code to systems other than those in which it was tested.

Scalability – The existing system is not affected by increasing the number of servers. It is entirely optional for a user to either use or not use the new servers as they get added to the set of distributed file servers.

Generic – The system can very easily be adapted to use a variety of servers. For example, it can even use an FTP server if it is not possible to install a distributed file server on the target server.

Anonymity – Although anonymity wasn’t a primary design goal, this system provides a high level of anonymity. This is because the bulk of the code resides on the client and the server has absolutely no knowledge of the data stored on the system. The user can be made totally anonymous by providing for anonymous login accounts on the server.

Critical Appraisal

Here, we attempt to bring out the flaws as well as the salient features of the systems as candidly as possible.

Features:

1) The system is invulnerable to all kinds of attacks on the server. Even if n servers are compromised, the only attack possible is deletion of the file or a denial of service. There is no way that the file can be recovered from the server by anyone else (provided that the secret key is not compromised and assuming that AES with the chosen level of encryption is computationally secure).

2) Moreover, the file is recoverable, provided <t servers have had malicious attacks.

3) The system gracefully handles file updates. This is because the hash is appended to each share.

Let us consider an example to illustrate it.

Assume there is a file update, and one of the servers is down. Thus, the client wont be able to update the share on that particular server. Now, when the client detects that one of the servers is down, it ignores it, but increments the update counter for each share. The next time when it has to retrieve the file, suppose the server that was down during file storage is operational. Now, while retrieving the file, the system notices a discrepancy in the server with the old share as it has a different update count. The client immediately erases the incorrect share and updates it with a new share.

The only case when an update discrepancy can occur is when the file is updated 232 times and the server is down in all 232 updates, which causes the counter to loop back to 0, and if the server that was offline comes online in the 232 update. The assumption made here is that the likelihood of such an event is extremely low and can be ignored. If the user is really paranoid, the code could be modified to use several bytes to store the update counter value. We used four bytes for the counter, as we consider it to be adequate for all practical purposes.

4) There is no trusted party and all except the user are not trusted.

Flaws

The interface is limited to Midnight Commander.
The security of the entire system is dependant on the secrecy of the key stored by the user.
The key has to be manually entered by the user for every file. This should be modified so that all keys are stored in a safe repository, which could very well be a file that is encrypted using a public key cryptographic system. Hence, the user would have to remember only his/her secret key and just store the rest in the repository.
If the user loses the key, it is impossible to recover any file encrypted using that key (this is impossible to avoid if confidentiality is of utmost importance). But, if confidentiality is not so important, the following scheme could be implemented:
The key could be split into n shares using threshold cryptography like Shamir’s secret sharing scheme. Though, a similar kind of attachment of a hash value of the key should be done to ensure integrity of the key. Once this is done, the user can obtain t shares from the servers and reconstruct the secret key. But, note that if t servers collude, then they can recover the key as well as the file.
The system is relatively slower than a distributed file system where no encryption is used.
An attacker listening to the conversation between the client and the server can obtain the user’s password and later destroy the user’s data, but confidentiality is still maintained as the attacker cannot read the encrypted file.
The code might be somewhat buggy as it is the initial release.  We haven’t incorporated too many checks for invalid states that could arise due to bad/invalid data input.

Further Work

We intend enhancing this system to make it a full-fledged file system. This would involve modifying the kernel in the case of Unix based systems and would require writing of device drivers for the Windows based systems.

Alternatively, an HTML interface could be given to the file system so that it is truly machine independent.

The system could be modified so that it uses a UDP connection instead of a TCP connection, as TCP is slower than UDP.

The communication channel could use SSL, so that it is resistant to attackers. Though, even if the communication channel is not encrypted, confidentiality is maintained.

Someday, maybe, we’ll implement a smart card based authentication system for the client.

We intend giving the project a name (haven’t yet found a suitable name for it!) 

Thank You

[1] M. Rabin. Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance.
Journal of the ACM, 36(2):335-348, 1989