VM image Integrity Verification

Background:

More and more virtual machines are being used, especially under emerging of virtual cloud computing. In order to manage all the images, the Mirage image format (MIF) is created and deployed on our VCL. MIF uses image manifest to map each file, which means the images can store in the mirage repository in unit of file, not image disk. And the operation can be done directly on each file, not the disk file. However, it creates the new problem to verify the integrity of new image which is checkout from the mirage storage. For instance, the order of each binary file on the disk may be different. And the content of empty space of the disk file can be different. These variables will not affect the integrity of the disk, but will generate different hash result if simply hash the whole virtual disk file. Thus I design a new method which can verify the integrity of files in the image and the integrity of several important metadata of the image file.

Concept:

In order to remain the integrity of image file, there are several characteristics in the image file need to maintain the same in the image file. First, all the content in the file system should be the same, including the file name, file content, path to the file, directory and access privilege for directory and file. Second, image’s metadata can’t be changed, including UUID, version, CID, and etc. (VMDK format for instance)

Method:

There are several types of virtual machine disk formats existing now, for instance, vdi, vmdk, img and etc. For vmdk file alone, it has the 5 types, which are used for the different desktops’ and servers’ architecture. In order to simply the implementation, I will assume that virtual machine image format is vmdk type 0, and OS is linux on the virtual machine, and file system is ext3.

Figure 1

In order to verify the integrity of file system, I will use tool to mount the image to the trusted machine, and hash each file in alphabetic sort-order (Figure 1), and the hash content includes file name, content, path, access privilege (Figure 2) and etc. I also need to hash the directory, including directory name, path and access privilege, even some directories are empty.

Figure 2

Then I have to hash some important components in the vmdk file, such as metadata (Figure 3), superblock object and the dentry object. Because I use vmdk type 0 file, the metadata of image is in the head of image content. The superblock object and the dentry object are also important for the file system, but they may be hard to locate their position. After hashing these contents of components, I can get the other hash result.

After getting the final two hashed results, I can hash the two hash results and get the one final result. Then I can encrypt the result by the secret key, and store the result in somewhere. Then I can check in the image into mirage store. When the user checkout the image, the user can use the same method hash the image, then decrypt the original hashed result by my public key and compare with his or her result. If the results are the same, the integrity can be guarantee.