Cloud Storage Oriented Cipher-text Search Protocol

Catalogue

1. Introduction

1.1 Background

1.2 Purpose

1.3 Application

1.4 Terminology

1.5 Symbol description

1.6 Normative reference

2. Overview

2.1 Protocol overview

2.2 Design Philosophy

2.3 Requirements for Design

3. Data Types

3.1 Definition

3.1.1 File

3.1.2 Array

3.1.3 Index

3.1.4 Token

3.1.5 Proof

3.1.6 Hash table

3.1.7 Merkle Hash Tree

3.2 Implementation

4. Message Types

5. File Storage

5.1 Overview

5.2 Generate index

5.3 Generate keys

5.4 Encryption

5.5 Generate

5.6 Upload file

5.7 Store files

6. File Search

6.1 Overview

6.2 Generate search token

6.3 Send search token

6.4 Search

6.5 Return result

6.6 Decryption

7. Challenge and Proof

7.1 Overview

7.2 Generate challenge

7.3 Send challenge

7.4 Generate proof

7.5 Send proof

7.6 Validate proof

8. File Update

8.1 Overview

8.2 Generate Keys

8.3 Generate update token

8.4 Send token/file

8.5 Update files

8.6 Update index

8.7 Update Search Authentication Token

8.8 Return new DSA

8.9 Update DSA

9 Error Handling

1. Introduction

1.1 Background

In recent years, with the rapid development of cloud computing, cloud storage which is one of the most important parts of cloud computing is becoming a researching hot. Technically speaking, cloud storage refers a system which consists of large numbers of different types of network storage devices working together. These devices use the technology of cluster application, grid and distributed file system to provide storage and business access.

At present, the rapid development of computer technology and Internet application causedata grow exponentially.People have more and more demand for storage. Under this trend, the proposal and development of cloud storage not only bringscheap storage service for people but also challengestraditional data storage service. Cloud storage, as a new storage method, has a big advantageover traditional storage. First of all, in the cloud storage, storage exists as a service.When a user has requirement for storage, heapplies the appropriate size of spacefrom cloud storage service providerto avoid constructing and managing storage platform himself. It ensures the full utilization of storage resources, and reduces the overhead of storage cost of user. Secondly, cloud storage can provide data backup, disaster recovery, load balance and other functions. So when some storage nodes are upgraded or damaged, it can also provide storage service normally to user to avoid the interruption of service. In addition, the authorized users can access the cloud storage service at any place through network. This storage flexibility will play a great role in promoting the development of mobile Internet. The scalability, low cost, no access restrictions and easy management of cloud storage will bring a great challenge to the traditional storage method.

At the same time, cloud storage also has problems. In cloud storage, all the data are delivered to cloud storage provider, and users lose the absolute control of the data, which will inevitably cause users to concern about the data security. Cloud storage has provided security protection measures, such as the use of general SSL in data transmission (Secure Sockets Layer) and TLS (Transport Layer Security) protocol, data encryption and firewall settings, but the data security depends entirely on the cloud storage system security, the quality of data administrator and other controlled factors because of the centralized management of the CSP. In addition, data stored in the cloud has become a primary target for malicious users and hackers. And if the data isolation becomes invalid, the private data of users may be leaked. Although cloud storage providerprovides SLA (Service Level Agreement)protocol to user to illustrate the services grades it provides, various uncontrolled factorsstill cause the concerns ofusers. Data security is a key problem in cloud storage. The survey of the Twinstrata in 2012 shows that only 20% of users are willing to store their private data in the cloud, and about 50% of people are willing to use cloud storage for data backup, archival storage, disaster recovery and so on. Thus, the problem of data security obstacles the cloud storage extension.

The cloud storage vendors, such as Windows, VMware, Amazon, Google allhave launched their own cloud storage services and give a certain assurance of data security, such as a variety of encryption, authentication means, to protect the privacy of users. But there are still various security incidents. In 2005, the encryptedtapes of the America bank were lost, resulting in the disclosure of a large number of customers’ information. In April of 2011, the information of nearly 77000000 online customers of Sony was stolen, including credit card data. In June of the same year, Google was invaded and some important personnel mailbox accountswere theft. These security events have caused users to lose trust in these cloud service providers.

1.2 Purpose

This protocolconstructs adynamic cipher-text search model and a search verification model based on cloud storage from the perspective of protection of data privacy. This model enables users to store their private data at the untrusted party. Even if the data are stolen, it will not disclose any information about the plaintext of the data. The model also supports the search operation based on the keyword and dynamic adding and deleting files.

1.3 Application

The system based on this protocol can be used in some confidential departments and sensitive commercial sectors. These departments and sectors can store a large number of secret information in the cipher-text form in the cloud server and retrieve at any time as necessary.

1.4 Terminology

Keyword. Akeyword isused to indicate the word of file content, which is the generalization and centralization of information. In this protocol it refers to some words selected as the identificationsof the files.

Linked list.The linked list is a storage structure, the physical storage unit of which is non-continuous and non-sequential.The logical sequence of data elements is implemented by the linked order of pointer. The list consists of a series of nodes (each element in the list is called node), which can be dynamically generated at runtime. Each node consists of two parts: one is the data domain which stores the data elements and the other is the pointer domainwhich stores the address of the next node. Compared to the structure of the sequence list, the linked list is more convenient for inserting and deleting operations.

Array. An array is a form of organizing some variables with the same type orderly in order to handle easily.

Pseudo random function.A pseudorandom function is an algorithm of producing random numbers for people.

Token: A token is a kind of special frames which can control the site occupyingmedia, to distinguish from the data frame and the other control frames. In this protocol the token is a kind of data format, used to represent the transmission type of message and the transmission data.

Inverted index.An inverted index is derived from practical applications which need to find records according to the value of the attribute. Each part of this index table includes an attribute value and the address of each record with the attribute values. It determines the record position by attribute value and it doesn’t determine the attribute value by record.

Encryption. Encryption is to make the original plaintext files or data become unreadable code, often referred as "cipher-text", according to a certain algorithm. It can only demonstrate the original content after the corresponding key is input. By this way, the purpose to protect data from being stolen or read by illegally people is achieved.

MD5. Message Digest Algorithm MD5 (message digest algorithm version fifth) is a hash function which is widely used in the computer security field to provide the integrity of the message. The algorithm transforms an arbitrary length byte string into a large length-fixed integerto ensure data integrity.

Cloud storage. Cloud storage develops based on cloud computing. It focuses on providing users with online storage service based on the Internet. Cloud storageorganizes a large number of different types of storage devices to cooperate together by software to provide external data storage services.

Digital signature. Digital signature is some data which is added on the data unit, or the cipher transformation made to the data unit. This data ortransformation allows the recipientof data unit to confirm the data sources and data integrity and to protectdata frombeing forged.

Key. The key is a kind of parameter. It is the data as the input of algorithm converting the plaintext into cipher-text or cipher-text into plaintext.

1.5 Symbol description

Symbol description

symbol / description
/ The file set including n files
#F / The number of files
/ The files including m keywords
W / The set of keywords
#W / The number of keywords
/ The file set including the keyword w
/ The number of files including the keyword w
/ The linked list made up of files including keyword w
/ The linked list made up of all keywords in file f
/ Inverted index
free / An special keyword ,satisfied
/ Search array, used to store keyword linked list
/ Dictionary, used to record the head node of the linked list
/ Deleting array, used to store file linked list
/ Dictionary, used to record the head node of the linked list
γ / Encrypted index, defined as

Function description:

The algorithm used in the search part:

Description of Algorithm Function
: Running on the client, used to generate the key for symmetric encryption algorithm and pseudo-random function.
: Use the key to encrypt the user’s file and keyword information into cipher-text and the encrypted index.
. Use the user’s keyword to generate the corresponding search token.
: Use the search token to perform search operation on the encrypted index.
: Generate the add token according to the files to be added and the corresponding keywords.
: Use the received add token to add files and update the stored encrypted index.
:Generate the deleting token according to the files to be deleted.
:Use the received delete token and the files to be deleted to update the stored encrypted index.

The algorithm used in the authentication part:

Description of Algorithm Function
. To generate the key used in the algorithm and the client is responsible for keeping the key.
: To generate the search authenticator when storing files.
: Run by the Client, and it is used to generate the challenge of searching some keywords.
: Run by the Server, and it is used to generate the authentication path according to a certain search.
: The verification algorithm. Run by the Client, and it is used to verify the proof sent from the Server
: Generate add token according to the files to be added and the corresponding keywords.
: Use the received add token to add files and update the stored encrypted index.
: Generate the deleting token according to the files to be deleted
: Use the received delete token and the files to be deleted to update the stored encrypted index.
: It is used to update the DSA state.

1.6 Normative reference

Kamara S, Lauter K. Cryptographic cloud storage[J]. Financial Cryptography and Data Security, 2010: 136-149.

2. Overview

2.1 Protocol overview

The system uses C/S architecture and it is composed of two entities, client and server. The main function of client is key generation, data encryption/decryption, authenticator generation, token generation and so on. The main function of server is searching, proof generation, update operation and so on.The overall frame is shown in figure 2.1.

figure 2.1 the frame structure of the cloud storage system

Figure 2.1: ①file encryption/decryption at the client; ②generate the search token using keywordsat the client; ③generate add/deleting tokens using the files to be added/deleted at the client; ④store the files which user uploadsat the server; ⑤search on the encrypted index according to the received search token at the server; ⑥update files according to the received add/delete token at the server; ⑦interaction of data between the client and server.

As can be seen from figure 2.1, the main function of the client is obtaining the original data from the users (including the files to be uploaded, the searching keywords, and the files to be updated), processing data and uploading data to the cloud. The main function of the server is receiving the data sent by client and doing the corresponding operations, mainly including storing, searching, and updating.

2.2 Design Philosophy

The protocol analysis the security of the existing cloud storage system, and puts forward a secure framework of cloud storage system.

Fig. 2.2 complete Security Model of Cloud Storage

The construction of the model of secure cloud storage system is based on the Searchable Symmetric Encryption (SSE) algorithm, combining with the secure cloud storage system architecture. Through the SSE algorithm, the user can encrypt the data and index, and send the cipher-text and the secure index to the cloud service provider for storing. When executing the search operation, the cloud service provider searches on the secure index using the search token generated by the user, and returns the cipher-text set to the user. Then the user can decrypt the received result and get the plaintext file corresponding to the search keyword. In addition, users can add files and delete files at any time, and it still can be able to guarantee the correctness of the index. In order to verify the search results returned by the server, a dynamic search authentication (DSA) algorithm is designed. The algorithm is based on the improved Merkle authentication dictionary andcan validate the correctness of the search result. The algorithm also support update operation based on the token and the algorithm can achieve higher efficiency at communication and computations.

Searchable encryption

The model involves only two entities. One is the owner of the confidentialitydata, who hopesto store the data in the cloud andprevent from illegal access to the data. This kind of entity is called the user (Client). The other kind of entity is the cloud storage service provider, who provides storage interface outward, stores the data and performs specific search operation on the data. It is called the server (Server).

According to the mentions above, in order to guarantee the security of the data in the maximum extent, all of the operations processing user data are basically placed at the client, including user’s files encryption, file index encryption and process of keywords. And the server only needs to store the files and do the limited retrieval function.

Fig. 2.3 Searchable Encryption Model of Cloud Environment

As can be seen from the chart, the user uses the computer to select the file setsneeded to be stored, preprocesses the files and then uploadsthem to the cloud. The preprocessing of the files is divided into two parts which execute simultaneously.One part is using symmetric encryption algorithm toencrypt the files set to get the cipher-text set, and then uploading them to the cloud storage server. The other part is constructing the index using the keywords of the file, encryptingthe index using the special encryption method and storing the result which is called the encrypted index in the cloud.The storage of the file and index is managed by the cloud storage service providers. The users only need to upload the files, without caring about the details of file storage. When a user searches some keyword, the client generates the search token corresponding the keyword using the method provided by the algorithm and sends the token to the cloud storage server. Then the serverperforms the search operation and returns the result.

The key of constructing the searchable encryption algorithm lies in the encryption of file index. In order to obtain a better search experience, this protocol uses the form of keyword specified by the user in advance. After obtaining the keyword information,preprocess these keywords. The keyword linked list is constructed by the files containing the same keyword and the file identifier is written in the linked list corresponding to the nodes. All of the keyword linked list form the inverted index. In order to ensure that the server cannot obtain effective information from the index, the pseudo random function (PRFs) is used to encrypt the inverted index. The encrypted index is stored in the random position of the search array, and eachhead node of the list is stored in the dictionary Ts(also called search table). The processed arrays and dictionaries are stored in the server. Because the inner elements are all encrypted data, the server cannot get the plaintext information directly from the search arrays and the search table. When the user search a keyword, processthe keyword to get the search token which contains the information designating the position of the keyword in the encrypted index. After the server receives the search token, it reads the encrypted index of the user, performs the search operation, gets the file identification, and sends the responding cipher-text to the client.

The user of cloud storage users may add or delete the files at any time, so the protocol must be able to support dynamic addition and deletion operation. The previous discussion shows that the key of the search lies in the construction of the encrypted index. In order to ensure that it still can be efficient and correct to perform the search operation after the user adds and deletes the files, the encrypted index must be updated in the process of adding and deleting files. Whenthe user adds files, the keyword that the file contains maybe existed or new. No matter what kind of situation, it only needs perform the corresponding updating operation on the keyword linked list, and the operation is not difficult. When the user deletes a file, the file contains different keywords which may be at any node of the keyword linked list. So every node of the linked list containing the keyword must be traversed. After deleting the node, the continuity of the linked list also needs to be ensured. So the deleting operation is complex and low efficiency.