Parallelizability of CBC-MAC using Round Stage Pipelining Technique

M. Razvi Doomun* and Z. Codabux-Rossan ,

Faculty of Engineering, University of Mauritius

,

Abstract

Cipher Block Chaining Message Authentication Code (CBC-MAC) has beenstandardised as an Advance Encryption Standard (AES) mode of operation. The wireless local area network (WLAN) security protocol, 802.11i, uses AES in Counter Mode with CBC-MAC (CCM) mode to protect data between the wireless clients and access point. However, the standard architecture of CBC-MAC mode is inefficient since the encryption/decryption function of the current block must be completed before the next block can start. In this paper, we analyze the performance of CBC-MAC and propose a round stage pipelining technique to parallelize the execution of CBC-MAC. The speed-up factor is calculated for CBC-MAC underthe proposed pipelining scheme. A reduction of authentication latency is thus achievable with structural optimizations of CBC-MAC by pipelining the AES rounds at intermediate state chaining.

Key words: Cipher Block Chaining, Message Authentication, Pipelining

1.0 Introduction

In data communication, transmitted frames should be confidential and protected from being modified by any unauthenticated or unauthorized devices. In typical wireless networks, data encryption and integrity is achieved by using a symmetric AES block cipher provided on beacon frames, command/control frames, and data frames. Much work in cryptography has been done to achieve privacy and authenticity goals separately: Encryption schemes for confidentiality, message authentication scheme for authenticity and many schemes are accompanied by provable security analyses. However, in practice we want to achieve the goals simultaneously and a natural and common used approach is to combine the above tools. In fact, many popular Internet protocols (SSL, TLS, SSH, IPSEC) rely on authenticated encryption schemes for privacy and authenticity. Many applications on the Internet, such as online banking, online auctions and remote login/ secure file transfer require both privacy and authenticity. A message authentication code (MAC) is computed to protect data from being modified without the key as well as to provide assurance that data come from the sender with the key. As such, MACs are vital cryptographic functionswhich provides data integrity and data origin authentication. The MAC based block cipher AES-CBC-MAC has been included in the international standards, such as IEEE 802.11i, for data integrity and authentication (IEEE Standard Specification, 2004; IEEE Standard Specification, 1999). Hence, CBC (Cipher Block Chaining) encryption has been standardized as an AES mode of operation where each plaintext block is XORed with the ciphertext of the preceding block. This chaining or dependency method creates a resultant ciphertext that eliminates the repetitive patterns that plague other encryption modes such as Electronic CodeBook (ECB) mode. Some hash functions available are parallelizable since each message block can be processed independently while the MAC based on the block ciphers such as AES-CBC-MAC are generally considered slow due to the complexity of the encryption process and CBC-MAC has to wait for a message block to finish processing before going to the next.

There has been a recent surge of research interest with many new schemes and modes of operation proposed (Whiting et al, 2002; Dworkin, 2005).Most security applications combine AES and block ciphers in general with different operation modes because the straightforward electronic code book (ECB) mode is vulnerable to statistical attacks. Because the CBC-MAC and CCM modes involve feedback, we cannot pipeline the AES core. Using AES in feedback mode, the CBC-MAC module generates the MAC value, which is used for authentication. Frames may be encrypted andauthenticated with an AES-CCM mode.This paper investigates AES-CBC-MAC performance and presents a parallelizable CBC mode of operation which increases its processing speed.

The rest of the paper is organized as follows: The relevant literature is discussed in section 2. AES and CBC-MAC operations are reviewed in section 3 and 4, respectively. Parallelized CBC-MAC strategy is explainedin section 5 and its performance is analysed in section 6. Finally, we conclude the whole work in section 7.

2.0 Related Work

The drive for higher data rate and higher throughputsupport for emerging applications in next generation wireless LANs has pushed for, new MAC mechanisms, such as Block Acknowledgment in IEEE802.11e and frame aggregation in IEEE 802.11n being developed. The intervals among MAC protocol data units (MPDUs) are short in the new MAC mechanisms, and thereby requiring faster response time in the cipher core at the link layer.The cipher response time is the interval time period between the start of a block of data processing and output of the processed data block.

Pipelining, sub-pipeliningand loop unrollingare the some of the hardwareimplementations of the AES algorithms that have been used to gain speed and throughtput by using additional hardware for implementing each round (Bae et al, 2006). In previous works, the authors Bae et al.(2006) designed a CCMP architecture that compute message authentication with CBC-MAC mode and data encryption in CTR mode in alternate turns. There are basically two ways to implement the general CCMP structure: sequential and parallel.With sequential CCMP structure, the Message Integrity Code (MIC) Tag is first calculated for the whole payload using CBC-MAC and then the payload and the MIC tag are encrypted in the same AES module. With such an arrangement, cipher response time is linearly proportional to the overall payload size.In parallel CCMP structure, a separate AES module is used for MIC computation in CBC-MAC mode and another AES module is used for Counter (CTR) mode data encryption.Furthermore, the performance and efficiency of CCMP depends on the individual design of the CBC-MAC and CTR-mode components, which in turn depend on the implementation of the AES core cipher unit.A cryptographic procedure can be used to produce different authentication keys, one for each message, and an efficient CBC-MAC is used with this key to authenticate one message.

3.0 Advanced Encryption Standard

In this section, we review the essential operations of AES encryption standard. AES- Rijndael is a cipher with a simple and elegant structure. It shows an adequate security margin against cryptanalytic attacks, such as linear and differential cryptanalysis, as remarked in (Daemen & Rijnmen, 2002).The AES algorithm, illustrated in Figure 1, employs fourtransformations: ShiftRow, SubBytes, AddRoundKey and MixColums. The AES ShiftRow transformation rearrangesthe order of bytes within one row the statearray. AES SubBytes transformation performsa byte-wise substitution for each byte in thestate array using a fixed 256-byte table. The AES AddRoundKey transformation performsa byte-wise Galois field addition usingbytes from the key schedule. Finally, AES MixColumns transformation is apermutation of the state array performed oneach of the four columns in the state array.

Figure 1. AES Transformation and Round Functions

In the Round key addition transformation, a round key generated from the key scheduler is applied to the state by a simple bitwise XOR. This transformation is self inverting. Hence, the key addition is same for both the encryption and decryption processes. In AES algorithm Shift Row and Mix Column operations are responsible for diffusion (Daemen & Rijnmen, 2002). Diffusion seeks to make the statistical relationship between the plaintext and the ciphertext as complex as possible in order to thwart attempts to deduce the key.

3.1 AES Number of Rounds

The strength of the AES cipher depends not only on the key size, but also on the number of operational rounds which perform a routine of data manipulation and diffusion. For example, AES needs at least 6 rounds for a 128-bit key, 7 rounds for a 192-bit key, and 9 rounds for a 256-bit key to provide enough security strength against known attacks. Current state-of-the-art in cryptanalysis demonstrates that resistance of iterative block ciphers against cryptanalytic attacks increases with the number of rounds.The number of rounds in AES-Rijndael has been determined by considering the maximum number of rounds for which short cut attacks have been found that are significantly more efficient than an exhaustive key search. Subsequently, a security margin was added on top of that.

A good principle is that a crypto algorithm should not perform too few or too many operational rounds, as it would make it either vulnerable to cryptanalysis attacks or expensive in terms of energy consumption. However, AES adds extra rounds for providing a security margin, such as 10 rounds for 128-bit key, 12 rounds for 192-bit key and 14 rounds for 256-bit key. For Rijndael with a block length and key length of 128 bits, no shortcut attacks have been reported in the literature for reduced versions with more than six rounds and four more rounds were added as a security margin.

4.0 Cipher Block Chaining Message Authentication Code (CBC-MAC)

CBC can be used to provide encryption of large messages as well as authenticity through CBC-MAC. Authentication and encryption are both provided in CCM using the techniques of CTR mode for encryption and the CBC-MAC for authentication. The CCM is made up of two methods: (a) generation of the MIC first and then the encryption (i.e. generation-encryption), and (b) decryption of ciphertext first and then the verification of the MIC (i.e. decryption-verification) (Whiting et al, 2002).

Standard CBC-MAC operates as follows: Take the first block and encrypt it using AES, XOR result with second block and encrypt it, and XOR result with next block and so on, as depicted in Figure 2. Standard CBC-MAC works sequentially and cannot be parallelized, and it can be only used if the message is an exact number of blocks and hence requires padding.

Figure2.Standard Cipher Block Chaining Mode

Let a message space for M be binary strings whose lengths are a positive multiple of l. Hence the messageM can be broken into blocks such that:

Next, each blockMi is passed through the encryption Ewith key Kand the result is XORed with the next block. If EKrepresents the AES encryption using a key K, then cipher block chaining is given by the following expression:

The CBC-MAC exists in different versions varying in details such as padding, length variability and key search strengthening (Black and Rogaway, 2000; Bellare et al, 1994). The general approach of padding for CBC-MAC is by considering the final input block as a partial block of data, left justified with zeroes appended to form a full block size. In our analysis for CBC-MAC, we assume a block size of 128 bits (as is the case for AES).If the general way of padding described above has been adopted for CBC-MAC-AES, the number of block of the message of packet size b bytes, Nblock_AES, will be(b× 8)/128. Hence, average CBC-MAC calculation time is{Nblock_AES ×[time for a block encryption]}.

Block pipeline instructions allow AES to run in ECB, CBC-MAC, counter, and CCM modes in 11 clock cycles per 128-bit block without loss in throughput compared to an AES without a mode of operation (Walker, 2005).

5.0 Parallelized CBC-MAC

The AES standard, as a symmetric private key block cipher, could be used in many forms, called modes, by high-level security protocols. The mode of operation is an algorithm that features the use of symmetric block cipher algorithm to provide an information service, such as confidentiality or authentication. For example, some kinds of pipelining are not allowed to be used with some modes. The main properties of mode of operation can be listed as: Performance, Parallelizability, Error Expansion, Crypto Synchronization. The optimization methods and the resulting implementation tradeoffs for the implementation of AES Rijndael can be divided into two classes: architectural and algorithmic optimization. Algorithmic optimization exploits algorithmic strength inside each round unit. Architectural optimization exploits design techniques such as pipelining, loop unrolling and sub-pipelining.

The individual rounds of the encryptionalgorithm also display similarity in functionality across the128 input data block. Hence, it is possibleto run AES-CBC-MAC functions in parallel, illustrated in Figure 3, according to the following round stage encryption expression:

C i, Rn1 = EK1 ( Mi C i-1, Rn1)

C i, Rn2 = EK2 (C i, Rn1 C i-1, Rn2)

C i, Rn3 = EK3 (C i, Rn2 C i-1, Rn3)

C i, Rn4 = EK4 (C i, Rn3 C i-1, Rn4)

C i, Rn10 = EK10 (C i, Rn9 C i-1, Rn10)

where, Mi is the ith plaintext block, Ci is the ith ciphertext block, and Rn is the encryption round number.

CBC-MAC can be made more efficient and faster, and the fact that it relies on a block cipher as well minimizes the number of cryptographic primitives we must implement in memory-constrained devices.

Figure 3. Round Stage Pipelining Structures for CBC-MAC

6.0 Performance in Software

The C code which has been used for assessing the speed performance of Rijndael by NIST [xx] was used in CBC mode for CBC-MAC-AES. The plaintext of 128-bit was encrypted 1,000,000 times in CBC mode using a 128-bit key. The average time for encrypting a 128-bit block was 5.3μs.

The cycle time of a AES-CBC-MAC round stage pipeline is the time needed to encrypt 128-bits block of data one round stage through the pipeline. The cycle time can be determined as , where

maximum round stage delay (delay through round stage which experience the largest delay)

number of encryption round stages in the AES-CBC-MAC pipeline.

time delay of a latch, needed to advance encrypted 128-bits data block from one stage to the next.

In general, the time delayis equivalent to a clock pulse. Now suppose that AES-CBC-MAC functions are processed and these functions are executed one after another. The total time required to execute all AES-CBC-MAC is .

A total of cycles are required to complete the execution of the first AES-CBC-MAC, and the remaining AES-CBC-MAC require cycles.The time required to execute AES-CBC-MAC without pipeline isbecause to execute one AES-CBC-MACfunction it will take cycle.

The speed up factor for the AES-CBC-MAC pipeline compared to execution without the pipeline is defined as:

In general, the number of AES-CBC-MAC functions executed is much more higher than the number of round stages in the pipeline. So, the tends to , theni.e. We have a fold speed up, the speed up factor is a function of the number of round stages in the AES-CBC-MAC function pipeline.

Though, it has been seen that the speed up is proportional to number of round stages in the pipeline, but in practice the speed up is less due to some practical reason.

At each stage of the pipeline, there is some overhead involved in moving data from buffer to buffer and in performing various AES-CBC-MAC round functions. This overhead can appreciably lengthen the total execution time of a single round encryption. However, this overhead issue is still under investigation in our future works.

Throughput is calculated as, throughput := (128 Bits * Clock Frequency)=(Cycles Per Encrypted Block)

7.0 Conclusion

In this paper, we analysed pipelined implementation of round execution in parallelized AES-CBC-MAC and determined the speed up which reduce authentication processing time, since it accelerates AES-CCMP for encryption and MIC generation.Most AES implementations will normally use hardware, which allows pipelining of Counter mode and. CBC-MAC. In future research work, we intend to investigate further the proposed implementation to improve low-power encryption-authentication in sensor ad hoc networks.

References

Bae D., Kim G., Kim J., Park S. & Song O. (2006) “An efficient Design of CCMP for Robust Security Network”, ICISC 2005, LNCS 3935, pp. 352-361, 2006.

Bellare M., Killian J., & Rogaway P. (1994). The Security of the Cipher Block Chaining Message Authentication Code. In Advances in Cryptology, CRYPTO ’94.Vol. 839 of Lecture Notes in Computer Science, pp. 341 – 358, Springer 1994.

Black J. & Rogaway P. (2000) CBC MACs for Arbitrary-Length Messages: The Three-Key Constructions,” in proceedings of advances in Cryptology-CRYPTO’00, Lecture Notes in Computer Science, Springer-Verlag vol. 1880, pp. 197-215, 2000.

Daemen J. & Rijnmen V. (2002). The Design of Rijndael: AES-The Advanced Encryption Standard. Information Security and Cryptography. Springer Verlag. 2002.

Dworkin M. (2005). Recommendation for Block Cipher Modes of Operations: Methods and Techniques. In Special Publication 800-38B, US National Institute Standards and Technology, 2005.

IEEE Standard Specification, (2004). Institute of Electrical and Electronics Engineers (IEEE), Inc., IEEE Std. 802.11i-2004, Amendment to Standard for Telecommunications and Information Exchange between Systems – LAN/MAN Specific Requirements – Part 11: “Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Medium Access Control (MAC) Security Enhancements”, July 2004.

IEEE Standard Specification. (1999). ISO/IEC 8802-11 ANSI/IEEE Std. 802.11 -1999. Information Technology – Telecommunications and information systems – Local and metropolitan area networks – Specific requirements – Part II: Wireless LAN Media Access Control (MAC) and Physical layer (PHY) specifications.

Walker J. (2005). 802.11 Security Series. Part III: AES-based Encapsulations of 802.11 Data. Network Security Architect, Platform Networking Group Intel Corporation.

Whiting D., Housley R., & Ferguson N. (2002). AES Encryption and Authentication Using CTR mode and CBC-MAC. IEEE P802.11 doc 02/001r2 May 2002.