Running head: PRACTICAL RESEARCH

University of Nizwa

College of Economics, Management and Information Systems (CEMIS)

Advanced Computer System Theory ( INFS501)

" Technologies toenhance the performance of Intel Core i7 "

Name: Amal Sultan AlMuqrishi

ID: 14676412

Major : Information System

Submitted to: Dr. Said Younes

Course: Advanced Computer System Theory - INFS501

  1. Introduction

The rapid evolution of technology opens a portal of competition among the international corporationsfor making the processors more intelligent as its needed. In fact, the history of CPUs (Central processing unit) is broadand Intel Corporation is acquired the lion's share for manufacturing and fabricating the processors. Take, Intel Core i7 processors, as an example that got five stars according to the Intel rating as the best processor on the planet currently. For the reasons that help to satisfy the needs, intelligent and support multi-core technologies.In addition, Intel Core i7 processors provide an incredible breakthrough, that affects the performance of computers clearly.

There are six generations [1] of the Intel Core i7which are Nehalem microarchitecture, Sandy microarchitecture, Ivy Bridge microarchitecture, Haswellmicroarchitecture, Broadwellmicroarchitecture, and Skylakemicroarchitecture.In fact, each family architecturehas different modules with various characteristics. The main features of these generations that comes under one umbrella are multiple cores on single die, pure 64-bit architecture, the clock speed ranging from 3.3GHz to 4GH. In addition, the size of fabrication process is smallthat is measured in nm (65/45/32 nm), thus it is saving power and space, and the number of transistors increase to billion. It also introduces the L3 cache.

Furthermore, there are various technologies and features related to the performance which enabled Intel Core i7 to be in the top of the list for many developers and organizations. This paper tries to highlight some of thetechnologies that used by the Intel i7 organization and how utilizing these techniques.For example, pipelining, branch prediction, data flow analysis,speculative execution and on board L1 and L2 cache. These methods used to enhance the instruction execution within the processor and the following sections give an overview of each technology.

  1. Pipelining

In computing and manufacturing processors, the concept ofpipelining [2] means that executes multiple tasks simultaneously or in parallelrather than sequential by using different resources.Pipelining technique allows the processor to do more work while a single instruction is executed, for instance, the CPU is decoding and fetching the next instruction. The purpose of pipelining is not to increase the latency of executing a single task; however, it helps to grow the throughput of entire tasks. In addition, the performance or the potential speedup of this concept is measured by the number of pipeline stages. There are two important factors that affects the speedup of pipeline which are the time to fill and drain pipeline.The main drawbacks that is available in the simple pipelining are losing the performance because of the in-order pipeline. In addition, the clock overheads, long latency for each instruction and hazards could lead to the inefficiency of deep pipelines and unification Into single pipeline.

In the previous generations of Intel, for instance, Pentium architecture has five pipeline stages [2] which are Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory access (MEM) and Register write back (WB). Then, Prescott family had increased the number of stages to the peak thirty onestages. However, this gives Prescott family modest gains because of the disappointing performance that had been achieved and the complex design behind the architecture. Therefore, the number of pipeline stages eventually is not improve the performance and decreasing the returns.In the first generation of Intel Core series,Nehalem architecture, Intel i7 provides 20 to 24 stages with maximum clock speed of 3000MHz which is larger than the Penryn's pipeline14 stages. Whilst, in the sixth generation, Skylakearchitecture, it had been developed fourteen stages and sixteen with fetch/retire stage. The maximum clock speed of Skylakearchitecture is approximately equals to 4000MHz [3]. These are some of the stages that available in the modern Intel Core i7 microprocessors [4], for example, Instruction fetch and decoding, μop cache , Loopback buffer, Micro-op fusion, Macro-op fusio, Stack engine, Register allocation and renaming, Execution units, Partial register access, Cache and memory access, Store forwarding stalls and Bottlenecks.

Furthermore, the pipeline in Core i7 provides a new technology of Intel which called Hyper-Threading [5]. It supports the parallelization of computations where a single core could handle two threads at the same time. Then during the execution, it allows the hardware to interleave the instructions among the threads in order to make a good hyper-threading among the cores and optimizing the use of the core’s resources.

  1. Branch Prediction

Branch predication [6] is a technique that used to reduce the cost which is associated to the conditional branches. Indeed, this method comes to solve the problem that exists with pipelining which known as the branches in the code. For example, the instruction flow is allowed in two directions based on a conditional jump. However, the CPU needs firstly to execute the branch instruction, because one pipeline is available and it does not know which branch is required to feed the pipeline.

The conditional jumpcauses many issues that reduces the performance of the processor. For instance, if there is a conditional jump, the processor has to predict the target address and whether the conditional jump is taken or not. The process is going smoothly in the pipeline when the right target is loaded. However, if it has been predicted and loaded the wrong target address, in this step the pipeline should be flushed what is wrong. Therefore, the time to handle this issue by fetching, decoding and executing is wasted.

The dynamic execution in Intel processors includes the deep branch prediction which allows the processor to decode instructions beyond branches in order to keep the instruction pipeline full as much as possible. Intel core processors have been implemented highly optimized branch prediction methods for predicting the ways of the instruction.

Intel Core i7supportdifferent mechanisms which are the two-level adaptive predictor in order to improve the predictionand the Branch Target Bufferto determine the target earlier. Besides that, there are various prediction methods for handling the conditional jumps, for example, misprediction penalty,pattern recognition for conditional jumps, pattern recognition for indirect jumps and calls,and prediction of function returns.

There are different behaviors could be done in the branch. In fact, all the history of those behaviors is stored to achieve the aim of predicting the future behaviors based on their old behaviors that have been stored. From the prediction of the history should select two significant factors. The conditional jump will be taken or not and determining the target address for that conditional or unconditional jump. There is a mechanism called Branch Target Buffer (BTB) that is responsible to store the target addresses of all jumps. Moreover, there is two times to store the target address in buffer. First, during the execution of unconditional jump and while the conditional jump is taken a place in the execution process. Second time, if the same jump has been executed, then the stored target address is used for fetching the predicted target into the pipeline. The branch target buffer is not big to include all the jumps in the program and may different jumps replace each other's in the buffer. For that reason the predicted target is very probably to be correct for the unconditional jumps, but uncertain. Therefore, we can conclude that the conditional jumps forms a high risk of misprediction.

  1. Data Flow Analysis

In the data flow technique [7], the processor provides an optimized schedule of instructions by analyzing the dependency of instructions where some instructions dependent on other's results.This optimized schedule of instructions is ready for execution regardless of the original program order. In Intel Core i7, the dynamic execution [8]also contains the data flow analysis in which it requires a real-time analysis of the flow of datafor determining the dependencies and detecting thechancesofout-of-order instruction execution.

  1. Speculative Execution

Speculative execution [9] is another mechanism that used to boost the processor performance. It means the early execution for the instructions to use later. This techniques works with the data flow analysis and pipelining. Obviously, the basic idea is to decode and execute the instructions without storing the results into the permanent register file. This process is pending until determining the right target address in the branch instruction. However, some instruction or tasks that have been done perhaps no need to them later; thus, it is ignored and discarded directly without retirement to do another work. Therefore, this technique prevent any delay that could be happened during the execution work in the processor. In addition, it enables the processor to be busy as much as possible and without any potential delay.

  1. On board L1 & L2 cache

Caching is a significant concept for increasing the CPUs performance. The main characteristic that exists in multi-core processors, Intel core i7 families or previous versions, is the introduce of L3 cache which followed a reduction in the cache size ofL1 and L2. In Intel Core series, the benefit of the L1 and L2 caches is to increase the clock speed without dissipating the heat and consuming more power. Thus, L1 and L2 are considered as an essential part of the modern chips. Moreover, L1 becomes more faster because it is the nearest one to the processor. In Intel Core i7 [10], inside each core there is an instruction cache and a data cache of the size 32 to 64 KB which is 8-way set associative, and a dedicated cache of the size 256 KB L2, that is also 8-way set associative. Outside of the cores, there is another cache called L3, which is much largerand smarter than L1 and L2of the size 8 MB and have16-way associative.

Finally, enhancing the performance in Intel core i7 is not limited only to these five technologies. However, there are many technologies that contribute to put the Intel i7 at the top nowadays. For example, Intel Turbo Boost technology[11] which means increase the performance of the CPU automatically when the programs need it. This step requires to increases the processor’s frequency dynamically. Intel Advanced Encryption Standard [12] is another technologythat used a symmetric key standard for the processes of encryption and decryption of blocks of data. It is a fast and secure technique for a variety of encryption, for example disk encryption,internet security and file storage encryption and others.

  1. Conclusion

In conclusion, the most important factor that makes the processors common and popular is the performance that enhance the speed of computers. This paper introduced some of the technologies that used in Intel Core i7, for example, pipelining, branch prediction, data flow analysis, speculative execution and on board L1 and L2 cache. It is clear that all the techniques work together to avoid the faults and enhance the features of improving the performance in the processor. The future of fabricating processors will be rich of technologiesthat overcome the speed problems and provide new architectures that will boost the performance of the CPUs as well as the efficiency ofcomputers.

  1. References
  1. Anonyms, List of Intel Core i7 microprocessors, October 2016, available at :
  2. Anonyms, Instruction pipelining, October 2016, available at :
  3. Anonyms, List of Intel CPU micro-architectures, November 2016, available at :
  4. F. Agner, The micro-architecture of Intel, AMD and VIA CPUs An optimization guide for assembly programmers and compiler makers, Technical University of Denmark, 1996 - 2016.
  5. L.David, Performance Analysis Guide for Intel®Core™ i7 Processor and Intel® Xeon™ 5500 processors, Version 1.0, 2008-2009 Intel Corporation
  6. Anonyms, Branch predication, August 2016, available at :
  7. W. Stallings,Computer Organization and Architecture designing for performance ninth edition, 2013.
  8. Anonyms, Intel® 64 and IA-32 ArchitecturesSoftware Developer’s Manual, 1997-2011 Intel Corporation
  9. Anonyms, Speculative execution, October 2016, available at :
  10. W. Scott, Intel's Core i7 processorsNehalem arrives with a splash, November 2008, available at :
  11. Srikanth and Rangacharulu, Verilog Implementation of Parallel AES Encryption Engines for Multi-Core Processor Arrays, International Journal & Magazine Of Engineering, Technology, Management And Research Volume No: 1(2014), Issue No: 9 (September).