Lecture Notes s1

cs3843 / syllabus / outline / lecture notes / programming assignments / recitations / homework / set up
Brief History of Computers
Year / Processor / Memory Size / Max Address Space / Caches / Speed
1972 / 8008 / 3500 / 16 KB / 0.05 MIPS / Instr set designed by Datapoint in San Antonio / TX Instruments Computers
1978 / 8086 / 29 K / 1 MB / 0.33 - 0.75 MIPS / IBM PC / MS DOS
1982 / 80286 / 134 K / 16 MB / 0.9 - 2.66 MIPS / IBM PC-AT Compaq suitcase computers / MS Windows, SCO Xenix
1985 / i386 / 275 K / 4 GB / 2.5 - 9.9 MIPS / With the speed of the i386 and the convenience of MS Windows 3.0, Microsoft grew significantly / IBM OS/2
MS Windows 2.0
MS Windows 3.0 (1990)
1989 / i486 / 1.2 M / 4 GB,
virtual 1 TB / 8 KB cache on chip / 25-50 MHz ;
20-41 MIPS / Floating Pt coprocessor on chip
1993 / Pentium / 3.1 M / 4 GB,
virtual 64 TB / 8 KB instruction cache; 8 KB data cache / 60-100 MHz;
100-150 MIPS
1995 / Pentium Pro / 5.5 M / 4 GB;
virtual 64 TB / 16 KB L1 cache
256 KB L2 cache / 166-200 MHz
1997 / Pentium II / 7.5 M / 5 GB; virtual 64 TB / 32 KB L1 cache
512 KB ext L2 cache / 233-450 MHz
1999 / Pentium III / 9.5 M / " / 256KB-2MB L2 cache / 450-600 MHz
2000 / Pentium 4 / 42 M / " / 1.6 - 1.8 GHz
2006 / Core 2 / 291M / 64 KB L1 cache per core;
4MB L2 cache / 1.86 - 3.0 GHz / 2 cores
2008 / Core i7 / 781 M / 64 KB L1 cache per core; 256 KB L2 cache per core; 8MB L3 cache / 2.8-3.5 GHz / 4 cores; hyper threading
2010 / Itanium Tukwila / 2B / 24 MB L3 cache / 1.6-1.73 GHz
up to 6 instr per clock cycle / 2-4 cores; instr-level parallelism / HP Servers
2012 / Xeon Phi / 5B / 240-320 GHz
1-1.2 teraflops double precision / 62 cores / HP & Cray Servers
Moore's Law (revised, 1975): The complexity for minimum component costs has increased at a rate of roughly a factor of two every two years.
Note: from 1978 to 2012, that would be 3.6B based on Moore's Law.
We will focus on the Intel Architecture 32 bit machines (IA32). There are two different syntaxes for IA32 Assembly Language:
Intel (used by Microsoft)
AT&T (used by GNU; therefore we use this)
The underlying machine code is the fundamentally the same. / Suppose we need to add two integer variables (which are externals) and store the result in another integer variable (which is an external).
Intel Asm Syntax / AT&T Asm Syntax / Meaning
mov eax, valx / movl valx, %eax / Load register eax with valx
add eax, valy / addl valy, %eax / Add valy to register eax
mov valz, eax / movl %eax, valz / Store the result in valz
AT&T Assembly language syntax uses Source-Destination operands; whereas, Intel uses Destination-Source. AT&T also places % in front of register names.
Machine Instructions
The actual machine code (which executes) is binary and is interpreted by the CPU. Assembly Language is a lot easier to read than machine code. IA32 machine instructions vary in size from 1 to 15 bytes. Some other machine architectures use fixed length instructions.
We will discuss the actual IA32 machine instruction format later in the semester. / Most instruction formats include:
Op Code - tells the CPU what needs to be done
Operand Type (if needed) - may include data type, whether the operand is a register, immediate (constant) or memory reference
Operand 1 (if needed) reg value, immediate operand, memory reference info
Operand 2 (if needed) reg value, immediate operand, memory reference info
Note: both operands usually will not be memory references
IA32 Hardware Architecture Overview
· Program Counter is called %eip (extended instruction pointer).
· 8 Integer Registers, each storing 32 bit values.
o These are named: %eax, %ebx, %ecx, %edx,
%esi, %edi, %esp, %ebp
o The lower 2 bytes in the first 4 integer registers can be referenced as: %ax, %bx, %cx, %dx
o We can further divide those 2 lower byte names into two single bytes:: %al, %ah, %bl, %bh, %cl, %ch, %dl, %dh
o Registers %esp and %ebp are for runtime stack manipulation
· Based on the history of Intel chips, backward compatibility forced the inclusion of these 2-byte and 1-byte registers
IA32 Hardware Architecture Overview Continued
· Condition Code Registers are single bit flags which are set based on the outcome of the most recent arithmetic or logical instructions.
o OF - overflow flag; set when a signed arithmetic operation is either too large or too small to fit in the destination
o CF - carry flag; set when an unsigned arithmetic operation is too large to fit in the destination
o ZF - zero flag; set when the result is zero; it is ON if a comparison shows values are equal
o SF - sign flag; set when the result is a negative value
o PF - parity flag; its parity is even (PE) when an even number of 1 bits in the 8 low order bits. / short ix = 21234;
short iy = 20841;
short iresult;
iresult = ix + iy;
printf("Result is %d\n");
output:
-23461
Since the sum is greater than 31767, it overflows. Some languages would generate an error, but C assumes overflows are expected and does not generate a runtime error.
Note:
21234 + 20841 = 42075
216 = 65536
42075 - 65536 = -23461
In addition to arithmetic operations, comparisons set those condition codes. (see the notes on Flow Instructions)
IA32 Hardware Architecture Overview Continued
· Floating Point Registers are used for floating point arithmetic. There are 8 floating point registers, each having 80 bits. These are stack based
o Top of the stack: register ST0
o Next: register ST1
o Bottom: register ST7 / We will discuss floating point in detail after the midterm exam.
IA32 Assembly Language AT&T Syntax
Comments begin with #.
Labels begin in column 1 and end with a colon. They are used to reference an instruction address for JMP and CALL instructions. They are also used to reference external variables (basis), static variables, and string constants.
Dot directives begin with a dot and tell the assembler things like name of your source code, variables which are external global basis (.globl), data types, lengths, and other assembler information.
Instruction Operators should not begin in column 1 for readability.
Instruction Operands might reference constants, symbol labels, registers or memory references, but are all dependent on the instruction operations. / See sample code below
Operands
Operands have several different forms to help reference registers, constants, symbols, and memory.
%reg Register references begin with a %. They reference the 4 byte registers (begin with "e" for extended), 2 byte registers or 1 byte registers.
$constant Numeric constants can be base-10 or hexadecimal (begin with 0x).
symbol A symbol can be an external variable, an external function, a static variable or a .label.
$symbol The address of the specified symbol is typically the address of a variable
memRef Memory references can take on many forms:
symbol Memory address based on the symbol's address
off(%reg) Memory address is an offset from the value of %reg
(%reg1,%reg2) Memory address is sum of the values of %reg1 and
%reg2
off(%reg1,%reg2) Memory address is an offset from the sum of the values of %reg1 and %reg2
The offsets can be positive integers, negative integer symbolics, or symbolics with a positive or negative offset. / Some examples:
%ax register %ax
%eax register %eax
$150 integer constant 150
$0xAFF3 hexadecimal constant AFF3
valx a symbol for an external or static variable
.L5 a label to an address of an instruction which could be used in a jump instruction. It can also be a label for a character string literal.
8(%ebp) the memory address which is 8 + the value of register %ebp
(%ebp,%ebx) the memory address which is the value of register %ebp + value of register %ebx.
studentData+4 the memory address which is 4 + the instruction address of studentData.
There is another form of memory references using a scale which we will discuss later.
Overview of the Machine instruction categories
Move - move from source to destination
movS source, dest
Load Effective Address - load the address instead of the value from an address
leaS source, dest
Arithmetic - 2 byte and 4 byte
addS operand1, operand2 add op1 to ap2
subS operand1, operand2 subtract op1 from op2
imulS operand multiply by operand
idivS operand divide by operand
incS operand increment the operand
decS operand decrement the operand
negS operand negate the operand
Shift
salS k,reg shift arithmetic left
sarS k,reg shift arithmetic right
shlS k,reg shift logical left
shrS k,reg shift logical right
Note: S is the size and must be one of
b byte (1 byte)
w word (2 bytes)
l long (4 bytes)
q quad words (8 bytes)
W for word is based on the old machines where a word was 2 bytes. / Examples:
movl iValA,%edx # Moves the long value of iValA to %edx
movl $iValA,%edx # Moves the address of iValA to %edx
movl %edx,lresult # Moves the long value in %edx to lresult
addl 4(%ebp),%edx # The value at the address computed
# by an offset of 4 plus the value of ebp
# is added to the value of %edx.
# The result is stored in %edx.
incw %dx # Increment the 2 byte value in %dx by 1.
movl lresult,%edx # Moves the long value found at lresult
# to %edx
leal lresult,%edx # Move the address of lresult to %edx
sarl $3, %edx # Arithmetic shift of the long value in
# %edx 3 bits to the right
Overview of the Machine instruction categories
Flow
jmp label unconditional jump to label
cmpS operand1, operand2 compare setting condition code flags
jle label jump less than or equal
jl label jump less than
je label jump equal
jne label jump not equal
jge label jump greater than or equal
jg label jump greater than
call dest using calling convention to invoke the function at dest
ret return to the caller based on the calling convention
Stack - these manipulate the runtime memory stack
pushS operand pushes the operand onto the runtime memory stack
popS operand pops the top of the stack and stores it in operand
leave prepare to leave the subroutine based on calling convention
Note that call and ret also manipulate the stack. / Consider the following C statement snippet:
if (iX > iY)
true part
else
false part
In Assembly Language:
movl iX, %edx # load reg edx with the iX variable
cmpl iY, %edx # compare iX:iY (we are comparing the
# second operand(edx) with the first)
jle .L3 # if <=, jump to .L3
… # code for the true part
jmp .L4 # jump over the false part
.L3:
… # code for the false part
.L4:
… # code following the entire if
C code for calculating the average using the final exam and the higher of the first two exams.
int calculateAverage(int iExam1, int iExam2, int iFinalExam)
{
int iSum;
if (iExam1 > iExam2)
iSum = iExam1 + iFinalExam;
else
iSum = iExam2 + iFinalExam;
return iSum / 2;
}
3/2 = 1.5, truncating 1
-3/2 = -1.5, truncating -1
-3 + 1 = -2, if we divide by 2 we get -1
-4 + 1 = -3, divide by 2 = -2 / Corresponding assembly language code generated by gcc -O1 -S (comments added by me)
1 .file "calculateAverage.c"
2 .text
3 .globl calculateAverage # The Linker will need to know
# this.
4 .type calculateAverage, @function
5 calculateAverage:
6 pushl %ebp # tbd
7 movl %esp, %ebp # tbd
8 movl 8(%ebp), %edx # load iExam1 in %edx
9 movl 12(%ebp), %eax # load iExam2 in %eax
10 cmpl %eax, %edx # compare iExam1:iExam2
11 jle .L2 # if <=, jump to .L2
12 addl 16(%ebp), %edx # add iFinal to %edx (iExam1)
13 jmp .L3 # jump over false part
14 .L2:
15 movl 16(%ebp), %edx # move iFinal to %edx
16 addl %eax, %edx # add iExam2 to %edx (iFinal)
17 .L3:
18 movl %edx, %eax # move sum to %eax for sign
19 shrl $31, %eax # shift makes this 0 or 1
20 addl %edx, %eax # increase by 0 or 1
21 sarl %eax # divide by 2 via shifting
22 popl %ebp # tbd
23 ret # tbd
24 .size calculateAverage, .-calculateAverage
25 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
26 .section .note.GNU-stack,"",@progbits
C code for averageDriver.c
StudentData;
void readStudents();
int calculateAverage(int iExam1, int iExam2, int iFinalExam);
int main(int argc, char *argv[])
{
int i;
studentData.iStudentCnt = 0;
readStudents();
for (i = 0; i < studentData.iStudentCnt; i++)
printf("%s %d\n"
, studentData.studentM[i].szStudentId
, calculateAverage
(studentData.studentM[i].iExam1
, studentData.studentM[i].iExam2
, studentData.studentM[i].iFinalExam)
);
} / Assembly Language for averageDriver.c using gcc -O1 -S
1 .file "averageDriver.c"
2 .section .rodata.str1.1,"aMS",@progbits,1
3 .LC0:
4 .string "%s %d\n"
5 .text
6 .globl main # linker will need to know about this
7 .type main, @function
8 main:
9 leal 4(%esp), %ecx # tbd
10 andl $-16, %esp # tbd
11 pushl -4(%ecx) # tbd
12 pushl %ebp # tbd
13 movl %esp, %ebp # tbd
14 pushl %edi # tbd
15 pushl %esi # tbd
16 pushl %ebx # tbd
17 pushl %ecx # tbd
18 subl $24, %esp # reserve 24 bytes on the stack
19 movl $studentData, %ebx # address of studentData -> %edx
#
# What is at studentData vs. studentData+4 ?
#
20 movl $0, (%ebx) # set iStudentCnt to 0