Combinatorial Devices
A digital circuit can be in one of two states 0 or 1.
0-1 volts->logic 0
2-5 volts->logic 1
A gate is a simple electronic device that can be used to compute various combinations of logic states.
A transistor can be thought of as a simple switch either closed or open.
Gates are made from transistors:
Invertor or NOT gate
NAND gate
NOR gate
Logic symbols used for these combinations are:
AX
01
10
ABX
001
011
101
110
ABX
001
010
100
110
ABX
000
010
100
111
ABX
000
011
101
111
A NAND or a NOR gate can be built from just two transistors
AND or OR require three, so most logic is built from NAND’s and NOR’s
All logic circuits can be built just from NAND gates (or NOR)
eg an invertor:
Boolean Algebra
Another way to describe circuits is by using Boolean Algebra.
Variables (normally capital letters) can be either 0 or 1 and can be combined by :
AB-> A and B(sometimes written AB)
A + B-> A or B
-> not A
From an arbitrary truth table, we can generate a Boolean expression:
ABC
001
010
101
111
Look at each row that produces a 1 in the C column. Form the expression using AND and NOT that generates a 1. OR all the rows that produce a 1 together:
C= + A + AB
To implement the function as a circuit:
Use standard ANDs and ORs to start with. Use more than two inputs if necessary.
Convert to NANDs and NORs when finished.
Inverting circles can be added to/removed from either end of a line.
3 input devices can be formed from 2 input devices with the output combined with the third input
is the same as:
The following equivalence’s also allow us to simplify our circuits:
which are a result of De Morgan’s Law = +
Cache memories
Large memories DRAM (dynamic ram) are slow~100ns
compared to our pipeline time of ~10ns
Small memories SRAM (static ram) are fast~10ns
but are ~10 times as expensive.
Memory is built using a hierarchy,
disk being the slowest and cheapest
then DRAM
then SRAM the fastest and most expensive.
A library is a good example of a memory hierarchy.
It is very important to have access to all books in a library, but at any one time you will only need a few books on one particular subject, and of those books maybe a few pages in each.
If you leave the books open, on your desk, at these particular pages, you have very fast access to the information.
If you are prepared to turn the pages you get slower access to other information in the book.
If you are prepared to go to the library you get even slower access to lots more information.
The same principle applies to memory.
The Principle of Locality states that programs access a relatively small portion of their address space at any instant of time.
Temporal locality - if an item is referenced, it will tend to be referenced again soon
Spatial locality - if an item is referenced, items whose addresses are close by will tend to be referenced soon.
A Hit is when the processor requests data that is in the faster memory (cache).
Hit Rate is the percentage of accesses that are found in the cache.
Hit Time is the time taken to access the cache.
A Miss is when the processor requests data that is not in the cache.
Miss Rate is (1 - Hit Rate).
Miss Penalty is the time taken to move data from memory to cache.
The Block Size is the amount of data that is transferred from memory to cache when a miss occurs.
Typical values are:
Block size 4 to 128 bytes
Hit time 1 cycle
Miss Penalty 8 cycles
Miss Rate 10%
Cache Size 256KB
When the processor wants to access data x5:
It first decides whether it is already in the cache.
If it isn’t it retrieves x5 from memory and inserts it in the cache.
How does the processor know if the data is in the cache?
How does the processor know its position in the cache?
The cache is ordered so that there is a relation between memory addresses and cache structure.
Direct mapping
The position or index of a data item in the cache is determined directly from its memory address.
Consider a cache of 8 entries each entry 32 bits wide.
Direct mapping will place data from memory into locations in the cache at positions corresponding to the low-order 3 bits of their memory address:
locationdatacache index
00000023000
00000144001
00001013010
00001148011
00010017100
00010156101
00011044110
00011122111
00100039000
..
01010129101
..
11111133111
the 3 low order bits give the position in the cache.
When the processor is first switched on, the cache will not contain valid data. So a valid bit is associated with each cache location, and is set when data is written.
Lots of memory locations are mapped to the same cache location.
We also need to know which location is currently in the cache.
We do this with a cache tags.
The cache tag is the remaining bits of the address:
cachecachecachecorresponding
tagindexdatamemory
00000023000000
10000128100001
01101034011010
00001148000011
00010017000100
01010129010101
10011092100110
11111133111111
To discover whether a data reference hits or misses the cache:
1find the cache index of the address
2.match the cache tag with the cache tag part of the address
100110
cache index is 110 -> cache tag is 100
cache tag part of 100110 is 100
100 == 100-> hit
010011
cache index is 011 -> cache tag is 000
cache tag part of 010011 is 010
000 != 010-> miss
To increase the size of the cache we can either increase the number of positions, or increase the size of the data stored at those positions
Example of a cache with 32 entries each 32 bytes of data:
Increasing the number of bytes for each cache entry involves a penalty when there is a miss. More bytes need to be copied from memory into the cache. Spatial locality means that this is not necessarily a bad thing!
Using the cache index to find the correct data line before comparing the cache tag involves two levels of computation.
A fully associative cache combines the cache index with the cache tag.
To find out whether the data is in the cache:
compare the cache tag part of the memory address with all the cache tags in the cache.
This matching is done in parallel in a fully associative cache.
Cache replacement:
In a directly mapped cache:
If there is a cache miss then the cache is updated and the new entry replaces the old.
In a fully associative cache:
If there is a cache miss then the new entry can be placed anywhere.
Where do we put it?
Using the Least Recently Used algorithm - hardware keeps track of which cache entry has not been accessed for the longest time.
This can be tricky to implement!
A simple pseudo LRU algorithm is:
Keep an index.
If the cache at that value is accessed add one to the index.
If a replacement is needed choose the cache entry at the index.
Cache writes:
Reading cache is straight forward. Writing to cache is not.
How do we keep cache data and memory data the same?
Write Back policy:
Keep an extra bit in each cache entry to mark whether the entry has been written to.
When the cache entry is replaced, write it back to memory if it has changed.
Write Through policy:
Always write to memory as well as to cache.
Use a write buffer between memory to speed up memory access.
Full Adder
A full adder adds two binary numbers (A,B) together and includes provision for a carry in bit (Cin) and a carry out bit (Cout).
The truth table for a full adder is:
ABCinCoutSum
00000
00101
01001
01110
10001
10110
11010
11111
->
Sum=Cin+B+A+ABCin
Cout=BCin+ACin+AB+ABCin
Decoder
A decoder accepts a binary encoded number as input and puts a logic 1 on the corresponding output line.
For2 inputs->4 output lines
3 inputs->8 output lines
eg for 3 inputs with the signal 101 on them:
the 5’th output line will be 1, the rest 0
I1I0ABCD
001000A =
010100B = I0
100010C = I1
110001D = I1 I0
Decoder with output enable
All outputs will be 0 unless (Output Enable) is Low
I1I0ABCD
0001000A = OE
0010100B = I0 OE
0100010C = I1 OE
0110001D = I1 I0 OE
1000000
1010000
1100000
1110000
Design a simple decoder for 4 bit numbers....
Multiplexor
Chooses one input from many and copies it to the output.
Common configurations are:
4 inputs-2 address lines
8 inputs-3 address lines
The Output Enable line is the same as before
The truth table becomes unwieldy with so many variables, and is mostly empty, so we cut out all the uninteresting cases:
C1C0Q
000I0
001I1
010I2
011I3
Q=OE ( I0 + C0I1 + C1I2 + C1C0I3)
Demultiplexor
1 input is copied to one of several outputs
Common configurations are:
1 input-2 address lines-4 outputs
1 input-3 address lines-8 outputs
Consider the decoder. It is equivalent, if we replace the input with OE
Glitches
So far we have assumed that all logic devices are infinitely fast.
Outputs immediately reflect inputs.
In practice this is not the case. There is a short delay between an input logic level changing and the corresponding output change.
Gate delays for TTL are typically 5 nanoseconds.
20 cm of wire will also delay a signal by 1 nanosecond.
A + =TRUE
However consider what happens when the signal A goes from 1 to 0
This spurious 0 is called a glitch.
These glitches may or may not have disastrous effects.
It is possible to design circuits that have no glitches. This requires extra hardware and considerable effort.
Another approach is to design circuits that are insensitive to glitches. Wait a fixed time after a change - during which the glitches will go away.
This idea is the basis for clocked circuits that we will come to later.
Memory
Feedback
If we loop the output of a circuit back into its input, we have a condition called feedback, and some interesting results are obtained.
This circuit will oscillate rapidly
Will eventually settle on one input 0 and the other 1 (either way)
RS Flip-flop
Consider what happens when S and R are both 0
If Q is 0:
the top NOR will have 0 0 as inputs and will output 1
the bottom NOR will have 1 0 as inputs and will output 0
=> the state is stable
if Q is 1:
the top NOR will have 0 1 as inputs and will output 0
the bottom NOR will have 0 0 as inputs and will output 1
=> the state is stable
If S is switched to a 1
the top NOR will have 1 X as inputs and will output 0
the bottom NOR will have 1 0 as inputs and will output 1
=> Q is forced to be a 1 (S stands for Set)
If R is switched to a 1
the bottom NOR will have 1 X as inputs and will output 0
=> Q is forced to be a 0 (R stands for Reset)
RS Flip-flops are useful for switch debouncing.
Switches make and break several times while in close contact.
Q will be Set to 1 if switch makes contact with S
Q will stay at 1 despite make/breaks on S until switch touches R
Circuits can use a clock to provide timing pulses. Logic levels can be read on the clock pulse, which is allowed to go high only when glitches are unlikely.
The RS Flip-flop can be clocked.
The output of the AND gates will only reflect S and R when the Clock is 1
When the Clock is 0 the outputs of the AND gates are always 0
The X output of a RS Flip-flop is normally
One problem with the RS Flip-flop is what happens when R and S are 1.
In this state the only stable state is both Q and 0 !
If both R and S return to 0 simultaneously Q jumps to 1 or 0 at random
otherwise Q will reflect which one returned to 0 last.
Clocked D Flip-flop
The D flip-flop removes this ambiguity
The Clock can now be considered a Strobe. A single clock pulse.
When the strobe is high, the value of D will be transferred to Q
Q will remain at that value until the strobe returns to a high state.
The D Flip-flop acts as a simple 1 bit memory.
As described, this type of flip-flop is level triggered. It depends on the strobe or clock being in a high state.
In practice this is not so desirable. It is better to transfer the value of D to the flip-flop when the strobe changes from a 0 to a 1
This is called edge triggered.
Edge triggered flip-flops are far more common.
They can be either 0 ->1 rising edge, or 1->0 falling edge
Clocked J-K Flip-flop
Rather than removing the ambiguity of the RS flip-flop as in the D flip-flop, the J-K flip-flop makes use of it.
J is equivalent to s.When J = 1=> Q becomes 1
K is equivalnet to R.When K = 1=> Q becomes 0
When J=K=1 => Q becomes
The J-K flip-flop toggles the output when J=K=1.
Given a regular clock and J=K=1 the Q output will be a clock with half the frequency.
Register
We construct a register by connecting together a group of D flip-flops
All the flip-flops load simultaneously from the data in lines
Bus
A bus is a common set of wires connecting multiple devices together.
Buses are used to connect the data and address lines between computers, memory and i/o devices.
One problem with normal logic components is that their outputs are always active - either a 1 or a 0 (5V or Gnd).
If we connect the output of two gates together, where one output is a 0 and the other is a 1, then the result will be somewhere in between
If we want to use a bus architecture we must make sure that only one device at a time outputs a logic signal of 0 or 1
All other devices must go into a high impedance state
ie they look like they are not connected at all.
Tri-state devices have this capability.
A tri-state device is like a normal gate but with the added capability of going into this off-line mode.
In tri-state devices the OE output-enable line drives the outputs into this third state rather than into the 0 logic state as previously.
Not all devices are tri-state. To connect these devices to a bus we need to put a tri-state driver chip in between the device and the bus.
A tri-state driver just sends its input to its output, but also has an OE input to allow the output to go ‘not connected’.
RAM - Random Access Memory
Static memory is an array of D flip-flops.
The access time is the same for all bits in the array.
Most memory chips are 1 bit wide, a byte is 8 chips side-by-side.
(Some memory chips are several bits wide - normally a power of 2)
The desired bit is selected by the use of address lines.
A typical memory device - 1 bit wide
The Chip Select line acts as an Enable line to the chip
(when several banks of chips are being used)
and also to tri-state the Data line depending on how it is being accessed
The R/ determines whether the memory is written to or read from.
When memory is being read the Data line acts as an output
Write access to memory
We need to:select 1 flip-flop
perform the write on a clock pulse
the pulse must arrive after the chip is selected and a write operation specified.
Use a decoder and the address lines to select a particular flip-flop
AND this signal with the , R/ to act as a clock to strobe the data in.
Read access from memory
Place the correct bit on the output line by using a multiplexor
Use the , R/ to enable a tri-state buffer.
1Mb Memories
We can make the above memory larger by making it Byte wide rather than bit wide.
This just means we have to copy 8 times the flip-flop column and its associated data in and data out lines.
The lines from the decoder and CS/RW logic go to each of the flip-flop columns.
For large Bit wide memories, we soon run into large numbers of pins to connect to the chip.
1Mbit needs:
20address lines
1data line
1CS line
1R/W line
2power/ground
The decoder within the chip will need 1 million output lines!
To get round these problems, we arrange the memory as a 2 dimensional grid, 1024 x 1024 elements.
We use 2 decoders and send the address in 2 chunks - the row and column addresses.
1Mbit now needs:
10address lines
1data line
1CAS/RAS strobe (column/row strobe)
1CS line
1R/W line
2power/ground
The 2 decoders now have only 1024 lines each as outputs.
Shift Registers
Contain several flip-flops in a row. One bit is input at one end on each clock pulse. Each other bit moves one place to the right (or left). The outputs of each flip-flop are available simultaneously.
We can use shift registers for serial to parallel conversion. Input 8 bits on 8 pulses, then read data simultaneously.
Division by 2
We can use the above principle to implement a divide by 2 circuit.
In the parallel load register (lecture 4) we can use an extra control line
shift/
to determine whether the input to a flip-flop comes from the data lines or from the previous flip-flop (the left-most flip-flop gets a 0 in shift mode)
(use the control line as the address of a 2 input multiplexor, the output is connected to the input of the flip-flop)
Try this as an exercise.
(design your own 2 input mux using AND/OR gates)
Multiplication by 2
Multiplication can be done by shifting to the left.
Higher powers of two can be done by shifting several times.
An extra flip-flop can be added to the ends of the chain.
In the case of division by 2, this flip-flop will contain the remainder.
In the case of multiplication the flip-flop will flag whether an overflow has occurred.
The 74LS75B is a parallel load/shift right register 4 bits wide, which (with external assistance) can do shift left as well.
Counters
Add one to the current number.
000
001
010
011
100
101
110
111
For the low order bit:
just swap between 0 and 1 - ie it toggles