Combinatorial Devices

A digital circuit can be in one of two states 0 or 1.

0-1 volts->logic 0

2-5 volts->logic 1

A gate is a simple electronic device that can be used to compute various combinations of logic states.

A transistor can be thought of as a simple switch either closed or open.

Gates are made from transistors:

Invertor or NOT gate

NAND gate

NOR gate

Logic symbols used for these combinations are:

AX

01

10

ABX

001

011

101

110

ABX

001

010

100

110

ABX

000

010

100

111

ABX

000

011

101

111

A NAND or a NOR gate can be built from just two transistors

AND or OR require three, so most logic is built from NAND’s and NOR’s

All logic circuits can be built just from NAND gates (or NOR)

eg an invertor:

Boolean Algebra

Another way to describe circuits is by using Boolean Algebra.

Variables (normally capital letters) can be either 0 or 1 and can be combined by :

AB-> A and B(sometimes written AB)

A + B-> A or B

-> not A

From an arbitrary truth table, we can generate a Boolean expression:

ABC

001

010

101

111

Look at each row that produces a 1 in the C column. Form the expression using AND and NOT that generates a 1. OR all the rows that produce a 1 together:

C= + A + AB

To implement the function as a circuit:

Use standard ANDs and ORs to start with. Use more than two inputs if necessary.

Convert to NANDs and NORs when finished.

Inverting circles can be added to/removed from either end of a line.

3 input devices can be formed from 2 input devices with the output combined with the third input

is the same as:

The following equivalence’s also allow us to simplify our circuits:

which are a result of De Morgan’s Law = +

Cache memories

Large memories DRAM (dynamic ram) are slow~100ns

compared to our pipeline time of ~10ns

Small memories SRAM (static ram) are fast~10ns

but are ~10 times as expensive.

Memory is built using a hierarchy,

disk being the slowest and cheapest

then DRAM

then SRAM the fastest and most expensive.

A library is a good example of a memory hierarchy.

It is very important to have access to all books in a library, but at any one time you will only need a few books on one particular subject, and of those books maybe a few pages in each.

If you leave the books open, on your desk, at these particular pages, you have very fast access to the information.

If you are prepared to turn the pages you get slower access to other information in the book.

If you are prepared to go to the library you get even slower access to lots more information.

The same principle applies to memory.

The Principle of Locality states that programs access a relatively small portion of their address space at any instant of time.

Temporal locality - if an item is referenced, it will tend to be referenced again soon

Spatial locality - if an item is referenced, items whose addresses are close by will tend to be referenced soon.

A Hit is when the processor requests data that is in the faster memory (cache).

Hit Rate is the percentage of accesses that are found in the cache.

Hit Time is the time taken to access the cache.

A Miss is when the processor requests data that is not in the cache.

Miss Rate is (1 - Hit Rate).

Miss Penalty is the time taken to move data from memory to cache.

The Block Size is the amount of data that is transferred from memory to cache when a miss occurs.

Typical values are:

Block size 4 to 128 bytes

Hit time 1 cycle

Miss Penalty 8 cycles

Miss Rate 10%

Cache Size 256KB

When the processor wants to access data x5:

It first decides whether it is already in the cache.

If it isn’t it retrieves x5 from memory and inserts it in the cache.

How does the processor know if the data is in the cache?

How does the processor know its position in the cache?

The cache is ordered so that there is a relation between memory addresses and cache structure.

Direct mapping

The position or index of a data item in the cache is determined directly from its memory address.

Consider a cache of 8 entries each entry 32 bits wide.

Direct mapping will place data from memory into locations in the cache at positions corresponding to the low-order 3 bits of their memory address:

locationdatacache index

00000023000

00000144001

00001013010

00001148011

00010017100

00010156101

00011044110

00011122111

00100039000

..

01010129101

..

11111133111

the 3 low order bits give the position in the cache.

When the processor is first switched on, the cache will not contain valid data. So a valid bit is associated with each cache location, and is set when data is written.

Lots of memory locations are mapped to the same cache location.

We also need to know which location is currently in the cache.

We do this with a cache tags.

The cache tag is the remaining bits of the address:

cachecachecachecorresponding

tagindexdatamemory

00000023000000

10000128100001

01101034011010

00001148000011

00010017000100

01010129010101

10011092100110

11111133111111

To discover whether a data reference hits or misses the cache:

1find the cache index of the address

2.match the cache tag with the cache tag part of the address

100110

cache index is 110 -> cache tag is 100

cache tag part of 100110 is 100

100 == 100-> hit

010011

cache index is 011 -> cache tag is 000

cache tag part of 010011 is 010

000 != 010-> miss

To increase the size of the cache we can either increase the number of positions, or increase the size of the data stored at those positions

Example of a cache with 32 entries each 32 bytes of data:

Increasing the number of bytes for each cache entry involves a penalty when there is a miss. More bytes need to be copied from memory into the cache. Spatial locality means that this is not necessarily a bad thing!

Using the cache index to find the correct data line before comparing the cache tag involves two levels of computation.

A fully associative cache combines the cache index with the cache tag.

To find out whether the data is in the cache:

compare the cache tag part of the memory address with all the cache tags in the cache.

This matching is done in parallel in a fully associative cache.

Cache replacement:

In a directly mapped cache:

If there is a cache miss then the cache is updated and the new entry replaces the old.

In a fully associative cache:

If there is a cache miss then the new entry can be placed anywhere.

Where do we put it?

Using the Least Recently Used algorithm - hardware keeps track of which cache entry has not been accessed for the longest time.

This can be tricky to implement!

A simple pseudo LRU algorithm is:

Keep an index.

If the cache at that value is accessed add one to the index.

If a replacement is needed choose the cache entry at the index.

Cache writes:

Reading cache is straight forward. Writing to cache is not.

How do we keep cache data and memory data the same?

Write Back policy:

Keep an extra bit in each cache entry to mark whether the entry has been written to.

When the cache entry is replaced, write it back to memory if it has changed.

Write Through policy:

Always write to memory as well as to cache.

Use a write buffer between memory to speed up memory access.

Full Adder

A full adder adds two binary numbers (A,B) together and includes provision for a carry in bit (Cin) and a carry out bit (Cout).

The truth table for a full adder is:

ABCinCoutSum

00000

00101

01001

01110

10001

10110

11010

11111

->

Sum=Cin+B+A+ABCin

Cout=BCin+ACin+AB+ABCin

Decoder

A decoder accepts a binary encoded number as input and puts a logic 1 on the corresponding output line.

For2 inputs->4 output lines

3 inputs->8 output lines

eg for 3 inputs with the signal 101 on them:

the 5’th output line will be 1, the rest 0

I1I0ABCD

001000A =

010100B = I0

100010C = I1

110001D = I1 I0

Decoder with output enable

All outputs will be 0 unless (Output Enable) is Low

I1I0ABCD

0001000A = OE

0010100B = I0 OE

0100010C = I1 OE

0110001D = I1 I0 OE

1000000

1010000

1100000

1110000

Design a simple decoder for 4 bit numbers....

Multiplexor

Chooses one input from many and copies it to the output.

Common configurations are:

4 inputs-2 address lines

8 inputs-3 address lines

The Output Enable line is the same as before

The truth table becomes unwieldy with so many variables, and is mostly empty, so we cut out all the uninteresting cases:

C1C0Q

000I0

001I1

010I2

011I3

Q=OE ( I0 + C0I1 + C1I2 + C1C0I3)

Demultiplexor

1 input is copied to one of several outputs

Common configurations are:

1 input-2 address lines-4 outputs

1 input-3 address lines-8 outputs

Consider the decoder. It is equivalent, if we replace the input with OE

Glitches

So far we have assumed that all logic devices are infinitely fast.

Outputs immediately reflect inputs.

In practice this is not the case. There is a short delay between an input logic level changing and the corresponding output change.

Gate delays for TTL are typically 5 nanoseconds.

20 cm of wire will also delay a signal by 1 nanosecond.

A + =TRUE

However consider what happens when the signal A goes from 1 to 0

This spurious 0 is called a glitch.

These glitches may or may not have disastrous effects.

It is possible to design circuits that have no glitches. This requires extra hardware and considerable effort.

Another approach is to design circuits that are insensitive to glitches. Wait a fixed time after a change - during which the glitches will go away.

This idea is the basis for clocked circuits that we will come to later.

Memory

Feedback

If we loop the output of a circuit back into its input, we have a condition called feedback, and some interesting results are obtained.

This circuit will oscillate rapidly

Will eventually settle on one input 0 and the other 1 (either way)

RS Flip-flop

Consider what happens when S and R are both 0

If Q is 0:

the top NOR will have 0 0 as inputs and will output 1

the bottom NOR will have 1 0 as inputs and will output 0

=> the state is stable

if Q is 1:

the top NOR will have 0 1 as inputs and will output 0

the bottom NOR will have 0 0 as inputs and will output 1

=> the state is stable

If S is switched to a 1

the top NOR will have 1 X as inputs and will output 0

the bottom NOR will have 1 0 as inputs and will output 1

=> Q is forced to be a 1 (S stands for Set)

If R is switched to a 1

the bottom NOR will have 1 X as inputs and will output 0

=> Q is forced to be a 0 (R stands for Reset)

RS Flip-flops are useful for switch debouncing.

Switches make and break several times while in close contact.

Q will be Set to 1 if switch makes contact with S

Q will stay at 1 despite make/breaks on S until switch touches R

Circuits can use a clock to provide timing pulses. Logic levels can be read on the clock pulse, which is allowed to go high only when glitches are unlikely.

The RS Flip-flop can be clocked.

The output of the AND gates will only reflect S and R when the Clock is 1

When the Clock is 0 the outputs of the AND gates are always 0

The X output of a RS Flip-flop is normally

One problem with the RS Flip-flop is what happens when R and S are 1.

In this state the only stable state is both Q and 0 !

If both R and S return to 0 simultaneously Q jumps to 1 or 0 at random

otherwise Q will reflect which one returned to 0 last.

Clocked D Flip-flop

The D flip-flop removes this ambiguity

The Clock can now be considered a Strobe. A single clock pulse.

When the strobe is high, the value of D will be transferred to Q

Q will remain at that value until the strobe returns to a high state.

The D Flip-flop acts as a simple 1 bit memory.

As described, this type of flip-flop is level triggered. It depends on the strobe or clock being in a high state.

In practice this is not so desirable. It is better to transfer the value of D to the flip-flop when the strobe changes from a 0 to a 1

This is called edge triggered.

Edge triggered flip-flops are far more common.

They can be either 0 ->1 rising edge, or 1->0 falling edge

Clocked J-K Flip-flop

Rather than removing the ambiguity of the RS flip-flop as in the D flip-flop, the J-K flip-flop makes use of it.

J is equivalent to s.When J = 1=> Q becomes 1

K is equivalnet to R.When K = 1=> Q becomes 0

When J=K=1 => Q becomes

The J-K flip-flop toggles the output when J=K=1.

Given a regular clock and J=K=1 the Q output will be a clock with half the frequency.

Register

We construct a register by connecting together a group of D flip-flops

All the flip-flops load simultaneously from the data in lines

Bus

A bus is a common set of wires connecting multiple devices together.

Buses are used to connect the data and address lines between computers, memory and i/o devices.

One problem with normal logic components is that their outputs are always active - either a 1 or a 0 (5V or Gnd).

If we connect the output of two gates together, where one output is a 0 and the other is a 1, then the result will be somewhere in between

If we want to use a bus architecture we must make sure that only one device at a time outputs a logic signal of 0 or 1

All other devices must go into a high impedance state

ie they look like they are not connected at all.

Tri-state devices have this capability.

A tri-state device is like a normal gate but with the added capability of going into this off-line mode.

In tri-state devices the OE output-enable line drives the outputs into this third state rather than into the 0 logic state as previously.

Not all devices are tri-state. To connect these devices to a bus we need to put a tri-state driver chip in between the device and the bus.

A tri-state driver just sends its input to its output, but also has an OE input to allow the output to go ‘not connected’.

RAM - Random Access Memory

Static memory is an array of D flip-flops.

The access time is the same for all bits in the array.

Most memory chips are 1 bit wide, a byte is 8 chips side-by-side.

(Some memory chips are several bits wide - normally a power of 2)

The desired bit is selected by the use of address lines.

A typical memory device - 1 bit wide

The Chip Select line acts as an Enable line to the chip

(when several banks of chips are being used)

and also to tri-state the Data line depending on how it is being accessed

The R/ determines whether the memory is written to or read from.

When memory is being read the Data line acts as an output

Write access to memory

We need to:select 1 flip-flop

perform the write on a clock pulse

the pulse must arrive after the chip is selected and a write operation specified.

Use a decoder and the address lines to select a particular flip-flop

AND this signal with the , R/ to act as a clock to strobe the data in.

Read access from memory

Place the correct bit on the output line by using a multiplexor

Use the , R/ to enable a tri-state buffer.

1Mb Memories

We can make the above memory larger by making it Byte wide rather than bit wide.

This just means we have to copy 8 times the flip-flop column and its associated data in and data out lines.

The lines from the decoder and CS/RW logic go to each of the flip-flop columns.

For large Bit wide memories, we soon run into large numbers of pins to connect to the chip.

1Mbit needs:

20address lines

1data line

1CS line

1R/W line

2power/ground

The decoder within the chip will need 1 million output lines!

To get round these problems, we arrange the memory as a 2 dimensional grid, 1024 x 1024 elements.

We use 2 decoders and send the address in 2 chunks - the row and column addresses.

1Mbit now needs:

10address lines

1data line

1CAS/RAS strobe (column/row strobe)

1CS line

1R/W line

2power/ground

The 2 decoders now have only 1024 lines each as outputs.

Shift Registers

Contain several flip-flops in a row. One bit is input at one end on each clock pulse. Each other bit moves one place to the right (or left). The outputs of each flip-flop are available simultaneously.

We can use shift registers for serial to parallel conversion. Input 8 bits on 8 pulses, then read data simultaneously.

Division by 2

We can use the above principle to implement a divide by 2 circuit.

In the parallel load register (lecture 4) we can use an extra control line

shift/

to determine whether the input to a flip-flop comes from the data lines or from the previous flip-flop (the left-most flip-flop gets a 0 in shift mode)

(use the control line as the address of a 2 input multiplexor, the output is connected to the input of the flip-flop)

Try this as an exercise.

(design your own 2 input mux using AND/OR gates)

Multiplication by 2

Multiplication can be done by shifting to the left.

Higher powers of two can be done by shifting several times.

An extra flip-flop can be added to the ends of the chain.

In the case of division by 2, this flip-flop will contain the remainder.

In the case of multiplication the flip-flop will flag whether an overflow has occurred.

The 74LS75B is a parallel load/shift right register 4 bits wide, which (with external assistance) can do shift left as well.

Counters

Add one to the current number.

000

001

010

011

100

101

110

111

For the low order bit:

just swap between 0 and 1 - ie it toggles