Engineers Are Trained to Design Chips Where Their First Consideration Is Getting Work Done

Seminar

Clockless Chips

Date: October 25, 2005.

Presented by:

K. Subrahmanya Sreshti.

(05IT6004).

ABSTRACT

Clock less approach, which uses a technique known as asynchronous logic, differs from conventional computer circuit design in that the switching on and off of digital circuits are controlled individually by specific pieces of data rather than by a tyrannical clock that forces all of the millions of the circuits on a chip to march in unison. It overcomes all the disadvantages of a clocked circuit such as slow speed, high power consumption, high electromagnetic noise etc. For these reasons the clock-less technology is considered as the technology, which is, going to drive majority of electronic chips in the coming years.

Introduction

Over the years, the designers of microprocessors have resorted to all sorts of tricks to make their products run faster. Modern chips, for example, queue up several instructions in a “pipeline” and analyze them to see if switching the order in which they are executed can produce the correct result, only more quickly.

After a point, cranking up the clock speed becomes an exercise in diminishing returns. That's why a one-gigahertz chip doesn't run twice as fast as a 500-megahertz chip. The clock, through the work it must do to coordinate millions of transistors on a chip, generates its own overhead. The faster the clock, the greater the overhead becomes. The clock in a state-of-the-art microprocessor can consume up to 30 percent of the chip's computing capability, with that percentage increasing at an ever faster rate as clock speeds increase.

Faced with diminishing returns, however, chip designers are dusting down two technologies—called multi-threading and asynchronous logic—that were both invented decades ago. At the time, neither was competitive with conventional designs, but important uses have since emerged for each of them. Multi-threading can increase the performance of database- and web-servers, while asynchronous logic is ideal for wireless devices and smart cards.

Problems with Synchronous Approach

The synchronous approach predominated, largely because it is easier to design chips in which things happen only when the clock ticks.

As chips get bigger, faster and more complicated, distributing the clock signal around the chip becomes harder. Another drawback with clocked designs is that they waste a lot of energy, since even inactive parts of the chip have to respond to every clock tick. Clocked chips also produce electromagnetic emissions at their clock frequency, which can cause radio interference.

Each tick must be long enough for signals to traverse even a chip’s longest wires in one cycle. However, the tasks performed on parts of a chip that are close together finish well before a cycle but can’t move on until the next tick. As chips get bigger and more complex, it becomes more difficult for ticks to reach all elements, particularly as clocks get faster.

In today's chips, the clock remains the key part of the action. As a microprocessor performs a given operation, electronic signals travel along microscopic strips of metal forking, intersecting again, encountering logic gates-until they finally deposit the results of the computation in a temporary memory bank called a register. Let's say you want to multiply 4 by 6. If you could slow down the chip and peek into the register as this calculation was being completed, you might see the value changing many times, say, from 4 to 12 to 8, before finally settling down into the correct answer. That's because the signals transmitted to perform the operation travel along many different paths before arriving at the register; only after all signals have completed their journey is the correct value assured. The role of the clock is to guarantee that the answer will be ready at a given time. The chip is designed so that even the slowest path through the circuit-the path with the longest wires and the most gates-is guaranteed to reach the register within a single clock-tick.

The chip’s clock is an oscillating crystal that vibrates at a regular frequency, depending on the voltage applied. This frequency is measured in gigahertz or megahertz. All the chip’s work is synchronized via the clock, which sends its signals out along all circuits and controls the registers, the data flow, and the order in which the processor performs the necessary tasks. An advantage of synchronous chips is that the order in which signals arrive doesn’t matter. Signals can arrive at different times, but the register waits until the next clock tick before capturing them. As long as they all arrive before the next tick, the system can process them in the proper order. Designers thus don’t have to worry about related issues, such as wire lengths, when working on chips. And it is easier to determine the maximum performance of a clocked system. With these systems, calculating performance simply involves counting the number of clock cycles needed to complete an operation.

Calculating performance is less defined with asynchronous designs.

The clocks themselves consume power and produce heat. In addition, in synchronous designs, registers use energy to switch so that they are ready to receive new data whenever the clock ticks, whether they have inputs to process or not. In asynchronous designs, gates switch only when they have inputs.

The job of coordinating tens of millions of transistors at a billion ticks per second requires the consumption of a lot of energy, most of which ends up as heat. Patrick Gelsinger, chief technology officer at Intel, referred to the problem in his keynote speech at the International Solid-State Circuits Conference last February. Gelsinger was only half-joking when he said that if microprocessors continue to be run by ever-faster clocks, then by 2005 a chip will run as hot as a nuclear reactor.

By throwing out the clock, the fundamental way that chips have organized and executed their work. For instance, within every one-gigahertz microprocessor, there lies an oscillating crystal ticking one billion times a second. Engineers are trained to design chips where their first consideration is getting work done before the next clock-tick comes around. For most chip designers, throwing out the clock is difficult to imagine.

The clock establishes a timing constraint within which all chip elements must work, and constraints can make design easier by reducing the number of potential decisions.

Asynchronous logic circuits (Stop the clocks)

As its name suggests, it does away with the cardinal rule of chip design: that everything marches to the beat of an oscillating crystal “clock”. For a 1GHz chip, this clock ticks one billion times a second, and all of the chip’s processing units co-ordinate their actions with these ticks to ensure that they remain in step. Asynchronous, or “clockless”, designs, in contrast, allow different bits of a chip to work at different speeds, sending data to and from each other as and when appropriate.

Clockless processors, also called asynchronous or self-timed, don’t use the oscillating crystal that serves as the regularly “ticking” clock that paces the work done by traditional synchronous processors. Rather than waiting for a clock tick, clockless-chip elements hand off the results of their work as soon as they are finished.

Figure 1.

How clockless chips work

There are no purely asynchronous chips yet. Instead, today’s clockless processors are actually clocked processors with asynchronous elements. Clockless elements use perfect clock gating, in which circuits operate only when they have work to do, not whenever a clock ticks. Instead of clock-based synchronization, local handshaking controls the passing of data between logic modules. The asynchronous processor places the location of the stored data it wants to read onto the address bus and issues a request for the information. The memory reads the address off the bus, finds the information, and places it on the data bus. The memory then acknowledges that it has read the data. Finally, the processor grabs the information from the data bus.

According to Jorgenson, “Data arrives at any rate and leaves at any rate. When the arrival rate exceeds the departure rate, the circuit stalls the input until the output catches up.”

The many handshakes themselves require more power than a clock’s operations. However, clockless systems more than offset this because, unlike synchronous chips, each circuit uses power only when it performs work.

Clockless advantages

In synchronous designs, the data moves on every clock edge, causing voltage spikes. In clockless chips, data doesn’t all move at the same time, which spreads out current flow, thereby minimizing the strength and frequency of spikes and emitting less EMI. Less EMI reduces both noise-related errors within circuits and interference with nearby devices.

Power efficiency, responsiveness, and robustness

Because asynchronous chips have no clock and each circuit powers up only when used, asynchronous processors use less energy than synchronous chips by providing only the voltage necessary for a particular operation.

According to Jorgenson, clockless chips are particularly energy-efficient for running video, audio, and other streaming applications — data-intensive programs that frequently cause synchronous processors to use considerable power. Streaming data applications have frequent periods of dead time — such as when there is no sound or when video frames change very little from their immediate predecessors — and little need for running error-correction logic. During this inactive time, asynchronous processors don’t use much power. Clockless processors activate only the circuits needed to handle data, thus they leave unused circuits ready to respond quickly to other demands. Asynchronous chips run cooler and have fewer and lower voltage spikes. Therefore, they are less likely to experience temperature-related problems and are more robust. Because they use handshaking, clockless chips give data time to arrive and stabilize before circuits pass it on. This contributes to reliability because it avoids the rushed data handling that central clocks sometimes necessitate, according to University of Manchester Professor Steve Furber, who runs the Amulet project.

Simple, efficient design

Logic modules could be developed without regard to compatibility with a central clock frequency, which makes the design process easier. Also, because asynchronous processors don’t need specially designed modules that all work at the same clock frequency, they can use standard components. This enables simpler, faster design and assembly.

However, the recent use of both domino logic and the delay-insensitive mode in asynchronous processors has created a fast approach known as integrated pipelines mode.

Domino logic improves performance because a system can evaluate several lines of data at a time in one cycle, as opposed to the typical approach of handing one line in each cycle. Domino logic is also efficient because it acts only on data that has changed during processing, rather than acting on all data throughout the process. The delay-insensitive mode allows an arbitrary time delay for logic blocks. “Registers communicate at their fastest common speed. If one block is slow, the blocks that it communicates with slow down,” said Jorgenson. This gives a system time to handle and validate data before passing it along, thereby reducing errors.

Advantages of the Clockless chips

A clocked chip can run no faster than its most slothful piece of logic; the answer isn't guaranteed until every part completes its work. By contrast, the transistors on an asynchronous chip can swap information independently, without needing to wait for everything else. The result? Instead of the entire chip running at the speed of its slowest components, it can run at the average speed of all components. At both Intel and Sun, this approach has led to prototype chips that run two to three times faster than comparable products using conventional circuitry.

Clockless chips draw power only when there is useful work to do, enabling a huge savings in battery-driven devices; an asynchronous-chip-based pager marketed by Philips Electronics, for example, runs almost twice as long as competitors' products, which use conventional clocked chips.

Asynchronous chips use 10 percent to 50 percent less energy than synchronous chips, in which the clocks are constantly drawing power. That makes them ideal for mobile communications applications - which usually need low power sources - and the chips' quiet nature also makes them more secure, as typical hacking techniques involve listening to clock ticks.

Another advantage of clockless chips is that they give off very low levels of electromagnetic noise. The faster the clock, the more difficult it is to prevent a device from interfering with other devices; dispensing with the clock all but eliminates this problem. The combination of low noise and low power consumption makes asynchronous chips a natural choice for mobile devices. "The low-hanging fruit for clockless chips will be in communications devices," starting with cell phones

Asynchronous logic would offer better security than conventional chips: "The clock is like a big signal that says, Okay, look now," says Fant. "It's like looking for someone in a marching band. Asynchronous is more like a milling crowd. There's no clear signal to watch. Potential hackers don't know where to begin."

Analyzing the power consumption for each clock tick can crack the encryption on existing smart cards. This allows details of the chip’s inner workings to be deduced. Such an attack would be far more difficult on a smartcard based on asynchronous logic.

They can perform encryption in a way that is harder to identify and to crack. Improved encryption makes asynchronous circuits an obvious choice for smart cards—the chip-endowed plastic cards beginning to be used for such security-sensitive applications as storage of medical records, electronic funds exchange and personal identification.

Ivan Sutherland of Sun Microsystems, who is regarded as the guru of the field, believes that such chips will have twice the power of conventional designs, which will make them ideal for use in high-performance computers. But Dr Furber suggests that the most promising application for asynchronous chips may be in mobile wireless devices and smart cards.

Different styles:

There are several styles of asynchronous design. Conventional chips represent the zeroes and ones of binary digits (“bits”) using low and high voltages on a particular wire.

One clockless approach, called “dual rail”, uses two wires for each bit. Sudden voltage changes on one of the wires represent a zero, and on the other wire a one.