The Basic CALF Algorithm

Verilog Implementation of CORDIC adaptive lattice filter (CALF)

Pranay Koka ()

Abstract

In this project the cordic based algorithm for an adaptive lattice filter was mapped into hardware. It involved applying techniques and algorithms like systolic array mapping, pipelining and retiming to achieve the desired objective of an efficient implementation of a the filter. The basic CALF algorithm was first analysed in-terms of the computation dependence and then the areas of improvement were spotted. The algorithm was then modified so that it could be pipelined and re-timed for speed. Systolic array mapping was also performed. This project also involved implementation of the improved version of the algorithm in verilog HDL. The filter modeled here is a 16-bit 2 – cordic stage, 2-section filter with saturation arithmetic.

Introduction and related work:

Lattice filters are used in various areas and are an important class of filters which need to be optimized for performance. This project aims at an efficient verilog implementation and synthesis of an adaptive lattice filter.

It is important to quote the related work done in this field. The design of lattice filters have been improved with the advent of CORDIC (Co-ordinate rotation digital computer) algorithms[1]. Cordic algorithms are a class of algorithms that use basic shift and add techniques to realize complex trignometric, hyperbolic and logarithmic functions. All trignometric functions can be computed using vector rotations. The cordic algorithms provide efficient methods to perform the vector rotations by arbitrary angles using only shift and adds. Because of the shift and add properties these algorithms have become attractive for an FPGA implementation.

A cordic based algorithm for an adaptive was proposed [2],[3], which used basic cordic iterative units to form a lattice section. The adaptive feedback was also provided in using a simple implementation as discussed in the later sections.

The Basic CALF algorithm:

The calf algorithm proposed in [2] is given below:

While(true)

;

For p = 1 to Pl

;

For i = 0 to n-1 Do

End

Elseif

End

t = t+1

if t > tmax then break loop

End

The important point to note here is that the logic required to generate the ‘d’ vectors can be as simple as an up-down counter. The block diagram of the lattice filter is shown in figure 1

Figure 1 block diagram of cordic adaptive lattice filter

Representation of the CALF algorithm in dependence graph:

The basic algorithm is a 3 nested loop. Analysing the inner loop and the next outer loop P, The dependence graph in Figure 2 shows a dependency between the iterations inside a section and also inter-section dependence. The DG represents a four section lattice filter with four cordic iterations in each section. As shown in the figure ith cordic iteration is dependent on the (i-1)th iteration and the first iteration of the p+1th lattice section is dependent on the last iteration of the pth section.

Figure 2 Dependence graph between iterations

Figure 3 shows the dependence between the sections of the lattice filter with respect to time. The dependence of X and Y tallies with the placement of the extra delay in the case of y as shown in the figure. The iterations in p+1th section depend on the pth section.

Figure 3 Dependence graph for inter-section dependence

Systolic array mapping:

Before deriving the mapping for systolic arrays, let us see the 3 variable form of the above described CALF algorithm.

For t = 0,1,2…..

For P = 1 to PL do

For I = 0 to N-1 do

End do

enddo

From the DG in figure 3 we have the following dependence matrix

We choose a scheduling vector

and a projection vector:

Causality constraint:

This satisfies the causality constraint.

Resource conflict avoidance:

Now

PE assignment (node mapping) :

ARC mapping:

The linear schedule is given by

The systolic array with this mapping is shown in figure 4

Figure 4 Systolic array mapping after inter-section pipelining

The systolic array in Figure4 is after coarse grain pipelining i.e. after pipelining between the sections of the lattice filter.

Piplining and retiming inter-cordic interations.

This being a stochastic gradient filter, we can add more delays to the computation of the d vector in the up-down counter without much changing the characteristicsof the filter.

Hence the algorithm will now change as follows:

For t = 0,1,2…..

For P = 1 to PL do

For I = 0 to N-1 do

End do

end do

The block diagram of a single section of the filter will now look like the one in Figure 5

Figure 5 Block diagram after addition of extra delays

The addition of these delays change the dependence of the iterations. It will be sensible to show the dependence of iterations with respect to time. Figure 6 shows the same

Figure 6 dependence graph after adding delays

Here the output quantities X and Y of the first iteration are used for the computation of d vector for the 5th iteration. This is shown by the cross dotted lines. This was what allowed us to pipeline and hence re-time the iterations. From this DG we can derive the systolic array mapping as shown in the following sub-sections.

The dependence matrix is given by

The -4 is because the first iteration at t+4 depends on the 4th (last) iteration of the at ‘t’.

Now we choose the scheduling vector and the projection vector.

Causality constraint:

This satisfies the causality constraint

Projection vector:

Resource conflict avoidance:

Hence no resource conflict.

Node mapping:

PE mapping:

ARC mapping:

Systolic array mapping:

Figure 7 Systolic array mapping after adding delays

In this mapping the 4D delays are for the d vector generation.

This can be further retimed so that the iterations can be piplined. This retimed filter section is shown in Figure 8

Figure 8 Block diagram of filter after pipelining and re-timing

Simulation insfrastructure:

The verilog HDL simulation [3][4]was done using Mentor Graphics ModelSim tool set. All port-mapping was done manually without using Renoir. The synthesizer used was synopsis design analyser.

Verilog Implementation:

The verilog modelling of this filter consists of three main components:

1)The cordic iteration

2)The up-down counter to model the generation of d vector

3)Integration of iterations and sections.

Cordic Iteration design:

The main components required for each cordic iteration is a multiplier or rather we can call it a divider to divide X and Y in powers of 2. The second component is an adder that adds the multiplied values with the original X and Y.

Since the multiplication/division is in powers of 2 we can use shifting to accomplish this.

Design of shifter:

Since each iteration requires shifting more than one bit location in one cycle, the best choice is to have a barrel shifter to do this. In available barrel shifters we need to specify the number of bits to shift and sometimes even the direction for bi-directional shifters. In this case we need not have such programmability for the shifting as the number of bits and the direction is specified in each iteration.We can thus customize the design of the barrel shifter reducing area on FPGA. The design of the barrel shifter as synthesized by design analyser is shown in the appendix.

Design of adder:

Initially the adder used was from the library of design analyser. But the draw back of these adders are that they do not have the saturation logic built in. Saturation arithmatic is very useful in DSP algorithms. Hence a ripple-carry adder with saturation logic was incorporated. Since the register width is 16bits, maximum range of values is between 7FFFH and 8000H. The logic for the saturation arithmetic is shown in Figure 9

Figure 9 Block diagram of saturation logic

Integration of the iterations and sections:

Figure 10 shows the sythesized hardware block diagram of a single section of the cordic lattice filter.This block diagram is for a two cordic iteration section. The block marked counter is the up-down counter. Iter1 and Iter2 are the hardware blocks for the cordic iterations.One-delays are the pipeline registers. The one-bit delay is the delay required for the di bit.

Figure 10 Block diagram of the 2-section lattice filter

Figure11 shows the hardware block diagram for a two stage lattice filter. The block named P-stage is a single filter section as shown in Figure 11. The one-delays are the pipeline registers.

Figure 11 Block diagram of a single section of the filter

Results:

The cordic based lattice filter was simulated and verified with the following sample calculation:

With a cordic input of 10 and -20 the first section results in the 2 an -5 and this is then propagated to the next section resulting in 10 and 1. This can be verified with the simulation output attached in the appendix. The simulation was carried out for different values and some of the traces are put in the appendix.

Dynamic range:

An important aspect to be figured out in such implementations is the dynamic range. Exhaustive simulations were run to find the input range that could be given as an input to the filter for worst case ‘d’ vectors of ‘0000’ and ‘1111’. The simulated filter was a two stage, two section filter. With a register width of 16bits, it was found that the maximum input that could be given was 15-bit signed values.

Conclusion and possible improvements:

The CALF algorithm was implemented in verilog HDL and also synthesized. The main area that can be improved is the adder. The adder can be made carry-look-ahead or can be pipelined. This though would take up more area on FPGA, it would definitely improve the speed. The scaling factor was not taken into account in this project. This would require a multiplier. Since the constant to be multiplied is pre-determined, the design of the multiplier can be optimized for speed.

References:

[1] Ray Andraka, A survey of CORDIC algorithms for FPGA based computers. Andraka

consulting group.

[2] Yu Hen Hu, On the convergence of CORDIC adaptive lattice filter (CALF) algorithm. IEEE transactions on signal processing, vol 46, No.7

[3] Y.H.Hu, H.E.Liao. A CORDIC adaptive lattice filter. IEEE transactions on signal

processing vol 40, 1992.

[4] K.Parhi. VLSI digital signal processing systems, design and implementation. John

Wiley and sons, 1999

[5] S.Palnitkar. Verilog HDL, a guide to digital design and synthesis. Prentice Hall

[6] M.D.Cilleti. Modeling, Synthesis and rapid prototyping with the Verilog HDL.

Prentice Hall, 1999

APPENDIX A

WAVE FORMS

APPENDIX B

MATLAB CODE

CALF algorithm

P = 2;

T = 2;

d = zeros(T+1,P+1);

N =2;

X = [0 256 512];

Y = [0 0 0];

f = zeros(T+1,P+1);

b = zeros(T+1,P+1);

mu =1;

%vi= zeros(N);

for j = 2:P+1

d(2,j) = 2;

end

for t = 2:T+1

f(t,1) = X(t);

b(t,1) = Y(t);

for p = 2:P+1

x(1) = f(t,p-1);

y(1) = b(t-1,p-1);

d(t,p)

vi = 2*int2bin(d(t,p),N,0)-1

for i = 1:N

x(i+1) = x(i) - vi(i)*y(i)*2^(-i-1);

y(i+1) = y(i) - vi(i)*x(i)*2^(-i-1);

end

f(t,p) = x(N+1);

b(t,p) = y(N+1);

x(N+1)

y(N+1)

if d(t,p)>0 & d(t,p)<2^N - 1

d(t+1,p) = d(t,p) + mu*sign(x(N))*sign(y(N));

elseif d(t,p) ==0 | d(t,p)==2^N - 1

d(t+1,p) = d(t,p);

end

Integer to binary conversiom

[mx,my]=size(x);

if mx > 1 & my > 1, % convert x into a column vector

disp(' x must be a vector, abort!')

break

elseif my > mx,

x=x';

end

if sb == 1

b=-(sign(x)-1)*0.5.*[x~=0]; % if x = 0, sign bit is 0 by default

else

b=[];

end

x=abs(x); % work on magnitude only from now on.

if max(x) >= 2^n,

disp(' x must be smaller than 2^n')

break

end

idx=diag(2^(diag([n-1:-1:0])));

for j=1:n

tmp = x - sign(x)*idx(j).*[x~=0];

% if x > 0, tmp = x - 2^(n-j), if x < 0, tmp=x+2^(n-j)

b= [b [tmp >= 0].*[x~=0]];

x=tmp;

endn

APPENDIX C

SYNTHESIZED

HARDWARE

BLOCK DIAGRAMS

Block diagram of lattice filter

Block diagram of a section of lattice filter

Block diagram of single cordic iteration

Schematic of barrel shifter1

Block diagram of barrel shifter2

Block diagram of ripple carry adder with saturation logic

Block diagram of up-down counter

Block diagram of single register delay

Block diagram of one bit delay

APPENDIX D

VERILOG HDL CODE

Verilog Implementation of
Cordic based Adaptive Lattice Filter
(CALF)

Pranay Koka

ECE 734 – Course Project

Guided By – Dr. Yu Hen Hu