at40k-fft ATMEL Corporation, at40k-fft Datasheet - Page 4

no-image

at40k-fft

Manufacturer Part Number
at40k-fft
Description
Fast Fourier Transform Intellectual Property Core At40k Fpgas
Manufacturer
ATMEL Corporation
Datasheet
Figure 4. Architecture of the butterfly.
Figure 4 shows the internal architecture of the 2 clock cycle
butterfly. From the input the data flow splits into two main
paths. The upper one computes A’=(A+B)/2 and the lower
one B’=(A-B)xW
are then multiplexed onto the output data bus using tristate
buffers.
In upper path the input data is fed into an accumulator. This
sums it every two cycles, divides the result by two and
rounds to 12 bits, giving A’=(A+B)/2. The result is then sent
through a short register based delay to align it with the
result from the slightly longer lower data path used to com-
pute B’.
In the B’ data path the input data is fed into a differencing
circuit which computes –A+B. The real and imaginary 13 bit
results are then fed through a cross bar switch which pre-
sents the difference of the imaginary components (-
Im(A)+Im(B)) to both 13x12 bit signed multipliers in the first
cycle, and the difference of the real components (-
Re(A)+Re(B)) in the second. In the first cycle both multipli-
ers multiply by the imaginary component of the twiddle fac-
tor (Im(WT)) and in the second by the negated real
component (-Re(WT)). (Note that the reason for negating
the real component of the twiddle factor is discussed in
section Twiddle factor ROM.)
The outputs of the two multipliers are therefore Im(-
A+B)xIm(WT), Re(-A+B)x-Re(WT) and Re(-A+B)xIm(WT),
Im(-A+B)x-Re(WT). The 25 bit results from the multipliers
are then truncated to 14 bits. One data stream is fed into an
accumulator which sums the results, divides by two and
truncates the result to 12 bits with rounding, giving,
The other is fed into a differencing circuit which subtracts
the two results, divides by two and truncates the result to
12 bits with rounding, yielding,
4
Re(B’)=(Im(-A+B)xIm(W
Re(W
T
))/2=Re((A-B)xW
T
/2. The results from these two data paths
Re( A ), Re( B )
Im ( A ), Im( B )
AT40K-FFT
T
T
).
)+Re(-A+B)x
1 2
1 2
Im ( W
1 3
1 3
1 2
1 2
Cross Bar
T
), -Re( W
T
)
Delay
Delay
1 3
1 3
To improve performance the butterfly is more heavily pipe-
lined than is shown in Figure 4. This results in an input to
output latency for the butterfly of 9 cycles.
Twiddle factor ROM
The twiddle factor ROM is organized as 256 x 12 bit words.
Each of the 128 complex twiddle factors is stored in the
ROM with alternate words being used for the real and
imaginary components.
To improve the accuracy of the most common twiddle fac-
tor, i.e. W
are negated in the ROM. Thus W
than 0.995; 0.995 being the closest approximation to 1 per-
mitted by a 12 bit two’s complement fractional integer.
The twiddle factor ROM was generated using Atmel’s
macro generator. The input data file for the macro genera-
tor was generated using a MatLab script.
Data RAM
The data RAM is configured as two 256 x 12 bit dual ported
memories. One is used to hold the real components and
the other to hold the imaginary components.
The availability of dual ported, as opposed to single ported,
memory significantly improves the utilization of on chip
RAM. Use of single ported RAM would have required twice
as much memory to permit the concurrent data fetches and
stores required by the butterfly.
To achieve maximum memory efficiency the FFT algorithm
uses “in place” computation, i.e. the data stored after each
butterfly computation is placed back in the same locations
that the input data was read from. This requirement has no
impact on performance, but does somewhat complicate the
design of the data address generators.
Im(B’)=(-Re(-A+B)xIm(W
Re(W
1 4
1 4
T
0
))/2=Im((A-B)xW
=1, the real components of the twiddle factors
1 2
1 2
1 2
1 2
1 2
1 2
T
Re( A' ), Re( B' )
).
Im( A' ), Im( B' )
T
)+Im(-A+B)x-
0
is stored as –1 rather

Related parts for at40k-fft