AN2768 Freescale Semiconductor / Motorola, AN2768 Datasheet

no-image

AN2768

Manufacturer Part Number
AN2768
Description
Implementation of a 128-Point FFT on the MRC6011 Device
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
© Freescale Semiconductor, Inc., 2004. All rights reserved.
Freescale Semiconductor
Application Note
The Fast Fourier Transform (FFT) is an efficient way to compute the
Discrete-time Fourier Transform (DFT) by exploiting symmetry and
periodicity in the DFT. Because it is so efficient, the algorithm is
implemented on many DSPs and hardware platforms for real-time
applications. FFT applications include not only DSP but also spectrum
analysis, speech processing, and filter designs where filter coefficients
are determined according to the frequency response of the filter. The
frequency response of a filter can be obtained by taking the Discrete
Fourier Transform of its impulse response. Conversely, given the
frequency response samples in the frequency domain, the time domain
impulse response can be computed by taking the inverse DFT. For any
discrete time sequences, the frequency components or spectral
components can be obtained by taking the FFT on the discrete time
sequences.
This application note describes how a 128-point FFT is implemented
on the Freescale Semiconductor MRC6011 reconfigurable compute
fabric (RCF ) device, which has an array of processors working in
parallel. FFT butterflies are normally performed sequentially.
However, the MRC6011 device has 16 processors that can perform 16
simultaneous butterflies. The RCF programming techniques to
achieve this parallelism are the subject of this application note. We
begin with a look at the parallel RCF architecture, which is the basis
for the parallel butterfly operations on the RC array. Also, we describe
the effects of finite-precision arithmetic relative to the MRC6011
device and the FFT results.
Implementation of a 128-Point FFT on
the MRC6011 Device
by Zhao Li, Hirokazu Higa, and Ed Martinez
CONTENTS
1
1.1
1.2
1.3
2
2.1
2.2
3
3.1
3.2
3.3
3.4
3.5
3.6
4
4.1
4.2
4.3
5
6
7
Decimation in Time and in Frequency ......................2
Radix-2 and Radix-4 ..................................................4
Bit-reversal .................................................................6
Frame Buffer ..............................................................8
RC Array ....................................................................9
Data Organization for Parallel Computing ................9
Input Data Storage in Frame Buffer ..........................9
Twiddle Factor Storage in Frame Buffer .................10
Transposition for Bit Reversal .................................11
Post-Transpose Data Organization ..........................15
Parallel Butterfly Operations ...................................15
Input Data Analysis .................................................23
Butterfly Output Scaling ..........................................24
Rounding Operation on RC .....................................26
Basics of Fast Fourier Transform ............................. 2
MRC6011 Architecture Overview ............................ 6
FFT on the MRC6011 Device .................................. 9
Fixed Point and Precision Issues ............................ 22
Performance and Error Analysis ............................. 26
Summary ................................................................. 28
References ............................................................... 29
Rev. 0, 7/2004
AN2768

Related parts for AN2768

AN2768 Summary of contents

Page 1

... Input Data Storage in Frame Buffer ..........................9 3.3 Twiddle Factor Storage in Frame Buffer .................10 3.4 Transposition for Bit Reversal .................................11 3.5 Post-Transpose Data Organization ..........................15 3.6 Parallel Butterfly Operations ...................................15 4 Fixed Point and Precision Issues ............................ 22 4.1 Input Data Analysis .................................................23 4.2 Butterfly Output Scaling ..........................................24 4.3 Rounding Operation on RC .....................................26 5 Performance and Error Analysis ............................. 26 6 Summary ................................................................. 28 7 References ............................................................... 29 AN2768 ...

Page 2

Basics of the Fast Fourier Transform 1 Basics of the Fast Fourier Transform The DFT is the basis of the fast Fourier transform. The DFT of a finite-length sequence of length N is defined as follows − π ...

Page 3

N/2-Point DFT x[4] x[6] x[1] x[3] N/2-Point DFT x[5] x[7] Figure 1. Decimation in Time of an N-Point DFT into Two (N/2)-Point DFT ( x[0] x[4] x[2] x[6] Figure 2. Decimation in Time of an (N/2)-Point ...

Page 4

Basics of the Fast Fourier Transform X[0] X[1] X[2] X[3] X[4] X[5] X[6] x[7] Figure 3. Decimation in Frequency of an N-Point DFT into Two (N/2)-Point DFT ( Similar to DIT, the DIF principle is illustrated in the ...

Page 5

Figure 5 shows a radix-4 butterfly the FFT number of points is a power of 2, the successive decomposition results in 2-point butterflies. If the number of ...

Page 6

MRC6011 Architecture Overview If we define = C cos( The 2-point butterfly output in Equation 6 and Equation 7 can be formulated as follows Where This relationship in Equation 12 ...

Page 7

MRC6011 device delivers a peak performance of 24 giga complex correlations per second with a sample resolution of 8 bits for I and Q inputs each– giga complex correlations per second at a resolution of ...

Page 8

MRC6011 Architecture Overview The RC controller executes the main control process of an application and schedules every execution cycle of the processing array. The actual array instructions reside in the context memory. The wide interconnect path between the context memory ...

Page 9

RC Array The 16-element RC array operates using the same clock as the RC controller that schedules array operations. Each RC can complete a MAC operation, so the array performs a maximum of 16 MACs on each core clock ...

Page 10

The FFT on the MRC6011 Device Real Imaginary Figure 9. Input Data Storage in Frame Buffer Since the input data is transposed to produce the bit-reversed input for a decimation in time FFT, the initial data organization in the frame ...

Page 11

The imaginary (or sine) part of the twiddle factors is stored immediately after the real (or cosine) part of the twiddle factors. Each part of the twiddle factors has an associated pointer that is updated during the butterfly operations only ...

Page 12

The FFT on the MRC6011 Device At the end of the real input data loading, the input data are interleaved and are ready for the final shuffling to complete the transpose operation. Figure 11 shows the layout of the data ...

Page 13

The shuffling that brings the data format from that in Figure 11 to that in Figure 12 uses the connectivity of the RCs within quadrants and the available fast lanes between the quadrants of the RCs. Eight cycles are required ...

Page 14

The FFT on the MRC6011 Device MORPHO{//5.cycle CELL{*,0}:R4 CELL{*,1}:R5 CELL{*,2}:R0 CELL{*,3}:R1 CELL{*,4}:R6 CELL{*,5}:R7 CELL{*,6}:R2 CELL{*,7}:R3 } MORPHO{//6.cycle CELL{*,0}:R5 CELL{*,1}:R4 CELL{*,2}:R1 CELL{*,3}:R0 CELL{*,4}:R7 CELL{*,5}:R6 CELL{*,6}:R3 CELL{*,7}:R2 } MORPHO{//7.cycle CELL{*,0}:R2 R12=BYP{R0{*,4}};//Expr lane right->left CELL{*,1}:R3 R13=BYP{R1{*,5}};//Expr lane right->left CELL{*,2}:R6 R8 =BYP{R2{*,0}}; CELL{*,3}:R7 R9 ...

Page 15

Post-Transpose Data Organization The 128 complex data are divided into eight groups, each with 16 complex numbers, as shown in Figure 13. Simultaneous butterfly operations are performed in stages within groups and between groups to reach the final FTT ...

Page 16

The FFT on the MRC6011 Device Stage 1 x[ x[ x[ x[ Figure 14. 8-point DIT FFT Butterfly Diagram The ...

Page 17

Example 3. Parallel Butterfly Operations without Scaling Down _DEC_CIRCULAR_BUFFER_SELECT = 0; _DEC_AUTOINCREMENT0 = (unsigned long)psiFBInputData; /* Stage1 */ /* G1,G2[Re,Im] = k1[R4,R0], k2[R5,R1], tmp = R8,R9 */ CELL{*,*} R13 = MULSIH{FB{psiFBInputTwiddleCos, 0, OMEGA_BR2, COL_BUS, WORD},R5, R14} << 1; CELL{*,*} R12 ...

Page 18

The FFT on the MRC6011 Device As indicated in the code segment of Example 3, the outputs of the butterflies go into R8, R9, R4 all 16 cells of the RC array, which are addressed as R8{*,*}. Stage ...

Page 19

Cell → 1 R8{0,*} G[ 0]r G[ 32]r R9{0,*} G[ 0]i G[ 32]i G1 R4{0,*} G[ 64]r G[ 96]r R0{0,*} G[ 64 G64 G32 G96 G16 G80 G48 G112 G8 G0 G32 G64 G96 G16 ...

Page 20

The FFT on the MRC6011 Device Cell → 1 R8{0,*} H[ 0]r H[ 64]r R9{0,*} H[ 0]i H[ 64]i G1 R4{0,*} H[ 32]r H[ 96]r R0{0,*} H[ 32 H32 H64 H96 H16 H48 H80 H0 ...

Page 21

Cell → 1 R8{0,*} I[ 0]r I[ 64]r R9{0,*} I[ 0]i I[ 64]i G1 R4{0,*} I[ 16]r I[ 80]r R0{0,*} I[ 16 I16 I64 I80 I0 I8 I64 I72 I32 Figure 18. Data Regrouping for ...

Page 22

Fixed-Point and Precision Issues MORPHO{ CELL{*, BYP{R8}; CELL{*, BYP{R8}; CELL{*, BYP{R8}; CELL{*, BYP{R8}; CELL{*, BYP{R0}; CELL{*, BYP{R0}; CELL{*, BYP{R0}; CELL{*, BYP{R0}; } MORPHO{ CELL{*,0} ...

Page 23

Figure 20. Vector Representation of the DIT Butterfly Figure 21 shows that the largest magnitude occurs when the vectors B ´ B and A point in the same direction ´ A and Figure ...

Page 24

Fixed-Point and Precision Issues at any time equal their magnitude (they become either purely real or purely imaginary). Therefore, the magnitude of the FFT input data must be limited by input equals the magnitude and therefore must satisfy We can ...

Page 25

Example 5. Scale Down Approach Applied to Stage 1 Butterfly for Group 1 and 2 Data /* Result of G1,G2[Re,Im] = k1[R8,R9], k2[R4,R0 Stage1 */ /* G1,G2[Re,Im] = k1[R4,R0], k2[R5,R1], tmp = R8,R9 */ CELL{*,*} R13 = MULSIH{FB{psiFBInputTwiddleCos, ...

Page 26

Performance and Error Analysis • The right 1 bit shift at the end of the R13, R12 addition. • Two additional multiplication cycles to scale down R4 and R0 for adjustment. • One NOP cycle as a result of the ...

Page 27

Table 4. RCF Cycle Counts for Separate Runs of the 128-Point FFT Kernel Name FFT128_Transpose FFT128_Stage1_6 FFT128_Stage7 TOTAL Table 5 lists the frame buffer memory requirements of the code. The RCF frame buffer size ( sufficient to ...

Page 28

Summary Where flt indicates a MATLAB® floating-point FFT result and fix indicates an RCF result. The multiplier of 128 is needed because of the scale down by 2 operation at each stage of the butterfly operation. Figure 24 shows the ...

Page 29

Input Sine Wave with Normalized Frequency of 0.1 Hz × –1 – Absolute Error of FFT 128 by RCF as Compared to Matlab Figure 25. Input ...

Page 30

References Implementation of a 128-Point FFT on the MRC6011 Device, Rev Freescale Semiconductor ...

Page 31

NOTES: Implementation of a 128-Point FFT on the MRC6011 Device, Rev. 0 Freescale Semiconductor References 31 ...

Page 32

... Learn More: For more information about Freescale Semiconductor products, please visit http://www.freescale.com AN2768 Rev. 0 6/2004 Information in this document is provided solely to enable system and software implementers to use Freescale Semiconductor products. There are no express or implied copyright licenses granted hereunder to design or fabricate any integrated circuits or integrated circuits based on the information in this document ...

Related keywords