AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 29

no-image

AN2094

Manufacturer Part Number
AN2094
Description
ITU-T G.729 Implementation on StarCore SC140
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
The reference C code uses ‘greater or equal’ to compare values in its search for the maximum correlation, but there
is no such instruction for the SC140. The ‘greater or equal’ comparison was replaced with ‘greater’ so that the
compiler would generate more efficient code, and the comparison order was reversed to retain the same
comparison bias as the original code. (The original code applied the ‘greater or equal’ comparison from maximum
to minimum lag values, thus favoring the minimum value. Applying the ‘greater’ comparison to the values in order
from minimum to maximum also favors the minimum value.)
The Lag_max() code resulting from these function-level changes is listed in Appendix A. Because four
correlations are computed in parallel, four comparisons must be performed inside the outer loop to determine the
maximum value. The compiler generated efficient code for the inner loop (four MACs and two moves in the same
execution set), but did not generate the best possible code for comparisons, so this became the focus of assembly
optimization.
4.3.2 Assembly Implementation
The initial assembly version was developed from the optimized C version, without optimizing the four sequential
comparison blocks. Less than 100 cycles was gained, but the code was quite similar to the code generated by the
compiler, which verified that the compiler generated nearly optimum code.
The comparison block, which initially used one variable to track the maximum value, required 8 cycles. By using
two variables, the cycle count can theoretically be reduced to 5 cycles. Software pipelining reduces the effective
cycle count to closer to four. An additional cycle is also required to compare the two variables to each other. This
technique does not compile well from C, so it was implemented in assembly, as shown in Code Example 8.
cmpgt d0,d7
[
]
[
]
[
]
[
]
The final assembly code for the Lag_max() function is listed in Appendix B. In this version, the final
comparison is moved to just after the doen3 instruction to fulfill the 2-instruction minimum requirement between
doen3 and loopstart3. Also note the use of the fake comparison of d0 and d1 to initialize the T bit to FALSE
(d0 and d1 are not equal), which gains 1 cycle in the outer loop.
4.3.3 Summary
Table 12 lists the Lag_max() cycle count and the code size for each version.
Freescale Semiconductor
ift
ifa
ift
ifa
ift
ifa
ift
ifa
tfr d7,d0
cmpgt d1,d6
tfr d6,d1
cmpgt d0,d5
tfr d5,d0
cmpgt d1,d4
tfr d4,d1
sub #2,d2
ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1
Example 8. Lag_max() Comparison Fragment Using Two Maxima
add d12,d2,d15
sub #2,d2
add d12,d2,d14
sub #2,d2
add d12,d2,d15
sub #2,d2
add d12,d2,d14
Details of Selected Functions
29

Related parts for AN2094