AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 33

no-image

AN2094

Manufacturer Part Number
AN2094
Description
ITU-T G.729 Implementation on StarCore SC140
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
:
The diamonds in Figure 7 represent a hypothetical variation based on our actual results. In this variation, only
those functions which are not eventually implemented in assembly are optimized in C; this amounts to code
representing 86 percent of the original run time. The ‘BestC-86 percent’ point represents the performance after this
portion of the code is optimized, with no assembly optimizations. BestC-86 percent+6Asm represents the
performance after an additional six (non-C-optimized) functions are implemented in assembly. These results
confirm that the SC140 has a compiler-friendly architecture and that more emphasis should be placed on C
development and less on assembly. Our project demonstrated the importance of developing code in C and
implementing in assembly only those few functions for which the compiler does not produce optimum code.
6
The approach used and recommended has the following key components:
Selected C functions are optimized at both the project and function level. These optimizations can be performed
without specific knowledge of the code algorithms. Application profiling helps to identify the most time-
consuming functions. Previous experience with a similar application further refines information gathered through
profiling.
The recommended procedure for implementing project-level and function-level C optimization includes profiling
the initially ported code, inlining frequently-called functions, and optimizing the C code so that the compiler
produces code that is better adapted to the SC140 architecture. In the C optimization step we employed several
techniques, including multisample, loop unrolling, split summation, and loop merging. On the G.729 project, these
optimizations reduced the program run time by a factor of 1.76, increasing code size by only 11 percent. The
development time for this phase of the project was 5.5 man-months. The programming team had broad experience
with C but not with the StarCore platform.
Further run-time reduction is achieved by applying algorithmic changes to critical functions grouped in modules.
This activity involves advanced understanding of the algorithms and algorithm design. Four such modules were
chosen for the vocoder project. The algorithm-modified functions were then reoptimized in C code.
Profiler data assists in identifying functions on which to implement algorithmic changes. Each of the functions
initially selected accounted for more than 5 percent of the total execution time of the ported version of the vocoder,
and collectively took more than 60 percent of total execution time of the optimized C version. This initial set of
functions was expanded to include functions that provide inputs to and receive outputs from the functions in the
initial set.
Freescale Semiconductor
Conclusions
C source optimizations for selected functions, with or without algorithmic changes.
Assembly implementation of a restricted number of time-critical functions.
Number of Functions
Table 13. Performance Versus Number of Assembly-Implemented Functions
ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1
18
0
3
6
MCPS
10.49
12.8
9.47
8.44
Improvement Factor
1.22
1.35
1.52
1
Additional Man-Months
0
1
2
5
Conclusions
33

Related parts for AN2094