AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 11

no-image

AN2094

Manufacturer Part Number
AN2094
Description
ITU-T G.729 Implementation on StarCore SC140
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
Profile data was obtained after the project-level optimizations were implemented, focusing on the total number of
cycles per application step. The new profile data served as a baseline for all further analysis.
2.4 Function-Level C Optimization
The primary focus of this phase is to take full advantage of the parallel architecture of the SC140. The optimization
techniques presented here can be performed without global knowledge of the algorithms and without detailed
analysis of data and control flow. The general approach for optimizing each function includes the following steps:
2.4.1 Selecting the Functions to be Optimized
The selection of functions for optimization was based on profiler data and experience with similar applications,
focusing on the most time-consuming functions. The functions selected are collectively referred to as ‘G1’
functions. The main criterion used to select the functions to be optimized was the profiler data obtained after the
project-level optimizations. Information provided by the profiler includes:
2.4.2 Predicting Speed Improvement
Optimizing a C function yielded speed improvements ranging from 1.08 to 4 times. Improvements were greatly
affected by such function characteristics as the number of variables and pointers, the number and dimension of
loops, data alignment, data dependencies, and the number of internal calls to other functions. For this reason it was
difficult to estimate the speed improvement in advance. For example, the speed improvement is much easier to
predict when multi-sample techniques are employed. On the other hand, the speed improvement for a function
containing many calls to other functions or extensive control code, is not only difficult to predict but also difficult
to achieve. Typical DSP code without data dependencies (for example, correlation or energy) easily yields a four
time improvement, while DSP code with data dependencies generally yields a two to three time improvement.
2.4.3 General Optimization Techniques
Several optimization techniques were employed, including multisample, split summation, loop unrolling, loop
merging, and loop splitting. Most optimizations employing these techniques also require data alignment, but these
alignments rarely cause conflicts or degrade performance.
Freescale Semiconductor
1.
2.
3.
4.
Number of calls per application step. Based on this information, the developer decides if a small,
frequently-called function should be inlined. This information is most useful during project-level
optimization.
Number of cycles per call. This information is necessary to predict the speed improvement gained by
optimizing a function. However, the number of cycles per call is not very useful by itself, because there
are cases in which a function with a small number of cycles per call is called several times per
application step.
Total number of cycles per application step (in G.729, a frame). This is the most useful information for
selecting the functions to optimize.
Establish the function interface and separate the function from the rest of the code to facilitate analysis.
Add test code that saves the function input and output contexts before and after each function call.
Optimize the function and monitor the output context to ensure that it remains the same as the corre-
sponding reference output context.
Integrate the optimized function with the rest of the program and run the program to see if the vocoder
passes all ITU-T test vectors.
ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1
Optimization Process
11

Related parts for AN2094