AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 13

no-image

AN2094

Manufacturer Part Number
AN2094
Description
ITU-T G.729 Implementation on StarCore SC140
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
2.4.3.4 Loop Merging
Combining two or more loops into a single loop loads the ALUs more efficiently and reduces the number of AGU
operations, as illustrated in Code Example 7. If a merged loop still does not use all available ALU units it can be
combined with other techniques described in this section.
/* initial loops */
for(i=0; i<L_WINDOW; i++)
{
}
for(i=0; i<L_WINDOW; i++)
{
}
/* loops merged */
for(i=0; i<L_WINDOW; i++)
{
}
2.4.3.5 Loop Splitting
Loop splitting refers to the process of breaking a large loop with several variables or pointers into two or more
shorter loops and saving the results of the partial computations in local vectors. This technique enables the
compiler to allocate registers more efficiently, resulting in substantial performance improvement, especially when
combined with other optimization techniques.
2.4.4 Programming Tips
The following is a summary of programming tips based on our experience in optimizing C functions. They are
described in detail in Efficient Programming Techniques for the SC140 [13].
Freescale Semiconductor
y[i] = mult_r(x[i], hamwindow[i]);
e = L_mac(y[i], y[i]);
y[i] = mult_r(x[i], hamwindow[i]);
e
= L_mac(y[i], y[i]);
Declare variables as close as possible to their area of use (using C blocks) to help the compiler identify
their life cycles. This improves register allocation but may require more stack memory.
Use the #pragma loop_count statement, to declare that the minimum number of cycles is greater
than zero, which helps the compiler to eliminate a test.
Perform loop unrolling by rolling and reusing the values that come from the unaligned vectors.
Use the >> operator in a variable shift displacement to prevent the compiler from translating the
operation into a function call.
Reverse the iteration order in a loop to obtain a more useful sequence of values.
Evaluate the effect of multisample without loop unrolling to determine if the speed improvement in the
unrolled case is worth the additional memory consumption.
Add internal pointers to arrays that are already aligned to improve both alignment and clarity.
When the data alignment property of a vector is not recognized in a code sequence, create a new
function with that vector as a parameter and use the #pragma align directive to specify the
alignment.
Use the << operator instead of the L_shl() function to prevent the compiler from inserting a function
call if overflow or underflow does not occur after a left-shift operation.
ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1
Example 7. Loop Merging
Optimization Process
13

Related parts for AN2094