AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 13

AN2094

Manufacturer Part Number

AN2094

Description

ITU-T G.729 Implementation on StarCore SC140

Manufacturer

Freescale Semiconductor / Motorola

Datasheet

1.AN2094.pdf (52 pages)

Current page: 13 of 52
Download datasheet (348Kb)

2.4.3.4 Loop Merging

Combining two or more loops into a single loop loads the ALUs more efficiently and reduces the number of AGU

operations, as illustrated in Code Example 7. If a merged loop still does not use all available ALU units it can be

combined with other techniques described in this section.

/* initial loops */

for(i=0; i<L_WINDOW; i++)

{

}

for(i=0; i<L_WINDOW; i++)

{

}

/* loops merged */

for(i=0; i<L_WINDOW; i++)

{

}

2.4.3.5 Loop Splitting

Loop splitting refers to the process of breaking a large loop with several variables or pointers into two or more

shorter loops and saving the results of the partial computations in local vectors. This technique enables the

compiler to allocate registers more efficiently, resulting in substantial performance improvement, especially when

combined with other optimization techniques.

2.4.4 Programming Tips

The following is a summary of programming tips based on our experience in optimizing C functions. They are

described in detail in Efficient Programming Techniques for the SC140 [13].

Freescale Semiconductor

y[i] = mult_r(x[i], hamwindow[i]);

e = L_mac(y[i], y[i]);

y[i] = mult_r(x[i], hamwindow[i]);

•

= L_mac(y[i], y[i]);

Declare variables as close as possible to their area of use (using C blocks) to help the compiler identify

their life cycles. This improves register allocation but may require more stack memory.

Use the #pragma loop_count statement, to declare that the minimum number of cycles is greater

than zero, which helps the compiler to eliminate a test.

Perform loop unrolling by rolling and reusing the values that come from the unaligned vectors.

Use the >> operator in a variable shift displacement to prevent the compiler from translating the

operation into a function call.

Reverse the iteration order in a loop to obtain a more useful sequence of values.

Evaluate the effect of multisample without loop unrolling to determine if the speed improvement in the

unrolled case is worth the additional memory consumption.

Add internal pointers to arrays that are already aligned to improve both alignment and clarity.

When the data alignment property of a vector is not recognized in a code sequence, create a new

function with that vector as a parameter and use the #pragma align directive to specify the

alignment.

Use the << operator instead of the L_shl() function to prevent the compiler from inserting a function

call if overflow or underflow does not occur after a left-shift operation.

ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1

Example 7. Loop Merging

Optimization Process

AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 13

AN2094

Related parts for AN2094