AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 12

no-image

AN2094

Manufacturer Part Number
AN2094
Description
ITU-T G.729 Implementation on StarCore SC140
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
Optimization Process
2.4.3.1 Multisample
Multisampling is a pipelining technique to process multiple samples simultaneously. It takes full advantage of the
SC140 multiple-ALU architecture to maximize parallel operation of the execution units. In addition, this technique
preserves bit-exactness and reduces the number of memory operations. This technique is most efficient when the
number of output samples is a multiple of four. For a more complete description, refer to the Freescale
Semiconductor application note, StarCore Multisample Programming Technique [12].
2.4.3.2 Split Summation
Split summation involves splitting a sum into four partial sums, using four variables and one-fourth the number of
iterations. A final summation of the four partial sums is performed at the end of the process. This technique
changes the sequence of operations, so special care must be taken when applying it to algorithms in which bit-
exactness must be preserved. Split summation is often used to compute signal energy, as illustrated in Code
Example 5. Bit-exactness of the algorithm is maintained because all of the terms of the sum are positive values (the
samples are squared).
/* Compute the energy of the signal stored in
for ( i = 0; i < SIG_LEN; i+=4 )
{
}
e0 = L_add(e0, e1); /* final summation of partial sums */
e2 = L_add(e2, e3);
e0 = L_add(e0, e2);
2.4.3.3 Loop Unrolling
Loop unrolling explicitly repeats the body of a loop with corresponding indices. As a stand-alone technique, loop
unrolling is used to increase the ALU usage per loop step, as is illustrated in Code Example 6.
/* Scale all the values in the signal[] vector of size SIG_LEN (multiple of 4). */
for ( i = 0; i < SIG_LEN; i+=4 )
{
}
Loop unrolling is also performed in conjunction with the multisample technique to reuse variables that are already
fetched, thus reducing memory bandwidth and alignment requirements.
12
* the signal[] vector of size SIG_LEN (multiple of 4).
* e0, e1, e2, e3 are partial sums.
* The final result is stored in e0.
*/
e0 = L_mac(e0, signal[i+0], signal[i+0]);
e1 = L_mac(e1, signal[i+1], signal[i+1]);
e2 = L_mac(e2, signal[i+2], signal[i+2]);
e3 = L_mac(e3, signal[i+3], signal[i+3]);
signal[i+0] = L_shr(signal[i+0], 2);
signal[i+1] = L_shr(signal[i+1], 2);
signal[i+2] = L_shr(signal[i+2], 2);
signal[i+3] = L_shr(signal[i+3], 2);
ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1
Example 5. Split Summation
Example 6. Loop Unrolling
Freescale Semiconductor

Related parts for AN2094