AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 12

AN2094

Manufacturer Part Number

AN2094

Description

ITU-T G.729 Implementation on StarCore SC140

Manufacturer

Freescale Semiconductor / Motorola

Datasheet

1.AN2094.pdf (52 pages)

Current page: 12 of 52
Download datasheet (348Kb)

Optimization Process

2.4.3.1 Multisample

Multisampling is a pipelining technique to process multiple samples simultaneously. It takes full advantage of the

SC140 multiple-ALU architecture to maximize parallel operation of the execution units. In addition, this technique

preserves bit-exactness and reduces the number of memory operations. This technique is most efficient when the

number of output samples is a multiple of four. For a more complete description, refer to the Freescale

Semiconductor application note, StarCore Multisample Programming Technique [12].

2.4.3.2 Split Summation

Split summation involves splitting a sum into four partial sums, using four variables and one-fourth the number of

iterations. A final summation of the four partial sums is performed at the end of the process. This technique

changes the sequence of operations, so special care must be taken when applying it to algorithms in which bit-

exactness must be preserved. Split summation is often used to compute signal energy, as illustrated in Code

Example 5. Bit-exactness of the algorithm is maintained because all of the terms of the sum are positive values (the

samples are squared).

/* Compute the energy of the signal stored in

for ( i = 0; i < SIG_LEN; i+=4 )

{

}

e0 = L_add(e0, e1); /* final summation of partial sums */

e2 = L_add(e2, e3);

e0 = L_add(e0, e2);

2.4.3.3 Loop Unrolling

Loop unrolling explicitly repeats the body of a loop with corresponding indices. As a stand-alone technique, loop

unrolling is used to increase the ALU usage per loop step, as is illustrated in Code Example 6.

/* Scale all the values in the signal[] vector of size SIG_LEN (multiple of 4). */

for ( i = 0; i < SIG_LEN; i+=4 )

{

}

Loop unrolling is also performed in conjunction with the multisample technique to reuse variables that are already

fetched, thus reducing memory bandwidth and alignment requirements.

* the signal[] vector of size SIG_LEN (multiple of 4).

* e0, e1, e2, e3 are partial sums.

* The final result is stored in e0.

e0 = L_mac(e0, signal[i+0], signal[i+0]);

e1 = L_mac(e1, signal[i+1], signal[i+1]);

e2 = L_mac(e2, signal[i+2], signal[i+2]);

e3 = L_mac(e3, signal[i+3], signal[i+3]);

signal[i+0] = L_shr(signal[i+0], 2);

signal[i+1] = L_shr(signal[i+1], 2);

signal[i+2] = L_shr(signal[i+2], 2);

signal[i+3] = L_shr(signal[i+3], 2);

ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1

Example 5. Split Summation

Example 6. Loop Unrolling

Freescale Semiconductor

AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 12

AN2094

Related parts for AN2094