AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 17

no-image

AN2094

Manufacturer Part Number
AN2094
Description
ITU-T G.729 Implementation on StarCore SC140
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
2.6.1 Selecting Functions for Assembly Implementation
At this stage of the project, speed increase is the highest priority, although code size can also decrease. The entire
project is profiled, and the most time-consuming functions are identified for the G1 group. Estimates of ideal
execution times for G1 functions are compared with the C optimization results. Functions compiled near-optimal
are left in the optimized C version; the remainder are candidates to be implemented in assembly.
At the assembly implementation stage, functions must have a fixed interface because interface changes to an
assembly function are not direct. Any modification to a C call parameter list requires modification of the
corresponding assembly code where parameters are primarily accessed through the stack.
2.6.2 Implementation Approaches
There are two basic approaches to implementing assembly code—modifying the compiler output and coding
directly in assembly. The first approach is most useful for relatively simple functions for which the compiled code
is close to the optimal version but does not take full advantage of parallel architecture. For example, registers can
be more optimally allocated, or the number of pointers needed to fetch data can be reduced.
A more frequent approach is direct assembly coding of functions that are more complex or do not perform as well
as expected. The optimized C code is used as a reference for testing to ensure that bit-exactness is maintained after
platform-dependent optimizations. In some instances the optimized C code is not optimal for assembly
implementation due to compiler behavior. In these cases, assembly implementation is based on another model,
perhaps another C version or suitable pseudo-code. In any event, it is best to use the C code as a reference,
regardless of compiler performance or the techniques employed.
2.6.3 Implementation Details
Details that must be considered when writing assembly code include processor restrictions, data alignment and
hardware loops alignment, and nesting order. StarCore restrictions (see the SC140 Core Reference Manual, section
6.4, Instruction Set Restrictions) and function calling conventions (see the SC100 Application Binary Interface
Reference Manual, section 2.3, Function Calling Conventions) must be considered with care. For multiple-register
move operations to and from memory, the memory addresses must meet certain alignment restrictions. There is no
assembly equivalent to the C assert() function, so alignment must be checked manually. If a hardware loop
starting address is not aligned (that is, the first execution set is spread over two fetch sets), one stall cycle is added
to the loop execution at each iteration. In intermediate development phases, loop alignment is specified by
FALIGN or OPT LPA assembler directives; in the final version of the code, instructions are rearranged or dummy
instructions are manually inserted to ensure alignment. The order in which loops are nested is important because
the processor prioritizes hardware loops (loop3 is the highest and loop0 is the lowest priority) and executes the
active loop with higher priority. Therefore the proper approach is to nest hardware loops in reverse order of their
indices.
2.6.4 Programming Tips
The following programming tips for assembly optimization are valid for all versions of the development tools;
some of these ideas also apply to C optimizations [13].
Freescale Semiconductor
Perform multisample with less than four samples computed in parallel to reduce the number of
variables or inner loops.
Translate ratio comparisons into product comparisons because multiplication requires less cycles than
division. Combine 16-bit numerators and denominators into a single 32-bit word, pairing MPYUS with
MPYSU.
ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1
Optimization Process
17

Related parts for AN2094