AN2203 Freescale Semiconductor / Motorola, AN2203 Datasheet - Page 50

AN2203

Manufacturer Part Number

AN2203

Description

MPC7450 RISC Microprocessor Family Software Optimization Guide

Manufacturer

Freescale Semiconductor / Motorola

Datasheet

1.AN2203.pdf (76 pages)

Available stocks

Company

Part Number

Manufacturer

Quantity

Price

Company:

Meier Automation Equipment Co., Limited

Part Number:

AN22030A

Manufacturer:

PANASONIC/松下

Quantity:

20 000

Current page: 50 of 76
Download datasheet (650Kb)

Other Optimizations Worth Investigating

4.4.2

With longer pipelines, more functional units, and higher instruction issue rate, the MPC7450 can provide

more instruction level parallelism (ILP) than previous microprocessors. Loops that have long dependency

chains may beneﬁt from software pipelining. On those loops, software pipelining increases ILP by

executing several iterations of the loop in parallel.

4.4.3

Small body inner loops may beneﬁt from unrolling on the MPC7450 more than on prior microprocessors

that implement the PowerPC architecture. By increasing the number of instructions in a loop and reducing

the number of times the loop needs to execute, possible stalls are minimized. The drawback of this technique

is the increased instruction space size required to hold the information. In some cases, increased code size

can result in more instruction cache misses, which may cost more performance than the loop unrolling

gained.

The costs of setting up and ﬁxing up code may also affect the loop unrolling trade-off.

To further extend the code example ﬁrst used in Section 3.1.1, “Fetching,” loop unrolling can be applied.

Because every taken branch on the MPC7450 represents at least one cycle of lost fetch opportunity, it can

often be more advantageous to unroll loops than it has been in the past. The following code assumes that it

is permitted to loop unroll four times (that is, the loop size is evenly divisible by four) and that a value of

loopsize/4 was previously loaded into the CTR (rather than the prior two examples, which assumed the loop

size was loaded into the CTR).

xxxxxx00

xxxxxx04

xxxxxx08

xxxxxx0C

xxxxxx10

xxxxxx14

xxxxxx18

xxxxxx1C

xxxxxx20

Table 4-1 shows that the fetch supply is no longer the bottleneck for the above code sequence. At this point,

the limiting bottleneck becomes the single cache port. For this code, one effective iteration (lwzu/add) is

completing per cycle. Loop unrolling doubles the performance of the aligned example case.

Table 4-1. MPC7450 Execution of One—Two Iterations of Code Loop Example

Software Pipelining

Loop Unrolling for Long Pipelines

lwzu (1)

add (1)

lwzu (2)

add (2)

lwzu (3)

add (3)

lwzu (4)

loop:

Instruction

MPC7450 RISC Microprocessor Family Software Optimization Guide

Freescale Semiconductor, Inc.

For More Information On This Product,

—

lwzu r10,0x4(r9)

add r11,r11,r10

lwzu r10,0x4(r9)

add r11,r11,r10

lwzu r10,0x4(r9)

add r11,r11,r10

lwzu r10,0x4(r9)

add r11,r11,r10

bdnz loop

—

Go to: www.freescale.com

—

MOTOROLA

AN2203 Freescale Semiconductor / Motorola, AN2203 Datasheet - Page 50

AN2203

Available stocks

Related parts for AN2203