SAA7115HL/V1,518 NXP Semiconductors, SAA7115HL/V1,518 Datasheet - Page 82

IC DIGITAL VIDEO DECODER 100LQFP

SAA7115HL/V1,518

Manufacturer Part Number
SAA7115HL/V1,518
Description
IC DIGITAL VIDEO DECODER 100LQFP
Manufacturer
NXP Semiconductors
Type
Video Decoderr
Datasheets

Specifications of SAA7115HL/V1,518

Package / Case
100-LQFP
Applications
Set-Top Boxes
Mounting Type
Surface Mount
Mounting Style
SMD/SMT
Lead Free Status / RoHS Status
Lead free / RoHS Compliant
Voltage - Supply, Analog
-
Voltage - Supply, Digital
-
Lead Free Status / RoHS Status
Lead free / RoHS Compliant, Lead free / RoHS Compliant
Other names
935270666518
SAA7115HLBE-T
SAA7115HLBE-T

Available stocks

Company
Part Number
Manufacturer
Quantity
Price
Part Number:
SAA7115HL/V1,518
Manufacturer:
Sigma Designs Inc
Quantity:
10 000
PNX1300/01/02/11 Data Book
overhead of the inner loop has been eliminated, further
increasing the performance advantage.
4.4.2
The code transformations of the previous section
achieved impressive performance improvements, but
given the VLIW nature of the PNX1300 CPU, more can
be done to exploit PNX1300’s parallelism.
The code in
erations (excluding loop overhead). Since PNX1300’s
branches have a 3-instruction delay and each instruction
can contain up to 5 operations, a fully utilized minimum-
sized loop can contain 16 operations (20 minus loop
overhead).
The PNX1300 compilation system performs a wide vari-
ety of powerful code transformation and scheduling opti-
mizations to ensure that the VLIW capabilities of the
CPU are exploited. It is still wise, however, to make pro-
gram parallelism explicit in source code when possible.
Explicit parallelism can only help the compiler produce a
fast running program.
To this end, we can unroll the loop of
number of times to create explicit parallelism and help
the compiler create a fast running loop. In this case,
where the number of iterations is a power-of-two, it
makes sense to unroll by a factor that is a power-of-two
to create clean code.
Figure 4-15
The compiler can apply common sub-expression elimi-
nation and other optimizations to eliminate extraneous
operations in the array indexing, but, again, improve-
ments in the source code can only help the compiler pro-
duce the best possible code and fastest-running pro-
gram.
4-10
Figure 4-14. The loop of
More Unrolling
Figure 4-12
shows the loop unrolled by a factor of eight.
unsigned int *IA = (unsigned int *) A;
unsigned int *IB = (unsigned int *) B;
for (row = 0; row < 16; row += 1)
{
}
int rowoffset = row * 4;
for (col4 = 0; col4 < 4; col4 += 1)
PRELIMINARY SPECIFICATION
has a loop containing only 4 op-
cost += UME8UU(IA[rowoffset + col4], IB[rowoffset + col4]);
Figure 4-13
Figure 4-12
recoded with 32-bit array accesses and the ume8uu custom operation.
some
Figure 4-16
pler array indexing.
Figure 4-15. Unrolled version of
code makes good use of PNX1300’s VLIW capabili-
ties.
Figure 4-16. Code from
array index calculations.
unsigned char A[16][16];
unsigned char B[16][16];
unsigned int *IA = (unsigned int *) A;
unsigned int *IB = (unsigned int *) B;
for (i = 0; i < 64; i += 8, IA += 8, IB += 8)
{
}
unsigned int *IA = (unsigned int *) A;
unsigned int *IB = (unsigned int *) B;
for (i = 0; i < 64; i += 8)
{
}
cost0 = UME8UU(IA[0], IB[0]);
cost1 = UME8UU(IA[1], IB[1]);
cost2 = UME8UU(IA[2], IB[2]);
cost3 = UME8UU(IA[3], IB[3]);
cost4 = UME8UU(IA[4], IB[4]);
cost5 = UME8UU(IA[5], IB[5]);
cost6 = UME8UU(IA[6], IB[6]);
cost7 = UME8UU(IA[7], IB[7]);
cost += cost0 + cost1 + cost2 +
cost0 = UME8UU(IA[i+0], IB[i+0]);
cost1 = UME8UU(IA[i+1], IB[i+1]);
cost2 = UME8UU(IA[i+2], IB[i+2]);
cost3 = UME8UU(IA[i+3], IB[i+3]);
cost4 = UME8UU(IA[i+4], IB[i+4]);
cost5 = UME8UU(IA[i+5], IB[i+5]);
cost6 = UME8UU(IA[i+6], IB[i+6]);
cost7 = UME8UU(IA[i+7], IB[i+7]);
cost += cost0 + cost1 + cost2 +
shows one way to modify the code for sim-
.
.
.
cost3 + cost4 + cost5 +
cost6 + cost7;
cost3 + cost4 + cost5 +
cost6 + cost7;
Figure 4-15
Philips Semiconductors
Figure
with simplified
4-12. This

Related parts for SAA7115HL/V1,518