tm1300 NXP Semiconductors, tm1300 Datasheet - Page 71

no-image

tm1300

Manufacturer Part Number
tm1300
Description
Tm-1300 Media Processor
Manufacturer
NXP Semiconductors
Datasheet

Available stocks

Company
Part Number
Manufacturer
Quantity
Price
Part Number:
tm1300-1.2
Quantity:
380
Philips Semiconductors
Figure 4-4. Straightforward code for MPEG frame reconstruction.
A straightforward coding of the reconstruction algorithm
might look as shown in
shares many of the undesirable properties of the first ex-
ample of byte-matrix transposition. The code accesses
memory a byte at a time instead of a word at a time,
which wastes 75% of the available bandwidth. Also, in
light of the many quad-byte-parallel operations intro-
duced in
tions,”
tions and one shift to process a single eight-bit pixel.
Perhaps even more unfortunate for a VLIW processor
like TM1300 is the branch-intensive code that performs
the saturation testing; eliminating these branches could
reap a significant performance gain.
Since MPEG decoding is the kind of task for which
TM1300 was created, there are two custom operations—
quadavg and dspuquadaddui—that exactly fit this impor-
tant MPEG kernel (and other kernels). These custom op-
erations process four pairs of 8-bit pixel values in paral-
lel. In addition, dspuquadaddui performs saturation tests
in hardware, which eliminates any need to execute ex-
plicit tests and branches.
For readers familiar with the details of MPEG algorithms,
the use of eight-bit IDCT values later in this example may
be confusing. The standard MPEG implementation calls
for nine-bit IDCT values, but extensive analysis has
shown that values outside the range [–128..127] occur
so rarely that they can be considered unimportant. Pur-
suant to this observation, the IDCT values are clipped
into the eight-bit range [–128..127] with saturating arith-
metic before the frame reconstruction code runs. The as-
sumption that this saturation occurs permits some of
TM1300’s custom operations to have clean, simple defi-
nitions.
The first step in seeing how custom operations can be of
value in this case, is to unroll the loop by a factor of four.
The unrolled code is shown in
code that is parallel with respect to the four pixel compu-
tations. As it is easily seen in the code, the four groups of
computations (one group per pixel) do not depend on
each other.
it seems inefficient to spend three separate addi-
Section 4.1.2, “Introduction to Custom Opera-
Figure
void reconstruct (unsigned char *back,
{
}
int i, temp;
for (i = 0; i < 64; i += 1)
{
}
4-4. This implementation
Figure
temp = ((back[i] + forward[i] + 1) >> 1) + idct[i];
if (temp > 255)
else if (temp < 0)
destination[i] = temp;
4-5. This creates
temp = 255;
temp = 0;
unsigned char *forward,
unsigned char *destination)
char *idct,
After some experience is gained with custom operations,
it is not necessary to unroll loops to discover situations
where custom operations are useful. Often, a good pro-
grammer with knowledge of the function of the custom
operations can see by simple inspection opportunities to
exploit custom operations.
To understand how quadavg and dspuquadaddui can be
used in this code, we examine the function of these cus-
tom operations.
The quadavg custom operation performs pixel averaging
on four pairs of pixels in parallel. Formally, the operation
of quadavg is as follows:
takes arguments in registers rsrc1 and rsrc2, and it com-
putes a result into register rdest. rsrc1 = [abcd], rsrc2 =
[wxyz], and rdest = [pqrs] where a, b, c, d, w, x, y, z, p, q,
r, and s are all unsigned eight-bit values. Then, quadavg
computes the output vector [pqrs] as follows:
The pixel averaging in
statement of each of the four groups of statements. The
rest of the code—adding idct[i] value and performing the
saturation test—can be performed by the dspuquadad-
dui operation. Formally, its function is as follows:
takes arguments in registers rsrc1 and rsrc2, and it com-
putes a result into register rdest. rsrc1 = [efgh], rsrc2 =
[stuv], and rdest = [ijkl] where e, f, g, h, i, j, k, and l are
unsigned 8-bit values; s, t, u, and v are signed 8-bit val-
ues. Then, dspuquadaddui computes the output vector
[ijkl] as follows:
The uclipi operation is defined in this case as it is for the
separate TM1300 operation of the same name described
in
definition is as follows:
PRODUCT SPECIFICATION
Appendix A, “DSPCPU Operations for
quadavg rscr1 rsrc2 -> rdest
p = (a + w + 1) >> 1
q = (b + x + 1) >> 1
r = (c + y + 1) >> 1
s = (d + z + 1) >> 1
dspuquadaddui rsrc1 rsrc2 -> rdest
i = uclipi(e + s, 255)
j = uclipi(f + t, 255)
k = uclipi(g + u, 255)
l = uclipi(h + v, 255)
Custom Operations for Multimedia
Figure 4-5
is evident in the first
TM1300,”. Its
4-5

Related parts for tm1300