rsc-464 ETC-unknow, rsc-464 Datasheet - Page 4

no-image

rsc-464

Manufacturer Part Number
rsc-464
Description
Speech Recognition Processor
Manufacturer
ETC-unknow
Datasheet




RSC-464
Speech Technologies
Speech Recognition
The RSC-464 is designed to operate in tandem with the FluentChip™ technology library, including speaker
independent (SI), speaker dependent (SD), and speaker verification (SV) speech recognition. Combinations of
these technologies may used to create applications that are rich in features. These are described below:

Speech and Music Synthesis
The RSC-464 provides high-quality speech compression using Sensory SX™ technology. One may select various
data rates from approximately 2.4 to 10.8Kbps to manage speech quality versus allotted memory. The highest data
rates use 16KHz sample rates to provide high quality reproduction of high pitched voices. Speech and sound
effects may also be compressed using 8-bit PCM (64Kbps) or 4-bit ADPCM (32Kbps) technologies.
The RSC-464 also provides eight-voice, wave table music synthesis which allows multiple, simultaneous
instruments for harmonizing. The RSC-464 uses a MIDI-like system to generate music. One or more of the eight
voices may be speech playback instead of music. One or more of the eight voices may be a drum track comprising
multiple drums. In effect, drum tracks allow the number of simultaneous instruments to exceed 8.
Speech and Music data may be stored in on-chip ROM. Speech data may alternatively be stored in off-chip serial
data ROM or serial data Flash for extended durations.
Easy to use tools allow the developer to record and compress their own voice talents and create with the push of a
button, or to create their own MIDI scores and instruments.
Record and Playback
The RSC-464 can perform speech record and playback (sometimes called “voice memo”) using either 8 bits
(64Kbps) or 4 bits (32Kbps) per sample, depending on the quantity and quality of playback desired. The record and
playback technology also optionally performs silence removal to reduce memory requirements.
External serial Flash or SRAM is required to store the compressed speech.
4
Speaker Independent recognition requires no user training. The RSC-464 can recognize up to 15 commands in
an active set (number of sets is limited only by internal ROM size). Text-to-SI (T2SI), based on a hybrid of
Hidden Markov Modeling and Neural Net technologies, allows creation of accurate SI recognition sets in
seconds. SI requires on-chip ROM.
Speaker Dependent recognition allows the user to create names for products or customize recognition sets. SD
is implemented with DTW (dynamic time warping) pattern matching technology. SD requires programmable
memory to store the personalized speech templates(trained patterns) that may be on-chip SRAM, or off-chip
serial EEPROM, Flash Memory, or SRAM. Up to 50 templates can be recognized in an active set (the number of
unique sets is limited only by programmable memory capacity). The RSC-464 can store 1 SD templates in on-
chip SRAM.
Speaker Verification enables the RSC-464 to authenticate when a previously trained password is spoken by the
target user. SV is also implemented with DTW technology. 1 SV template can be stored in on-chip SRAM, or
more with external programmable memory such as delineated in SD above.
Word Spotting enables the RSC-464 to spot a specific word surrounded by other speech within a phrase. This
can be quite effective when the users response may vary (e.g. spotting “telephone” in the phrases “ummm
telephone”, or “telephone call”). This option is available for SI and SD.
Continuous Listening allows the chip to continuously listen for a specific word. This may be used as a trigger
word to request a device to listen for a command. This option is available for SI and SD.
P/N 80-0282-A
Preliminary Data Sheet
© 2005 Sensory Inc.

Related parts for rsc-464