Home Automation EZine
EMagazine
Volume 6 Issue 6
Dec 2001 / Jan 2002

Features
Cover Page
2001 in Retrospect
Home Theater Control
Home Entertainment Networking - COAX
Home Media Trends
TV – The Next Phase for Broadband
Wireless Toys
Distributed Audio Wiring Practices
Digital Audio Receivers
Streaming Media Problems
MultiRoom Audio
 on Cat5

Smart Homes for Disabled People
New Face for Automation
802.11b Wireless
SmartHome Planning
Phoneline Networks
Voice Control for Home Automation
Streaming Wireless Entertainment
CeBus vs X10
Action / Reaction
Plan for Now
Provide for Then

X10bot for Linux
DLP™ Technology
Loudspeakers and Whole House Audio
HAVi and IEEE 1394
One Chip Does All
Rock-n-Roll Meets Wireless
Weather Stations

New Products
Previews
HAL Deluxe
Cateye Web Camera
Remote Anywhere
HomeVision Home Controller

Siemens Gigaset Wireless Phone
Interviews
Terry Wright
Chairman HomeCNA
John Barr
President OSGI
Mentor
Wayne Caswell
Wireless Networking

Free Email Updates
Industry News
Article Library
Review Library

Return to Main Menu
Home Toys Article
- December 2001 -
[Home Page]
KEEP INFORMED OF THE LATEST NEWS
Sign Up for our Newsletter
[Click Message To Learn More]

Incorporation of embedded voice control capability is described below for lighting and HVAC systems. This solution can be re-purposed to other types of embedded systems that require voice control with a limited command set. However, one noticeable limitation of voice control is audio/TV systems as it is difficult to separate the speech of the user (commands) from the other audio signals.

DSP Controllers Bring New Level of Functionality to HVAC and Security Systems

by Richard Mensik, System Application Engineer, Czech Systems Laboratories, Motorola Semiconductor Products Sector


As the digital home becomes a reality, control mechanisms for everyday appliances are growing beyond the traditional touch screens, keypads and remote control units to include voice command. Voice recognition systems can now be added easily and inexpensively to practically any new home appliance. While home automation solutions vary broadly in sophistication and cost, HVAC (heating/ventilation/air conditioning) and lighting control for added convenience and security are expected to be some of the most cost-effective and commonplace applications.

Digital signal processing capabilities integrated with microcontroller (MCU) functionality enable voice control capability to be added to embedded systems (computers that are generally a combination of hardware and software designed to perform a dedicated function and are often part of a larger system or product). MCUs are today used in thousands of electronic products and systems in which many decisions or calculations are required. Such systems monitor and control everything from spacecraft to robots and factory equipment, home appliances, security systems, automobiles, VCRs and TVs, cellular telephones and personal digital assistants -- virtually any electronic device used in our everyday lives.

A new breed of digital signal processor (DSP)-controllers have emerged that provide the performance required for real-time speech processing as well as the traditional MCU features needed for control functions. It is quite feasible for original equipment manufacturers (OEMs) to add voice control to any new embedded system that contains a DSP controller. The question now is, "Exactly what features are expected from DSP controllers to perform embedded voice control?"

This article addresses a design solution for enabling voice-activated HVAC and lighting systems based on Motorola's 56805 DSP controller and outlines the features and benefits that can be achieved by adding voice-activation capability to these systems. The solution described allows simple voice control featuring speaker-dependent activation for four users (for basic security), remote control by telephone, and back-up control mechanisms, such as manual switches and keypads (in case of a noisy environment).

"Voice control" often implies that a system will recognize only a limited command set rather than fluent speech. A limited command set does markedly decrease memory and DSP controller performance requirements compared to fluent speech recognition. A key function to address with regard to embedded voice control is speaker dependence. Generally, speaker independence needs more memory and more processor performance. This article focuses on speaker-dependent systems, which are more suitable to be embedded, even though speaker dependency is sometimes required to improve the security of the system.

Two basic choices for recognition algorithms include Dynamical Time Warping (DTW) and Hidden Markov Models (HMM). DTW has fewer requirements on hardware. HMM is more complex, enables better recognition scores, but needs more speech data in the "training" phase. DTW is used in the design described below.

In spite of the simplification from a limited command set and speaker dependence, the memory remains a limiting factor. Using common speech parametrization, approximately 1K x 16 per command (assuming about 1 sec length) is usually needed. The speech data is created by the user during the training of the recognition system. Since it must be stored permanently, using Flash memory is recommended. The recognition software could be stored in ROM but speech recognition algorithms are steadily in development and upgrades should be expected. RAM memory will be used by input buffers (with size equal to speech frame size) and the buffer for the recognition algorithm (DTW matrix). Assuming the system will use 30 commands, one user, and a sampling frequency of 8 kHz, a rough estimation of data memory consumption could be obtained: 32K x 16 Flash and 8K x 16 of RAM. Using the Philips chip, HelloIC, minimal consumption of program memory can be expected - about 32 K x 16 - however, this value strongly depends on processor instruction set and compiler efficiency.

It is difficult to specify requirements for the DSP controller performance because of the common problem of measuring processor performance. Traditional units such as MIPS (million instructions per second) or MACS (multiply accumulates per second) do not cover all performance aspects. Generally, 20 MIPS will be sufficient for most DSP controllers. Dependency of recognition scores on processor performance is unambiguous: more performance allows the system to reach better recognition scores. The most suitable method to measure processor performance is algorithm kernel benchmarking. Relevant algorithms for speech recognition include real Fast Fourier Transformation (FFT), computation of Linear Predictive Coefficients (LPC), or FIR filter. Benchmarking results for these computations determine the chip's capability to perform real-time parametrization of the speech signal. This capability is pivotal for digital speech processing.

Some semiconductor manufacturers have announced new, enhanced cores that will dramatically increase performance of their DSP controller product lines. On-chip peripherals are shared between the existing and new line of DSP controllers so that application software will need minimal or no modifications (assuming it was not written in assembly language) and developers can focus strictly on improving voice recognition algorithms.

In choosing a DSP controller, data width and type of processor arithmetic should be considered. Generally, better precision (greater data width, floating point arithmetic) improves the voice recognition score, but if a more cost-effective solution is required, the optimal data width of the processor will be probably 16-bits. The on-chip peripherals on the DSP controller should include an analog-to-digital converter (ADC), general purpose I/O pins (GPIO) and other communication interfaces, such as serial communication interface, I2C, IrDA, etc. The minimum precision of the ADC should be 8-bits with a minimum sampling frequency 8 kHz. The number of required ADC channels is dependent on the specific system being designed. (DSP controllers for motor control usually have more ADC channels (8-16) and a high sampling frequency relative to speech processing).

Most speech recognition software is written in C language so the presence of a C-compiler for the chosen DSP controller is almost mandatory. And, considering the software complexity, an integrated development environment is also highly recommended. Several companies have already released software products for embedded speech recognition. These are optimized for embedded platforms (size, performance), but usually need some adaptation for the specific DSP controller architecture being used in the embedded system. Some multiply adapted systems already exist (e.g. VoCon by Philips), but they cover small parts of the DSP controller market, so the adaptation process is faster when the DSP controller has its own software development kit (SDK) or signal processing library optimized for its specific core.

Incorporation of embedded voice control capability is described below for lighting and HVAC systems. This solution can be re-purposed to other types of embedded systems that require voice control with a limited command set. However, one noticeable limitation of voice control is audio/TV systems as it is difficult to separate the speech of the user (commands) from the other audio signals.

The Motorola 56805 DSP controller used in the proposed design has 8 ADC channels (time multiplexed) with 12-bit precision and 32K x 16-bit words of integrated Flash program memory, which is sufficient to contain the voice control program. Data memory must be external because there are DSP controllers available in the market today that have sufficient data memory for speech recognition.

The estimation of consumed performance bandwidth can be based on measuring the execution time of FFT, which is the most time consuming operation and it must be performed in real time. If an 8 kHz sampling frequency is used with 12-bit quantization (storing in 16-bit), 16 msec segmentation with 50 percent overlapping, 125 frames per second will need to be processed. If common speech parametrization, e.g. cepstral coefficients, is used, we will process one FFT and one inverse FFT on 128-point frame (corresponding to 16msec/8kHz). This computation consumes about 50,000 clock cycles on the 56805 (using Motorola's DSP56800 Software Development Kit signal processing library). If 50 percent is kept in reserve for additional computations, we will need 50,000 * 1.5 * 125 (frames) = 9,375,000 clock cycles per second, which means 12 percent utilization for 80MHz clock frequency. The remaining reserve can be utilized later using more sophisticated speech processing algorithms.

It is recommended that OEM system designers use speech recognition software from a reputable independent software vendor due to the complexity of this type of software. OEM designers want not to develop their own speech recognition algorithms but to make those that they use noise-resistant in order to achieve a high recognition score in a wide class of environments. A secondary issue is the recognition score itself. If it is necessary to reliably prove that the recognition score is, for example, higher than 99 percent, then it will be necessary to perform several hundred recognitions with different speakers in different environments. This evaluation is usually performed using a speech database stored on disk or CD, so additional hardware is needed.

As mentioned earlier, memory size may restrict the voice command set size. If we use speech parametrization from the above mentioned example with 8 coefficients per speech frame, we will need 1,000 data words per second of speech. Assuming a minimal command set, about 30 seconds of speech, we will need 30,000 data words. The 56805 can directly address 2 * 64 K x 16-bit data words.

The command set must be designed with respect to speaker dependence of the algorithm being used. Assuming that the proposed system will be used by four people, the command set could be following: "light", "dark", "heat", "cold" (recorded four times individually for each speaker), and the numbers zero through nine, "time," and "temperature" (recorded by one authorized person). The speaker dependence represents an advantage when controlling devices by phone.

All controlling functions of the proposed system should be accessible both by voice input and manual switch or keypad because of the possibility of noisy environments. Each microphone has a corresponding switch or button and a corresponding lamp unit. The arbitering process, which chooses the controlled device, is solved by software. DSP samples continuously call ADC inputs and the affiliated software process adjudicates if there is a speech on input and it assigns an actual device. The ADC on the 56805 has two channels, each multiplexed to four pins (8 sources of analog signal can be connected in all). One ADC pin can be optionally connected to a phone line via a subscriber line interface circuit (SLIC), and one pin is connected to temperature sensor. The remaining six pins can then be connected to microphones. The ADC resolution is 12-bit, maximal sampling frequency 800 kHz, which allows us to perform time-multiplexed sampling of all 8 ADC channels. To minimize memory requirements, a sampling frequency of 8 kHz is recommended for all speech channels. Because of this low sampling frequency, low-frequency microphones or anti-aliasing filters (low pass filters) should be used.

HVAC systems can be also controlled by phone. The recognition process will be the same with the addition of implementation of the answering function. If the speech reference will be recognized, the DSP controller sends it back as an audio signal via phone lines to confirm the validity of the recognition. The DSP controller is connected to the phone line via SLIC, the internal ADC is used at the input side and external digital-to-analog converter (DAC) is used at the output. The DAC and DSP controller are connected via SPI. Modem functions are provided by software. All of the software should be programmed in C. Individual software tasks communicate using semaphores (global status variables). (Modem functions and basic DSP algorithms are components of Motorola's DSP56800 SDK.)

The design solution in this article can be demonstrated on the DSP56805EVM (evaluation module). The kit, available from Motorola, contains required hardware (except input analog parts, e.g. microphones, preamplifiers, thermal sensors), SLIC and the output power stage. The DSP56805 evaluation module is fully supported by Metrowerks CodeWarrior® IDE.