KWS (Keyword Spotting)

Supported ICs

Fixed keyword
- RTL8721Dx
- RTL8726E
- RTL8713E
- RTL8730E
User-defined keyword

Overview

KWS is the module to detect specific wakeup words from audio. It is usually the first step in a voice interaction system. The device will enter the state of waiting voice commands after detecting the keyword.

AIVoice provides two KWS solutions: a fixed keyword solution and a user-defined keyword solution. The former can achieve optimal performance on low-resource devices, while the latter allows flexible customization of keywords.

Solution	Training data	Available keywords	Feature
Fixed keyword	Specific keywords	Keywords same as training data	better performance, smaller model
User-defined keyword	Common data	Flexible keyword of the same language as training data	More flexible

Currently SDK provides a fixed keyword model library and a user-defined model.

Fixed Keyword Model

Support Chinese keyword xiao-qiang-xiao-qiang or ni-hao-xiao-qiang.
Other keywords or performance optimizations can be provided through customized services.

User-defined Keyword Model

Language Support: Chinese only
Number of Keyword: Supports up to 5 keywords simultaneously.
Word Length: Each keyword must contain 3 to 6 Chinese characters; words outside this range are invalid.
Keyword Selection Guidelines
- Avoid characters with zero initials(e.g., yīn, yī).
- Avoid common daily phrases (e.g., put on clothes, eat breakfast).
- Ensure high phonetic distinction between adjacent syllables.

KWS Mode

Two KWS modes are provided for different use cases. Single-channel mode processes single-channel audio as input, while Multi-channel mode processes multi-channel as input. Multi-channel mode improves accuracy for KWS and ASR compared to single-channel mode. However, it also increases computational resource consumption and memory usage.

KWS mode	Config	Description
Single-channel mode	mode = KWS_SINGLE_MODE	Less computation resource consumption and less memory usage
Multi-channel mode	mode = KWS_MULTI_MODE	Better KWS and ASR accuracy

Algorithm Flow

Single-channel Mode

Multi-channel Mode

Configurations

KWS configurable parameters

keywords:: Keywords for wake up, and available keywords depend on KWS model. If the KWS model is a fixed keyword solution, keywords can only be chosen from the trained words. For user-defined solution, keywords can be customized with any combinations of same language unit(such as pinyin for Chinese). Example: xiao-qiang-xiao-qiang.
thresholds:: Threshold for wake up, range [0, 1]. The higher, less false alarm, but harder to wake up. Set to 0 to use sensitivity with predefined thresholds.
sensitivity:: Three levels of sensitivity are provided with predefined thresholds. The higher, easier to wake up but also more false alarm. ONLY works when thresholds set to 0.
mode:: KWS mode, single-channel mode or multi-channel mode.
enable_age_gender:: Whether enable output speaker’s age and gender classification when wake up. Not supported in current version.

Refer to ${aivoice_lib_dir}/include/aivoice_kws_config.h for details.

Threshold Adjustment Suggestions

As the threshold increases from low to high, the wakeup rate gradually decreases, and false wakeup reduce (i.e., sensitivity shifts from high to low). Users should select an appropriate threshold based on actual needs.
For fixed keyword model, three sensitivity levels are provided: High, Medium, and Low, corresponding to ~1 false trigger per 12h, 24h, and 48h, respectively. For finer adjustments, users can configure the thresholds parameter to adapt to their usage scenario, with a step size of 0.02.
For user-defined keyword model, the thresholds are typically lower than fixed keyowrd model, with a suggested adjustment step size of 0.005.

Product Overview

SoCs

Select SoC via Applications

Internet of Things(IoT)

Wi-Fi Audio

Smart Display

Smart Voice

Carplay Box

Select SoC via Features

HiFi DSP Series

Cortex-A Linux Series

Display Series

Audio Series

Image Signal Processing Series

Select SoC via Features

Wi-Fi 6 + BLE Series

Wi-Fi 2.4G/5G + BLE Series

Wi-Fi + Classic BT Series

Wi-Fi R-MESH Series

Wi-Fi Ulta-Low-power

Media & Entertainment

Audio Solutions

Audio Front-End Algorithms

Wi-Fi

System

System Security

AI Voice

Audio Front-End Algorithms

Multimedia

SDK and Resources

FreeRTOS

Linux

HiFi DSP

Zephyr

Tools

VSCode User Guide

Hardware Design

Datasheet

Support