ASR (Automatic Speech Recognition)
Supported ICs[ RTL8726E ][ RTL8713E ][ RTL8730E ]
Overview
ASR is the module to recognize speech to text.
AIVoice provides an offline ASR algorithm for voice command detection, designed for voice interaction applications that require fast, reliable, and network-independent performance.
Core Features
Continuous Interaction: Supports multi-turn recognition after a single wake-up, with no need for repeated wake-up prompts.
Language Support: Currently supports only Mandarin Chinese.
Command Capacity: Recognizes up to 200 commands.
Resource Composition: Recognition resources include a model file and an FST file (Finite-State Transducer). To update or customize commands, only the FST file needs to be replaced, with no changes required to the model.
Predefined & Customization
The SDK includes a default set of 40 predefined commands for air conditioning scenarios, such as
打开空调(turn on air conditioning) and关闭空调(turn off air conditioning).To customize commands, you can use our online generation tool to create the corresponding FST resource file for quick integration into your application.
Customized Services
If your project requirements exceed the following standard capabilities, please contact our business team for a customized solution:
Command vocabulary exceeding 200 entries
Support for additional languages
Deep performance optimization or algorithm customization for specific scenarios
Configurations
ASR configurable parameters:
- sensitivity:
Three levels of sensitivity are provided with predefined internal parameters.The higher, easier to detect commands but also more false alarm.
Refer to ${aivoice_lib_dir}/include/aivoice_asr_config.h for details.
Custom Command Guide
Hardware Requirements
Chip Models
RTL8713ECM-VA4-CG
RTL8730EAM-VA6-CG
Flash Size: ≥ 16 MB
Dual-microphone Spacing: 50 mm
Microphone and Loopback Channel Mapping:
RTL8713ECM-VA4-CG: AMIC1, AMIC2, AMIC3
RTL8730EAM-VA6-CG: AMIC1, AMIC3, AMIC5
One external speaker
Software Requirements
Prepare ameba-rtos repository (referred to as ${SDK}), with reference to
SDK Download
for the extended XDK download method.
Command Word Format
Language: Only Chinese is supported currently.
Refer to :Chinese documentation
Operation Instructions
Use Online tool and upload the required Excel files containing the command list and corresponding audio prompts. After backend compilation is completed, download the
download.zipfile and extract it.Copy the modified files into the SDK and apply the patch:
cp -r ${download}/patch/* ${SDK}
cd ${SDK}/
python component/aivoice/tools/patch_custom_cmd.py RTL8713E
Note
Please ensure you are in the SDK root directory before running this script. Otherwise, the script may fail to locate the files to be modified, causing the operation to fail.
Set up the compilation environment, and run ameba.py menuconfig to enter the configuration interface
source env.sh
ameba.py soc RTL8726E
ameba.py menuconfig
Enable DSP through menu navigation
--------MENUCONFIG FOR General---------
CONFIG DSP Enable --->
[*] Enable DSP
Config Link through menu navigation
--------MENUCONFIG FOR General---------
CONFIG Link Option --->
IMG2(Application) running on FLASH or PSRAM?
(X) FLASH
( ) PSRAM
IMG2 Data and Heap in SRAM or PSRAM? --->
( ) SRAM
(X) PSRAM
Enable VFS LITTLEFS
--------MENUCONFIG FOR General---------
CONFIG VFS --->
[*] Enable VFS LITTLEFS
Enable AIVoice
--------MENUCONFIG FOR General---------
CONFIG TrustZone --->
...
CONFIG APPLICATION --->
GUI Config --->
...
AI Config --->
[ ] Enable TFLITE MICRO
[*] Enable AIVoice
Enable Speechmind
--------MENUCONFIG FOR General---------
CONFIG TrustZone --->
...
CONFIG APPLICATION --->
GUI Config --->
...
AI Config --->
[ ] Enable TFLITE MICRO
[*] Enable AIVoice
[*] Enable SpeechMind
Build
ameba.py build
Download below images using Flash Programming Tool :
boot.bin: Default address
app.bin: Default address
tts.bin: Address 0x083E0000, 0x087E0000
dsp_all.bin: Address 0x087E0000, 0x08A00000
aivoice_models.bin: Address 0x08A00000, 0x08E00000
Use Online tool and upload the required Excel files containing the command list and corresponding audio prompts. After backend compilation is completed, download the
download.zipfile and extract it.Copy the modified files into the SDK and apply the patch:
cp -r ${download}/patch/* ${SDK}
cd ${SDK}/
python component/aivoice/tools/patch_custom_cmd.py RTL8713E
Note
Please ensure you are in the SDK root directory before running this script. Otherwise, the script may fail to locate the files to be modified, causing the operation to fail.
Set up the compilation environment, and run ameba.py menuconfig to enter the configuration interface
source env.sh
ameba.py soc RTL8726E
ameba.py menuconfig
Enable DSP through menu navigation
--------MENUCONFIG FOR General---------
CONFIG DSP Enable --->
[*] Enable DSP
Config Link through menu navigation
--------MENUCONFIG FOR General---------
CONFIG Link Option --->
IMG2(Application) running on FLASH or PSRAM?
(X) FLASH
( ) PSRAM
IMG2 Data and Heap in SRAM or PSRAM? --->
( ) SRAM
(X) PSRAM
Enable VFS LITTLEFS
--------MENUCONFIG FOR General---------
CONFIG VFS --->
[*] Enable VFS LITTLEFS
Enable AIVoice
--------MENUCONFIG FOR General---------
CONFIG TrustZone --->
...
CONFIG APPLICATION --->
GUI Config --->
...
AI Config --->
[ ] Enable TFLITE MICRO
[*] Enable AIVoice
Enable Speechmind
--------MENUCONFIG FOR General---------
CONFIG TrustZone --->
...
CONFIG APPLICATION --->
GUI Config --->
...
AI Config --->
[ ] Enable TFLITE MICRO
[*] Enable AIVoice
[*] Enable SpeechMind
Build
ameba.py build
Download below images using Flash Programming Tool :
boot.bin: Default address
app.bin: Default address
tts.bin: Address 0x083E0000, 0x087E0000
dsp_all.bin: Address 0x087E0000, 0x08A00000
aivoice_models.bin: Address 0x08A00000, 0x08E00000
Use Online tool and upload the required Excel files containing the command list and corresponding audio prompts. After backend compilation is completed, download the
download.zipfile and extract it.Copy the modified files into the SDK and apply the patch:
cp -r ${download}/patch/* ${SDK}
cd ${SDK}/
python component/aivoice/tools/patch_custom_cmd.py RTL8730E
Note
Please ensure you are in the SDK root directory before running this script. Otherwise, the script may fail to locate the files to be modified, causing the operation to fail.
Set up the compilation environment, and run ameba.py menuconfig to enter the configuration interface
source env.sh
ameba.py soc RTL8730E
ameba.py menuconfig
Select run application from Flash through menu navigation
--------MENUCONFIG FOR General---------
CONFIG Link Option --->
IMG2(Application) running on PSRAM or FLASH? --->
( ) PSRAM
(X) FLASH
Enable VFS LITTLEFS
--------MENUCONFIG FOR General---------
CONFIG VFS --->
[*] Enable VFS LITTLEFS
Enable AIVoice (select algorithm version as needed)
--------MENUCONFIG FOR General---------
CONFIG TrustZone --->
...
CONFIG APPLICATION --->
GUI Config --->
...
AI Config --->
[ ] Enable TFLITE MICRO
[*] Enable AIVoice
Enable Speechmind
--------MENUCONFIG FOR General---------
CONFIG TrustZone --->
...
CONFIG APPLICATION --->
GUI Config --->
...
AI Config --->
[ ] Enable TFLITE MICRO
[*] Enable AIVoice
[*] Enable SpeechMind
Choose single core
MENUCONFIG FOR CA32 CONFIG --->
...
CONFIG SMP --->
Select Core Num --->
( ) DUAL
(X) SINGLE
Build image
ameba.py build
Download below images using Flash Program Tool :
boot.bin: Default address
app.bin: Address 0x08040000, 0x08600000
tts.bin: Address 0x08640000, 0x08A40000
Running and Expected Results
After downloading the images onto the development board, the following voice interaction features can be tested:
Voice Wake-Up
Wake up the device by saying
xiao-qiang-xiao-qiangorni-hao-xiao-qiang. If successful, logs will be printed, and the response audioMaster, I'm herewill be played.Timeout
If no interaction occurs after wake-up, the device will announce
Master, I'll step back for now. Wake me up if needed. Further interaction requires re-wake-up. The timeout duration can be adjusted in${SDK}/component/application/speechmind/src/speech_mind.cvia:aivoice_param.timeout = 10;
Command Recognition
After wake-up, continuous interaction is possible using customized voice commands. If recognized, logs will be printed, and corresponding response audio
Command x executedwill be played. The supported command list can be found in the boot-up logs.To modify the response audio, refer to Response Audio Replacement.
Sound Source Localization (SSL)
When SSL is enabled, the device detects the speaker’s direction after each wake-up. The angle will be logged, and the response audio
xx degreeswill be played. SSL can be enabled/disabled in${SDK}/component/application/speechmind/src/speech_mind.cvia:#define ENABLE_SSL_DOA 1
Response Audio Replacement
To replace TTS response audio, follow the steps below.
Prepare Audio
Format: MP3
Directory structure:
tts_mp3/ <-- -dir parameter specifies this directory └── tts/ ├── 10001.mp3 # Wake-up response audio ├── 10002.mp3 # Timeout exit response audio ├── x.mp3 # Command response audio (x is the command ID) └── angle/ # (Optional) Sound source localization angle response audio ├── 1000.mp3 └── ...Note
The
-dirparameter in the packaging command must use the top-level directorytts_mp3, not thettsdirectory itself.Audio naming rule: File names correspond to the second column in
tts_content_namein${download}/patch/component/application/speechmind/src/speech_tts.c.static const struct tts_content_name tts_names[] = { {"I'm here", "10001"}, {"Goodbye", "10002"}, {"Cmd 1 done", "1"}, {"Cmd 2 done", "2"}, ... };
If sound source localization is enabled, also copy the angle response audio directory to the tts directory:
cp -r ${SDK}/component/application/speechmind/res/tts/tts/angle tts_mp3/tts
Package Audio
python ${SDK}/tools/image_scripts/vfs.py -t LITTLEFS -s 4096 -c 1024 -dir tts_mp3 -out tts.bin
The
-dirparameter must use the top-level directory containing the tts subdirectory (e.g.,tts_mp3as shown in the example), not the tts directory itself.If “size exceed limit” is reported, it indicates the total audio size exceeds the 4 MB limit. Reduce the MP3 bitrate to decrease the file size.
For the tts.bin flashing address, refer to the firmware flashing section for each chip. Default size is 4 MB (
-c 1024). To adjust the size, remove the-cparameter and refer to AIVoice Developer Guide Example 2 for layout configuration.