ASR (Automatic Speech Recognition)
Supported ICs[ RTL8726E ][ RTL8713E ][ RTL8730E ]
Overview
ASR is the module to recognize speech to text.
AIVoice provides an offline ASR algorithm for voice command detection, designed for voice interaction applications that require fast, reliable, and network-independent performance.
Core Features
Continuous Interaction: Supports multi-turn recognition after a single wake-up, with no need for repeated wake-up prompts.
Language Support: Currently supports only Mandarin Chinese.
Command Capacity: Recognizes up to 200 commands.
Resource Composition: Recognition resources include a model file and an FST file (Finite-State Transducer). To update or customize commands, only the FST file needs to be replaced, with no changes required to the model.
Predefined & Customization
The SDK includes a default set of 40 predefined commands for air conditioning scenarios, such as
打开空调(turn on air conditioning) and关闭空调(turn off air conditioning).To customize commands, you can use our online generation tool to create the corresponding FST resource file for quick integration into your application.
Customized Services
If your project requirements exceed the following standard capabilities, please contact our business team for a customized solution:
Command vocabulary exceeding 200 entries
Support for additional languages
Deep performance optimization or algorithm customization for specific scenarios
Configurations
ASR configurable parameters:
- sensitivity:
Three levels of sensitivity are provided with predefined internal parameters.The higher, easier to detect commands but also more false alarm.
Refer to ${aivoice_lib_dir}/include/aivoice_asr_config.h for details.
Custom Command Guide
Hardware Requirements
Chip Models
RTL8713ECM-VA4-CG
RTL8730EAM-VA6-CG
Flash Size: ≥ 16 MB
Dual-microphone Spacing: 50 mm
Microphone and Loopback Channel Mapping:
RTL8713ECM-VA4-CG: AMIC1, AMIC2, AMIC3
RTL8730EAM-VA6-CG: AMIC1, AMIC3, AMIC5
One external speaker
Software Requirements
Prepare ameba-rtos repository (referred to as ${SDK}).
Master Branch: Refer to SDK Download for the extended XDK download method.
Branch 1.1:
git clone https://github.com/Ameba-AIoT/ameba-rtos.git
cd ameba-rtos
git checkout remotes/origin/release/v1.1 -b release/v1.1
Operation Instructions
Use
Online tooland upload the required Excel files containing the command list and corresponding audio prompts. After backend compilation is completed, download thedownload.zipfile and extract it.Copy the modified files into the SDK and apply the patch:
cp -r ${download}/patch/* ${SDK}
git apply --reject speechmind_custom_cmd.patch
Note
If git apply reports errors, it indicates a significant version discrepancy between the current SDK and the version used to generate the patch. Please manually integrate the changes from the patch into the corresponding files in the SDK.
Switch to the SDK’s GCC project directory and run
menuconfig.pyto enter the configuration interface
cd {SDK}/amebalite_gcc_project
./menuconfig.py
Enable DSP through menu navigation
--------MENUCONFIG FOR General---------
CONFIG DSP Enable --->
[*] Enable DSP
Config Link through menu navigation
Branch master:
--------MENUCONFIG FOR General---------
CONFIG Link Option --->
IMG2(Application) running on FLASH or PSRAM?
(X) FLASH
( ) PSRAM
IMG2 Data and Heap in SRAM or PSRAM? --->
( ) SRAM
(X) PSRAM
Branch v1.1:
--------MENUCONFIG FOR General---------
CONFIG Link Option --->
IMG2(Application) running on FLASH or PSRAM?
(X) CodeInXip_DataHeapInPsram
( ) CodeInPsram_DataHeapInSram
( ) CodeInPsram_DataHeapInPsram
( ) CodeInXip_DataHeapInSram
Enable VFS LITTLEFS
--------MENUCONFIG FOR General---------
CONFIG VFS --->
[*] Enable VFS LITTLEFS
Enable AIVoice
--------MENUCONFIG FOR General---------
CONFIG TrustZone --->
...
CONFIG APPLICATION --->
GUI Config --->
...
AI Config --->
[ ] Enable TFLITE MICRO
[*] Enable AIVoice
Enable Speechmind
--------MENUCONFIG FOR General---------
CONFIG TrustZone --->
...
CONFIG APPLICATION --->
GUI Config --->
...
AI Config --->
[ ] Enable TFLITE MICRO
[*] Enable AIVoice
[*] Enable SpeechMind
Build
./build.py
Download below images using Flash Programming Tool :
km4_boot_all.bin: Default address
kr4_km4_app.bin: Default address
tts.bin: Address 0x083E0000, 0x087E0000
dsp_all.bin: Address 0x087E0000, 0x08A00000
aivoice_models.bin: Address 0x08A00000, 0x08E00000
Use
Online tooland upload the required Excel files containing the command list and corresponding audio prompts. After backend compilation is completed, download thedownload.zipfile and extract it.Copy the modified files into the SDK and apply the patch:
cp -r ${download}/patch/* ${SDK}
git apply --reject speechmind_custom_cmd.patch
Note
If git apply reports errors, it indicates a significant version discrepancy between the current SDK and the version used to generate the patch. Please manually integrate the changes from the patch into the corresponding files in the SDK.
Switch to the SDK’s GCC project directory and run
menuconfig.pyto enter the configuration interface
cd {SDK}/amebalite_gcc_project
./menuconfig.py
Enable DSP through menu navigation
--------MENUCONFIG FOR General---------
CONFIG DSP Enable --->
[*] Enable DSP
Config Link through menu navigation
Branch master:
--------MENUCONFIG FOR General---------
CONFIG Link Option --->
IMG2(Application) running on FLASH or PSRAM?
(X) FLASH
( ) PSRAM
IMG2 Data and Heap in SRAM or PSRAM? --->
( ) SRAM
(X) PSRAM
Branch v1.1:
--------MENUCONFIG FOR General---------
CONFIG Link Option --->
IMG2(Application) running on FLASH or PSRAM?
(X) CodeInXip_DataHeapInPsram
( ) CodeInPsram_DataHeapInSram
( ) CodeInPsram_DataHeapInPsram
( ) CodeInXip_DataHeapInSram
Enable VFS LITTLEFS
--------MENUCONFIG FOR General---------
CONFIG VFS --->
[*] Enable VFS LITTLEFS
Enable AIVoice
--------MENUCONFIG FOR General---------
CONFIG TrustZone --->
...
CONFIG APPLICATION --->
GUI Config --->
...
AI Config --->
[ ] Enable TFLITE MICRO
[*] Enable AIVoice
Enable Speechmind
--------MENUCONFIG FOR General---------
CONFIG TrustZone --->
...
CONFIG APPLICATION --->
GUI Config --->
...
AI Config --->
[ ] Enable TFLITE MICRO
[*] Enable AIVoice
[*] Enable SpeechMind
Build
./build.py
Download below images using Flash Programming Tool :
km4_boot_all.bin: Default address
kr4_km4_app.bin: Default address
tts.bin: Address 0x083E0000, 0x087E0000
dsp_all.bin: Address 0x087E0000, 0x08A00000
aivoice_models.bin: Address 0x08A00000, 0x08E00000
Use
Online tooland upload the required Excel files containing the command list and corresponding audio prompts. After backend compilation is completed, download thedownload.zipfile and extract it.Copy the modified files into the SDK and apply the patch:
cp -r ${download}/patch/* ${SDK}
git apply --reject speechmind_custom_cmd.patch
Note
If git apply reports errors, it indicates a significant version discrepancy between the current SDK and the version used to generate the patch. Please manually integrate the changes from the patch into the corresponding files in the SDK.
Switch to the SDK’s GCC project directory and run
menuconfig.pyto enter the configuration interface
cd {SDK}/amebasmart_gcc_project
./menuconfig.py
Select run application from Flash through menu navigation
Branch master:
--------MENUCONFIG FOR General---------
CONFIG Link Option --->
IMG2(Application) running on PSRAM or FLASH? --->
( ) PSRAM
(X) FLASH
Branch v1.1:
--------MENUCONFIG FOR General---------
CONFIG BOOT OPTION --->
[*] XIP_FLASH
Enable VFS LITTLEFS
--------MENUCONFIG FOR General---------
CONFIG VFS --->
[*] Enable VFS LITTLEFS
Enable AIVoice (select algorithm version as needed)
--------MENUCONFIG FOR General---------
CONFIG TrustZone --->
...
CONFIG APPLICATION --->
GUI Config --->
...
AI Config --->
[ ] Enable TFLITE MICRO
[*] Enable AIVoice
Enable Speechmind
--------MENUCONFIG FOR General---------
CONFIG TrustZone --->
...
CONFIG APPLICATION --->
GUI Config --->
...
AI Config --->
[ ] Enable TFLITE MICRO
[*] Enable AIVoice
[*] Enable SpeechMind
Choose single core
MENUCONFIG FOR CA32 CONFIG --->
...
CONFIG SMP --->
Select Core Num --->
( ) DUAL
(X) SINGLE
Build image
./build.py
Download below images using Flash Program Tool :
km4_boot_all.bin: Default address
km0_km4_ca32_app.bin: Address 0x08020000, 0x08600000
tts.bin: Address 0x08623000, 0x08A23000
Running and Expected Results
After downloading the images onto the development board, the following voice interaction features can be tested:
Voice Wake-Up
Wake up the device by saying
xiao-qiang-xiao-qiangorni-hao-xiao-qiang. If successful, logs will be printed, and the response audioMaster, I'm herewill be played.Timeout
If no interaction occurs after wake-up, the device will announce
Master, I'll step back for now. Wake me up if needed. Further interaction requires re-wake-up. The timeout duration can be adjusted in${SDK}/component/application/speechmind/src/speech_mind.cvia:aivoice_param.timeout = 10;
Command Recognition
After wake-up, continuous interaction is possible using customized voice commands. If recognized, logs will be printed, and corresponding response audio
Command x executedwill be played. The supported command list can be found in the boot-up logs.To modify the response audio, refer to the audio names in
tts_content_namein${download}/patch/component/application/speechmind/src/speech_tts.c, and follow Example 2 in AIVoice Development Guide to replace the audio and prepare thetts.binfile.Sound Source Localization (SSL)
When SSL is enabled, the device detects the speaker’s direction after each wake-up. The angle will be logged, and the response audio
xx degreeswill be played. SSL can be enabled/disabled in${SDK}/component/application/speechmind/src/speech_mind.cvia:#define ENABLE_SSL_DOA 1