ASR (Automatic Speech Recognition)

Supported ICs[ RTL8726E ][ RTL8713E ][ RTL8730E ]

Overview

ASR is the module to recognize speech to text.

AIVoice provides an offline ASR algorithm for voice command detection, designed for voice interaction applications that require fast, reliable, and network-independent performance.

Core Features

Continuous Interaction: Supports multi-turn recognition after a single wake-up, with no need for repeated wake-up prompts.
Language Support: Currently supports only Mandarin Chinese.
Command Capacity: Recognizes up to 200 commands.
Resource Composition: Recognition resources include a model file and an FST file (Finite-State Transducer). To update or customize commands, only the FST file needs to be replaced, with no changes required to the model.

Predefined & Customization

The SDK includes a default set of 40 predefined commands for air conditioning scenarios, such as 打开空调 (turn on air conditioning) and 关闭空调 (turn off air conditioning).
To customize commands, you can use our online generation tool to create the corresponding FST resource file for quick integration into your application.

Customized Services

If your project requirements exceed the following standard capabilities, please contact our business team for a customized solution:

Command vocabulary exceeding 200 entries
Support for additional languages
Deep performance optimization or algorithm customization for specific scenarios

Configurations

ASR configurable parameters:

sensitivity:: Three levels of sensitivity are provided with predefined internal parameters.The higher, easier to detect commands but also more false alarm.

Refer to ${aivoice_lib_dir}/include/aivoice_asr_config.h for details.

Custom Command Guide

Hardware Requirements

Chip Models
- RTL8713ECM-VA4-CG
- RTL8730EAM-VA6-CG
Flash Size: ≥ 16 MB
Dual-microphone Spacing: 50 mm
Microphone and Loopback Channel Mapping:
- RTL8713ECM-VA4-CG: AMIC1, AMIC2, AMIC3
- RTL8730EAM-VA6-CG: AMIC1, AMIC3, AMIC5
One external speaker

Software Requirements

Prepare ameba-rtos repository (referred to as ${SDK}).

Master Branch: Refer to SDK Download for the extended XDK download method.
Branch 1.1:

git clone https://github.com/Ameba-AIoT/ameba-rtos.git
cd ameba-rtos
git checkout remotes/origin/release/v1.1 -b release/v1.1

Operation Instructions

Use Online tool and upload the required Excel files containing the command list and corresponding audio prompts. After backend compilation is completed, download the download.zip file and extract it.
Copy the modified files into the SDK and apply the patch:

cp -r ${download}/patch/* ${SDK}
git apply --reject speechmind_custom_cmd.patch

Note

If git apply reports errors, it indicates a significant version discrepancy between the current SDK and the version used to generate the patch. Please manually integrate the changes from the patch into the corresponding files in the SDK.

Switch to the SDK’s GCC project directory and run menuconfig.py to enter the configuration interface

cd {SDK}/amebalite_gcc_project
./menuconfig.py

Enable DSP through menu navigation

--------MENUCONFIG FOR General---------
CONFIG DSP Enable  --->
   [*] Enable DSP

Config Link through menu navigation

Branch master:

--------MENUCONFIG FOR General---------
CONFIG Link Option  --->
   IMG2(Application) running on FLASH or PSRAM?
      (X) FLASH
      ( ) PSRAM
   IMG2 Data and Heap in SRAM or PSRAM?  --->
      ( ) SRAM
      (X) PSRAM

Branch v1.1:

--------MENUCONFIG FOR General---------
CONFIG Link Option  --->
   IMG2(Application) running on FLASH or PSRAM?
      (X) CodeInXip_DataHeapInPsram
      ( ) CodeInPsram_DataHeapInSram
      ( ) CodeInPsram_DataHeapInPsram
      ( ) CodeInXip_DataHeapInSram

Enable VFS LITTLEFS

--------MENUCONFIG FOR General---------
CONFIG VFS  --->
   [*] Enable VFS LITTLEFS

Enable AIVoice

--------MENUCONFIG FOR General---------
CONFIG TrustZone  --->
...
CONFIG APPLICATION  --->
   GUI Config  --->
   ...
   AI Config  --->
      [ ] Enable TFLITE MICRO
      [*] Enable AIVoice

Enable Speechmind

--------MENUCONFIG FOR General---------
CONFIG TrustZone  --->
...
CONFIG APPLICATION  --->
   GUI Config  --->
   ...
   AI Config  --->
      [ ] Enable TFLITE MICRO
      [*] Enable AIVoice
      [*] Enable SpeechMind

Build

./build.py

Download below images using Flash Programming Tool :

km4_boot_all.bin: Default address
kr4_km4_app.bin: Default address
tts.bin: Address 0x083E0000, 0x087E0000
dsp_all.bin: Address 0x087E0000, 0x08A00000
aivoice_models.bin: Address 0x08A00000, 0x08E00000

Use Online tool and upload the required Excel files containing the command list and corresponding audio prompts. After backend compilation is completed, download the download.zip file and extract it.
Copy the modified files into the SDK and apply the patch:

cp -r ${download}/patch/* ${SDK}
git apply --reject speechmind_custom_cmd.patch

Note

If git apply reports errors, it indicates a significant version discrepancy between the current SDK and the version used to generate the patch. Please manually integrate the changes from the patch into the corresponding files in the SDK.

Switch to the SDK’s GCC project directory and run menuconfig.py to enter the configuration interface

cd {SDK}/amebalite_gcc_project
./menuconfig.py

Enable DSP through menu navigation

--------MENUCONFIG FOR General---------
CONFIG DSP Enable  --->
   [*] Enable DSP

Config Link through menu navigation

Branch master:

--------MENUCONFIG FOR General---------
CONFIG Link Option  --->
   IMG2(Application) running on FLASH or PSRAM?
      (X) FLASH
      ( ) PSRAM
   IMG2 Data and Heap in SRAM or PSRAM?  --->
      ( ) SRAM
      (X) PSRAM

Branch v1.1:

--------MENUCONFIG FOR General---------
CONFIG Link Option  --->
   IMG2(Application) running on FLASH or PSRAM?
      (X) CodeInXip_DataHeapInPsram
      ( ) CodeInPsram_DataHeapInSram
      ( ) CodeInPsram_DataHeapInPsram
      ( ) CodeInXip_DataHeapInSram

Enable VFS LITTLEFS

--------MENUCONFIG FOR General---------
CONFIG VFS  --->
   [*] Enable VFS LITTLEFS

Enable AIVoice

--------MENUCONFIG FOR General---------
CONFIG TrustZone  --->
...
CONFIG APPLICATION  --->
   GUI Config  --->
   ...
   AI Config  --->
      [ ] Enable TFLITE MICRO
      [*] Enable AIVoice

Enable Speechmind

--------MENUCONFIG FOR General---------
CONFIG TrustZone  --->
...
CONFIG APPLICATION  --->
   GUI Config  --->
   ...
   AI Config  --->
      [ ] Enable TFLITE MICRO
      [*] Enable AIVoice
      [*] Enable SpeechMind

Build

./build.py

Download below images using Flash Programming Tool :

km4_boot_all.bin: Default address
kr4_km4_app.bin: Default address
tts.bin: Address 0x083E0000, 0x087E0000
dsp_all.bin: Address 0x087E0000, 0x08A00000
aivoice_models.bin: Address 0x08A00000, 0x08E00000

Use Online tool and upload the required Excel files containing the command list and corresponding audio prompts. After backend compilation is completed, download the download.zip file and extract it.
Copy the modified files into the SDK and apply the patch:

cp -r ${download}/patch/* ${SDK}
git apply --reject speechmind_custom_cmd.patch

Note

If git apply reports errors, it indicates a significant version discrepancy between the current SDK and the version used to generate the patch. Please manually integrate the changes from the patch into the corresponding files in the SDK.

Switch to the SDK’s GCC project directory and run menuconfig.py to enter the configuration interface

cd {SDK}/amebasmart_gcc_project
./menuconfig.py

Select run application from Flash through menu navigation

Branch master:

--------MENUCONFIG FOR General---------
CONFIG Link Option  --->
   IMG2(Application) running on PSRAM or FLASH?  --->
      ( ) PSRAM
      (X) FLASH

Branch v1.1:

--------MENUCONFIG FOR General---------
CONFIG BOOT OPTION --->
   [*] XIP_FLASH

Enable VFS LITTLEFS

--------MENUCONFIG FOR General---------
CONFIG VFS  --->
   [*] Enable VFS LITTLEFS

Enable AIVoice (select algorithm version as needed)

--------MENUCONFIG FOR General---------
CONFIG TrustZone  --->
...
CONFIG APPLICATION  --->
   GUI Config  --->
   ...
   AI Config  --->
      [ ] Enable TFLITE MICRO
      [*] Enable AIVoice

Enable Speechmind

--------MENUCONFIG FOR General---------
CONFIG TrustZone  --->
...
CONFIG APPLICATION  --->
   GUI Config  --->
   ...
   AI Config  --->
      [ ] Enable TFLITE MICRO
      [*] Enable AIVoice
      [*] Enable SpeechMind

Choose single core

MENUCONFIG FOR CA32 CONFIG  --->
...
CONFIG SMP  --->
   Select Core Num --->
      ( ) DUAL
      (X) SINGLE

Build image

./build.py

Download below images using Flash Program Tool :

km4_boot_all.bin: Default address
km0_km4_ca32_app.bin: Address 0x08020000, 0x08600000
tts.bin: Address 0x08623000, 0x08A23000

Running and Expected Results

After downloading the images onto the development board, the following voice interaction features can be tested:

Voice Wake-Up

Wake up the device by saying xiao-qiang-xiao-qiang or ni-hao-xiao-qiang. If successful, logs will be printed, and the response audio Master, I'm here will be played.
Timeout

If no interaction occurs after wake-up, the device will announce Master, I'll step back for now. Wake me up if needed. Further interaction requires re-wake-up. The timeout duration can be adjusted in ${SDK}/component/application/speechmind/src/speech_mind.c via:
```
aivoice_param.timeout = 10;
```
Command Recognition

After wake-up, continuous interaction is possible using customized voice commands. If recognized, logs will be printed, and corresponding response audio Command x executed will be played. The supported command list can be found in the boot-up logs.

To modify the response audio, refer to the audio names in tts_content_name in ${download}/patch/component/application/speechmind/src/speech_tts.c, and follow Example 2 in AIVoice Development Guide to replace the audio and prepare the tts.bin file.
Sound Source Localization (SSL)

When SSL is enabled, the device detects the speaker’s direction after each wake-up. The angle will be logged, and the response audio xx degrees will be played. SSL can be enabled/disabled in ${SDK}/component/application/speechmind/src/speech_mind.c via:
```
#define ENABLE_SSL_DOA 1
```