ASR (Automatic Speech Recognition)

Supported ICs[ RTL8726E ][ RTL8713E ][ RTL8730E ]

Overview

ASR is the module to recognize speech to text.

AIVoice provides an offline ASR algorithm for voice command detection, designed for voice interaction applications that require fast, reliable, and network-independent performance.

Core Features

  • Continuous Interaction: Supports multi-turn recognition after a single wake-up, with no need for repeated wake-up prompts.

  • Language Support: Currently supports only Mandarin Chinese.

  • Command Capacity: Recognizes up to 200 commands.

  • Resource Composition: Recognition resources include a model file and an FST file (Finite-State Transducer). To update or customize commands, only the FST file needs to be replaced, with no changes required to the model.

Predefined & Customization

  • The SDK includes a default set of 40 predefined commands for air conditioning scenarios, such as 打开空调 (turn on air conditioning) and 关闭空调 (turn off air conditioning).

  • To customize commands, you can use our online generation tool to create the corresponding FST resource file for quick integration into your application.

Customized Services

If your project requirements exceed the following standard capabilities, please contact our business team for a customized solution:

  • Command vocabulary exceeding 200 entries

  • Support for additional languages

  • Deep performance optimization or algorithm customization for specific scenarios

Configurations

ASR configurable parameters:

sensitivity:

Three levels of sensitivity are provided with predefined internal parameters.The higher, easier to detect commands but also more false alarm.

Refer to ${aivoice_lib_dir}/include/aivoice_asr_config.h for details.

Custom Command Guide

Hardware Requirements

  • Chip Models

    • RTL8713ECM-VA4-CG

    • RTL8730EAM-VA6-CG

  • Flash Size: ≥ 16 MB

  • Dual-microphone Spacing: 50 mm

  • Microphone and Loopback Channel Mapping:

    • RTL8713ECM-VA4-CG: AMIC1, AMIC2, AMIC3

    • RTL8730EAM-VA6-CG: AMIC1, AMIC3, AMIC5

  • One external speaker

Software Requirements

Prepare ameba-rtos repository (referred to as ${SDK}).

  • Master Branch: Refer to SDK Download for the extended XDK download method.

  • Branch 1.1:

git clone https://github.com/Ameba-AIoT/ameba-rtos.git
cd ameba-rtos
git checkout remotes/origin/release/v1.1 -b release/v1.1

Operation Instructions

  1. Use Online tool and upload the required Excel files containing the command list and corresponding audio prompts. After backend compilation is completed, download the download.zip file and extract it.

  2. Copy the modified files into the SDK and apply the patch:

cp -r ${download}/patch/* ${SDK}
git apply --reject speechmind_custom_cmd.patch

Note

If git apply reports errors, it indicates a significant version discrepancy between the current SDK and the version used to generate the patch. Please manually integrate the changes from the patch into the corresponding files in the SDK.

  1. Switch to the SDK’s GCC project directory and run menuconfig.py to enter the configuration interface

cd {SDK}/amebalite_gcc_project
./menuconfig.py
  1. Enable DSP through menu navigation

--------MENUCONFIG FOR General---------
CONFIG DSP Enable  --->
   [*] Enable DSP
  1. Config Link through menu navigation

  • Branch master:

--------MENUCONFIG FOR General---------
CONFIG Link Option  --->
   IMG2(Application) running on FLASH or PSRAM?
      (X) FLASH
      ( ) PSRAM
   IMG2 Data and Heap in SRAM or PSRAM?  --->
      ( ) SRAM
      (X) PSRAM
  • Branch v1.1:

--------MENUCONFIG FOR General---------
CONFIG Link Option  --->
   IMG2(Application) running on FLASH or PSRAM?
      (X) CodeInXip_DataHeapInPsram
      ( ) CodeInPsram_DataHeapInSram
      ( ) CodeInPsram_DataHeapInPsram
      ( ) CodeInXip_DataHeapInSram
  1. Enable VFS LITTLEFS

--------MENUCONFIG FOR General---------
CONFIG VFS  --->
   [*] Enable VFS LITTLEFS
  1. Enable AIVoice

--------MENUCONFIG FOR General---------
CONFIG TrustZone  --->
...
CONFIG APPLICATION  --->
   GUI Config  --->
   ...
   AI Config  --->
      [ ] Enable TFLITE MICRO
      [*] Enable AIVoice
  1. Enable Speechmind

--------MENUCONFIG FOR General---------
CONFIG TrustZone  --->
...
CONFIG APPLICATION  --->
   GUI Config  --->
   ...
   AI Config  --->
      [ ] Enable TFLITE MICRO
      [*] Enable AIVoice
      [*] Enable SpeechMind
  1. Build

./build.py
  1. Download below images using Flash Programming Tool :

  • km4_boot_all.bin: Default address

  • kr4_km4_app.bin: Default address

  • tts.bin: Address 0x083E0000, 0x087E0000

  • dsp_all.bin: Address 0x087E0000, 0x08A00000

  • aivoice_models.bin: Address 0x08A00000, 0x08E00000

Running and Expected Results

After downloading the images onto the development board, the following voice interaction features can be tested:

  • Voice Wake-Up

    Wake up the device by saying xiao-qiang-xiao-qiang or ni-hao-xiao-qiang. If successful, logs will be printed, and the response audio Master, I'm here will be played.

  • Timeout

    If no interaction occurs after wake-up, the device will announce Master, I'll step back for now. Wake me up if needed. Further interaction requires re-wake-up. The timeout duration can be adjusted in ${SDK}/component/application/speechmind/src/speech_mind.c via:

    aivoice_param.timeout = 10;
    
  • Command Recognition

    After wake-up, continuous interaction is possible using customized voice commands. If recognized, logs will be printed, and corresponding response audio Command x executed will be played. The supported command list can be found in the boot-up logs.

    To modify the response audio, refer to the audio names in tts_content_name in ${download}/patch/component/application/speechmind/src/speech_tts.c, and follow Example 2 in AIVoice Development Guide to replace the audio and prepare the tts.bin file.

  • Sound Source Localization (SSL)

    When SSL is enabled, the device detects the speaker’s direction after each wake-up. The angle will be logged, and the response audio xx degrees will be played. SSL can be enabled/disabled in ${SDK}/component/application/speechmind/src/speech_mind.c via:

    #define ENABLE_SSL_DOA 1