ASR (Automatic Speech Recognition)

Supported ICs[ RTL8726E ][ RTL8713E ][ RTL8730E ]

Overview

ASR is the module to recognize speech to text.

AIVoice provides an offline ASR algorithm for voice command detection, designed for voice interaction applications that require fast, reliable, and network-independent performance.

Core Features

  • Continuous Interaction: Supports multi-turn recognition after a single wake-up, with no need for repeated wake-up prompts.

  • Language Support: Currently supports only Mandarin Chinese.

  • Command Capacity: Recognizes up to 200 commands.

  • Resource Composition: Recognition resources include a model file and an FST file (Finite-State Transducer). To update or customize commands, only the FST file needs to be replaced, with no changes required to the model.

Predefined & Customization

  • The SDK includes a default set of 40 predefined commands for air conditioning scenarios, such as 打开空调 (turn on air conditioning) and 关闭空调 (turn off air conditioning).

  • To customize commands, you can use our online generation tool to create the corresponding FST resource file for quick integration into your application.

Customized Services

If your project requirements exceed the following standard capabilities, please contact our business team for a customized solution:

  • Command vocabulary exceeding 200 entries

  • Support for additional languages

  • Deep performance optimization or algorithm customization for specific scenarios

Configurations

ASR configurable parameters:

sensitivity:

Three levels of sensitivity are provided with predefined internal parameters.The higher, easier to detect commands but also more false alarm.

Refer to ${aivoice_lib_dir}/include/aivoice_asr_config.h for details.

Custom Command Guide

Hardware Requirements

  • Chip Models

    • RTL8713ECM-VA4-CG

    • RTL8730EAM-VA6-CG

  • Flash Size: ≥ 16 MB

  • Dual-microphone Spacing: 50 mm

  • Microphone and Loopback Channel Mapping:

    • RTL8713ECM-VA4-CG: AMIC1, AMIC2, AMIC3

    • RTL8730EAM-VA6-CG: AMIC1, AMIC3, AMIC5

  • One external speaker

Software Requirements

Prepare ameba-rtos repository (referred to as ${SDK}), with reference to SDK Download for the extended XDK download method.

Command Word Format

Operation Instructions

RTL8726E:
  1. Use Online tool and upload the required Excel files containing the command list and corresponding audio prompts. After backend compilation is completed, download the download.zip file and extract it.

  2. Copy the modified files into the SDK and apply the patch:

cp -r ${download}/patch/* ${SDK}
cd ${SDK}/
python component/aivoice/tools/patch_custom_cmd.py RTL8713E

Note

Please ensure you are in the SDK root directory before running this script. Otherwise, the script may fail to locate the files to be modified, causing the operation to fail.

  1. Set up the compilation environment, and run ameba.py menuconfig to enter the configuration interface

source env.sh
ameba.py soc RTL8726E
ameba.py menuconfig
  1. Enable DSP through menu navigation

--------MENUCONFIG FOR General---------
CONFIG DSP Enable  --->
   [*] Enable DSP
  1. Config Link through menu navigation

--------MENUCONFIG FOR General---------
CONFIG Link Option  --->
   IMG2(Application) running on FLASH or PSRAM?
      (X) FLASH
      ( ) PSRAM
   IMG2 Data and Heap in SRAM or PSRAM?  --->
      ( ) SRAM
      (X) PSRAM
  1. Enable VFS LITTLEFS

--------MENUCONFIG FOR General---------
CONFIG VFS  --->
   [*] Enable VFS LITTLEFS
  1. Enable AIVoice

--------MENUCONFIG FOR General---------
CONFIG TrustZone  --->
...
CONFIG APPLICATION  --->
   GUI Config  --->
   ...
   AI Config  --->
      [ ] Enable TFLITE MICRO
      [*] Enable AIVoice
  1. Enable Speechmind

--------MENUCONFIG FOR General---------
CONFIG TrustZone  --->
...
CONFIG APPLICATION  --->
   GUI Config  --->
   ...
   AI Config  --->
      [ ] Enable TFLITE MICRO
      [*] Enable AIVoice
      [*] Enable SpeechMind
  1. Build

ameba.py build
  1. Download below images using Flash Programming Tool :

  • boot.bin: Default address

  • app.bin: Default address

  • tts.bin: Address 0x083E0000, 0x087E0000

  • dsp_all.bin: Address 0x087E0000, 0x08A00000

  • aivoice_models.bin: Address 0x08A00000, 0x08E00000

Running and Expected Results

After downloading the images onto the development board, the following voice interaction features can be tested:

  • Voice Wake-Up

    Wake up the device by saying xiao-qiang-xiao-qiang or ni-hao-xiao-qiang. If successful, logs will be printed, and the response audio Master, I'm here will be played.

  • Timeout

    If no interaction occurs after wake-up, the device will announce Master, I'll step back for now. Wake me up if needed. Further interaction requires re-wake-up. The timeout duration can be adjusted in ${SDK}/component/application/speechmind/src/speech_mind.c via:

    aivoice_param.timeout = 10;
    
  • Command Recognition

    After wake-up, continuous interaction is possible using customized voice commands. If recognized, logs will be printed, and corresponding response audio Command x executed will be played. The supported command list can be found in the boot-up logs.

    To modify the response audio, refer to Response Audio Replacement.

  • Sound Source Localization (SSL)

    When SSL is enabled, the device detects the speaker’s direction after each wake-up. The angle will be logged, and the response audio xx degrees will be played. SSL can be enabled/disabled in ${SDK}/component/application/speechmind/src/speech_mind.c via:

    #define ENABLE_SSL_DOA 1
    

Response Audio Replacement

To replace TTS response audio, follow the steps below.

Prepare Audio

  • Format: MP3

  • Directory structure:

    tts_mp3/              <-- -dir parameter specifies this directory
    └── tts/
        ├── 10001.mp3    # Wake-up response audio
        ├── 10002.mp3    # Timeout exit response audio
        ├── x.mp3        # Command response audio (x is the command ID)
        └── angle/       # (Optional) Sound source localization angle response audio
            ├── 1000.mp3
            └── ...
    

    Note

    The -dir parameter in the packaging command must use the top-level directory tts_mp3, not the tts directory itself.

  • Audio naming rule: File names correspond to the second column in tts_content_name in ${download}/patch/component/application/speechmind/src/speech_tts.c.

    static const struct tts_content_name tts_names[] = {
        {"I'm here", "10001"},
        {"Goodbye", "10002"},
        {"Cmd 1 done", "1"},
        {"Cmd 2 done", "2"},
        ...
    };
    
  • If sound source localization is enabled, also copy the angle response audio directory to the tts directory:

    cp -r ${SDK}/component/application/speechmind/res/tts/tts/angle tts_mp3/tts
    

Package Audio

python ${SDK}/tools/image_scripts/vfs.py -t LITTLEFS -s 4096 -c 1024 -dir tts_mp3 -out tts.bin
  • The -dir parameter must use the top-level directory containing the tts subdirectory (e.g., tts_mp3 as shown in the example), not the tts directory itself.

  • If “size exceed limit” is reported, it indicates the total audio size exceeds the 4 MB limit. Reduce the MP3 bitrate to decrease the file size.

  • For the tts.bin flashing address, refer to the firmware flashing section for each chip. Default size is 4 MB (-c 1024). To adjust the size, remove the -c parameter and refer to AIVoice Developer Guide Example 2 for layout configuration.