LLM-Powered Voice Interaction Solution

LLM-Powered Voice Interaction Solution

Hybrid offline-online AI solution, bridging local efficiency with cloud intelligence

Overview

Signal Processing

Automatic Speech Recognition

Advantages

Typical Applications

Development Resources

Recommended ICs

Overview

Realtek provides a hybrid offline-online large model voice interaction solution that combines efficient local chip-level voice processing with cloud-based cognitive capabilities, enhancing human-machine interaction experience.

LLM-Powered Voice Interaction Solution Architecture

Signal Processing

AEC (Acoustic Echo Cancellation)

Dual-stage linear cancellation + residual suppression for effective echo removal

BF (Beamforming)

Multi-microphone spatial filtering for targeted speech enhancement

NS (Noise Suppression)

Supports signal processing and neural network two modes for noise reduction

AGC (Automatic Gain Control)

Fixed + adaptive gain adjustment for stable output levels

SSL (Sound Source Localization)

360° directional tracking with microphone arrays

Automatic Speech Recognition

KWS (Keyword Spotting)

Supports fixed keywords and user-defined keywords, fast, accurate on-device response

VAD (Voice Activity Detection)

Accurate speech/silence detection

ASR (Automatic Speech Recognition)

Offline command recognition and customizable command words for real-time control

Key Advantages

Highly Customizable Local Voice Interaction

Custom Wake-up Words: User-level customization for personalized device naming
Custom Voice Commands: Define offline instructions via a configuration platform for rapid productization
Quick Deployment: One-click configuration to adapt to diverse product forms and scenarios

High-Speed Stable Wi-Fi for Chip-Level Voice Interaction

Supports multiple network protocols, compatible with cloud service providers
High throughput & low latency for rapid AI response
Enhanced network stability ensures smooth AI conversations

Professional & Flexible Multimedia Framework

Multi-format audio playback support
High-quality audio output for immersive experience
Versatile interfaces for diverse application scenarios

Typical Applications

Smart Home

Local device control (lights, curtains, AC)
Cloud responses for weather, recipes, news

Smart Toys

Local media control (playback, volume)
Cloud-based Q&A and story telling

Conference Systems

Local signal processing & noise reduction
Cloud transcription & summary generation

Smart Home

Local device control (lights, curtains, AC)
Cloud responses for weather, recipes, news

Smart Toys

Local media control (playback, volume)
Cloud-based Q&A and story telling

Conference Systems

Local signal processing & noise reduction
Cloud transcription & summary generation

Development Resources

Icon	Resource Name	Link
	SDK Download	Link
	AIVoice Development Guide	Link
	Custom Command Guide	Link
	Audio Hardware Design Requirements	Link
	Cloud Platform Reference: Coze	Link
	Contact Us	Link

Recommended ICs

Features	Filter	RTL8721Dx	RTL8720E	RTL8710E	RTL8726E	RTL8713E	RTL8730E	RTL8721F	RTL872xD	RTL8735B
Application Processor	Cortex-M Cortex-A	Cortex-M	Cortex-M	Cortex-M	Cortex-M	Cortex-M	Cortex-A	Cortex-M	Cortex-M	Cortex-M
DSP	Select
ISP?	Select
Arm TrustZone	Select
Dual Band?	Select
Wi-Fi 6	Select
R-MESH?	Select
Ultra-low Power	Select
Ethernet	Select
BT Dual Mode	Select
HMI?	Select
Audio ADC?	Select
Audio DAC?	Select
SDIO Host	Select
SD/EMMC Host	Select
USB	Select
BT Dedicated Antenna?	Select
CAN?	Select

Feature	RTL8721Dx	RTL8726E	RTL8713E	RTL8730E
AFE Single MIC (Speech Recognition Mode)
AFE Single MIC (Voice Communication Mode)
AFE Dual MIC (Speech Recognition Mode)
AFE Three MIC (Speech Recognition Mode)
AEC (Speech Recognition Mode)
AEC (Voice Communication Mode)
BF (Speech Recognition Mode)
BF (Voice Communication Mode)
NS (Speech Recognition Mode)
NS (Voice Communication Mode)
AGC (Speech Recognition Mode)
AGC (Voice Communication Mode)
SSL (Speech Recognition Mode)
SSL (Voice Communication Mode)
KWS Fixed Keyword
KWS User-defined Keyword
VAD
ASR