LLM-Powered Voice Interaction Solution
LLM-Powered Voice Interaction Solution
Hybrid offline-online AI Solution, Bridging Local Efficiency with Cloud Intelligence
Overview
Realtek provides a hybrid offline-online large model voice interaction solution that combines efficient local chip-level voice processing with cloud-based cognitive capabilities, enhancing human-machine interaction experience.
Signal Processing
AEC (Acoustic Echo Cancellation)
Dual-stage linear cancellation + residual suppression for effective echo removal
BF (Beamforming)
Multi-microphone spatial filtering for targeted speech enhancement
NS (Noise Suppression)
Supports signal processing and neural network two modes for noise reduction
AGC (Automatic Gain Control)
Fixed + adaptive gain adjustment for stable output levels
SSL (Sound Source Localization)
360° directional tracking with microphone arrays
Automatic Speech Recognition
KWS (Keyword Spotting)
-
Supports fixed keywords and user-defined keywords, fast, accurate on-device response
VAD (Voice Activity Detection)
-
Accurate speech/silence detection
ASR (Automatic Speech Recognition)
-
Offline command recognition and customizable command words for real-time control
Key Advantages
Highly Customizable Local Voice Interaction
- Custom Wake-up Words: User-level customization for personalized device naming
- Custom Voice Commands: Define offline instructions via a configuration platform for rapid productization
- Quick Deployment: One-click configuration to adapt to diverse product forms and scenarios
High-Speed Stable Wi-Fi for Chip-Level Voice Interaction
- Supports multiple network protocols, compatible with cloud service providers
- High throughput & low latency for rapid AI response
- Enhanced network stability ensures smooth AI conversations
Professional & Flexible Multimedia Framework
- Multi-format audio playback support
- High-quality audio output for immersive experience
- Versatile interfaces for diverse application scenarios
Typical Applications
Smart Home
- Local device control (lights, curtains, AC)
- Cloud responses for weather, recipes, news
Smart Toys
- Local media control (playback, volume)
- Cloud-based Q&A and story telling
Conference Systems
- Local signal processing & noise reduction
- Cloud transcription & summary generation
Smart Home
- Local device control (lights, curtains, AC)
- Cloud responses for weather, recipes, news
Smart Toys
- Local media control (playback, volume)
- Cloud-based Q&A and story telling
Conference Systems
- Local signal processing & noise reduction
- Cloud transcription & summary generation
Development Resources
![]() |
SDK Download | Link |
![]() |
AIVoice Development Guide | Link |
![]() |
Custom Command Guide | Link |
![]() |
Audio Hardware Design Requirements | Link |
![]() |
Cloud Platform Reference: Coze | Link |
![]() |
Contact Us | Link |
Recommended ICs
| Features | Filter | RTL8721Dx | RTL8720E | RTL8710E | RTL8726E | RTL8713E | RTL8730E | RTL8721F | RTL872xD | RTL8735B |
|---|---|---|---|---|---|---|---|---|---|---|
| Application Processor |
Cortex-M | Cortex-M | Cortex-M | Cortex-M | Cortex-M | Cortex-A | Cortex-M | Cortex-M | Cortex-M | |
| DSP | ||||||||||
| ISP | ||||||||||
| Arm TrustZone | ||||||||||
| Dual Band | ||||||||||
| Wi-Fi 6 | ||||||||||
| R-MESH | ||||||||||
| Ultra-low Power | ||||||||||
| Ethernet | ||||||||||
| BT Dual Mode | ||||||||||
| HMI | ||||||||||
| Audio ADC | ||||||||||
| Audio DAC | ||||||||||
| SDIO Host | ||||||||||
| SD/EMMC Host | ||||||||||
| USB | ||||||||||
|
BT Dedicated Antenna |
||||||||||
| A2C |
| Feature | RTL8721Dx | RTL8726E | RTL8713E | RTL8730E |
|---|---|---|---|---|
| AFE Single MIC (Speech Recognition Mode) | ||||
| AFE Single MIC (Voice Communication Mode) | ||||
| AFE Dual MIC (Speech Recognition Mode) | ||||
| AFE Three MIC (Speech Recognition Mode) | ||||
| AEC (Speech Recognition Mode) | ||||
| AEC (Voice Communication Mode) | ||||
| BF (Speech Recognition Mode) | ||||
| BF (Voice Communication Mode) | ||||
| NS (Speech Recognition Mode) | ||||
| NS (Voice Communication Mode) | ||||
| AGC (Speech Recognition Mode) | ||||
| AGC (Voice Communication Mode) | ||||
| SSL (Speech Recognition Mode) | ||||
| SSL (Voice Communication Mode) | ||||
| KWS Fixed Keyword | ||||
| KWS User-defined Keyword | ||||
| VAD | ||||
| ASR |



