LLM-Powered Voice Interaction Solution
LLM-Powered Voice Interaction Solution
Hybrid offline-online AI Solution, Bridging Local Efficiency with Cloud Intelligence
Solution Selection
Overview
Realtek provides a hybrid offline-online large model voice interaction solution that combines efficient local chip-level voice processing with cloud-based cognitive capabilities, enhancing human-machine interaction experience.
Signal Processing
AEC (Acoustic Echo Cancellation)
Dual-stage linear cancellation + residual suppression for effective echo removal
BF (Beamforming)
Multi-microphone spatial filtering for targeted speech enhancement
NS (Noise Suppression)
Supports signal processing and neural network two modes for noise reduction
AGC (Automatic Gain Control)
Fixed + adaptive gain adjustment for stable output levels
SSL (Sound Source Localization)
360° directional tracking with microphone arrays
Automatic Speech Recognition
KWS (Keyword Spotting)
-
Supports fixed keywords and user-defined keywords, fast, accurate on-device response
VAD (Voice Activity Detection)
-
Accurate speech/silence detection
ASR (Automatic Speech Recognition)
-
Offline command recognition for real-time control
Key Advantages
High-Speed Stable Wi-Fi for Chip-Level Voice Interaction
- Supports multiple network protocols, compatible with cloud service providers
- High throughput & low latency for rapid AI response
- Enhanced network stability ensures smooth AI conversations
Professional & Flexible Multimedia Framework
- Multi-format audio playback support
- High-quality audio output for immersive experience
- Versatile interfaces for diverse application scenarios
Typical Applications
Smart Home
- Local device control (lights, curtains, AC)
- Cloud responses for weather, recipes, news
Smart Toys
- Local media control (playback, volume)
- Cloud-based Q&A and story telling
Conference Systems
- Local signal processing & noise reduction
- Cloud transcription & summary generation
Smart Home
- Local device control (lights, curtains, AC)
- Cloud responses for weather, recipes, news
Smart Toys
- Local media control (playback, volume)
- Cloud-based Q&A and story telling
Conference Systems
- Local signal processing & noise reduction
- Cloud transcription & summary generation
Development Resources
![]() |
SDK Download | Link |
![]() |
AIVoice Development Guide | Link |
![]() |
Audio Hardware Design Requirements | Link |
![]() |
Cloud Platform Reference: Coze | Link |
![]() |
Contact Us | Link |
Recommended ICs
| Features | Filter | RTL872xD | RTL8721Dx | RTL8721F | RTL8720E | RTL8710E | RTL8726E | RTL8713E | RTL8730E | RTL8735B |
|---|---|---|---|---|---|---|---|---|---|---|
| Application Processor |
Cortex-M | Cortex-M | Cortex-M | Cortex-M | Cortex-M | Cortex-M | Cortex-M | Cortex-A | Cortex-M | |
| DSP | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | |
| ISP | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
| TrustZone | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | |
| Dual Band | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | |
| Wi-Fi6 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | |
| R-MESH | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Ultra-low power | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Ethernet | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
| BT Dual Mode | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | |
| HMI | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | |
| Audio ADC | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | |
| Audio DAC | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | |
| USB | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | |
|
BT Dedicated Antenna |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| Feature | RTL8721Dx | RTL8726E | RTL8713E | RTL8730E |
|---|---|---|---|---|
| AFE Single MIC (Speech Recognition Mode) | ||||
| AFE Single MIC (Voice Communication Mode) | / | |||
| AFE Dual MIC (Speech Recognition Mode) | / | |||
| AFE Three MIC (Speech Recognition Mode) | / | |||
| AEC (Speech Recognition Mode) | ||||
| AEC (Voice Communication Mode) | / | |||
| BF (Speech Recognition Mode) | / | |||
| BF (Voice Communication Mode) | / | / | / | / |
| NS (Speech Recognition Mode) | ||||
| NS (Voice Communication Mode) | / | |||
| AGC (Speech Recognition Mode) | ||||
| AGC (Voice Communication Mode) | / | |||
| SSL (Speech Recognition Mode) | / | |||
| SSL (Voice Communication Mode) | / | / | / | / |
| KWS Fixed Keyword | ||||
| KWS User-defined Keyword | / | |||
| VAD | ||||
| ASR | / |



