LLM-Powered Voice Interaction Solution

Overview

Realtek provides a hybrid offline-online large model voice interaction solution that combines efficient local chip-level voice processing with cloud-based cognitive capabilities, enhancing human-machine interaction experience.

Key Advantages

Highlights

High-Speed Stable Wi-Fi for Chip-Level Voice Interaction

Supports multiple network protocols, compatible with cloud service providers
High throughput & low latency for rapid AI response
Enhanced network stability ensures smooth AI conversations

Professional & Flexible Multimedia Framework

Multi-format audio playback support
High-quality audio output for immersive experience
Versatile interfaces for diverse application scenarios

Comprehensive Local AI Algorithms

AFE (Acoustic Front-End): Echo cancellation, beamforming, noise suppression, AGC, sound localization
KWS (Keyword Spotting): Fixed & user-defined wake words with fast local response
VAD (Voice Activity Detection): Accurate speech/silence detection
ASR (Automatic Speech Recognition): Offline command recognition for real-time control

Application Scenarios

Realtek’s AI Voice interaction solution is widely used in:

Scenario	Solution
Smart Home	Local device control (lights, curtains, AC) Cloud responses for weather, recipes, news
Smart Toys	Local media control (playback, volume) Cloud-based Q&A and story telling
Conference Systems	Local signal processing & noise reduction Cloud transcription & summary generation

Software Development Resources

Hardware Development Resources

Audio Hardware Design Requirements

Recommended ICs

Feature	RTL8721Dx	RTL8726E	RTL8713E	RTL8730E
AFE Single MIC (Speech Recognition Mode)	Y	Y	Y	Y
AFE Single MIC (Voice Communication Mode)	-	Y	Y	Y
AFE Dual MIC (Speech Recognition Mode)	-	Y	Y	Y
AFE Three MIC (Speech Recognition Mode)	-	Y	Y	Y
AEC (Speech Recognition Mode)	Y	Y	Y	Y
AEC (Voice Communication Mode)	-	Y	Y	Y
BF (Speech Recognition Mode)	-	Y	Y	Y
BF (Voice Communication Mode)	-	-	-	-
NS (Speech Recognition Mode)	Y	Y	Y	Y
NS (Voice Communication Mode)	-	Y	Y	Y
AGC (Speech Recognition Mode)	Y	Y	Y	Y
AGC (Voice Communication Mode)	-	Y	Y	Y
SSL (Speech Recognition Mode)	-	Y	Y	Y
SSL (Voice Communication Mode)	-	-	-	-
KWS Fixed Keyword	Y	Y	Y	Y
KWS User-defined Keyword	-	Y	Y	Y
VAD	Y	Y	Y	Y
ASR	-	Y	Y	Y