Media Example
Supported ICs: [RTL8735C]
Overview
Category |
Example |
Entry point / source |
Description |
|---|---|---|---|
Video Example |
|
H.264 V1 live streaming over RTSP. |
|
Video Example |
|
Dual-channel video-only MP4 recording and RTSP streaming. |
|
Audio Example |
|
Microphone capture looped back to speaker playback. |
|
Audio Example |
|
AAC encode/decode audio loopback. |
|
Audio Example |
|
Full-duplex two-way audio over RTSP/RTP. |
|
Audio Example |
|
Audio VQE test using AEC, AGC, and noise suppression. |
|
AI Example |
|
SCRFD face detection with UVC or RTSP output and OSD overlay. |
|
AI Example |
|
YOLO/NanoDet object detection on V5 RGB input. |
|
AI Example |
|
Cascaded face detection, embedding, recognition, and OSD overlay. |
|
Integrated Example |
|
Audio, video recording, RTSP streaming, and NN detection together. |
|
Integrated Example |
|
Full AP-side multimedia joint test with recording, streaming, NN, OSD, and file saving. |
Video Example
The following examples demonstrate how to build video pipelines using the Multimedia Framework (MMF). Example source files are located at:
Build the Video Example
Before building the video example, you must enable the required option in menuconfig:
./ameba.py menuconfig
Navigate to:
(Top) > CONFIG VIDEO SOFTWARE > BUILD AP MEDIA EXAMPLE
Enable this option. It will automatically enable all modules required by the video examples.
Then run the following command from the SDK root to compile the video AP example:
./ameba.py build --app video_ap
After a successful build, flash the image to the board and open a serial console to run the examples.
Example Source Files
Source files are located at:
component/soc/<soc>/sw/media_example/ap_media_example/video_example_src/
Each example is registered in an application table using VIDEO_APP_TABLE_SECTION and invoked via the serial console:
VIDEO list # list all registered video examples
VIDEO run [n] # run the example at index n (default: 0)
Some examples export more than one application-table entry. For example, the joint test example provides separate UVC and RTSP entries, so choose the desired output path directly from VIDEO list and run that index.
Note
Only one example can run at a time. Reset the board before switching to a different example.
Note
Examples that use network features (e.g. RTSP streaming) require an active Wi-Fi connection before running. Connect to a router first using the AT command:
AT+WLCONN=ssid,<your_ssid>,pw,<your_password>
Wait for the [$]wifi got ip message confirming the IP address has been assigned, then run the video example. Refer to
Wi-Fi AT Commands
for the full Wi-Fi command reference.
mmf2_video_example_v1_rtsp_init
Source file: ap_media_example/video_example_src/mmf2_video_example_v1_rtsp_init.c
The simplest live streaming example. V1 captures H.264 video and streams it over RTSP. This is the recommended starting point for anyone new to the MMF - it uses only one video_module, one rtsp2_module, and one SISO linker.
Data flow:
[Video V1: H.264 1920x1080 @ 30fps] --> SISO --> [RTSP server]
Connect to rtsp://<device-ip>/stream with VLC or any RTSP-capable player once the board is running.
Compile-time options:
#define AINR_ENA 0 /* AI noise reduction on V1; set 1 to enable */
Key notes:
Requires
CONFIG_LWIP_LAYER=1(a compile-time error will be raised otherwise).MMQI_FLAG_DYNAMICis required for the video module queue because H.264 frame sizes vary per frame.Queue depth
60absorbs burst encoder output and prevents frame drops.To force a keyframe when a new client connects, call
mm_module_ctrl(video_v1_ctx, CMD_VIDEO_FORCE_IFRAME, 0)from the RTSP connected callback.
mmf2_video_example_v1v2_mp4_rtsp_init
Source file: ap_media_example/video_example_src/mmf2_video_example_v1v2_mp4_rtsp_init.c
A dual-channel video-only example. V1 captures HEVC at full resolution for MP4 recording on the SD card; V2 captures H.264 at a lower resolution for simultaneous RTSP live streaming. Both channels share a single MIMO linker - no audio.
Data flow:
[Video V1: HEVC 2688x1520 @ 15fps] --+
+--> MIMO --> [mp4_module] (V1 -> SD card)
[Video V2: H.264 1280x720 @ 30fps] --+
+--> MIMO --> [rtsp2_module] (V2 -> RTSP)
Compile-time options:
#define AINR_ENA 0 /* AI noise reduction; set 1 to enable */
Key notes:
Requires
CONFIG_LWIP_LAYER=1.V1 and V2 have independent frame rates and bitrates - V1 is high-quality for archiving, V2 is optimised for streaming bandwidth.
STORAGE_VIDEO(video-only) is set inmp4_params.record_type; change toSTORAGE_ALLand add an audio chain if audio recording is needed.The MIMO dependency mask keeps the two outputs independent:
MMIC_DEP_INPUT0for MP4 andMMIC_DEP_INPUT1for RTSP.
Audio Example
The following examples demonstrate how to build audio pipelines using the Multimedia Framework (MMF). Example source files are located at:
component/soc/<soc>/sw/media_example/ap_media_example/audio_example_src/
Each example is registered in an application table using AUDIO_APP_TABLE_SECTION and invoked via the serial console:
AUDIO list # list all registered audio examples
AUDIO run [n] # run the example at index n (default: 0)
Note
Only one example can run at a time. Reset the board before switching to a different example.
mmf2_example_audioloop_init
Source file: ap_media_example/audio_example_src/mmf2_example_audioloop_init.c
A simple audio loopback example. Audio is captured from the built-in microphone (AMIC) and immediately played back through the speaker using the audio module.
Data flow:
[Audio Module] --> SISO --> [Audio Module]
(capture) (playback)
Key parameters:
audio_params.sample_rate = ASR_48KHZ;
mmf2_example_aacloop_init
Source file: ap_media_example/audio_example_src/mmf2_example_aacloop_init.c
An audio encode/decode loopback example. Audio captured from the microphone is encoded to AAC, then immediately decoded and played back. This example demonstrates the AAC encoder (module_aac) and AAC decoder (module_aad) pipeline.
Data flow:
[Audio] --> SISO --> [AAC Encoder] --> SISO --> [AAC Decoder] --> SISO --> [Audio]
(capture) (playback)
Key parameters:
aac_params.sample_rate = 16000;
aac_params.channel = 1; // mono
aac_params.trans_type = AAC_TYPE_RAW;
aac_params.object_type = AAC_AOT_LC;
aac_params.bitrate = 32000; // 32 kbps
aad_params.sample_rate = 16000;
aad_params.channel = 1;
aad_params.trans_type = AAD_TYPE_RAW;
aad_params.object_type = AAD_AOT_LC;
mmf2_example_2way_audio_init
Source file: ap_media_example/audio_example_src/mmf2_example_2way_audio_init.c
A two-way audio streaming example using RTSP server and RTP client. Audio captured from the microphone is encoded to AAC and streamed via RTSP, while simultaneously receiving RTP audio streams, decoding them, and playing back through the speaker. This demonstrates full-duplex audio communication.
Data flow:
[Audio] --> SISO --> [aac_module] --> SISO --> [rtsp2_module] -----> (Network)
(capture) (encode) (server) (stream)
[Audio] <-- SISO <-- [aad_module] <-- SISO <-- [rtp_module] <---- (Network)
(playback) (decode) (receive) (send)
Key parameters:
// Audio capture/playback
audio_params.sample_rate = ASR_16KHZ;
// AAC Encoder
aac_params.sample_rate = 16000;
aac_params.channel = 1; // mono
aac_params.trans_type = AAC_TYPE_ADTS;
aac_params.object_type = AAC_AOT_LC;
aac_params.bitrate = 32000; // 32 kbps
// RTSP Server
rtsp2_params.codec_id = AV_CODEC_ID_MP4A_LATM;
rtsp2_params.channel = 1;
rtsp2_params.samplerate = 16000;
// RTP Receiver
rtp_params.port = 16384;
// AAC Decoder
aad_params.sample_rate = 16000;
aad_params.channel = 1;
aad_params.trans_type = AAD_TYPE_RTP_RAW;
aad_params.object_type = AAD_AOT_LC;
Usage:
Connect to RTSP stream via VLC or other RTSP client:
rtsp://<device_ip>:554/The device streams audio at 16kHz mono AAC by default
Simultaneously receives RTP audio at port 16384 and plays back
Stream Audio from Device to VLC Player
Click Media -> Open Network Stream
Enter
rtsp://<device_ip>:554/where<device_ip>is the Ameba IP address, and RTSP server port default is 554Click Play
Stream Audio from VLC Player to Device
Click Media -> Stream
Select File, click Add, choose an audio file, then click Stream
Note
Please select the audio file with format matching the decoder settings (mono, 16kHz sampling rate).
Check the selected file and click Next
Select RTP Audio/Video Profile, click Add
Enter the device IP address in Address field, set Base port to 16384, click Next
Ensure Activate Transcoding is unchecked, click Next -> Stream
The sound can be heard on the board’s 3.5 mm audio jack.
audio_vqe_test
Source file: ap_media_example/audio_example_src/audio_vqe_test.c
An audio Voice Quality Enhancement (VQE) test example using the ASP (Audio Signal Processing) library. This example demonstrates the audio processing pipeline including Acoustic Echo Cancellation (AEC), Automatic Gain Control (AGC), and Noise Suppression (NS) for both send (speaker) and receive (microphone) paths.
Data flow:
[Speaker] --> [VQE SND: AEC + AGC + NS] --> [Output]
^
| (farend reference)
[Microphone] -----+
[Input] --> [VQE RCV: NS + AGC] --> [Microphone TX]
Key components:
VQE_SND (Send path): Processes microphone input with AEC, AGC, and NS to remove echo and noise
VQE_RCV (Receive path): Processes output with NS and AGC for transmit path
Key parameters:
// Frame configuration
#define AUDIO_DMA_PAGE_SIZE 640
#define FRAME_SIZE (AUDIO_DMA_PAGE_SIZE / 2) // 320 samples
Sample rate: 16000 Hz
// RX Path - Noise Suppression
RX_NS.NS_EN = 1;
RX_NS.NSLevel = 10; // Suppression level when no speech
RX_NS.HPFEnable = 0; // High-pass filter disabled
// RX Path - Automatic Level Control (ALC)
RX_AGC.AGC_EN = 1;
RX_AGC.AGCMode = CT_ALC; // ALC mode (vs. CT_LIMITER)
RX_AGC.ReferenceLvl = 0; // Target level 0 dBFS
RX_AGC.RatioFormat = 1; // 8.8 fix point ratio format
RX_AGC.AttackTime = 10; // 10 ms
RX_AGC.ReleaseTime = 50; // 50 ms
RX_AGC.Ratio[0..2] = 50 * 256; // Compression ratio
RX_AGC.Threshold[0] = 39; // Threshold1 in dB
RX_AGC.Threshold[1] = 70; // Threshold2 in dB
RX_AGC.Threshold[2] = 80; // Noise gate level in dB
RX_AGC.NoiseFloorAdaptEnable = 1;
RX_AGC.RMSDetectorEnable = 1;
RX_AGC.MaxGainLimit = 30; // 30 dB max gain
// RX Path - Acoustic Echo Cancellation
RX_AEC.AEC_EN = 1;
RX_AEC.EchoTailLen = 60; // Echo tail length in ms
RX_AEC.CNGEnable = 1; // Comfort noise generation enabled
RX_AEC.PPLevel = 4; // Post-processing level (1-18)
RX_AEC.DTControl = 1; // Double-talk control type
// TX Path - Noise Suppression
TX_NS.NS_EN = 1;
TX_NS.NSLevel = 10;
// TX Path - Automatic Level Control
TX_AGC.AGC_EN = 1;
TX_AGC.AGCMode = CT_ALC;
TX_AGC.ReferenceLvl = 0;
TX_AGC.RatioFormat = 1;
TX_AGC.AttackTime = 10;
TX_AGC.ReleaseTime = 50;
TX_AGC.Ratio[0..2] = 50 * 256;
TX_AGC.Threshold[0] = 39;
TX_AGC.Threshold[1] = 70;
TX_AGC.Threshold[2] = 80;
TX_AGC.NoiseFloorAdaptEnable = 1;
TX_AGC.RMSDetectorEnable = 1;
TX_AGC.MaxGainLimit = 30;
ASP API Functions:
Function |
Description |
|---|---|
|
Initialize send path VQE (AEC + AGC + NS) |
|
Process one frame through send path |
|
Destroy send path VQE context |
|
Initialize receive path noise suppression |
|
Process one frame through receive NS |
|
Destroy receive path NS context |
|
Initialize receive path AGC |
|
Process one frame through receive AGC |
|
Destroy receive path AGC context |
ASP Configuration Structures:
typedef struct CTNS_cfg_s {
int16_t NS_EN; // Enable noise suppression
int16_t NSLevel; // Suppression level (0-10)
int16_t HPFEnable; // High-pass filter enable
int16_t NSSlowConvergence; // Slow convergence time (ms)
int16_t QuickConvergenceEnable; // Quick convergence enable
} CTNS_cfg_t;
typedef struct CTAGC_cfg_s {
int16_t AGC_EN; // Enable AGC
CT_AGC_MODE AGCMode; // CT_ALC or CT_LIMITER
int16_t ReferenceLvl; // Reference level in dB
int16_t RatioFormat; // Ratio format (0: integer, 1: 8.8 fix point)
int16_t AttackTime; // Attack time in ms
int16_t ReleaseTime; // Release time in ms
int16_t Ratio[3]; // Compression ratios
int16_t Threshold[3]; // Thresholds (Threshold1, Threshold2, NoiseGateLvl)
int16_t KneeWidth; // Knee width
int16_t NoiseFloorAdaptEnable; // Noise floor adaptation enable
int16_t RMSDetectorEnable; // RMS detector enable
int16_t MaxGainLimit; // Maximum gain limit in dB
} CTAGC_cfg_t;
typedef struct CTAEC_cfg_s {
int16_t AEC_EN; // Enable AEC
int16_t EchoTailLen; // Echo tail length in ms
int16_t CNGEnable; // Comfort noise generation enable
int16_t PPLevel; // Post-processing level (1-18)
int16_t DTControl; // Double-talk control type
int16_t ConvergenceTime; // Convergence time
} CTAEC_cfg_t;
Usage:
Run via serial console:
audio run <index>(find index withaudio list)The example reads pink noise from SD card (
pink_noise.bin) as farend inputProcessed output is saved to
asp_rx.binTest duration is 30 seconds (configurable via
AUDIO_TEST_DURATION)
Notes:
Requires SD card with
pink_noise.bintest file for farend inputUses 16kHz sample rate with 320-sample frames (20ms)
Microphone input is processed through AEC, AGC, and NS pipeline
AI Example
Build the NN Video Examples
The NN video examples are built as part of the AP media example application. Before building, enable the required option in menuconfig:
./ameba.py menuconfig
Navigate to:
(Top) > CONFIG VIDEO SOFTWARE > BUILD AP MEDIA EXAMPLE
Enable this option. It will automatically enable the multimedia framework modules required by the NN video examples, including video_module, vipnn_module, uvcd_module / rtsp2_module when available, array_module, and facerecog_module.
Then run the following command from the SDK root to compile the video AP application:
./ameba.py build --app video_ap
Before running the examples, make sure the model files have been deployed to LittleFS as described in
AI NPU User Guide
. The SDK NN video examples use vfs:/ model paths by default.
After a successful build, flash the firmware image and the LittleFS model image to the board, then open a serial console. The examples are registered in the video application table and can be listed and run with:
VIDEO list # list all registered video examples
VIDEO run [n] # run the example at index n (default: 0)
Note
Only one example can run at a time. Reset the board before switching to a different example.
Note
NN video examples that use RTSP require an active Wi-Fi connection before running. Connect to a router first using the AT command:
AT+WLCONN=ssid,<your_ssid>,pw,<your_password>
Wait for the [$]wifi got ip message confirming the IP address has been assigned, then run the RTSP example. UVC examples require CONFIG_USBD_UVC=1.
Face Detection Example
Source: ap_media_example/video_example_src/mmf2_video_example_nn_face_detection_init.c
Streams H.264 video (V1) to UVC or RTSP while simultaneously running SCRFD face detection on an RGB stream (V5). V1 is the display stream and OSD coordinate space; V5 is the NN input stream.
Data flow:
[Video V1: H.264 1920x1080@30fps] --> SISO --> [UVC / RTSP]
|
(OSD overlay)
[Video V5: RGB 576x320@10fps ] --> SISO --> [VIPNN: SCRFD face detection]
|
nn_display_cb()
(draws boxes & landmarks)
Compile-time options:
#define FACEDET_STREAM_WIDTH 1920
#define FACEDET_STREAM_HEIGHT 1080
#define FACEDET_STREAM_FPS 30
#define FACEDET_STREAM_GOP 30
#define FACEDET_STREAM_BPS 2000000
#define FACEDET_NN_WIDTH 576
#define FACEDET_NN_HEIGHT 320
#define FACEDET_NN_FPS 10
#define FACEDET_USE_ARRAY_INPUT 0 /* 0: V5 RGB input, 1: array_module input */
Note
Two entry points are exported: mmf2_video_example_nn_face_detection_uvc_init (default UVC output) and mmf2_video_example_nn_face_detection_rtsp_init (RTSP output). UVC requires CONFIG_USBD_UVC=1; RTSP requires CONFIG_LWIP_LAYER=1.
NN model:
#include "model_scrfd.h"
#define FACEDET_MODEL_OBJ scrfd
#define FACEDET_MODEL_NAME "vfs:/scrfd_500m_bnkps_shape576x320.nb"
#define FACEDET_NN_WIDTH 576
#define FACEDET_NN_HEIGHT 320
The NN result callback nn_display_cb() maps SCRFD bounding boxes and 5 facial landmark points back to the V1 stream using NN_OSD_SCALE_LETTERBOX and draws them with nn_osd_group. The callback clears and flushes the group each frame, then calls nn_osd_group_kick() so stale overlays can be managed by the same helper API used by other NN examples.
Object Detection Example
Source: ap_media_example/video_example_src/mmf2_video_example_nn_object_detection_init.c
Runs YOLO/NanoDet object detection on a V5 RGB stream while sending V1 H.264 to UVC or RTSP with OSD overlay. The NN input source is selectable between V5 RGB (OBJDET_USE_ARRAY_INPUT=0) and a synthetic array_module buffer (OBJDET_USE_ARRAY_INPUT=1).
Data flow (V5 input, OBJDET_USE_ARRAY_INPUT=0):
[Video V1: H.264 1920x1080@30fps] --> SISO --> [UVC / RTSP]
|
(OSD overlay)
[Video V5: RGB 416x416@10fps] --> SISO --> [VIPNN: YOLO object detection]
|
nn_display_cb()
Data flow (Array input, OBJDET_USE_ARRAY_INPUT=1):
[Array: RGB 416x416@10fps] --> SISO --> [VIPNN: YOLO object detection]
|
nn_display_cb()
Compile-time options:
#define OBJDET_STREAM_WIDTH 1920
#define OBJDET_STREAM_HEIGHT 1080
#define OBJDET_STREAM_FPS 30
#define OBJDET_STREAM_GOP 30
#define OBJDET_STREAM_BPS 2000000
#define OBJDET_NN_WIDTH 416
#define OBJDET_NN_HEIGHT 416
#define OBJDET_NN_FPS 10
#define OBJDET_USE_ARRAY_INPUT 0 /* 0: V5 RGB input, 1: array_module input */
NN model selection:
The example supports the object detection models listed in AI NPU User Guide .
#include "model_yolo.h"
#include "model_yolov9.h"
#define OBJDET_MODEL_OBJ yolov4_tiny
#define OBJDET_MODEL_NAME "vfs:/yolov4_tiny_asymu8.nb"
#define OBJDET_NN_WIDTH 416
#define OBJDET_NN_HEIGHT 416
Model object: Select from available object detection model types (
yolov4_tiny,yolov7_tiny,nanodet_plus_m,yolov9_tiny)Model file path: Supports both LittleFS (
vfs:/) and SD card (sd:) storageResolution:
OBJDET_NN_WIDTHandOBJDET_NN_HEIGHTdescribe the incoming RGB frame. If they differ from the model tensor size, the model pre-processing code resizes the full frame before inference.Output sink: Two entry points are exported:
mmf2_video_example_nn_object_detection_uvc_initandmmf2_video_example_nn_object_detection_rtsp_init.
NN module configuration:
static nn_data_param_t nn_input_params = {
.img = {
.width = OBJDET_NN_WIDTH,
.height = OBJDET_NN_HEIGHT,
},
.codec_type = AV_CODEC_ID_RGB888
};
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_MODEL, (int)&OBJDET_MODEL_OBJ);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_MODEL_FILE_NAME, (int)nn_model_file_name);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_IN_PARAMS, (int)&nn_input_params);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_DISPPOST, (int)nn_display_cb);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_RES_SIZE, sizeof(objdetect_res_t));
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_RES_MAX_CNT, 32);
mm_module_ctrl(vipnn_ctx, MM_CMD_SET_QUEUE_LEN, 1);
mm_module_ctrl(vipnn_ctx, MM_CMD_INIT_QUEUE_ITEMS, MMQI_FLAG_STATIC);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_APPLY, 0);
The callback draws class labels and confidence scores onto V1 through nn_osd_group. Object detection uses NN_OSD_SCALE_STRETCH when mapping normalised detector coordinates back to the encoded stream.
Face Recognition Example
Source: ap_media_example/video_example_src/mmf2_video_example_nn_face_recognition_init.c
A two-stage cascaded NN pipeline for face detection and recognition with OSD overlay. SCRFD detects faces in V5 RGB frames; MobileFaceNet extracts 128-dim embeddings; facerecog_module identifies faces by comparing against a stored database. Known faces are highlighted in green, unknown in red using a two-layer nn_osd_group.
Data flow:
[Video V1: H.264 1920x1080 @ 30fps] --> SISO --> [UVC / RTSP]
|
(OSD overlay)
[Video V5: RGB 576x320 @ 10fps] --> SISO --> [VIPNN: SCRFD face detection]
|
(cascaded output)
|
[VIPNN: MobileFaceNet embedding]
|
[facerecog_module: identity match]
|
face_recognition_draw_object()
Compile-time options:
#define STREAM_WIDTH 1920 /* encoded output resolution */
#define STREAM_HEIGHT 1080
#define STREAM_FPS 30
#define STREAM_GOP 30
#define STREAM_BPS 2000000
#define NN_WIDTH 576 /* must match SCRFD model input shape */
#define NN_HEIGHT 320
#define NN_FPS 10
NN models used:
#include "model_scrfd.h"
#include "model_mobilefacenet.h"
#define FACEDET_MODEL_OBJ scrfd
#define FACENET_MODEL_OBJ mbfacenet_fwfs
#define FACEDET_MODEL_NAME "vfs:/scrfd_500m_bnkps_shape576x320.nb"
#define FACENET_MODEL_NAME "vfs:/mobilefacenet_pcqsymi8.nb"
#define NN_WIDTH 576 /* SCRFD input width */
#define NN_HEIGHT 320 /* SCRFD input height */
facerecog_module configuration:
facerecog_ctx = mm_module_open(&facerecog_module);
mm_module_ctrl(facerecog_ctx, CMD_FRC_SET_THRES100, 99); /* 0.99 similarity threshold */
mm_module_ctrl(facerecog_ctx, CMD_FRC_SET_OSD_DRAW, (int)face_recognition_draw_object);
mm_module_ctrl(facerecog_ctx, CMD_FRC_LOAD_FEATURES, 0); /* load from vfs:/face_feature.bin */
OSD group configuration (two layers: green for known, red for unknown):
static const nn_osd_layer_config_t face_recognition_osd_layer[] = {
{ .stream_id = V1_STREAM_ID, .alpha = 128, .y = 0, .u = 0, .v = 0 }, /* green */
{ .stream_id = V1_STREAM_ID, .alpha = 128, .y = 76, .u = 85, .v = 255 }, /* red */
};
nn_osd_group_init(&face_recognition_osd, STREAM_WIDTH, STREAM_HEIGHT,
face_recognition_osd_layer, 2,
FACE_RECOG_OSD_WATCHDOG /* 1 = enable watchdog */);
Runtime shell commands:
The example registers the following console commands for managing the face database:
Command |
Description |
|---|---|
|
Enter register mode. The next detected face is stored under the given name. |
|
Return to recognition mode. |
|
Load registered face features from |
|
Save registered face features to |
|
Reset registered features in RAM (does not affect the file). |
|
List all registered face names in RAM. |
|
Set the recognition similarity threshold (integer, divided by 100). Example: |
Two entry points are exported:
mmf2_video_example_nn_face_recognition_uvc_init- V1 encoded output to UVC (default VIDEO app table entry)mmf2_video_example_nn_face_recognition_rtsp_init- V1 encoded output to RTSP
Note
The SCRFD stage must enable module output (CMD_VIPNN_SET_OUTPUT) and be configured as a data group start (MM_CMD_SET_DATAGROUP, MM_GROUP_START). MobileFaceNet is configured as the group end (MM_GROUP_END) and uses VIPNN_CMODE_ALL_ROI so it runs once per detected face.
Integrated Example
mmf2_video_example_av_rtsp_mp4_nn_init
Source file: ap_media_example/video_example_src/mmf2_video_example_av_rtsp_mp4_nn_init.c
A comprehensive Audio+Video example combining RTSP live streaming, MP4 SD card recording, and NN object detection simultaneously.
Data flow:
[Audio (16kHz AMIC)] --> SISO --> [AAC Encoder] ---+
|
[Video V1: HEVC 2688x1520@15fps] ---+ +--> MIMO --> [MP4 Recording]
[Video V2: H.264 1280x720@30fps ] --+ |
+-- MIMO ------+--> [RTSP Streaming (V2 + Audio)]
[Video V5: RGB 416x416@10fps ] --> SISO --> [VIPNN: YOLO object detection]
|
nn_display_cb()
Compile-time options:
#define AINR_ENA 1 /* AI noise reduction on video channels */
#define AUDIO_ENA 1 /* enable audio for both RTSP and MP4 */
#define NN_ENA 1 /* enable NN object detection on V5 */
Video parameters (V1 for MP4 recording):
video_v1_params.format = VIDEO_HEVC;
video_v1_params.width = 2688;
video_v1_params.height = 1520;
video_v1_params.fps = 15;
video_v1_params.bps = 2000000;
video_v1_params.rc_mode = ENC_VBR;
Video parameters (V2 for RTSP streaming):
video_v2_params.format = VIDEO_H264;
video_v2_params.width = 1280;
video_v2_params.height = 720;
video_v2_params.fps = 30;
video_v2_params.bps = 500000;
MP4 recording parameters:
mp4_params.record_length = 10; // seconds per file
mp4_params.record_file_num = 3; // rolling 3 files
mp4_params.record_file_name = "AmebaPro_recording";
mp4_params.mp4_audio_format = AUDIO_AAC;
NN model:
#define NN_MODEL_OBJ yolov4_tiny
#define NN_MODEL_NAME "vfs:/yolov4_tiny_asymu8.nb"
#define NN_WIDTH 416
#define NN_HEIGHT 416
mmf2_video_example_joint_test_uvc_init / rtsp_init
Source file: ap_media_example/video_example_src/mmf2_video_example_joint_test_init.c
The full joint test example enabling up to 5 simultaneous video channels with optional AI noise reduction, live streaming (UVC or RTSP), MP4 recording, NN detection with OSD overlay, and file saving. This example exercises the full feature set of the AP-side multimedia pipeline.
The SDK now registers two runtime entries for this source file:
mmf2_video_example_joint_test_uvc_init
mmf2_video_example_joint_test_rtsp_init
Both entries share the same internal initialisation path. The selected VIDEO list entry sets the V2 live-stream sink before modules are opened: UVC routes V2 to uvcd_module; RTSP routes V2 to rtsp2_module.
Channel configuration:
Ch |
ID |
Default resolution / format |
Purpose |
|---|---|---|---|
V1 |
0 |
HEVC 2688x1520 @ 15fps |
Video recording (MP4) |
V2 |
1 |
H.264 1280x720 @ 30fps |
Live streaming (UVC / RTSP) |
V3 |
2 |
JPEG 1920x1080 @ 1fps |
JPEG snapshot |
V4 |
3 |
NV12 640x480 @ 10fps |
Motion detection |
V5 |
4 |
RGB 416x416 @ 10fps |
NN AI detection |
Compile-time options:
#define V1_ENA 1 /* enable V1 channel */
#define V2_ENA 1 /* enable V2 channel */
#define V3_ENA 1 /* enable V3 channel */
#define V4_ENA 1 /* enable V4 channel */
#define V5_ENA 1 /* enable V5 channel */
#define AINR_ENA 1 /* AI noise reduction */
#define MP4_ENA 1 /* MP4 SD card recording on V1 */
#define NN_ENA 0 /* NN object detection on V5 */
#define NN_OSD_ENA 1 /* draw NN detection result on V2 stream */
#define V5_RGB_WIDTH 416
#define V5_RGB_HEIGHT 416
The V2 output sink is no longer selected by the VAPP command. Use VIDEO list to choose either the UVC entry or the RTSP entry before running the example. The UVC entry requires CONFIG_USBD_UVC=1. The RTSP entry requires CONFIG_LWIP_LAYER=1 and waits until Wi-Fi is connected before opening rtsp2_module.
NN model selection:
/* Supported models: yolov4_tiny, yolov7_tiny, nanodet_plus_m */
#define NN_MODEL_OBJ yolov4_tiny
#define NN_MODEL_NAME "vfs:/yolov4_tiny_asymu8.nb" /* or "sd:yolov4_tiny_asymu8.nb" */
#define NN_WIDTH 416
#define NN_HEIGHT 416
If the selected .nb includes a preprocessing or scaling layer, set V5_RGB_WIDTH / V5_RGB_HEIGHT to the actual V5 RGB output size and keep NN_WIDTH / NN_HEIGHT as the logical detector input size. The example passes both values to VIPNN:
static nn_data_param_t nn_input_params = {
.img = {
.width = V5_RGB_WIDTH,
.height = V5_RGB_HEIGHT,
.model_width = NN_WIDTH,
.model_height = NN_HEIGHT,
},
.codec_type = AV_CODEC_ID_RGB888
};
NN object class filtering:
The example supports filtering detected objects by class ID:
static int desired_class_list[] = {0}; /* 0: person (COCO dataset) */
Only objects matching the class IDs in desired_class_list[] will be displayed. Refer to the COCO dataset class labels for available class IDs.
NN OSD overlay feature:
When NN_OSD_ENA is enabled, the example draws detected object bounding boxes onto the V2 streaming channel using the nn_osd_group API. The group manages one 1bpp bitmap canvas per color layer, assigns hardware OSD indexes automatically, and optionally clears stale overlays via a watchdog when NN callbacks stop.
The OSD group is initialised once (lazy, safe to call every frame) and updated each callback:
static nn_osd_group_t osd_group;
static const nn_osd_layer_config_t osd_layer[] = {
{
.stream_id = V2_STREAM_ID,
.alpha = 128, /* OSD alpha */
.y = 0, /* green Y */
.u = 0, /* green U */
.v = 0, /* green V */
},
};
/* In the NN display callback: */
nn_osd_group_init(&osd_group, im_w, im_h, osd_layer, 1, 0 /* no watchdog */);
nn_osd_group_lock(&osd_group);
nn_osd_group_clear(&osd_group);
nn_osd_coord_map_t map;
nn_osd_coord_map_init(&map, im_w, im_h, im, NN_OSD_SCALE_STRETCH);
/* for each detection: */
nn_osd_canvas_t *canvas = nn_osd_group_canvas(&osd_group, NN_OSD_LAYER);
nn_osd_coord_map_bbox(&map, rx0, ry0, rx1, ry1, &xmin, &ymin, &xmax, &ymax);
nn_osd_draw_rect(canvas, xmin, ymin, xmax, ymax, 10);
nn_osd_draw_stringf(canvas, xmin, ymin - 24, 3, "%s %d", label, score);
nn_osd_group_flush(&osd_group);
nn_osd_group_unlock(&osd_group);
Runtime test commands:
The joint test example also registers CmdApVideoAppTest(), exposed through the VAPP console command. These commands are intended for dynamic validation without rebuilding firmware:
Encoder-related runtime commands only cover bitrate/QP, GOP, and force-I-frame. They do not change init-only encoder headers, VUI, HRD, or profile-level configuration. Other VAPP commands, such as OSD, AINR, and NN controls, are example-level runtime controls and do not modify encoder init-only settings.
Command |
Description |
|---|---|
|
Change encoder bitrate for channel |
|
Change encoder QP range for channel |
|
Change encoder GOP length for channel |
|
Force an I-frame on channel |
|
Update hardware OSD bitmap parameters manually. Example: |
|
Disable or enable AI noise reduction globally. Example: |
|
Pause or resume VIPNN inference. When paused with |
|
Stop the joint test example and de-initialise linkers/modules in reverse order. |
Note
The VAPP command no longer changes the V2 sink. Stop the current example with VAPP stop if needed, then use VIDEO list and VIDEO run [n] to start the UVC or RTSP joint-test entry.
FileSaver module:
When a channel’s primary output is disabled via its compile-time flag, that channel may fall back to a filesaver_module consumer. V1 uses FileSaver when MP4_ENA=0; V3 (JPEG snapshot) and V4 (NV12 motion detection) always route to FileSaver; V5 routes to FileSaver when NN_ENA=0. V2 no longer has a none / FileSaver sink in this example: the selected entry routes V2 to either UVC or RTSP.