Media Example

Supported ICs: [RTL8735C]

Overview

Category

Example

Entry point / source

Description

Video Example

mmf2_video_example_v1_rtsp_init

mmf2_video_example_v1_rtsp_init.c

H.264 V1 live streaming over RTSP.

Video Example

mmf2_video_example_v1v2_mp4_rtsp_init

mmf2_video_example_v1v2_mp4_rtsp_init.c

Dual-channel video-only MP4 recording and RTSP streaming.

Audio Example

mmf2_example_audioloop_init

mmf2_example_audioloop_init.c

Microphone capture looped back to speaker playback.

Audio Example

mmf2_example_aacloop_init

mmf2_example_aacloop_init.c

AAC encode/decode audio loopback.

Audio Example

mmf2_example_2way_audio_init

mmf2_example_2way_audio_init.c

Full-duplex two-way audio over RTSP/RTP.

Audio Example

audio_vqe_test

audio_vqe_test.c

Audio VQE test using AEC, AGC, and noise suppression.

AI Example

Face Detection Example

mmf2_video_example_nn_face_detection_init.c

SCRFD face detection with UVC or RTSP output and OSD overlay.

AI Example

Object Detection Example

mmf2_video_example_nn_object_detection_init.c

YOLO/NanoDet object detection on V5 RGB input.

AI Example

Face Recognition Example

mmf2_video_example_nn_face_recognition_init.c

Cascaded face detection, embedding, recognition, and OSD overlay.

Integrated Example

mmf2_video_example_av_rtsp_mp4_nn_init

mmf2_video_example_av_rtsp_mp4_nn_init.c

Audio, video recording, RTSP streaming, and NN detection together.

Integrated Example

mmf2_video_example_joint_test_uvc_init / rtsp_init

mmf2_video_example_joint_test_init.c

Full AP-side multimedia joint test with recording, streaming, NN, OSD, and file saving.

Video Example

The following examples demonstrate how to build video pipelines using the Multimedia Framework (MMF). Example source files are located at:

Build the Video Example

Before building the video example, you must enable the required option in menuconfig:

./ameba.py menuconfig

Navigate to:

(Top) > CONFIG VIDEO SOFTWARE > BUILD AP MEDIA EXAMPLE

Enable this option. It will automatically enable all modules required by the video examples.

Then run the following command from the SDK root to compile the video AP example:

./ameba.py build --app video_ap

After a successful build, flash the image to the board and open a serial console to run the examples.

Example Source Files

Source files are located at:

component/soc/<soc>/sw/media_example/ap_media_example/video_example_src/

Each example is registered in an application table using VIDEO_APP_TABLE_SECTION and invoked via the serial console:

VIDEO list         # list all registered video examples
VIDEO run [n]      # run the example at index n (default: 0)

Some examples export more than one application-table entry. For example, the joint test example provides separate UVC and RTSP entries, so choose the desired output path directly from VIDEO list and run that index.

Note

Only one example can run at a time. Reset the board before switching to a different example.

Note

Examples that use network features (e.g. RTSP streaming) require an active Wi-Fi connection before running. Connect to a router first using the AT command:

AT+WLCONN=ssid,<your_ssid>,pw,<your_password>

Wait for the [$]wifi got ip message confirming the IP address has been assigned, then run the video example. Refer to Wi-Fi AT Commands for the full Wi-Fi command reference.

mmf2_video_example_v1_rtsp_init

Source file: ap_media_example/video_example_src/mmf2_video_example_v1_rtsp_init.c

The simplest live streaming example. V1 captures H.264 video and streams it over RTSP. This is the recommended starting point for anyone new to the MMF - it uses only one video_module, one rtsp2_module, and one SISO linker.

Data flow:

[Video V1: H.264 1920x1080 @ 30fps] --> SISO --> [RTSP server]

Connect to rtsp://<device-ip>/stream with VLC or any RTSP-capable player once the board is running.

Compile-time options:

#define AINR_ENA    0   /* AI noise reduction on V1; set 1 to enable */

Key notes:

  • Requires CONFIG_LWIP_LAYER=1 (a compile-time error will be raised otherwise).

  • MMQI_FLAG_DYNAMIC is required for the video module queue because H.264 frame sizes vary per frame.

  • Queue depth 60 absorbs burst encoder output and prevents frame drops.

  • To force a keyframe when a new client connects, call mm_module_ctrl(video_v1_ctx, CMD_VIDEO_FORCE_IFRAME, 0) from the RTSP connected callback.

mmf2_video_example_v1v2_mp4_rtsp_init

Source file: ap_media_example/video_example_src/mmf2_video_example_v1v2_mp4_rtsp_init.c

A dual-channel video-only example. V1 captures HEVC at full resolution for MP4 recording on the SD card; V2 captures H.264 at a lower resolution for simultaneous RTSP live streaming. Both channels share a single MIMO linker - no audio.

Data flow:

[Video V1: HEVC 2688x1520 @ 15fps] --+
                                      +--> MIMO --> [mp4_module]   (V1 -> SD card)
[Video V2: H.264 1280x720  @ 30fps] --+
                                      +--> MIMO --> [rtsp2_module] (V2 -> RTSP)

Compile-time options:

#define AINR_ENA    0   /* AI noise reduction; set 1 to enable */

Key notes:

  • Requires CONFIG_LWIP_LAYER=1.

  • V1 and V2 have independent frame rates and bitrates - V1 is high-quality for archiving, V2 is optimised for streaming bandwidth.

  • STORAGE_VIDEO (video-only) is set in mp4_params.record_type; change to STORAGE_ALL and add an audio chain if audio recording is needed.

  • The MIMO dependency mask keeps the two outputs independent: MMIC_DEP_INPUT0 for MP4 and MMIC_DEP_INPUT1 for RTSP.

Audio Example

The following examples demonstrate how to build audio pipelines using the Multimedia Framework (MMF). Example source files are located at:

component/soc/<soc>/sw/media_example/ap_media_example/audio_example_src/

Each example is registered in an application table using AUDIO_APP_TABLE_SECTION and invoked via the serial console:

AUDIO list         # list all registered audio examples
AUDIO run [n]      # run the example at index n (default: 0)

Note

Only one example can run at a time. Reset the board before switching to a different example.

mmf2_example_audioloop_init

Source file: ap_media_example/audio_example_src/mmf2_example_audioloop_init.c

A simple audio loopback example. Audio is captured from the built-in microphone (AMIC) and immediately played back through the speaker using the audio module.

Data flow:

[Audio Module] --> SISO --> [Audio Module]
  (capture)                   (playback)

Key parameters:

audio_params.sample_rate = ASR_48KHZ;

mmf2_example_aacloop_init

Source file: ap_media_example/audio_example_src/mmf2_example_aacloop_init.c

An audio encode/decode loopback example. Audio captured from the microphone is encoded to AAC, then immediately decoded and played back. This example demonstrates the AAC encoder (module_aac) and AAC decoder (module_aad) pipeline.

Data flow:

[Audio] --> SISO --> [AAC Encoder] --> SISO --> [AAC Decoder] --> SISO --> [Audio]
 (capture)                                                                 (playback)

Key parameters:

aac_params.sample_rate  = 16000;
aac_params.channel      = 1;         // mono
aac_params.trans_type   = AAC_TYPE_RAW;
aac_params.object_type  = AAC_AOT_LC;
aac_params.bitrate      = 32000;     // 32 kbps

aad_params.sample_rate  = 16000;
aad_params.channel      = 1;
aad_params.trans_type   = AAD_TYPE_RAW;
aad_params.object_type  = AAD_AOT_LC;

mmf2_example_2way_audio_init

Source file: ap_media_example/audio_example_src/mmf2_example_2way_audio_init.c

A two-way audio streaming example using RTSP server and RTP client. Audio captured from the microphone is encoded to AAC and streamed via RTSP, while simultaneously receiving RTP audio streams, decoding them, and playing back through the speaker. This demonstrates full-duplex audio communication.

Data flow:

[Audio] --> SISO --> [aac_module] --> SISO --> [rtsp2_module] -----> (Network)
(capture)              (encode)                   (server)            (stream)

[Audio] <-- SISO <-- [aad_module] <-- SISO <-- [rtp_module] <---- (Network)
(playback)             (decode)                 (receive)            (send)

Key parameters:

// Audio capture/playback
audio_params.sample_rate = ASR_16KHZ;

// AAC Encoder
aac_params.sample_rate  = 16000;
aac_params.channel      = 1;         // mono
aac_params.trans_type   = AAC_TYPE_ADTS;
aac_params.object_type  = AAC_AOT_LC;
aac_params.bitrate      = 32000;     // 32 kbps

// RTSP Server
rtsp2_params.codec_id   = AV_CODEC_ID_MP4A_LATM;
rtsp2_params.channel    = 1;
rtsp2_params.samplerate = 16000;

// RTP Receiver
rtp_params.port         = 16384;

// AAC Decoder
aad_params.sample_rate  = 16000;
aad_params.channel      = 1;
aad_params.trans_type   = AAD_TYPE_RTP_RAW;
aad_params.object_type  = AAD_AOT_LC;

Usage:

  • Connect to RTSP stream via VLC or other RTSP client: rtsp://<device_ip>:554/

  • The device streams audio at 16kHz mono AAC by default

  • Simultaneously receives RTP audio at port 16384 and plays back

Stream Audio from Device to VLC Player

  1. Click Media -> Open Network Stream

  2. Enter rtsp://<device_ip>:554/ where <device_ip> is the Ameba IP address, and RTSP server port default is 554

  3. Click Play

Stream Audio from VLC Player to Device

  1. Click Media -> Stream

  2. Select File, click Add, choose an audio file, then click Stream

    Note

    Please select the audio file with format matching the decoder settings (mono, 16kHz sampling rate).

  3. Check the selected file and click Next

  4. Select RTP Audio/Video Profile, click Add

  5. Enter the device IP address in Address field, set Base port to 16384, click Next

  6. Ensure Activate Transcoding is unchecked, click Next -> Stream

  7. The sound can be heard on the board’s 3.5 mm audio jack.

audio_vqe_test

Source file: ap_media_example/audio_example_src/audio_vqe_test.c

An audio Voice Quality Enhancement (VQE) test example using the ASP (Audio Signal Processing) library. This example demonstrates the audio processing pipeline including Acoustic Echo Cancellation (AEC), Automatic Gain Control (AGC), and Noise Suppression (NS) for both send (speaker) and receive (microphone) paths.

Data flow:

[Speaker] --> [VQE SND: AEC + AGC + NS] --> [Output]
                  ^
                  | (farend reference)
[Microphone] -----+

[Input] --> [VQE RCV: NS + AGC] --> [Microphone TX]

Key components:

  • VQE_SND (Send path): Processes microphone input with AEC, AGC, and NS to remove echo and noise

  • VQE_RCV (Receive path): Processes output with NS and AGC for transmit path

Key parameters:

// Frame configuration
#define AUDIO_DMA_PAGE_SIZE 640
#define FRAME_SIZE (AUDIO_DMA_PAGE_SIZE / 2)  // 320 samples
Sample rate: 16000 Hz

// RX Path - Noise Suppression
RX_NS.NS_EN = 1;
RX_NS.NSLevel = 10;           // Suppression level when no speech
RX_NS.HPFEnable = 0;          // High-pass filter disabled

// RX Path - Automatic Level Control (ALC)
RX_AGC.AGC_EN = 1;
RX_AGC.AGCMode = CT_ALC;      // ALC mode (vs. CT_LIMITER)
RX_AGC.ReferenceLvl = 0;      // Target level 0 dBFS
RX_AGC.RatioFormat = 1;       // 8.8 fix point ratio format
RX_AGC.AttackTime = 10;       // 10 ms
RX_AGC.ReleaseTime = 50;      // 50 ms
RX_AGC.Ratio[0..2] = 50 * 256; // Compression ratio
RX_AGC.Threshold[0] = 39;     // Threshold1 in dB
RX_AGC.Threshold[1] = 70;     // Threshold2 in dB
RX_AGC.Threshold[2] = 80;     // Noise gate level in dB
RX_AGC.NoiseFloorAdaptEnable = 1;
RX_AGC.RMSDetectorEnable = 1;
RX_AGC.MaxGainLimit = 30;     // 30 dB max gain

// RX Path - Acoustic Echo Cancellation
RX_AEC.AEC_EN = 1;
RX_AEC.EchoTailLen = 60;      // Echo tail length in ms
RX_AEC.CNGEnable = 1;         // Comfort noise generation enabled
RX_AEC.PPLevel = 4;           // Post-processing level (1-18)
RX_AEC.DTControl = 1;         // Double-talk control type

// TX Path - Noise Suppression
TX_NS.NS_EN = 1;
TX_NS.NSLevel = 10;

// TX Path - Automatic Level Control
TX_AGC.AGC_EN = 1;
TX_AGC.AGCMode = CT_ALC;
TX_AGC.ReferenceLvl = 0;
TX_AGC.RatioFormat = 1;
TX_AGC.AttackTime = 10;
TX_AGC.ReleaseTime = 50;
TX_AGC.Ratio[0..2] = 50 * 256;
TX_AGC.Threshold[0] = 39;
TX_AGC.Threshold[1] = 70;
TX_AGC.Threshold[2] = 80;
TX_AGC.NoiseFloorAdaptEnable = 1;
TX_AGC.RMSDetectorEnable = 1;
TX_AGC.MaxGainLimit = 30;

ASP API Functions:

Function

Description

VQE_SND_init()

Initialize send path VQE (AEC + AGC + NS)

VQE_SND_process()

Process one frame through send path

VQE_SND_destroy()

Destroy send path VQE context

VQE_RCV_NS_init()

Initialize receive path noise suppression

VQE_RCV_NS_process()

Process one frame through receive NS

VQE_RCV_NS_destroy()

Destroy receive path NS context

VQE_RCV_AGC_init()

Initialize receive path AGC

VQE_RCV_AGC_process()

Process one frame through receive AGC

VQE_RCV_AGC_destroy()

Destroy receive path AGC context

ASP Configuration Structures:

typedef struct CTNS_cfg_s {
    int16_t NS_EN;                    // Enable noise suppression
    int16_t NSLevel;                  // Suppression level (0-10)
    int16_t HPFEnable;                // High-pass filter enable
    int16_t NSSlowConvergence;        // Slow convergence time (ms)
    int16_t QuickConvergenceEnable;   // Quick convergence enable
} CTNS_cfg_t;

typedef struct CTAGC_cfg_s {
    int16_t AGC_EN;                   // Enable AGC
    CT_AGC_MODE AGCMode;              // CT_ALC or CT_LIMITER
    int16_t ReferenceLvl;             // Reference level in dB
    int16_t RatioFormat;              // Ratio format (0: integer, 1: 8.8 fix point)
    int16_t AttackTime;               // Attack time in ms
    int16_t ReleaseTime;              // Release time in ms
    int16_t Ratio[3];                 // Compression ratios
    int16_t Threshold[3];             // Thresholds (Threshold1, Threshold2, NoiseGateLvl)
    int16_t KneeWidth;                // Knee width
    int16_t NoiseFloorAdaptEnable;    // Noise floor adaptation enable
    int16_t RMSDetectorEnable;        // RMS detector enable
    int16_t MaxGainLimit;             // Maximum gain limit in dB
} CTAGC_cfg_t;

typedef struct CTAEC_cfg_s {
    int16_t AEC_EN;                   // Enable AEC
    int16_t EchoTailLen;              // Echo tail length in ms
    int16_t CNGEnable;                // Comfort noise generation enable
    int16_t PPLevel;                  // Post-processing level (1-18)
    int16_t DTControl;                // Double-talk control type
    int16_t ConvergenceTime;          // Convergence time
} CTAEC_cfg_t;

Usage:

  1. Run via serial console: audio run <index> (find index with audio list)

  2. The example reads pink noise from SD card (pink_noise.bin) as farend input

  3. Processed output is saved to asp_rx.bin

  4. Test duration is 30 seconds (configurable via AUDIO_TEST_DURATION)

Notes:

  • Requires SD card with pink_noise.bin test file for farend input

  • Uses 16kHz sample rate with 320-sample frames (20ms)

  • Microphone input is processed through AEC, AGC, and NS pipeline

AI Example

Build the NN Video Examples

The NN video examples are built as part of the AP media example application. Before building, enable the required option in menuconfig:

./ameba.py menuconfig

Navigate to:

(Top) > CONFIG VIDEO SOFTWARE > BUILD AP MEDIA EXAMPLE

Enable this option. It will automatically enable the multimedia framework modules required by the NN video examples, including video_module, vipnn_module, uvcd_module / rtsp2_module when available, array_module, and facerecog_module.

Then run the following command from the SDK root to compile the video AP application:

./ameba.py build --app video_ap

Before running the examples, make sure the model files have been deployed to LittleFS as described in AI NPU User Guide . The SDK NN video examples use vfs:/ model paths by default.

After a successful build, flash the firmware image and the LittleFS model image to the board, then open a serial console. The examples are registered in the video application table and can be listed and run with:

VIDEO list         # list all registered video examples
VIDEO run [n]      # run the example at index n (default: 0)

Note

Only one example can run at a time. Reset the board before switching to a different example.

Note

NN video examples that use RTSP require an active Wi-Fi connection before running. Connect to a router first using the AT command:

AT+WLCONN=ssid,<your_ssid>,pw,<your_password>

Wait for the [$]wifi got ip message confirming the IP address has been assigned, then run the RTSP example. UVC examples require CONFIG_USBD_UVC=1.

Face Detection Example

Source: ap_media_example/video_example_src/mmf2_video_example_nn_face_detection_init.c

Streams H.264 video (V1) to UVC or RTSP while simultaneously running SCRFD face detection on an RGB stream (V5). V1 is the display stream and OSD coordinate space; V5 is the NN input stream.

Data flow:

[Video V1: H.264 1920x1080@30fps] --> SISO --> [UVC / RTSP]
                                               |
                                          (OSD overlay)

[Video V5: RGB  576x320@10fps   ] --> SISO --> [VIPNN: SCRFD face detection]
                                                    |
                                               nn_display_cb()
                                               (draws boxes & landmarks)

Compile-time options:

#define FACEDET_STREAM_WIDTH       1920
#define FACEDET_STREAM_HEIGHT      1080
#define FACEDET_STREAM_FPS         30
#define FACEDET_STREAM_GOP         30
#define FACEDET_STREAM_BPS         2000000

#define FACEDET_NN_WIDTH           576
#define FACEDET_NN_HEIGHT          320
#define FACEDET_NN_FPS             10

#define FACEDET_USE_ARRAY_INPUT    0   /* 0: V5 RGB input, 1: array_module input */

Note

Two entry points are exported: mmf2_video_example_nn_face_detection_uvc_init (default UVC output) and mmf2_video_example_nn_face_detection_rtsp_init (RTSP output). UVC requires CONFIG_USBD_UVC=1; RTSP requires CONFIG_LWIP_LAYER=1.

NN model:

#include "model_scrfd.h"
#define FACEDET_MODEL_OBJ    scrfd
#define FACEDET_MODEL_NAME   "vfs:/scrfd_500m_bnkps_shape576x320.nb"
#define FACEDET_NN_WIDTH     576
#define FACEDET_NN_HEIGHT    320

The NN result callback nn_display_cb() maps SCRFD bounding boxes and 5 facial landmark points back to the V1 stream using NN_OSD_SCALE_LETTERBOX and draws them with nn_osd_group. The callback clears and flushes the group each frame, then calls nn_osd_group_kick() so stale overlays can be managed by the same helper API used by other NN examples.

Object Detection Example

Source: ap_media_example/video_example_src/mmf2_video_example_nn_object_detection_init.c

Runs YOLO/NanoDet object detection on a V5 RGB stream while sending V1 H.264 to UVC or RTSP with OSD overlay. The NN input source is selectable between V5 RGB (OBJDET_USE_ARRAY_INPUT=0) and a synthetic array_module buffer (OBJDET_USE_ARRAY_INPUT=1).

Data flow (V5 input, OBJDET_USE_ARRAY_INPUT=0):

[Video V1: H.264 1920x1080@30fps] --> SISO --> [UVC / RTSP]
                                               |
                                          (OSD overlay)

[Video V5: RGB 416x416@10fps] --> SISO --> [VIPNN: YOLO object detection]
                                                    |
                                               nn_display_cb()

Data flow (Array input, OBJDET_USE_ARRAY_INPUT=1):

[Array: RGB 416x416@10fps] --> SISO --> [VIPNN: YOLO object detection]
                                               |
                                          nn_display_cb()

Compile-time options:

#define OBJDET_STREAM_WIDTH       1920
#define OBJDET_STREAM_HEIGHT      1080
#define OBJDET_STREAM_FPS         30
#define OBJDET_STREAM_GOP         30
#define OBJDET_STREAM_BPS         2000000

#define OBJDET_NN_WIDTH           416
#define OBJDET_NN_HEIGHT          416
#define OBJDET_NN_FPS             10
#define OBJDET_USE_ARRAY_INPUT    0   /* 0: V5 RGB input, 1: array_module input */

NN model selection:

The example supports the object detection models listed in AI NPU User Guide .

#include "model_yolo.h"
#include "model_yolov9.h"
#define OBJDET_MODEL_OBJ    yolov4_tiny
#define OBJDET_MODEL_NAME   "vfs:/yolov4_tiny_asymu8.nb"
#define OBJDET_NN_WIDTH     416
#define OBJDET_NN_HEIGHT    416
  • Model object: Select from available object detection model types (yolov4_tiny, yolov7_tiny, nanodet_plus_m, yolov9_tiny)

  • Model file path: Supports both LittleFS (vfs:/) and SD card (sd:) storage

  • Resolution: OBJDET_NN_WIDTH and OBJDET_NN_HEIGHT describe the incoming RGB frame. If they differ from the model tensor size, the model pre-processing code resizes the full frame before inference.

  • Output sink: Two entry points are exported: mmf2_video_example_nn_object_detection_uvc_init and mmf2_video_example_nn_object_detection_rtsp_init.

NN module configuration:

static nn_data_param_t nn_input_params = {
    .img = {
        .width = OBJDET_NN_WIDTH,
        .height = OBJDET_NN_HEIGHT,
    },
    .codec_type = AV_CODEC_ID_RGB888
};

mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_MODEL, (int)&OBJDET_MODEL_OBJ);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_MODEL_FILE_NAME, (int)nn_model_file_name);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_IN_PARAMS, (int)&nn_input_params);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_DISPPOST, (int)nn_display_cb);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_RES_SIZE, sizeof(objdetect_res_t));
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_RES_MAX_CNT, 32);
mm_module_ctrl(vipnn_ctx, MM_CMD_SET_QUEUE_LEN, 1);
mm_module_ctrl(vipnn_ctx, MM_CMD_INIT_QUEUE_ITEMS, MMQI_FLAG_STATIC);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_APPLY, 0);

The callback draws class labels and confidence scores onto V1 through nn_osd_group. Object detection uses NN_OSD_SCALE_STRETCH when mapping normalised detector coordinates back to the encoded stream.

Face Recognition Example

Source: ap_media_example/video_example_src/mmf2_video_example_nn_face_recognition_init.c

A two-stage cascaded NN pipeline for face detection and recognition with OSD overlay. SCRFD detects faces in V5 RGB frames; MobileFaceNet extracts 128-dim embeddings; facerecog_module identifies faces by comparing against a stored database. Known faces are highlighted in green, unknown in red using a two-layer nn_osd_group.

Data flow:

[Video V1: H.264 1920x1080 @ 30fps] --> SISO --> [UVC / RTSP]
                                                       |
                                                  (OSD overlay)

[Video V5: RGB 576x320 @ 10fps] --> SISO --> [VIPNN: SCRFD face detection]
                                                    |
                                             (cascaded output)
                                                    |
                                             [VIPNN: MobileFaceNet embedding]
                                                    |
                                             [facerecog_module: identity match]
                                                    |
                                          face_recognition_draw_object()

Compile-time options:

#define STREAM_WIDTH    1920   /* encoded output resolution */
#define STREAM_HEIGHT   1080
#define STREAM_FPS      30
#define STREAM_GOP      30
#define STREAM_BPS      2000000

#define NN_WIDTH        576    /* must match SCRFD model input shape */
#define NN_HEIGHT       320
#define NN_FPS          10

NN models used:

#include "model_scrfd.h"
#include "model_mobilefacenet.h"
#define FACEDET_MODEL_OBJ       scrfd
#define FACENET_MODEL_OBJ       mbfacenet_fwfs
#define FACEDET_MODEL_NAME      "vfs:/scrfd_500m_bnkps_shape576x320.nb"
#define FACENET_MODEL_NAME      "vfs:/mobilefacenet_pcqsymi8.nb"
#define NN_WIDTH        576    /* SCRFD input width */
#define NN_HEIGHT       320    /* SCRFD input height */

facerecog_module configuration:

facerecog_ctx = mm_module_open(&facerecog_module);
mm_module_ctrl(facerecog_ctx, CMD_FRC_SET_THRES100,   99);  /* 0.99 similarity threshold */
mm_module_ctrl(facerecog_ctx, CMD_FRC_SET_OSD_DRAW,   (int)face_recognition_draw_object);
mm_module_ctrl(facerecog_ctx, CMD_FRC_LOAD_FEATURES,  0);   /* load from vfs:/face_feature.bin */

OSD group configuration (two layers: green for known, red for unknown):

static const nn_osd_layer_config_t face_recognition_osd_layer[] = {
    { .stream_id = V1_STREAM_ID, .alpha = 128, .y = 0,  .u = 0,  .v = 0   }, /* green */
    { .stream_id = V1_STREAM_ID, .alpha = 128, .y = 76, .u = 85, .v = 255 }, /* red */
};
nn_osd_group_init(&face_recognition_osd, STREAM_WIDTH, STREAM_HEIGHT,
                  face_recognition_osd_layer, 2,
                  FACE_RECOG_OSD_WATCHDOG /* 1 = enable watchdog */);

Runtime shell commands:

The example registers the following console commands for managing the face database:

Command

Description

FREG <name>

Enter register mode. The next detected face is stored under the given name.

FRRM

Return to recognition mode.

FRFL

Load registered face features from vfs:/face_feature.bin.

FRFS

Save registered face features to vfs:/face_feature.bin.

FRFR

Reset registered features in RAM (does not affect the file).

FRLS

List all registered face names in RAM.

FRSC <score>

Set the recognition similarity threshold (integer, divided by 100). Example: FRSC 90 sets threshold to 0.90.

Two entry points are exported:

  • mmf2_video_example_nn_face_recognition_uvc_init - V1 encoded output to UVC (default VIDEO app table entry)

  • mmf2_video_example_nn_face_recognition_rtsp_init - V1 encoded output to RTSP

Note

The SCRFD stage must enable module output (CMD_VIPNN_SET_OUTPUT) and be configured as a data group start (MM_CMD_SET_DATAGROUP, MM_GROUP_START). MobileFaceNet is configured as the group end (MM_GROUP_END) and uses VIPNN_CMODE_ALL_ROI so it runs once per detected face.

Integrated Example

mmf2_video_example_av_rtsp_mp4_nn_init

Source file: ap_media_example/video_example_src/mmf2_video_example_av_rtsp_mp4_nn_init.c

A comprehensive Audio+Video example combining RTSP live streaming, MP4 SD card recording, and NN object detection simultaneously.

Data flow:

[Audio (16kHz AMIC)] --> SISO --> [AAC Encoder] ---+
                                                    |
[Video V1: HEVC 2688x1520@15fps] ---+              +--> MIMO --> [MP4 Recording]
[Video V2: H.264 1280x720@30fps ] --+              |
                                    +-- MIMO ------+--> [RTSP Streaming (V2 + Audio)]

[Video V5: RGB 416x416@10fps    ] --> SISO --> [VIPNN: YOLO object detection]
                                                    |
                                               nn_display_cb()

Compile-time options:

#define AINR_ENA   1  /* AI noise reduction on video channels */
#define AUDIO_ENA  1  /* enable audio for both RTSP and MP4 */
#define NN_ENA     1  /* enable NN object detection on V5 */

Video parameters (V1 for MP4 recording):

video_v1_params.format  = VIDEO_HEVC;
video_v1_params.width   = 2688;
video_v1_params.height  = 1520;
video_v1_params.fps     = 15;
video_v1_params.bps     = 2000000;
video_v1_params.rc_mode = ENC_VBR;

Video parameters (V2 for RTSP streaming):

video_v2_params.format  = VIDEO_H264;
video_v2_params.width   = 1280;
video_v2_params.height  = 720;
video_v2_params.fps     = 30;
video_v2_params.bps     = 500000;

MP4 recording parameters:

mp4_params.record_length    = 10;                  // seconds per file
mp4_params.record_file_num  = 3;                   // rolling 3 files
mp4_params.record_file_name = "AmebaPro_recording";
mp4_params.mp4_audio_format = AUDIO_AAC;

NN model:

#define NN_MODEL_OBJ    yolov4_tiny
#define NN_MODEL_NAME   "vfs:/yolov4_tiny_asymu8.nb"
#define NN_WIDTH        416
#define NN_HEIGHT       416

mmf2_video_example_joint_test_uvc_init / rtsp_init

Source file: ap_media_example/video_example_src/mmf2_video_example_joint_test_init.c

The full joint test example enabling up to 5 simultaneous video channels with optional AI noise reduction, live streaming (UVC or RTSP), MP4 recording, NN detection with OSD overlay, and file saving. This example exercises the full feature set of the AP-side multimedia pipeline.

The SDK now registers two runtime entries for this source file:

mmf2_video_example_joint_test_uvc_init
mmf2_video_example_joint_test_rtsp_init

Both entries share the same internal initialisation path. The selected VIDEO list entry sets the V2 live-stream sink before modules are opened: UVC routes V2 to uvcd_module; RTSP routes V2 to rtsp2_module.

Channel configuration:

Ch

ID

Default resolution / format

Purpose

V1

0

HEVC 2688x1520 @ 15fps

Video recording (MP4)

V2

1

H.264 1280x720 @ 30fps

Live streaming (UVC / RTSP)

V3

2

JPEG 1920x1080 @ 1fps

JPEG snapshot

V4

3

NV12 640x480 @ 10fps

Motion detection

V5

4

RGB 416x416 @ 10fps

NN AI detection

Compile-time options:

#define V1_ENA     1  /* enable V1 channel */
#define V2_ENA     1  /* enable V2 channel */
#define V3_ENA     1  /* enable V3 channel */
#define V4_ENA     1  /* enable V4 channel */
#define V5_ENA     1  /* enable V5 channel */

#define AINR_ENA   1  /* AI noise reduction */
#define MP4_ENA    1  /* MP4 SD card recording on V1 */
#define NN_ENA     0  /* NN object detection on V5 */
#define NN_OSD_ENA 1  /* draw NN detection result on V2 stream */

#define V5_RGB_WIDTH  416
#define V5_RGB_HEIGHT 416

The V2 output sink is no longer selected by the VAPP command. Use VIDEO list to choose either the UVC entry or the RTSP entry before running the example. The UVC entry requires CONFIG_USBD_UVC=1. The RTSP entry requires CONFIG_LWIP_LAYER=1 and waits until Wi-Fi is connected before opening rtsp2_module.

NN model selection:

/* Supported models: yolov4_tiny, yolov7_tiny, nanodet_plus_m */
#define NN_MODEL_OBJ    yolov4_tiny
#define NN_MODEL_NAME   "vfs:/yolov4_tiny_asymu8.nb"  /* or "sd:yolov4_tiny_asymu8.nb" */
#define NN_WIDTH        416
#define NN_HEIGHT       416

If the selected .nb includes a preprocessing or scaling layer, set V5_RGB_WIDTH / V5_RGB_HEIGHT to the actual V5 RGB output size and keep NN_WIDTH / NN_HEIGHT as the logical detector input size. The example passes both values to VIPNN:

static nn_data_param_t nn_input_params = {
    .img = {
        .width = V5_RGB_WIDTH,
        .height = V5_RGB_HEIGHT,
        .model_width = NN_WIDTH,
        .model_height = NN_HEIGHT,
    },
    .codec_type = AV_CODEC_ID_RGB888
};

NN object class filtering:

The example supports filtering detected objects by class ID:

static int desired_class_list[] = {0};  /* 0: person (COCO dataset) */

Only objects matching the class IDs in desired_class_list[] will be displayed. Refer to the COCO dataset class labels for available class IDs.

NN OSD overlay feature:

When NN_OSD_ENA is enabled, the example draws detected object bounding boxes onto the V2 streaming channel using the nn_osd_group API. The group manages one 1bpp bitmap canvas per color layer, assigns hardware OSD indexes automatically, and optionally clears stale overlays via a watchdog when NN callbacks stop.

The OSD group is initialised once (lazy, safe to call every frame) and updated each callback:

static nn_osd_group_t osd_group;
static const nn_osd_layer_config_t osd_layer[] = {
    {
        .stream_id = V2_STREAM_ID,
        .alpha = 128,  /* OSD alpha */
        .y = 0,        /* green Y */
        .u = 0,        /* green U */
        .v = 0,        /* green V */
    },
};

/* In the NN display callback: */
nn_osd_group_init(&osd_group, im_w, im_h, osd_layer, 1, 0 /* no watchdog */);
nn_osd_group_lock(&osd_group);
nn_osd_group_clear(&osd_group);

nn_osd_coord_map_t map;
nn_osd_coord_map_init(&map, im_w, im_h, im, NN_OSD_SCALE_STRETCH);
/* for each detection: */
nn_osd_canvas_t *canvas = nn_osd_group_canvas(&osd_group, NN_OSD_LAYER);
nn_osd_coord_map_bbox(&map, rx0, ry0, rx1, ry1, &xmin, &ymin, &xmax, &ymax);
nn_osd_draw_rect(canvas, xmin, ymin, xmax, ymax, 10);
nn_osd_draw_stringf(canvas, xmin, ymin - 24, 3, "%s %d", label, score);

nn_osd_group_flush(&osd_group);
nn_osd_group_unlock(&osd_group);

Runtime test commands:

The joint test example also registers CmdApVideoAppTest(), exposed through the VAPP console command. These commands are intended for dynamic validation without rebuilding firmware:

Encoder-related runtime commands only cover bitrate/QP, GOP, and force-I-frame. They do not change init-only encoder headers, VUI, HRD, or profile-level configuration. Other VAPP commands, such as OSD, AINR, and NN controls, are example-level runtime controls and do not modify encoder init-only settings.

Command

Description

VAPP bps <ch> <bps>

Change encoder bitrate for channel ch at runtime. Example: VAPP bps 1 4000000 or VAPP bps 1 256000.

VAPP qp <ch> <min_qp> <max_qp>

Change encoder QP range for channel ch. Example: VAPP qp 1 26 28 or VAPP qp 1 45 48.

VAPP gop <ch> <gop>

Change encoder GOP length for channel ch.

VAPP i <ch>

Force an I-frame on channel ch. This is useful after changing bitrate/QP or when a streaming client reconnects.

VAPP osd <ch> <idx> <fmt> <x> <y> <w> <h> <alpha> <Y> <U> <V> <bitmap_addr>

Update hardware OSD bitmap parameters manually. Example: VAPP osd 1 0 0 100 100 320 320 128 200 200 200 0x80000000.

VAPP ainr <0|1>

Disable or enable AI noise reduction globally. Example: VAPP ainr 1.

VAPP nn <0|1>

Pause or resume VIPNN inference. When paused with NN_OSD_ENA=1, the example clears the V2 OSD canvas.

VAPP stop

Stop the joint test example and de-initialise linkers/modules in reverse order.

Note

The VAPP command no longer changes the V2 sink. Stop the current example with VAPP stop if needed, then use VIDEO list and VIDEO run [n] to start the UVC or RTSP joint-test entry.

FileSaver module:

When a channel’s primary output is disabled via its compile-time flag, that channel may fall back to a filesaver_module consumer. V1 uses FileSaver when MP4_ENA=0; V3 (JPEG snapshot) and V4 (NV12 motion detection) always route to FileSaver; V5 routes to FileSaver when NN_ENA=0. V2 no longer has a none / FileSaver sink in this example: the selected entry routes V2 to either UVC or RTSP.