AI NPU Module
Supported ICs: [RTL8735C]
Overview
The platform integrates a dedicated Neural Processing Unit (NPU) that offloads Deep Neural Network (DNN) computation from the main CPU, enabling real-time AI inference - such as object detection and face recognition - with low power consumption.
At INT8 precision, the NPU delivers approximately 1 TOPS at the 600 MHz operating frequency, backed by 256 KB on-chip SRAM (VIP_SRAM) for intermediate tensors and a dedicated 12 MB DDR region for model weights and I/O buffers (see NN Memory Layout).
For detailed hardware architecture, supported layer types, quantisation formats, and software stack, see NPU Hardware Reference.
Integration Overview
All NN examples follow the same integration pattern. The application opens a V5 video channel as an RGB (or NV12) source, passes frames to the VIPNN module for NPU inference, and receives structured results through a result callback function.
[Video V5: RGB NN_WIDTH x NN_HEIGHT @ fps] --> SISO --> [VIPNN module]
|
nn_display_cb()
Steps to get NN running:
Select a pre-built model from Pre-Built Model Library (or request a custom model - see Custom Model Conversion).
Get the
.nbmodel binary onto the device file system - see Deploying Models to Device.Configure the VIPNN module in your application - see VIPNN Module.
Implement the result callback to act on inference output.
Pre-Built Model Library
The following pre-compiled .nb model binaries are included in the SDK:
component/soc/<soc>/video/nn/app/nn_model/binary/
To use a model, the .nb file must be present on the device file system before running the application. See Deploying Models to Device for how to flash or copy model files to the device.
Object Detection
YOLO Series
YOLO (You Only Look Once) is a widely used real-time object detection algorithm. The following variants are provided:
Model object |
Binary filename |
Input size |
Quantised |
|---|---|---|---|
|
|
416 x 416 |
uint8 |
|
|
576 x 320 |
uint8 |
|
|
640 x 480 |
uint8 |
|
|
416 x 416 |
int16 DFP |
Include the corresponding header and select the model object at compile time:
#include "model_yolo.h" // yolov4_tiny, yolov7_tiny
#include "model_yolov9.h" // yolov9_tiny
#define NN_MODEL_OBJ yolov7_tiny
#define NN_MODEL_NAME "vfs:/yolov7_tiny_576x320_asymu8.nb"
#define NN_WIDTH 576
#define NN_HEIGHT 320
The output of each detected object is stored as objdetect_res_t:
typedef struct objdetect_res_s {
union {
float result[6]; // [class_id, score, top_x, top_y, bot_x, bot_y]
detobj_t res;
};
} objdetect_res_t;
All coordinates are normalised to [0.0, 1.0] relative to the logical detector input size. In the common case this is the network tensor size. If the .nb model includes an in-graph preprocessing or scaling layer, the application can provide model_width / model_height in nn_data_param_t so YOLO post-processing decodes boxes against the logical detector size. The YOLO models are trained on the COCO dataset (80 classes).
For more information: https://github.com/AlexeyAB/darknet
Face Detection
SCRFD
SCRFD (Sample and Computation Redistribution for Face Detection) is a lightweight, high-accuracy face detector that outputs bounding boxes and 5-point facial landmarks.
Model object |
Binary filename |
Input size |
Quantised |
|---|---|---|---|
|
|
576 x 320 |
uint8 |
#include "model_scrfd.h"
#define NN_MODEL_OBJ scrfd
#define NN_MODEL_NAME "vfs:/scrfd_500m_bnkps_shape576x320.nb"
#define NN_WIDTH 576
#define NN_HEIGHT 320
The detection result is stored as facedetect_res_t:
typedef struct facedetect_res_s {
union {
float result[6]; // [class_id, score, top_x, top_y, bot_x, bot_y]
detobj_t res;
};
landmark_t landmark; // 5 facial landmark points (x, y) normalised to [0.0, 1.0]
} facedetect_res_t;
For more information: https://github.com/deepinsight/insightface/tree/master/detection/scrfd
Face Recognition
MobileFaceNet
MobileFaceNet is a compact face recognition model trained with ArcFace (Additive Angular Margin Loss). It takes a cropped and aligned face image and outputs a 128-dimensional feature embedding for identity matching.
MobileFaceNet is typically used together with SCRFD in a cascaded detect-then-recognise pipeline: SCRFD first detects and localises faces in the full frame, then MobileFaceNet extracts an embedding from each cropped face region for comparison against a stored identity database. See Cascaded Mode for the VIPNN configuration.
Model object |
Binary filename |
Input size |
Quantised |
|---|---|---|---|
|
|
112 x 112 |
int8 sym |
#include "model_mobilefacenet.h"
The recognition result is stored as face_feature_res_t:
#define MAX_FACE_FEATURE_DIM 128
typedef struct face_feature_res_s {
union {
float result[6];
detobj_t res;
};
float feature[MAX_FACE_FEATURE_DIM]; // 128-dim face embedding
} face_feature_res_t;
For more information: https://github.com/deepinsight/insightface/tree/master/recognition
Model Memory and File Size Reference
The following table lists the memory footprint of each SDK model. The DDR memory column represents the NPU runtime memory (model weights + I/O tensor buffers). All models fit within the 12 MB NN_DDR window.
Category |
Model binary |
Input size |
Quantised |
DDR memory usage |
File size |
|---|---|---|---|---|---|
Object detection |
|
416 x 416 |
uint8 |
6.51 MB |
3.59 MB |
Object detection |
|
576 x 320 |
uint8 |
7.25 MB |
3.96 MB |
Object detection |
|
640 x 480 |
uint8 |
10.13 MB |
3.83 MB |
Object detection |
|
416 x 416 |
int16 DFP |
10.22 MB |
4.73 MB |
Face detection |
|
576 x 320 |
uint8 |
2.28 MB |
0.75 MB |
Face recognition |
|
112 x 112 |
int8 sym |
2.06 MB |
1.40 MB |
Note
When running two models simultaneously (e.g. SCRFD + MobileFaceNet for detect-then-recognise), ensure the combined DDR memory usage does not exceed the 12 MB NN_DDR budget. The SCRFD + MobileFaceNet combination uses approximately 4.3 MB in total.
Deploying Models to Device
The VIPNN module loads model binaries at runtime using a path prefix:
vfs:/- reads from the internal LittleFS flash partition (VFS1)sd:/- reads from an SD card
For quick prototyping, copying the .nb file to the root of an SD card and setting sd:/model.nb as the model path requires no additional build steps. For production or devices without an SD card slot, use the vfs: path described below.
Flashing Models to LittleFS (VFS1)
VFS1 is the LittleFS flash partition defined in the flash layout (component/soc/usrcfg/<soc>/ameba_flashcfg.c). The current SDK stores Wi-Fi, BT, and NN data in this single LittleFS region because only one LittleFS flash region is supported at runtime:
{VFS1, 0x088A3000, 0x08EA2FFF} /* VFS region 1: wifi/BT/NN data (6 MB) */
{VFS2, 0xFFFFFFFF, 0xFFFFFFFF}
Note
The address range shown above is for reference only. Always verify the actual VFS1 partition address and size in component/soc/usrcfg/<soc>/ameba_flashcfg.c before flashing, as the layout may differ depending on your firmware configuration.
Step 1 - Prepare the model directory
Create a local directory under tools/littlefs/linux and copy the model files used by the SDK NN video examples into it:
cd tools/littlefs/linux
mkdir -p nn_model
cp ../../../component/soc/<soc>/video/nn/app/nn_model/binary/mobilefacenet_pcqsymi8.nb nn_model/
cp ../../../component/soc/<soc>/video/nn/app/nn_model/binary/scrfd_500m_bnkps_shape576x320.nb nn_model/
cp ../../../component/soc/<soc>/video/nn/app/nn_model/binary/yolov4_tiny_asymu8.nb nn_model/
This packs the three common example models into one LittleFS image:
mobilefacenet_pcqsymi8.nbfor MobileFaceNet face embeddingscrfd_500m_bnkps_shape576x320.nbfor SCRFD face detectionyolov4_tiny_asymu8.nbfor YOLO object detection
Step 2 - Build the LittleFS image
Run the following command from tools/littlefs/linux:
./mklittlefs -b 4096 -p 4096 -s 0x600000 -c nn_model/ nn_model_lfs.bin
After the image is generated, list the files inside the LittleFS image to verify that the expected model files were packed:
./mklittlefs -b 4096 -p 4096 -s 0x600000 -l nn_model_lfs.bin
Example output:
1466168 /mobilefacenet_pcqsymi8.nb Mon Nov 17 08:57:48 2025
787568 /scrfd_500m_bnkps_shape576x320.nb Mon Nov 17 08:57:48 2025
3763536 /yolov4_tiny_asymu8.nb Mon Nov 17 08:57:48 2025
Option |
Description |
|---|---|
|
Block size in bytes (matches the flash erase block size) |
|
Page size in bytes |
|
Image size - must match the VFS1 partition size (6 MB) |
|
Input directory to pack |
|
Output LittleFS image file |
Step 3 - Flash the image
Flash nn_model_lfs.bin to the VFS1 start address (0x088A3000) using the image download tool. After a successful flash, the model files are accessible at runtime via the vfs: prefix:
#define FACEDET_MODEL_NAME "vfs:/scrfd_500m_bnkps_shape576x320.nb"
#define FACENET_MODEL_NAME "vfs:/mobilefacenet_pcqsymi8.nb"
#define OBJDET_MODEL_NAME "vfs:/yolov4_tiny_asymu8.nb"
Note
The total size of all packed files must not exceed the VFS1 partition size (6 MB). The three-model example above is approximately 5.74 MB, which fits in the default 6 MB VFS1 image. Refer to Model Memory and File Size Reference for the file size of each pre-compiled model.
VIPNN Module
The NN MMF module - vipnn - accepts RGB or NV12 frames from the video pipeline, runs inference on the NPU, and delivers structured post-processed results to the application via a callback function.
Pre-processing and post-processing are bundled with each model object (nnmodel_t), so adding a new model requires only providing a new model object - the VIPNN module itself does not need to change.
VIPNN Module Context
The internal context of the VIPNN module:
typedef struct vipnn_ctx_s {
void *parent;
vip_network network; // NPU network handle
vip_buffer_create_params_t vip_param_in[MAX_IO_NUM];
vip_buffer_create_params_t vip_param_out[MAX_IO_NUM];
vip_buffer input_buffers[MAX_IO_NUM];
vip_buffer output_buffers[MAX_IO_NUM];
vipnn_params_t params; // module parameters
vipnn_status_t status;
char network_name[64];
int input_count;
int output_count;
vipnn_preproc_t pre_process; // custom pre-process hook
vipnn_postproc_t post_process; // custom post-process hook
disp_postprcess_t disp_postproc; // result display callback
vipnn_cascaded_mode_t cas_mode;
bool module_out_en;
vipnn_measure_t measure; // inference FPS measurement
} vipnn_ctx_t;
Module Parameters
The vipnn_params_t structure holds the runtime parameters for the module:
typedef struct vipnn_param_s {
char model_file[64]; // model file path on file system (e.g. "vfs:/model.nb")
uint8_t *model_mem; // pointer to model in memory (alternative to file path)
uint32_t model_size; // model size in bytes (when using model_mem)
int fps; // target inference FPS (0 = unlimited)
int out_res_size; // sizeof one result structure
int out_res_max_cnt; // maximum number of results per frame
int save_out_tensor; // set to 1 to dump raw output tensors for offline debugging
nn_data_param_t *in_param; // input image parameters
nnmodel_t *model; // pointer to the model object
} vipnn_params_t;
The image part of nn_data_param_t describes the frame consumed by VIPNN:
typedef struct nn_data_param_s {
union {
struct {
int width, height;
int model_width, model_height; // optional logical detector size
landmarki_t landmark;
} img;
/* audio fields omitted */
};
uint32_t codec_type;
void *priv;
int size_in_byte;
} nn_data_param_t;
Note
Set save_out_tensor = 1 to dump raw NPU output tensors to a file. This is useful when developing or verifying custom post-processing logic on a PC. Disable this flag in production builds.
When model_mem is set (non-NULL), the module loads the model from that memory pointer instead of the file system. This is useful for embedding the model binary directly into firmware rather than storing it in a separate file system partition.
Complete Module Initialisation
static nn_data_param_t nn_input_params = {
.img = {
.width = NN_WIDTH,
.height = NN_HEIGHT,
},
.codec_type = AV_CODEC_ID_RGB888
};
vipnn_ctx = mm_module_open(&vipnn_module);
if (vipnn_ctx) {
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_MODEL, (int)&NN_MODEL_OBJ);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_MODEL_FILE_NAME, (int)nn_model_file_name);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_IN_PARAMS, (int)&nn_input_params);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_DISPPOST, (int)nn_display_cb);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_RES_SIZE, sizeof(objdetect_res_t));
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_RES_MAX_CNT, 32);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_APPLY, 0);
}
Setting the Input Image Parameters
Use CMD_VIPNN_SET_IN_PARAMS to describe the input frame passed to the VIPNN module:
nn_data_param_t nn_input_params = {
.img = {
.width = NN_WIDTH, // incoming RGB/NV12 frame width
.height = NN_HEIGHT, // incoming RGB/NV12 frame height
},
.codec_type = AV_CODEC_ID_RGB888 // or AV_CODEC_ID_NV12
};
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_IN_PARAMS, (int)&nn_input_params);
For models that include an in-graph preprocessing or scaling layer, the incoming V5 RGB frame size can differ from the logical detector input size. In that case, keep width / height equal to the real input frame and set model_width / model_height to the detector size used by post-processing:
#define V5_RGB_WIDTH 1280
#define V5_RGB_HEIGHT 720
#define NN_WIDTH 416
#define NN_HEIGHT 416
nn_data_param_t nn_input_params = {
.img = {
.width = V5_RGB_WIDTH,
.height = V5_RGB_HEIGHT,
.model_width = NN_WIDTH,
.model_height = NN_HEIGHT,
},
.codec_type = AV_CODEC_ID_RGB888
};
Note
The codec_type must match the output format of the upstream V5 video module. Use VIDEO_RGB + AV_CODEC_ID_RGB888 for models that require an RGB input. width and height describe the full incoming frame; ROI is no longer configured in nn_data_param_t. If the frame size differs from the network tensor size and the model does not contain its own preprocessing layer, the model preprocessing code resizes the full frame before inference.
Setting the NN Model
Each supported model is represented by an nnmodel_t object that bundles the model binary path, pre-processing, and post-processing functions together.
#include "model_yolo.h"
#define NN_MODEL_OBJ yolov4_tiny
#define NN_MODEL_NAME "vfs:/yolov4_tiny_asymu8.nb"
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_MODEL, (int)&NN_MODEL_OBJ);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_MODEL_FILE_NAME, (int)NN_MODEL_NAME);
Setting the Result Callback
Register a callback with CMD_VIPNN_SET_DISPPOST to receive inference results after each frame. The callback runs in the VIPNN task context - keep it short and non-blocking:
static void nn_display_cb(void *p, void *img_param)
{
vipnn_out_buf_t *out = (vipnn_out_buf_t *)p;
objdetect_res_t *res = (objdetect_res_t *)&out->res[0];
int obj_num = out->res_cnt;
for (int i = 0; i < obj_num; i++) {
RTK_LOGI(TAG, "class=%d score=%.2f [%.2f %.2f %.2f %.2f]\r\n",
(int)res[i].result[0], res[i].result[1],
res[i].result[2], res[i].result[3],
res[i].result[4], res[i].result[5]);
}
}
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_DISPPOST, (int)nn_display_cb);
Setting Detection Thresholds
For object detection and face detection models, two post-processing thresholds control result filtering:
static float nn_confidence_thresh = 0.5; // minimum score to keep a detection
static float nn_nms_thresh = 0.3; // IoU threshold for NMS suppression
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_CONFIDENCE_THRES, (int)&nn_confidence_thresh);
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_NMS_THRES, (int)&nn_nms_thresh);
Increasing nn_confidence_thresh reduces false positives but may cause low-confidence detections to be dropped. Increasing nn_nms_thresh allows detections with higher bounding-box overlap to coexist.
Filtering by Class ID
Use CMD_VIPNN_SET_DESIRED_CLASS to restrict output to a specific object class ID. This is useful when running a multi-class model (such as YOLO trained on COCO) but the application only needs one class - for example, detecting only people (class 0 in COCO):
static int desired_class = 0; // 0 = person in COCO
mm_module_ctrl(vipnn_ctx, CMD_VIPNN_SET_DESIRED_CLASS, (int)&desired_class);
Cascaded Mode
Cascaded mode connects two VIPNN module instances in series. The result of a first-stage model (e.g. SCRFD face detection) is passed directly as input to a second-stage model (e.g. MobileFaceNet face recognition), enabling a detect-then-recognise pipeline without writing custom glue code between the two stages.
Set cas_mode on the downstream VIPNN module to enable it as a cascaded consumer. The upstream frame size and optional model_width / model_height are propagated to the cascaded input. MobileFaceNet now derives its face crop ROI from the previous SCRFD detection result inside model preprocessing, while landmarks are carried in nn_data_param_t.img.landmark for face alignment. Refer to the face recognition example in the SDK for the complete two-module setup.
The SDK face recognition example uses these additional VIPNN controls:
mm_module_ctrl(facedet_ctx, CMD_VIPNN_SET_OUTPUT, 1);
mm_module_ctrl(facedet_ctx, MM_CMD_SET_DATAGROUP, MM_GROUP_START);
mm_module_ctrl(facenet_ctx, CMD_VIPNN_SET_CASCADE, VIPNN_CMODE_ALL_ROI);
mm_module_ctrl(facenet_ctx, CMD_VIPNN_SET_OUTPUT, 1);
mm_module_ctrl(facenet_ctx, MM_CMD_SET_DATAGROUP, MM_GROUP_END);
CMD_VIPNN_SET_OUTPUT lets the first-stage result continue downstream. VIPNN_CMODE_ALL_ROI runs MobileFaceNet once for each detected face ROI instead of only the first ROI.
facerecog Module
facerecog_module consumes MobileFaceNet feature results, compares them with a registered identity database, and calls an application-provided draw callback with names and bounding boxes. It is compiled into the MMF module list as module_facerecog.c and is used by mmf2_video_example_nn_face_recognition_init.c.
The module stores up to MAX_FRC_REG_NUM (20) registered identities in RAM. CMD_FRC_SAVE_FEATURES writes them to vfs:/face_feature.bin with a CRC; CMD_FRC_LOAD_FEATURES reloads the file at runtime.
facerecog_ctx = mm_module_open(&facerecog_module);
mm_module_ctrl(facerecog_ctx, CMD_FRC_SET_THRES100, 99);
mm_module_ctrl(facerecog_ctx, CMD_FRC_SET_OSD_DRAW, (int)face_recognition_draw_object);
mm_module_ctrl(facerecog_ctx, CMD_FRC_LOAD_FEATURES, 0);
Command |
Description |
|---|---|
|
Set similarity threshold as an integer percentage. |
|
Register the draw callback that receives |
|
Register the next single detected face under the supplied name. |
|
Return to recognition mode. |
|
Load registered features from |
|
Save registered features to |
|
Clear registered features in RAM. |
|
Print registered identity names. |
Module Command Reference
Command |
Description |
|---|---|
|
Set the model object ( |
|
Set the model binary file path string (e.g. |
|
Set input frame descriptor ( |
|
Register result callback ( |
|
Set sizeof one result structure |
|
Set maximum number of results per frame |
|
Set detection confidence threshold ( |
|
Set NMS IoU threshold ( |
|
Filter output to a specific object class ID ( |
|
Enable module output so downstream MMF modules can consume VIPNN results |
|
Select normal or raw VIPNN output |
|
Enable cascaded mode ( |
|
Store output tensors for debugging custom post-processing |
|
Use an application-provided output buffer |
|
Apply configuration and start the VIPNN module |
For NN media example usage, see Media Example .
NN Memory Layout
The NPU uses a dedicated region in DDR for model weights, input/output tensors, and intermediate computation buffers. The NN DDR window is defined in the linker script ameba_layout.ld:
NN_DDR (rwx) : ORIGIN = 0x87400000, LENGTH = 12M
Address range: 0x87400000 - 0x87FFFFFF (12 MB)
The complete DDR memory map for reference:
Region |
Base |
Size |
Usage |
|---|---|---|---|
|
0x80420000 |
~28 MB |
Application code, heap, stack |
|
0x82000000 |
32 MB |
Encoder (H.264/HEVC) working memory |
|
0x84000000 |
48 MB |
Video processor (ISP) working memory |
|
0x87000000 |
4 MB |
Tile scaler / graphics engine |
|
0x87400000 |
12 MB |
NPU model + tensor buffers |
NPU Hardware Reference
Compute Performance
Data type |
MACs / cycle |
Notes |
|---|---|---|
INT8 |
768 MACs -> ~1 TOPS |
Default; best throughput |
INT16 (DFP) |
192 MACs |
Higher numerical precision |
FP16 / BF16 |
384 MACs |
Floating-point; used for PPU-side layers |
NPU Architecture
The NPU consists of three compute subsystems relevant to software developers:
- Neural Network Engine (NNE)
A parallel MAC array with multiple convolution cores responsible for convolution, depthwise convolution, and GEMM (fully-connected) operations. This is the primary accelerator for standard DNN layers. Supports INT8, INT16, FP16, and BF16.
- Parallel Processing Unit (PPU)
A SIMD programmable execution unit that handles:
Pre- and post-processing kernels (OpenCL / OpenVX)
Custom NN layers not natively supported by the NNE
Activation, normalisation, reshape, and other lightweight operators
IEEE 32-bit floating-point pipeline
- Vision Engine (EVIS)
Hardware-accelerated image processing primitives: 3x3 filtering, bilinear interpolation (Lerp), histogram, packed image load/store, and dot products. Used by the runtime driver for input format conversion.
The NPU communicates with the SoC via an AXI bus and supports virtual memory with 32-bit physical addressing.
Supported Quantisation Formats
Format |
Description |
When to use |
|---|---|---|
|
Asymmetric unsigned INT8 |
Maximum throughput; default for most models |
|
Symmetric INT16 (Dynamic Fixed Point) |
Higher accuracy requirements |
|
IEEE 16-bit floating point |
Mixed-precision or PPU-executed layers |
|
Brain floating-point 16-bit tensor format |
Floating-point outputs decoded by SDK utils |
Supported Input Image Formats
The NPU natively processes the following image formats without extra conversion cost:
Format |
Description |
|---|---|
|
24-bit RGB888, 3-channel interleaved (BT.709) |
|
YUV 4:2:0 semi-planar - native ISP output format |
|
32-bit RGBX (R, G, B + don’t-care byte) |
|
Unsigned 8-bit single-channel |
|
Signed 16-bit single-channel |
In this video pipeline, the V5 ISP channel outputs RGB888 frames that feed directly into the VIPNN module without conversion.
Supported NN Layer Types
The NNE accelerates all standard DNN layer types. Custom or unsupported layers fall back to the PPU.
Category |
Operations |
|---|---|
Convolution |
CONV3D, CONV2D, CONV1D, DECONVOLUTION, DECONVOLUTION1D, GROUPED_CONV2D, FCL2 |
Activation |
RELU, LEAKY_RELU, PRELU, SIGMOID, TANH, SOFTMAX, LOG_SOFTMAX, SWISH, MISH, ELU, HARD_SIGMOID, CLIP, EXP, LOG, SQRT, RSQRT, ABS, NEG, LINEAR, SIN, ERF |
Elementwise |
ADD, SUBTRACT, MULTIPLY, DIVIDE, MAXIMUM, MINIMUM, POW, FLOORDIV, MATRIXMUL, RELATIONAL_OPS, LOGICAL_OPS, SELECT, ADDN |
Normalisation |
BATCH_NORM, LAYER_NORM, INSTANCE_NORM, GROUP_NORM, L2_NORMALIZE, MOMENTS |
Reshape / Tensor |
CONCAT, SLICE, SPLIT, RESHAPE, SQUEEZE, PERMUTE, PAD, REVERSE, SPACE2DEPTH, DEPTH2SPACE, BATCH2SPACE, SPACE2BATCH, STRIDED_SLICE, REDUCE, ARGMAX, ARGMIN, SHUFFLECHANNEL, RESIZE, EXPAND_BROADCAST |
Recurrent (RNN) |
LSTMUNIT, GRUCELL, GRU, SVDF |
Pooling |
MAX_POOL, AVG_POOL, ROI_POOL, POOLWITHARGMAX, UPSAMPLE |
Miscellaneous |
PROPOSAL, VARIABLE, DROPOUT, STACK, UNSTACK, REORG, GATHER, SCATTER_ND, ONE_HOT, CAST |
Software Stack
The NPU runtime exposes the following APIs to application software:
OpenVX 1.3 + OpenVX 1.2 Neural Network Extension - primary NN inference API
OpenCL 3.0 / 1.2 Full Profile - for custom compute kernels on the PPU
Proprietary Extensions for CNN - vendor extensions for NN acceleration and custom layers
VIP Lite API (
vip_lite.h) - low-level NPU control, used internally byvipnn_module
Neural network models trained in common AI frameworks (Keras, TensorFlow, TFLite, PyTorch, Caffe, ONNX, Darknet) are converted offline to a compiled network binary (.nb) using the Acuity Toolkit, then deployed at runtime from the LittleFS flash partition (vfs:) or SD card (sd:).
Custom Model Conversion
If you have a custom-trained model and need to convert it to the .nb format for use on the NPU, please contact your Realtek representative or sales contact to request further assistance.