Yolo on batch on axon board

I have converted yolov8n on batch
But when i test it it says segmentation fault core dumped

If you want to do batch processing on CNN models on Axon board, recommended way which is efficient to batch process in multiple core, is to pass batch size in build function of rknn toolkit as: ret = rknn.build(rknn_batch_size=batch_size, ...) and optionally to use multiple core you need to pass core_mask=RKNNLite.NPU_CORE_0_1_2 in init_runtime function while initializing model to use on Axon.
Have you tried in this way? Or if you are tinkering with input shape directly, have you exported the onnx model to intake dynamic input and set the shape in input_size_list parameter in load_onnx function? if you have modified the input shape, it is advisable to pass data_format parameter (value = “nhwc” or “nchw”) in inference function of the RKNNLite instance while performing inference.

I just take the input shape directly
Bit for fixed batch size
Do i need to change it to dynamic and data format i tool nchw
Bathc fps i was getting 16 of 8 batch and detection result was also incorrect

The default value of data_format paramter is “nhwc”, so if you change it, make sure to pass it as parameter, otherwise you will receive wrong output.
you don’t really need to make dynamic to do batch processing, but it input shapes passed to rknn toolkit should match the onnx model input requirements. So it is easy to not temper with input shape configuration and just pass the rknn_batch_size parameter in the build function. Also as I have mentioned above the core to be utilized can passed in init_runtime function as core_mask paramter (use value “RKNNLite.NPU_CORE_0_1_2” to utilize all).

If we run yolov8n on batch 8 on 5 streams
What will be fps

haven’t benchmarked batching on yolov8n myself yet. but I have achieved 90+ net FPS in python and 100+ FPS in c++ using all cores of NPU and batch_size=1. And batching often increases the FPS, so it is safe to say it will be more than 90 fps in Python (that you’re using I assume)

Can you suggest your scripts for conversion and inference.
Also for onnx do i set dynamic shape input or static?

are you performing quantization? if your goal is just to try batching, dynamic shape input is not necessary. And if your goal is to process multiple stream efficiently on mutliple cores of NPU in real time, then I would suggest to use multi-threading in c++ instead of batching in python. You can perform multi-threading in python, but it is not truely multi-threading in python upto python3.12. What are your requirements?
For export and inference a quick guide can be found here. and detailed api document reference can be found here.

My end goal is to python only
But thing is, i need to stream 5 cames and yolo will fetch faces in batch and i want to use retinaface in batch to process faces in batch as there are more than 20 detections happend per frame/cam so batching will help me

ok, then while exporting the model try setting the rknn_batch_size parameter, this should work best. If you are performing quantization, make sure to pass diverse and ample amount of dataset for quantization. For exporting, follow the code below this section. And let me know if you have any issue in exporting the model.

Cool ill check all this thing and will export it
Can you give me inference code as well please
that would be big hel;p

optimized inference code would highly depend on the number of models you are using in the pipeline and their relative inference time. What I can recommend you to choose multi-threading for doing different steps of pipeline in different threads scheduled according to the time requirement of different steps. Only change you can do from NPU side is to choose which cores to use and data_format to pass. For data format I would recommend to pass input in “nhwc” format as read by cv2, and let the rknnlite toolkit to do re-orienting to “nchw” and normalization by mean and variance. Or you try different formats and choose the fastest one.
For choosing of core core_mask parameter can be chosen as RKNNLite.NPU_CORE_0_1_2 for using all cores for same inference task, or RKNNLite.NPU_CORE_AUTO can be chosen in case of multi-threading to auto pick any one available core. A simple code briefly explaining all functions to use is present here. Multi-threading and batch processing would definitely impact the FPS, but that highly depends on your pipeline.

For export, if i send 100 images in datast.txt it takes one by one
Amd says rknn failed error, for batchinf how should i send data on database.txt

at which step it raised error? did it not raise the error there earlier when batching was not used? And what is the complete error?

It was raised before, but i changed jpg to npy for 8batches
But when i refrence it, it give batch FPS 17

E RKNN: [00:44:13.081] rknn_inputs_set, param input size(4915200) < model input size(39321600)
E inference: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2772, in rknn.api.rknn_base.RKNNBase.inference
File “rknn/api/rknn_runtime.py”, line 582, in rknn.api.rknn_runtime.RKNNRuntime.set_inputs
Exception: Set inputs failed. error code: RKNN_ERR_PARAM_INVALID

I ===================== WARN(0) =====================
E rknn-toolkit2 version: 2.3.2
Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2772, in rknn.api.rknn_base.RKNNBase.inference
File “rknn/api/rknn_runtime.py”, line 582, in rknn.api.rknn_runtime.RKNNRuntime.set_inputs
Exception: Set inputs failed. error code: RKNN_ERR_PARAM_INVALID

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/vicharak/Axon-NPU-Guide/examples/yolo_models/test2.py”, line 129, in
outputs = rknn.inference(inputs=[batch_inputs], data_format=‘nchw’)
File “/home/vicharak/miniforge3/envs/rknn/lib/python3.10/site-packages/rknn/api/rknn.py”, line 314, in inference
return self.rknn_base.inference(inputs=inputs, data_format=data_format,
File “rknn/api/rknn_log.py”, line 349, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2772, in rknn.api.rknn_base.RKNNBase.inference
File “rknn/api/rknn_runtime.py”, line 582, in rknn.api.rknn_runtime.RKNNRuntime.set_inputs
Exception: Set inputs failed. error code: RKNN_ERR_PARAM_INVALID

FATAL: exception not rethrown
Aborted (core dumped)

When i do with npy
Its says
/home/vicharak/miniforge3/envs/rknn/lib/python3.10/site-packages/rknn/api/rknn.py:51: UserWarning: pkg_resources is deprecated as an API. See Package Discovery and Resource Access using pkg_resources - setuptools 80.9.0 documentation. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.

self.rknn_base = RKNNBase(cur_path, verbose)

I rknn-toolkit2 version: 2.3.2

I target set by user is: rk3588

Model expects: batch=8, channels=3, HxW=640x640

FPS: 1.12

FPS: 1.12

FPS: 1.12

FPS: 1.11

FPS: 1.12

FPS: 1.12

FPS: 1.12

FPS: 1.11

FPS: 1.12

FPS: 1.12

FPS: 1.13

FPS: 1.14

FPS: 1.14

FPS: 1.14

FPS: 1.14

FPS: 1.14

FPS: 1.15

FPS: 1.16

FPS: 1.16

Code
import cv2
import numpy as np
import onnx
from rknn.api import RKNN
from imutils.video import FPS
import time

-------------------------

Config

-------------------------

RKNN_MODEL = ‘best-new640.rknn’
ONNX_MODEL = ‘best-new640.onnx’
OBJ_THRESH = 0.25
NMS_THRESH = 0.45

CLASSES = (
“laptop”,“Bike”,“Car”,“cattle”,“fire”,“Bus”,“Smartphone”,
“glasses”,“bottle”,“Auto”,“book”,“smoke”,“Number_plate”,
“tractor”,“Truck”,“bag”,“Face”,“pencilcase”,“Person”,
“helmet”,“machine”
)
CLASSES = tuple([c.strip() for c in CLASSES])

-------------------------

Helper functions (letterbox, post_process_rknn, nms_boxes, draw)

Use your existing implementations

-------------------------

-------------------------

Load ONNX to get input shape

-------------------------

onnx_model = onnx.load(ONNX_MODEL)
input_shape = [dim.dim_value for dim in onnx_model.graph.input[0].type.tensor_type.shape.dim]
batch_size, channels, height, width = input_shape
print(f"Model expects: batch={batch_size}, channels={channels}, HxW={height}x{width}")

-------------------------

Initialize RKNN

-------------------------

rknn = RKNN(verbose=True)
rknn.load_rknn(RKNN_MODEL)
rknn.init_runtime(target=‘rk3588’)

-------------------------

Initialize Webcam / RTSP

-------------------------

cap = cv2.VideoCapture(‘rtsp://admin:tdbtech4189@192.168.1.250:554’)

Start FPS tracker

fps_tracker = FPS().start()

while True:
ret, frame = cap.read()
if not ret:
break

# Preprocess
img_resized, _, _ = letterbox(frame, new_shape=(height, width))
img_input = img_resized.astype(np.float32) / 255.0
img_input = np.transpose(img_input, (2,0,1))  # HWC -> CHW
img_input = np.expand_dims(img_input, axis=0)  # batch=1

# replicate to match batch if model requires batch>1
if batch_size > 1:
    img_input = np.vstack([img_input for _ in range(batch_size)])

# RKNN Inference
outputs = rknn.inference(inputs=[img_input], data_format='nchw')
output_single = outputs[0]  # first in batch
num_classes = len(CLASSES)

# Post-process
boxes, class_ids, scores = post_process_rknn(
    output_single, frame.shape, num_classes, img_size=(height,width),
    obj_thresh=OBJ_THRESH, nms_thresh=NMS_THRESH
)

# Draw results
img_out = draw(frame.copy(), boxes, scores, class_ids)

# Update FPS
fps_tracker.update()

# Print current running average FPS per frame
print(f"FPS (per-frame): {fps_tracker.fps():.2f}")

Stop FPS tracker

fps_tracker.stop()
print(f"Average FPS: {fps_tracker.fps():.2f}")

cap.release()
rknn.release()
cv2.destroyAllWindows()

if you are modifying rknn_batch_size only and not the inherent input shape, then don’t pass the input in shape of “nhwc” or “nchw”, instead just the input as single images .npy data, or just pass the list of images in dataset.txt file and let the rknn toolkit handle itself. The batch size doesn’t have effect on quantization as dataset is used for calibration of activations, so don’t pass input in batches while quantizing.

Do i need to export yolo pt weights with batch 8 or 1
You know the end goal is batching

are you using yolo models trained on custom dataset? or pretrained model on coco dataset? if you are using the pretrained one, use the model from rknn_model_zoo. export onnx model with just 1 batch and configure the batch in build function. In dataset.txt, just pass the images path list and it would be enough for any batch size even if number of images is not muliple of batch size.