Yolo on batch on axon board

Prem · October 22, 2025, 2:48pm

while doing inference on axon board, use rknnlite instead of rknn. rknn api is for converting the model and simulating the model to get output. You may follow this for using rknnlite.api : link.
And the FPS you are calculating is batch FPS right? So net FPS should batch_size * FPS?

Knightwolf · October 25, 2025, 6:42am

rknn-toolkit2 version: 2.3.2
I Loading : 0%| | 0/I Loading : 100%|██████████████████████████████████████████████| 127/127 [00:00<00:00, 25879.44it/s]
E load_onnx: The input shape [‘batch’, 3, ‘height’, ‘width’] of ‘images’ is not support!
Please set the ‘inputs’ / ‘input_size_list’ parameters of ‘rknn.load_onnx’, or set the ‘dyanmic_input’ parameter of ‘rknn.config’ to fix the input shape!
I ===================== WARN(0) =====================
E rknn-toolkit2 version: 2.3.2
E load_onnx: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 1579, in rknn.api.rknn_base.RKNNBase.load_onnx
File “rknn/api/rknn_base.py”, line 708, in rknn.api.rknn_base.RKNNBase._create_ir_and_inputs_meta
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input shape [‘batch’, 3, ‘height’, ‘width’] of ‘images’ is not support!
Please set the ‘inputs’ / ‘input_size_list’ parameters of ‘rknn.load_onnx’, or set the ‘dyanmic_input’ parameter of ‘rknn.config’ to fix the input shape!

I ===================== WARN(0) =====================
E rknn-toolkit2 version: 2.3.2
Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 1579, in rknn.api.rknn_base.RKNNBase.load_onnx
File “rknn/api/rknn_base.py”, line 708, in rknn.api.rknn_base.RKNNBase._create_ir_and_inputs_meta
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input shape [‘batch’, 3, ‘height’, ‘width’] of ‘images’ is not support!
Please set the ‘inputs’ / ‘input_size_list’ parameters of ‘rknn.load_onnx’, or set the ‘dyanmic_input’ parameter of ‘rknn.config’ to fix the input shape!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/vicharak/conv.py”, line 21, in
ret = rknn.load_onnx(model=onnx_model_path)
File “/home/vicharak/miniforge3/envs/rknn/lib/python3.10/site-packages/rknn/api/rknn.py”, line 168, in load_onnx
return self.rknn_base.load_onnx(model, inputs, input_size_list, input_initial_val, outputs)
File “rknn/api/rknn_log.py”, line 349, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 1579, in rknn.api.rknn_base.RKNNBase.load_onnx
File “rknn/api/rknn_base.py”, line 708, in rknn.api.rknn_base.RKNNBase._create_ir_and_inputs_meta
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input shape [‘batch’, 3, ‘height’, ‘width’] of ‘images’ is not support!
Please set the ‘inputs’ / ‘input_size_list’ parameters of ‘rknn.load_onnx’, or set the ‘dyanmic_input’ parameter of ‘rknn.config’ to fix the input shape!

please help , when i am converting dynmic input of onnx to rknn for yolov8

Knightwolf · October 25, 2025, 6:54am

when i give mdel input of batch 1

(rknn) vicharak@vicharak:~$ python3 conv.py
/home/vicharak/miniforge3/envs/rknn/lib/python3.10/site-packages/rknn/api/rknn.py:51: UserWarning: pkg_resources is deprecated as an API. See Package Discovery and Resource Access using pkg_resources - setuptools 80.9.0 documentation. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
self.rknn_base = RKNNBase(cur_path, verbose)
I rknn-toolkit2 version: 2.3.2
I Loading : 0%| | 0/I Loading : 100%|██████████████████████████████████████████████| 127/127 [00:00<00:00, 27750.80it/s]
E build: The ‘rknn_batch_size’ is conflict with model input!
Build model failed

code :
from rknn.api import RKNN

onnx_model_path = ‘best-new640.onnx’
rknn_model_path = ‘best_new640+.rknn’

Initialize RKNN

rknn = RKNN()

Mean and std — replace these with your model’s preprocessing values

mu1, mu2, mu3 = 0, 0, 0
del1, del2, del3 = 1, 1, 1

Configure RKNN

rknn.config(
mean_values=[[mu1, mu2, mu3]],
std_values=[[del1, del2, del3]],
target_platform=‘rk3588’
)

Load ONNX model

ret = rknn.load_onnx(model=onnx_model_path)
if ret != 0:
print(‘ Could not load ONNX model’)
exit(ret)

Quantization & batching setup

quantization_flag = True
dataset_path = ‘dataset.txt’ # Only required if quantization is True
batch_size = 8
auto_hybrid_flag = True # Enable hybrid quantization (optional)

Build RKNN model

ret = rknn.build(
do_quantization=quantization_flag,
dataset=dataset_path if quantization_flag else None,
rknn_batch_size=batch_size,
auto_hybrid=auto_hybrid_flag
)
if ret != 0:
print(‘ Build model failed’)
exit(ret)
else:
print(‘ Build model done’)

Export RKNN model

ret = rknn.export_rknn(rknn_model_path)
if ret != 0:
print(‘ Export model failed’)
exit(ret)
else:
print(f’ RKNN model exported: {rknn_model_path}')

Knightwolf · October 25, 2025, 7:01am

when i set batchsize 1
it gives me thids
I OpFusing 0: 56%|██████████████████████████▎ | 56/100 [00:0I OpFusing 0: 58%|███████████████████████████▎ | 58/100 [00:0I OpFusing 0: 60%|████████████████████████████▏ | 60/100 [00:0I OpFusing 0: 62%|█████████████████████████████▏ | 62/100 [00:0I OpFusing 0: 64%|██████████████████████████████ | 64/100 [00:0I OpFusing 0: 65%|██████████████████████████████▌ | 65/100 [00:0I OpFusing 0: 66%|███████████████████████████████ | 66/100 [00:0I OpFusing 0: 68%|███████████████████████████████▉ | 68/100 [00:0I OpFusing 0: 70%|████████████████████████████████▉ | 70/100 [00:0I OpFusing 0: 73%|██████████████████████████████████▎ | 73/100 [00:0I OpFusing 0: 74%|██████████████████████████████████▊ | 74/100 [00:0I OpFusing 0: 76%|███████████████████████████████████▋ | 76/100 [00:0I OpFusing 0: 78%|████████████████████████████████████▋ | 78/100 [00:0I OpFusing 0: 81%|██████████████████████████████████████ | 81/100 [00:0I OpFusing 0: 83%|███████████████████████████████████████ | 83/100 [00:0I OpFusing 0: 85%|███████████████████████████████████████▉ | 85/100 [00:0I OpFusing 0: 87%|████████████████████████████████████████▉ | 87/100 [00:0I OpFusing 0: 89%|█████████████████████████████████████████▊ | 89/100 [00:0I OpFusing 0: 92%|███████████████████████████████████████████▏ | 92/100 [00:0I OpFusing 0: 94%|████████████████████████████████████████████▏ | 94/100 [00:0I OpFusing 0: 100%|██████████████████████████████████████████████| 100/100 [00:00<00:00, 318.28it/s]
I OpFusing 1 : 0%| | 0/I OpFusing 1 : 89%|████████████████████████████████████████▉ | 89/100 [00:0I OpFusing 1 : 92%|██████████████████████████████████████████▎ | 92/100 [00:0I OpFusing 1 : 92%|██████████████████████████████████████████▎ | 92/100 [00:0I OpFusing 1 : 93%|██████████████████████████████████████████▊ | 93/100 [00:0I OpFusing 1 : 94%|███████████████████████████████████████████▏ | 94/100 [00:0I OpFusing 1 : 94%|███████████████████████████████████████████▏ | 94/100 [00:0I OpFusing 1 : 96%|████████████████████████████████████████████▏ | 96/100 [00:0I OpFusing 1 : 95%|███████████████████████████████████████████▋ | 95/100 [00:0I OpFusing 1 : 96%|████████████████████████████████████████████▏ | 96/100 [00:0I OpFusing 1 : 97%|████████████████████████████████████████████▌ | 97/100 [00:0I OpFusing 1 : 97%|████████████████████████████████████████████▌ | 97/100 [00:0I OpFusing 1 : 98%|█████████████████████████████████████████████ | 98/100 [00:0I OpFusing 1 : 91%|█████████████████████████████████████████▊ | 91/100 [00:0I OpFusing 1 : 95%|███████████████████████████████████████████▋ | 95/100 [00:0I OpFusing 1 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 152.47it/s]
I OpFusing 0 : 0%| | 0/I OpFusing 0 : 78%|███████████████████████████████████▉ | 78/100 [00:0I OpFusing 0 : 79%|████████████████████████████████████▎ | 79/100 [00:0I OpFusing 0 : 82%|█████████████████████████████████████▋ | 82/100 [00:0I OpFusing 0 : 83%|██████████████████████████████████████▏ | 83/100 [00:0I OpFusing 0 : 85%|███████████████████████████████████████ | 85/100 [00:0I OpFusing 0 : 86%|███████████████████████████████████████▌ | 86/100 [00:0I OpFusing 0 : 88%|████████████████████████████████████████▍ | 88/100 [00:0I OpFusing 0 : 91%|█████████████████████████████████████████▊ | 91/100 [00:0I OpFusing 0 : 92%|██████████████████████████████████████████▎ | 92/100 [00:0I OpFusing 0 : 94%|███████████████████████████████████████████▏ | 94/100 [00:0I OpFusing 0 : 96%|████████████████████████████████████████████▏ | 96/100 [00:0I OpFusing 0 : 97%|████████████████████████████████████████████▌ | 97/100 [00:0I OpFusing 0 : 98%|█████████████████████████████████████████████ | 98/100 [00:0I OpFusing 0 : 91%|█████████████████████████████████████████▊ | 91/100 [00:0I OpFusing 0 : 94%|███████████████████████████████████████████▏ | 94/100 [00:0I OpFusing 0 : 97%|████████████████████████████████████████████▌ | 97/100 [00:0I OpFusing 0 : 92%|███████████████████████████████████████████▏ | 92/100 [00:I OpFusing 0 : 97%|█████████████████████████████████████████████▌ | 97/100 [00:I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 89.45it/s]
I OpFusing 1 : 0%| | 0/I OpFusing 1 : 92%|███████████████████████████████████████████▏ | 92/100 [00:I OpFusing 1 : 97%|█████████████████████████████████████████████▌ | 97/100 [00:I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 78.69it/s]
I OpFusing 0 : 0%| | 0/I OpFusing 0 : 93%|███████████████████████████████████████████▋ | 93/100 [00:I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 67.30it/s]
I OpFusing 1 : 0%| | 0/I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 64.93it/s]
I OpFusing 2 : 0%| | 0/I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:06<00:00, 14.96it/s]
I GraphPreparing : 0%| | 0/I GraphPreparing : 57%|███████████████████████▌ | 101/176 [00:0I GraphPreparing : 100%|█████████████████████████████████████████| 176/176 [00:00<00:00, 931.98it/s]
I Quantizating 1/12: 0%| | 0/176 [00:00<?, ?it/s]E build: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!
I ===================== WARN(0) =====================
E rknn-toolkit2 version: 2.3.2
I Quantizating 1/12: 0%| | 0/176 [00:00<?, ?it/s]
E build: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2041, in rknn.api.rknn_base.RKNNBase.build
File “rknn/api/rknn_base.py”, line 175, in rknn.api.rknn_base.RKNNBase._quantize
File “rknn/api/quantizer.py”, line 1419, in rknn.api.quantizer.Quantizer.run
File “rknn/api/quantizer.py”, line 900, in rknn.api.quantizer.Quantizer._get_layer_range
File “rknn/api/rknn_utils.py”, line 328, in rknn.api.rknn_utils.get_input_img
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!

I ===================== WARN(0) =====================
E rknn-toolkit2 version: 2.3.2
Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2041, in rknn.api.rknn_base.RKNNBase.build
File “rknn/api/rknn_base.py”, line 175, in rknn.api.rknn_base.RKNNBase._quantize
File “rknn/api/quantizer.py”, line 1419, in rknn.api.quantizer.Quantizer.run
File “rknn/api/quantizer.py”, line 900, in rknn.api.quantizer.Quantizer._get_layer_range
File “rknn/api/rknn_utils.py”, line 328, in rknn.api.rknn_utils.get_input_img
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/vicharak/conv.py”, line 33, in
ret = rknn.build(
File “/home/vicharak/miniforge3/envs/rknn/lib/python3.10/site-packages/rknn/api/rknn.py”, line 198, in build
return self.rknn_base.build(do_quantization=do_quantization, dataset=dataset, expand_batch_size=rknn_batch_size, auto_hybrid=auto_hybrid)
File “rknn/api/rknn_log.py”, line 349, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2041, in rknn.api.rknn_base.RKNNBase.build
File “rknn/api/rknn_base.py”, line 175, in rknn.api.rknn_base.RKNNBase._quantize
File “rknn/api/quantizer.py”, line 1419, in rknn.api.quantizer.Quantizer.run
File “rknn/api/quantizer.py”, line 900, in rknn.api.quantizer.Quantizer._get_layer_range
File “rknn/api/rknn_utils.py”, line 328, in rknn.api.rknn_utils.get_input_img
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!

Prem · October 25, 2025, 7:02am

when you are converting a dynamic input onnx model to rknn model, you have to define the shapes that you want. If you want muliple input shapes to be supported you have to set the dynamic_input parameter of rknn.config() function or if you want single input shape then you need to set the shape inputs and input_size_list parameters of rknn.load_onnx() function. The complete documentation explaining api and parameters can be found here.

since you are trying to convert a dynamic shape input without providing the set of input shape to expect, hence it is raising error. Kindly go through the rknn.load_onnx and rknn.config() api documentation link for understanding parameters requirements.

Knightwolf · October 25, 2025, 7:08am

i did and dynamic is enavled but
100 [00:01<?, ?it/s]W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
I OpFusing 1 : 93%|███████████████████████████████████████████▋ | 93/100 [00:01<00:00, 53.10it/s]W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 52.05it/s]
I OpFusing 0 : 0%| | 0/I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:02<00:00, 49.71it/s]
I OpFusing 1 : 0%| | 0/100 [00:02<?, ?it/s]W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:02<00:00, 48.34it/s]
I OpFusing 2 : 0%| | 0/I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:02<00:00, 48.03it/s]
I GraphPreparing : 0%| | 0/I GraphPreparing : 100%|████████████████████████████████████████| 176/176 [00:00<00:00, 3163.93it/s]
I Quantizating 1/12: 0%| | 0/176 [00:00<?, ?it/s]E build: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!
W build: ===================== WARN(22) =====================
E rknn-toolkit2 version: 2.3.2
I Quantizating 1/12: 0%| | 0/176 [00:00<?, ?it/s]
E build: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2041, in rknn.api.rknn_base.RKNNBase.build
File “rknn/api/rknn_base.py”, line 175, in rknn.api.rknn_base.RKNNBase._quantize
File “rknn/api/quantizer.py”, line 1419, in rknn.api.quantizer.Quantizer.run
File “rknn/api/quantizer.py”, line 900, in rknn.api.quantizer.Quantizer._get_layer_range
File “rknn/api/rknn_utils.py”, line 328, in rknn.api.rknn_utils.get_input_img
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!

I ===================== WARN(0) =====================
E rknn-toolkit2 version: 2.3.2
Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2041, in rknn.api.rknn_base.RKNNBase.build
File “rknn/api/rknn_base.py”, line 175, in rknn.api.rknn_base.RKNNBase._quantize
File “rknn/api/quantizer.py”, line 1419, in rknn.api.quantizer.Quantizer.run
File “rknn/api/quantizer.py”, line 900, in rknn.api.quantizer.Quantizer._get_layer_range
File “rknn/api/rknn_utils.py”, line 328, in rknn.api.rknn_utils.get_input_img
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/vicharak/conv2.py”, line 33, in
ret = rknn.build(
File “/home/vicharak/miniforge3/envs/rknn/lib/python3.10/site-packages/rknn/api/rknn.py”, line 198, in build
return self.rknn_base.build(do_quantization=do_quantization, dataset=dataset, expand_batch_size=rknn_batch_size, auto_hybrid=auto_hybrid)
File “rknn/api/rknn_log.py”, line 349, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2041, in rknn.api.rknn_base.RKNNBase.build
File “rknn/api/rknn_base.py”, line 175, in rknn.api.rknn_base.RKNNBase._quantize
File “rknn/api/quantizer.py”, line 1419, in rknn.api.quantizer.Quantizer.run
File “rknn/api/quantizer.py”, line 900, in rknn.api.quantizer.Quantizer._get_layer_range
File “rknn/api/rknn_utils.py”, line 328, in rknn.api.rknn_utils.get_input_img
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!

Prem · October 25, 2025, 7:08am

in this I guess you are using statically exported onnx model with batch_size (‘n’) = 8. In that case, in dataset.txt where you provide quantization dataset, you have to provide preprocessed dataset in shape (8, 640, 640, 3). This can be achieved by concatenating 8 (1, 640, 640, 3) shaped image by axis=0 as np.concatenate((img1, ...., img8), axis=0) and saving that as np.save("dataset_i.npy") and passing these paths to dataset_i.npy to dataset.txt for quantization.

Knightwolf · October 25, 2025, 7:15am

its done, but with dynamic input not with giving rknn_batch_size with rknn.buikd
when i do that
from rknn.api import RKNN

Paths

onnx_model_path = ‘best-new6408.onnx’
rknn_model_path = ‘best-new640d.rknn’

Mean and std for your model

mu1, mu2, mu3 = 0, 0, 0 # Replace with your actual values
del1, del2, del3 = 1, 1, 1 # Replace with your actual values

rknn = RKNN()

Configure model for dynamic batch

Here we support batch sizes 1 to 8 for 640x640 input

rknn.config(
mean_values=[[mu1, mu2, mu3]],
std_values=[[del1, del2, del3]],
target_platform=‘rk3588’,
dynamic_input=[[[1, 3, 640, 640]], [[8, 3, 640, 640]]] # list of list of list
)

Load ONNX

ret = rknn.load_onnx(model=onnx_model_path)
if ret != 0:
print(“Failed to load ONNX model”)
exit(ret)

Quantization settings

quantization_flag = True
dataset_path = ‘dataset.txt’ # images for calibration

Build model with quantization

ret = rknn.build(
do_quantization=quantization_flag,
dataset=dataset_path,
rknn_batch_size=8
)
if ret != 0:
print(“Build failed”)
exit(ret)

Export RKNN model

ret = rknn.export_rknn(rknn_model_path)
if ret != 0:
print(“Export failed”)
exit(ret)

print(“RKNN model exported successfully!”)

I rknn-toolkit2 version: 2.3.2
W config: Please make sure the model can be dynamic when enable ‘config.dynamic_input’!
I The ‘dynamic_input’ function has been enabled, the MaxShape is dynamic_input[1] = [[8, 3, 640, 640]]!
The following functions are subject to the MaxShape:
1. The quantified dataset needs to be configured according to MaxShape
2. The eval_perf or eval_memory return the results of MaxShape
I Loading : 0%| | 0/I Loading : 100%|██████████████████████████████████████████████| 127/127 [00:00<00:00, 13154.46it/s]
E build: The ‘rknn_batch_size’ is conflict with model input!
Build failed

Knightwolf · October 25, 2025, 7:32am

does it gove mow FPS on videocaptuere on video than live feed?

Knightwolf · October 25, 2025, 7:52am

import cv2
import numpy as np
import imutils
import time
from rknnlite.api import RKNNLite
from imutils.video import VideoStream

-------------------- Config --------------------

rknn_model_path = ‘best-new640d.rknn’
video_path = ‘rtsp://admin:tdbtech4189@192.168.1.250:554’ # 0 for webcam
BATCH_SIZE = 8
FRAME_WIDTH, FRAME_HEIGHT = 640, 640 # model input
CLASSES = (
“laptop”, “Bike”, “Car”, “cattle”, “fire”, “Bus”, “Smartphone”,
“glasses”, “bottle”, “Auto”, “book”, “smoke”, “Number_plate”,
“tractor”, “Truck”, “bag”, “Face”, “pencilcase”, “Person”,
“helmet”, “machine”
)

-------------------- RKNNLite Setup --------------------

rknn_lite = RKNNLite()
ret = rknn_lite.load_rknn(rknn_model_path)
if ret != 0:
print(“Load RKNN model failed”)
exit(ret)

Use all 3 NPU cores for batch inference

ret = rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2)
if ret != 0:
print(“Init runtime environment failed”)
exit(ret)

-------------------- Video Capture --------------------

cap = VideoStream(video_path).start()

frame_buffer =
frame_display_buffer =

For FPS calculation

fps_display = 0
fps_count = 0
start_time = time.time()

while True:
frame = cap.read()

# Resize frame using imutils
# Resize frame to model input
frame_resized = cv2.resize(frame, (FRAME_WIDTH, FRAME_HEIGHT))  # FRAME_WIDTH=640, FRAME_HEIGHT=640
frame_rgb = cv2.cvtColor(frame_resized, cv2.COLOR_BGR2RGB)


frame_buffer.append(frame_rgb)
frame_display_buffer.append(frame)

# When batch is full, run inference
if len(frame_buffer) == BATCH_SIZE:
    batch_input = np.stack(frame_buffer, axis=0).astype(np.float32)  # nhwc

    outputs = rknn_lite.inference(inputs=[batch_input], data_format="nhwc")

    if outputs is not None:
        batch_out = outputs[0]  # shape: (BATCH_SIZE, num_detections, N)
        for i, dets in enumerate(batch_out):
            for det in dets:
                if len(det) >= 6:
                    x1, y1, x2, y2, score, class_idx = det[:6]
                    class_idx = int(class_idx)
                    if class_idx >= len(CLASSES):
                        continue
                    class_name = CLASSES[class_idx]

                    # Draw on frame
                    cv2.rectangle(frame_display_buffer[i], (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0), 2)
                    cv2.putText(frame_display_buffer[i], f"{class_name}: {score:.2f}", 
                                (int(x1), int(y1)-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    # Show frames with FPS
    for f in frame_display_buffer:
        fps_count += 1
        if fps_count >= BATCH_SIZE:
            end_time = time.time()
            fps_display = fps_count / (end_time - start_time)
            start_time = end_time
            fps_count = 0
        cv2.putText(f, f"FPS: {fps_display:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
        cv2.imshow("RKNNLite Video", f)

    # Clear buffers
    frame_buffer = []
    frame_display_buffer = []

if cv2.waitKey(1) & 0xFF == ord('q'):
    break

Process remaining frames if any

if len(frame_buffer) > 0:
batch_input = np.stack(frame_buffer, axis=0).astype(np.float32)
outputs = rknn_lite.inference(inputs=[batch_input], data_format=“nhwc”)

if outputs is not None:
    batch_out = outputs[0]
    for i, dets in enumerate(batch_out):
        for det in dets:
            if len(det) >= 6:
                x1, y1, x2, y2, score, class_idx = det[:6]
                class_idx = int(class_idx)
                if class_idx >= len(CLASSES):
                    continue
                class_name = CLASSES[class_idx]
                cv2.rectangle(frame_display_buffer[i], (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0), 2)
                cv2.putText(frame_display_buffer[i], f"{class_name}: {score:.2f}", 
                            (int(x1), int(y1)-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
        cv2.imshow("RKNNLite Video", frame_display_buffer[i])
        cv2.waitKey(1)

cap.release()
cv2.destroyAllWindows()
rknn_lite.release()

I AM GETTING fps JUST 16, CAN YU CHECK PLEASE

Prem · October 25, 2025, 8:21am

when you are modifying input shape for batching, you don’t need to provide rknn_batch_size paramter also.

Knightwolf · October 25, 2025, 8:21am

import cv2
import numpy as np
import time
from rknnlite.api import RKNNLite
from imutils.video import VideoStream
from collections import deque

-------------------- Config --------------------

rknn_model_path = ‘best-new640d.rknn’
video_path = ‘rtsp://admin:tdbtech4189@192.168.1.250:554’
BATCH_SIZE = 8
MODEL_W, MODEL_H = 640, 640
CLASSES = (“laptop”, “Bike”, “Car”, “cattle”, “fire”, “Bus”, “Smartphone”,
“glasses”, “bottle”, “Auto”, “book”, “smoke”, “Number_plate”,
“tractor”, “Truck”, “bag”, “Face”, “pencilcase”, “Person”,
“helmet”, “machine”)

-------------------- RKNNLite Setup --------------------

rknn_lite = RKNNLite()
if rknn_lite.load_rknn(rknn_model_path) != 0:
print(“Load RKNN model failed”); exit(1)
if rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) != 0:
print(“Init runtime environment failed”); exit(1)

-------------------- Letterbox Resize --------------------

def letterbox_image(image, target_size=(MODEL_W, MODEL_H)):
h, w = image.shape[:2]
scale = min(target_size[0]/w, target_size[1]/h)
new_w, new_h = int(wscale), int(hscale)
resized = cv2.resize(image, (new_w, new_h))
canvas = np.zeros((target_size[1], target_size[0], 3), dtype=np.uint8)
top = (target_size[1] - new_h) // 2
left = (target_size[0] - new_w) // 2
canvas[top:top+new_h, left:left+new_w] = resized
return canvas, scale, left, top

-------------------- Video Capture --------------------

cap = VideoStream(video_path).start()
frame_buffer =
frame_display_buffer = deque(maxlen=30)
fps_display = 0
fps_count = 0
start_time = time.time()

-------------------- Inference Function --------------------

def run_batch_inference(frames_batch, display_batch, scales_offsets):
global fps_display, fps_count, start_time

# Model expects BGR 0-255, NHWC
batch_input = np.stack(frames_batch, axis=0).astype(np.float32)

outputs = rknn_lite.inference(inputs=[batch_input], data_format="nhwc")
if outputs is None: return

batch_out = outputs[0]  # (BATCH_SIZE, num_detections, N)
for i, dets in enumerate(batch_out):
    scale, dx, dy = scales_offsets[i]
    for det in dets:
        if len(det) >= 6:
            x1, y1, x2, y2, score, class_idx = det[:6]
            class_idx = int(class_idx)
            if class_idx >= len(CLASSES): continue
            class_name = CLASSES[class_idx]

            # Adjust coordinates from letterbox
            x1 = int((x1 - dx) / scale)
            y1 = int((y1 - dy) / scale)
            x2 = int((x2 - dx) / scale)
            y2 = int((y2 - dy) / scale)

            cv2.rectangle(display_batch[i], (x1, y1), (x2, y2), (0,255,0), 2)
            cv2.putText(display_batch[i], f"{class_name}: {score:.2f}",
                        (x1, y1-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2)

# FPS display
for f in display_batch:
    fps_count += 1
    if fps_count >= BATCH_SIZE:
        end_time = time.time()
        fps_display = fps_count / (end_time - start_time)
        start_time = end_time
        fps_count = 0
    cv2.putText(f, f"FPS: {fps_display:.2f}", (10,30),
                cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
    cv2.imshow("RKNNLite 640x640 Video", f)

-------------------- Main Loop --------------------

while True:
frame = cap.read()
if frame is None: continue

# Letterbox resize for model input
frame_resized, scale, dx, dy = letterbox_image(frame, (MODEL_W, MODEL_H))
frame_buffer.append(frame_resized)
frame_display_buffer.append(frame.copy())
if 'scales_offsets_list' not in locals(): scales_offsets_list = deque(maxlen=30)
scales_offsets_list.append((scale, dx, dy))

if len(frame_buffer) == BATCH_SIZE:
    run_batch_inference(
        list(frame_buffer),
        list(frame_display_buffer)[-BATCH_SIZE:],
        list(scales_offsets_list)[-BATCH_SIZE:]
    )
    frame_buffer = []

# Display last few frames smoothly
for f in list(frame_display_buffer)[-BATCH_SIZE:]:
    cv2.imshow("RKNNLite 640x640 Video", f)

if cv2.waitKey(1) & 0xFF == ord('q'): break

Process remaining frames

if len(frame_buffer) > 0:
run_batch_inference(
list(frame_buffer),
list(frame_display_buffer)[-len(frame_buffer):],
list(scales_offsets_list)[-len(frame_buffer):]
)

cap.release()
cv2.destroyAllWindows()
rknn_lite.release()

can you check this
it gives on 17 FPS

Prem · October 25, 2025, 8:27am

does it gove mow FPS on videocaptuere on video than live feed?

what do you mean by this?

Recently I had tried modifying batch_sizes of various types of model to see the change in inference time. What I observed is that for the models whose input shape is lower like 224224, increasing batch_size helps increase in the net FPS upto certain increase in batch_size. But the same is not true for quite large input shape 640640. Maybe this has to do with certain sweet spot of memory bandwidth threshold which first increase the FPS until certain input size or computation per layer, then after a certain threshold it starts decreasing.

Knightwolf · October 25, 2025, 8:30am

ok but after exprtin to rknn accuray is the issue
like misclasssification occutringf

Prem · October 25, 2025, 8:31am

while providing code can please use 3 backtick (```) a line before and a line after code finishes, this formats the code in actual code format.

Knightwolf · October 25, 2025, 8:32am

‘’’
import cv2
import numpy as np
import time
from rknnlite.api import RKNNLite
from imutils.video import VideoStream
from collections import deque

-------------------- Config --------------------

rknn_model_path = ‘best-new640d.rknn’
video_path = ‘rtsp://admin:tdbtech4189@192.168.1.250:554’
BATCH_SIZE = 8
FRAME_WIDTH, FRAME_HEIGHT = 640, 640 # model input
CLASSES = (“laptop”, “Bike”, “Car”, “cattle”, “fire”, “Bus”, “Smartphone”,
“glasses”, “bottle”, “Auto”, “book”, “smoke”, “Number_plate”,
“tractor”, “Truck”, “bag”, “Face”, “pencilcase”, “Person”,
“helmet”, “machine”)

-------------------- RKNNLite Setup --------------------

rknn_lite = RKNNLite()
if rknn_lite.load_rknn(rknn_model_path) != 0:
print(“Load RKNN model failed”); exit(1)
if rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) != 0:
print(“Init runtime environment failed”); exit(1)

-------------------- Video Capture --------------------

cap = VideoStream(video_path).start()
frame_buffer =
frame_display_buffer = deque(maxlen=30)
fps_display = 0
fps_count = 0
start_time = time.time()

def run_batch_inference(frames_batch, display_batch):
global fps_display, start_time, fps_count

# Normalize to 0-1
batch_input = np.stack(frames_batch, axis=0).astype(np.float32) / 255.0
outputs = rknn_lite.inference(inputs=[batch_input], data_format="nhwc")

if outputs is not None:
    batch_out = outputs[0]  # (BATCH_SIZE, num_detections, N)
    for i, dets in enumerate(batch_out):
        for det in dets:
            if len(det) >= 6:
                x1, y1, x2, y2, score, class_idx = det[:6]
                class_idx = int(class_idx)
                if class_idx >= len(CLASSES): continue
                class_name = CLASSES[class_idx]

                # Draw directly on 640x640 resized frame
                x1 = int(x1)
                y1 = int(y1)
                x2 = int(x2)
                y2 = int(y2)
                cv2.rectangle(display_batch[i], (x1, y1), (x2, y2), (0,255,0), 2)
                cv2.putText(display_batch[i], f"{class_name}: {score:.2f}",
                            (x1, y1-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2)

# FPS
for f in display_batch:
    fps_count += 1
    if fps_count >= BATCH_SIZE:
        end_time = time.time()
        fps_display = fps_count / (end_time - start_time)
        start_time = end_time
        fps_count = 0
    cv2.putText(f, f"FPS: {fps_display:.2f}", (10,30),
                cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
    cv2.imshow("RKNNLite 640x640 Video", f)

Main loop

while True:
frame = cap.read()
if frame is None: continue

# Resize frame to 640x640 for the model
frame_resized = cv2.resize(frame, (FRAME_WIDTH, FRAME_HEIGHT))
frame_rgb = cv2.cvtColor(frame_resized, cv2.COLOR_BGR2RGB)

frame_buffer.append(frame_rgb)
frame_display_buffer.append(frame_resized.copy())  # keep a copy for display

if len(frame_buffer) == BATCH_SIZE:
    run_batch_inference(frame_buffer, list(frame_display_buffer)[-BATCH_SIZE:])
    frame_buffer = []

# Always display last few frames for smooth playback
for f in list(frame_display_buffer)[-BATCH_SIZE:]:
    cv2.imshow("RKNNLite 640x640 Video", f)

if cv2.waitKey(1) & 0xFF == ord('q'): break

Process remaining frames

if len(frame_buffer) > 0:
run_batch_inference(frame_buffer, list(frame_display_buffer)[-len(frame_buffer):])

cap.release()
cv2.destroyAllWindows()
rknn_lite.release()
‘’’

Prem · October 25, 2025, 8:42am

few common reasons for inaccuracies are:

the model may expect input in RGB or BGR, but cv2 reads image in BGR format and PIL reads image in RGB format, one must change it according to model in preprocess
model may expect the input to be normalised by certain mean, std or scale, but either these values are neither passed to rknn.config nor used for normalization in preprocessing or these values or these values are both passed to config and used for normalization in preprocessing also.
while passing the input image, sometimes we pass the input in “nchw” format but forget to change the default “nhwc” format which is passed as parameter.

P.S. Please double check spelling and grammar of queries before posting, it is sometimes hard to know what the issue really is.
And for code formatting use 3 backtick (` one in the top left corner below esc button) and not the inverted comma

Knightwolf · October 28, 2025, 9:05am

i managewd to run batch on yolo on rk3588 vicharak board
but issue is it lags until it reaches batch
also int8 was not working preopely so i worked on fp16 and it works
GPS for 8,3,640,640 is arounf 11
but it lags until it receives 8 frames.
what we can do as i want to run 5 cams in parallel in batch detection so process bbox will be effective else there is no point

import os
import cv2
import time
import numpy as np
from rknnlite.api import RKNNLite
from imutils.video import VideoStream
from collections import deque

# ---------------- CONFIG ----------------
MODEL_PATH = "yolo11s_fp16_dynamic.rknn"
RTSP_URL = "rtsp://admin:tdbtech4189@192.168.1.250:554"
IMG_SIZE = 640
CONF_THRESH = 0.30
IOU_THRESH = 0.45
BATCH_SIZE = 8
FRAME_SKIP = 0  # Process every n-th frame for speed

CLASSES = ("person", "bicycle", "car","motorbike ","aeroplane ","bus ","train","truck ","boat","traffic light",
           "fire hydrant","stop sign ","parking meter","bench","bird","cat","dog ","horse ","sheep","cow","elephant",
           "bear","zebra ","giraffe","backpack","umbrella","handbag","tie","suitcase","frisbee","skis","snowboard","sports ball","kite",
           "baseball bat","baseball glove","skateboard","surfboard","tennis racket","bottle","wine glass","cup","fork","knife ",
           "spoon","bowl","banana","apple","sandwich","orange","broccoli","carrot","hot dog","pizza ","donut","cake","chair","sofa",
           "pottedplant","bed","diningtable","toilet ","tvmonitor","laptop	","mouse	","remote ","keyboard ","cell phone","microwave ",
           "oven ","toaster","sink","refrigerator ","book","clock","vase","scissors ","teddy bear ","hair drier", "toothbrush ")


# ---------------- FUNCTIONS ----------------
def preprocess_image(img, img_size=640):
    h, w = img.shape[:2]
    scale = min(img_size / w, img_size / h)
    new_w, new_h = int(w * scale), int(h * scale)
    resized = cv2.resize(img, (new_w, new_h))
    padded = np.full((img_size, img_size, 3), 114, dtype=np.uint8)
    dw, dh = (img_size - new_w) // 2, (img_size - new_h) // 2
    padded[dh:dh + new_h, dw:dw + new_w] = resized
    return padded, scale, dw, dh


def bbox_iou(box1, boxes):
    x1, y1, x2, y2 = box1
    xx1 = np.maximum(x1, boxes[:, 0])
    yy1 = np.maximum(y1, boxes[:, 1])
    xx2 = np.minimum(x2, boxes[:, 2])
    yy2 = np.minimum(y2, boxes[:, 3])
    inter = np.maximum(0, xx2 - xx1) * np.maximum(0, yy2 - yy1)
    area1 = (x2 - x1) * (y2 - y1)
    area2 = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
    union = area1 + area2 - inter
    return inter / (union + 1e-6)


def nms(boxes, scores, iou_thresh):
    x1, y1, x2, y2 = boxes.T
    areas = (x2 - x1) * (y2 - y1)
    order = scores.argsort()[::-1]
    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])
        inter = np.maximum(0, xx2 - xx1) * np.maximum(0, yy2 - yy1)
        iou = inter / (areas[i] + areas[order[1:]] - inter)
        inds = np.where(iou <= iou_thresh)[0]
        order = order[inds + 1]
    return keep



def sigmoid(x):
    return 1 / (1 + np.exp(-x))


def decode_yolo(pred, conf_thresh=0.25, iou_thresh=0.45):
    pred = pred.reshape(84, -1).T  # (8400, 84)

    xywh = pred[:, 0:4]
    obj_conf = sigmoid(pred[:, 4])
    cls_scores = sigmoid(pred[:, 5:])
    cls_conf = cls_scores * obj_conf[:, None]

    class_ids = np.argmax(cls_conf, axis=1)
    scores = cls_conf[np.arange(cls_conf.shape[0]), class_ids]

    mask = scores > conf_thresh
    if not np.any(mask):
        return np.empty((0, 4)), np.array([]), np.array([])

    xywh = xywh[mask]
    scores = scores[mask]
    class_ids = class_ids[mask]

    xyxy = np.zeros_like(xywh)
    xyxy[:, 0] = xywh[:, 0] - xywh[:, 2] / 2
    xyxy[:, 1] = xywh[:, 1] - xywh[:, 3] / 2
    xyxy[:, 2] = xywh[:, 0] + xywh[:, 2] / 2
    xyxy[:, 3] = xywh[:, 1] + xywh[:, 3] / 2

    keep = nms(xyxy, scores, iou_thresh)
    return xyxy[keep], scores[keep], class_ids[keep]


def draw_boxes(img, boxes, scores, class_ids):
    for (box, score, cls) in zip(boxes, scores, class_ids):
        x1, y1, x2, y2 = map(int, box)
        color = (0, 255, 0)
        label = f"{CLASSES[cls]} {score:.2f}"
        cv2.rectangle(img, (x1, y1), (x2, y2), color, 2)
        cv2.putText(img, label, (x1, y1 - 5),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)
    return img


# ---------------- MAIN ----------------
if __name__ == "__main__":
    print(f"🔹 Loading RKNN model: {MODEL_PATH}")
    rknn = RKNNLite()
    if rknn.load_rknn(MODEL_PATH) != 0:
        raise SystemExit("❌ Failed to load RKNN model")
    print("✅ Model loaded successfully")

    if rknn.init_runtime() != 0:
        raise SystemExit("❌ Failed to initialize runtime")
    print("✅ Runtime initialized")

    print(f"\n🎥 Starting RTSP stream: {RTSP_URL}")
    vs = VideoStream(RTSP_URL).start()
    time.sleep(1.0)

    frame_buffer = deque(maxlen=BATCH_SIZE)
    meta_data = []

    frame_count = 0
    print("🚀 Running inference... Press 'q' to exit.")

    while True:
        frame = vs.read()
        if frame is None:
            print("⚠️ Frame not received, reconnecting...")
            time.sleep(1)
            continue

        frame_count += 1


        processed, scale, dw, dh = preprocess_image(frame, IMG_SIZE)
        frame_buffer.append(processed)
        meta_data.append((frame, scale, dw, dh))

        if len(frame_buffer) >= BATCH_SIZE:
            input_batch = np.array(frame_buffer, dtype=np.uint8)
            t0 = time.time()
            outputs = rknn.inference(inputs=[input_batch])
            t1 = time.time()

            output = outputs[0]
            fps = BATCH_SIZE / (t1 - t0)
            print(f"✅ Batch done in {(t1-t0)*1000:.2f} ms ({fps:.2f} FPS)")

            for i in range(len(frame_buffer)):
                pred = output[i]
                boxes, scores, class_ids = decode_yolo(pred, CONF_THRESH, IOU_THRESH)

                img, scale, dw, dh = meta_data[i]
                if len(boxes) > 0:
                    boxes[:, [0, 2]] -= dw
                    boxes[:, [1, 3]] -= dh
                    boxes /= scale

                img_out = draw_boxes(img, boxes, scores, class_ids)
                cv2.imshow("YOLO RKNN RTSP", img_out)

            frame_buffer.clear()
            meta_data.clear()

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    vs.stop()
    cv2.destroyAllWindows()
    print("\n✅ RTSP inference stopped.")

Prem · October 28, 2025, 9:36am

By lag, I suppose you mean the time difference between receiving input and displaying the final output. To utilise 3 NPU cores while reducing the lag that comes from waiting for another frame to come (and also from preprocess and postprocess), I think the best way to do it is by using 3 threads for inference, with inferencing one frame at a time and preprocessing and postprocessing should be done in separate threads. This way NPU utilisation would increase and idle time of NPU would decrease and so turnaround time would also decrease.

Though a little loss of accuracy is unavoidable while post-training quantization, it can be reduced by passing sufficient amount of diverse data while quantization callibrarion and using hybrid quantization if accuracy still suffers. If there is a total loss of accuracy, that means there is something wrong in some step of quantization. While quantizing yolo8 or yolo11 models, you should pass mean_values=[[0, 0, 0]], std_values=[[255, 255, 255]] as paramters in RKNNLite.config() function call and while inferencing pass uint8(0 - 255 range) image in right shape (shape can be modified using data_format paramter of inference() functon, default is “nhwc” which means height width comes before channel).

Knightwolf · October 28, 2025, 9:40am

thing is, calcibration dataset was too small so may be hats an issue.
For batching, i want to know, supopose there are 50 nobects i want to process per cam, so this batch 8 fps will remain constant for upto 8 cameras? anmd multi object detection