while doing inference on axon board, use rknnlite instead of rknn. rknn api is for converting the model and simulating the model to get output. You may follow this for using rknnlite.api : link.
And the FPS you are calculating is batch FPS right? So net FPS should batch_size * FPS?
rknn-toolkit2 version: 2.3.2
I Loading : 0%| | 0/I Loading : 100%|██████████████████████████████████████████████| 127/127 [00:00<00:00, 25879.44it/s]
E load_onnx: The input shape [‘batch’, 3, ‘height’, ‘width’] of ‘images’ is not support!
Please set the ‘inputs’ / ‘input_size_list’ parameters of ‘rknn.load_onnx’, or set the ‘dyanmic_input’ parameter of ‘rknn.config’ to fix the input shape!
I ===================== WARN(0) =====================
E rknn-toolkit2 version: 2.3.2
E load_onnx: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 1579, in rknn.api.rknn_base.RKNNBase.load_onnx
File “rknn/api/rknn_base.py”, line 708, in rknn.api.rknn_base.RKNNBase._create_ir_and_inputs_meta
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input shape [‘batch’, 3, ‘height’, ‘width’] of ‘images’ is not support!
Please set the ‘inputs’ / ‘input_size_list’ parameters of ‘rknn.load_onnx’, or set the ‘dyanmic_input’ parameter of ‘rknn.config’ to fix the input shape!
I ===================== WARN(0) =====================
E rknn-toolkit2 version: 2.3.2
Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 1579, in rknn.api.rknn_base.RKNNBase.load_onnx
File “rknn/api/rknn_base.py”, line 708, in rknn.api.rknn_base.RKNNBase._create_ir_and_inputs_meta
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input shape [‘batch’, 3, ‘height’, ‘width’] of ‘images’ is not support!
Please set the ‘inputs’ / ‘input_size_list’ parameters of ‘rknn.load_onnx’, or set the ‘dyanmic_input’ parameter of ‘rknn.config’ to fix the input shape!
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/vicharak/conv.py”, line 21, in
ret = rknn.load_onnx(model=onnx_model_path)
File “/home/vicharak/miniforge3/envs/rknn/lib/python3.10/site-packages/rknn/api/rknn.py”, line 168, in load_onnx
return self.rknn_base.load_onnx(model, inputs, input_size_list, input_initial_val, outputs)
File “rknn/api/rknn_log.py”, line 349, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 1579, in rknn.api.rknn_base.RKNNBase.load_onnx
File “rknn/api/rknn_base.py”, line 708, in rknn.api.rknn_base.RKNNBase._create_ir_and_inputs_meta
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input shape [‘batch’, 3, ‘height’, ‘width’] of ‘images’ is not support!
Please set the ‘inputs’ / ‘input_size_list’ parameters of ‘rknn.load_onnx’, or set the ‘dyanmic_input’ parameter of ‘rknn.config’ to fix the input shape!
please help , when i am converting dynmic input of onnx to rknn for yolov8
when i give mdel input of batch 1
(rknn) vicharak@vicharak:~$ python3 conv.py
/home/vicharak/miniforge3/envs/rknn/lib/python3.10/site-packages/rknn/api/rknn.py:51: UserWarning: pkg_resources is deprecated as an API. See Package Discovery and Resource Access using pkg_resources - setuptools 80.9.0 documentation. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
self.rknn_base = RKNNBase(cur_path, verbose)
I rknn-toolkit2 version: 2.3.2
I Loading : 0%| | 0/I Loading : 100%|██████████████████████████████████████████████| 127/127 [00:00<00:00, 27750.80it/s]
E build: The ‘rknn_batch_size’ is conflict with model input!
Build model failed
code :
from rknn.api import RKNN
onnx_model_path = ‘best-new640.onnx’
rknn_model_path = ‘best_new640+.rknn’
Initialize RKNN
rknn = RKNN()
Mean and std — replace these with your model’s preprocessing values
mu1, mu2, mu3 = 0, 0, 0
del1, del2, del3 = 1, 1, 1
Configure RKNN
rknn.config(
mean_values=[[mu1, mu2, mu3]],
std_values=[[del1, del2, del3]],
target_platform=‘rk3588’
)
Load ONNX model
ret = rknn.load_onnx(model=onnx_model_path)
if ret != 0:
print(‘
Could not load ONNX model’)
exit(ret)
Quantization & batching setup
quantization_flag = True
dataset_path = ‘dataset.txt’ # Only required if quantization is True
batch_size = 8
auto_hybrid_flag = True # Enable hybrid quantization (optional)
Build RKNN model
ret = rknn.build(
do_quantization=quantization_flag,
dataset=dataset_path if quantization_flag else None,
rknn_batch_size=batch_size,
auto_hybrid=auto_hybrid_flag
)
if ret != 0:
print(‘
Build model failed’)
exit(ret)
else:
print(‘
Build model done’)
Export RKNN model
ret = rknn.export_rknn(rknn_model_path)
if ret != 0:
print(‘
Export model failed’)
exit(ret)
else:
print(f’
RKNN model exported: {rknn_model_path}')
when i set batchsize 1
it gives me thids
I OpFusing 0: 56%|██████████████████████████▎ | 56/100 [00:0I OpFusing 0: 58%|███████████████████████████▎ | 58/100 [00:0I OpFusing 0: 60%|████████████████████████████▏ | 60/100 [00:0I OpFusing 0: 62%|█████████████████████████████▏ | 62/100 [00:0I OpFusing 0: 64%|██████████████████████████████ | 64/100 [00:0I OpFusing 0: 65%|██████████████████████████████▌ | 65/100 [00:0I OpFusing 0: 66%|███████████████████████████████ | 66/100 [00:0I OpFusing 0: 68%|███████████████████████████████▉ | 68/100 [00:0I OpFusing 0: 70%|████████████████████████████████▉ | 70/100 [00:0I OpFusing 0: 73%|██████████████████████████████████▎ | 73/100 [00:0I OpFusing 0: 74%|██████████████████████████████████▊ | 74/100 [00:0I OpFusing 0: 76%|███████████████████████████████████▋ | 76/100 [00:0I OpFusing 0: 78%|████████████████████████████████████▋ | 78/100 [00:0I OpFusing 0: 81%|██████████████████████████████████████ | 81/100 [00:0I OpFusing 0: 83%|███████████████████████████████████████ | 83/100 [00:0I OpFusing 0: 85%|███████████████████████████████████████▉ | 85/100 [00:0I OpFusing 0: 87%|████████████████████████████████████████▉ | 87/100 [00:0I OpFusing 0: 89%|█████████████████████████████████████████▊ | 89/100 [00:0I OpFusing 0: 92%|███████████████████████████████████████████▏ | 92/100 [00:0I OpFusing 0: 94%|████████████████████████████████████████████▏ | 94/100 [00:0I OpFusing 0: 100%|██████████████████████████████████████████████| 100/100 [00:00<00:00, 318.28it/s]
I OpFusing 1 : 0%| | 0/I OpFusing 1 : 89%|████████████████████████████████████████▉ | 89/100 [00:0I OpFusing 1 : 92%|██████████████████████████████████████████▎ | 92/100 [00:0I OpFusing 1 : 92%|██████████████████████████████████████████▎ | 92/100 [00:0I OpFusing 1 : 93%|██████████████████████████████████████████▊ | 93/100 [00:0I OpFusing 1 : 94%|███████████████████████████████████████████▏ | 94/100 [00:0I OpFusing 1 : 94%|███████████████████████████████████████████▏ | 94/100 [00:0I OpFusing 1 : 96%|████████████████████████████████████████████▏ | 96/100 [00:0I OpFusing 1 : 95%|███████████████████████████████████████████▋ | 95/100 [00:0I OpFusing 1 : 96%|████████████████████████████████████████████▏ | 96/100 [00:0I OpFusing 1 : 97%|████████████████████████████████████████████▌ | 97/100 [00:0I OpFusing 1 : 97%|████████████████████████████████████████████▌ | 97/100 [00:0I OpFusing 1 : 98%|█████████████████████████████████████████████ | 98/100 [00:0I OpFusing 1 : 91%|█████████████████████████████████████████▊ | 91/100 [00:0I OpFusing 1 : 95%|███████████████████████████████████████████▋ | 95/100 [00:0I OpFusing 1 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 152.47it/s]
I OpFusing 0 : 0%| | 0/I OpFusing 0 : 78%|███████████████████████████████████▉ | 78/100 [00:0I OpFusing 0 : 79%|████████████████████████████████████▎ | 79/100 [00:0I OpFusing 0 : 82%|█████████████████████████████████████▋ | 82/100 [00:0I OpFusing 0 : 83%|██████████████████████████████████████▏ | 83/100 [00:0I OpFusing 0 : 85%|███████████████████████████████████████ | 85/100 [00:0I OpFusing 0 : 86%|███████████████████████████████████████▌ | 86/100 [00:0I OpFusing 0 : 88%|████████████████████████████████████████▍ | 88/100 [00:0I OpFusing 0 : 91%|█████████████████████████████████████████▊ | 91/100 [00:0I OpFusing 0 : 92%|██████████████████████████████████████████▎ | 92/100 [00:0I OpFusing 0 : 94%|███████████████████████████████████████████▏ | 94/100 [00:0I OpFusing 0 : 96%|████████████████████████████████████████████▏ | 96/100 [00:0I OpFusing 0 : 97%|████████████████████████████████████████████▌ | 97/100 [00:0I OpFusing 0 : 98%|█████████████████████████████████████████████ | 98/100 [00:0I OpFusing 0 : 91%|█████████████████████████████████████████▊ | 91/100 [00:0I OpFusing 0 : 94%|███████████████████████████████████████████▏ | 94/100 [00:0I OpFusing 0 : 97%|████████████████████████████████████████████▌ | 97/100 [00:0I OpFusing 0 : 92%|███████████████████████████████████████████▏ | 92/100 [00:I OpFusing 0 : 97%|█████████████████████████████████████████████▌ | 97/100 [00:I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 89.45it/s]
I OpFusing 1 : 0%| | 0/I OpFusing 1 : 92%|███████████████████████████████████████████▏ | 92/100 [00:I OpFusing 1 : 97%|█████████████████████████████████████████████▌ | 97/100 [00:I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 78.69it/s]
I OpFusing 0 : 0%| | 0/I OpFusing 0 : 93%|███████████████████████████████████████████▋ | 93/100 [00:I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 67.30it/s]
I OpFusing 1 : 0%| | 0/I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 64.93it/s]
I OpFusing 2 : 0%| | 0/I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:06<00:00, 14.96it/s]
I GraphPreparing : 0%| | 0/I GraphPreparing : 57%|███████████████████████▌ | 101/176 [00:0I GraphPreparing : 100%|█████████████████████████████████████████| 176/176 [00:00<00:00, 931.98it/s]
I Quantizating 1/12: 0%| | 0/176 [00:00<?, ?it/s]E build: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!
I ===================== WARN(0) =====================
E rknn-toolkit2 version: 2.3.2
I Quantizating 1/12: 0%| | 0/176 [00:00<?, ?it/s]
E build: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2041, in rknn.api.rknn_base.RKNNBase.build
File “rknn/api/rknn_base.py”, line 175, in rknn.api.rknn_base.RKNNBase._quantize
File “rknn/api/quantizer.py”, line 1419, in rknn.api.quantizer.Quantizer.run
File “rknn/api/quantizer.py”, line 900, in rknn.api.quantizer.Quantizer._get_layer_range
File “rknn/api/rknn_utils.py”, line 328, in rknn.api.rknn_utils.get_input_img
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!
I ===================== WARN(0) =====================
E rknn-toolkit2 version: 2.3.2
Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2041, in rknn.api.rknn_base.RKNNBase.build
File “rknn/api/rknn_base.py”, line 175, in rknn.api.rknn_base.RKNNBase._quantize
File “rknn/api/quantizer.py”, line 1419, in rknn.api.quantizer.Quantizer.run
File “rknn/api/quantizer.py”, line 900, in rknn.api.quantizer.Quantizer._get_layer_range
File “rknn/api/rknn_utils.py”, line 328, in rknn.api.rknn_utils.get_input_img
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/vicharak/conv.py”, line 33, in
ret = rknn.build(
File “/home/vicharak/miniforge3/envs/rknn/lib/python3.10/site-packages/rknn/api/rknn.py”, line 198, in build
return self.rknn_base.build(do_quantization=do_quantization, dataset=dataset, expand_batch_size=rknn_batch_size, auto_hybrid=auto_hybrid)
File “rknn/api/rknn_log.py”, line 349, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2041, in rknn.api.rknn_base.RKNNBase.build
File “rknn/api/rknn_base.py”, line 175, in rknn.api.rknn_base.RKNNBase._quantize
File “rknn/api/quantizer.py”, line 1419, in rknn.api.quantizer.Quantizer.run
File “rknn/api/quantizer.py”, line 900, in rknn.api.quantizer.Quantizer._get_layer_range
File “rknn/api/rknn_utils.py”, line 328, in rknn.api.rknn_utils.get_input_img
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!
when you are converting a dynamic input onnx model to rknn model, you have to define the shapes that you want. If you want muliple input shapes to be supported you have to set the dynamic_input parameter of rknn.config() function or if you want single input shape then you need to set the shape inputs and input_size_list parameters of rknn.load_onnx() function. The complete documentation explaining api and parameters can be found here.
since you are trying to convert a dynamic shape input without providing the set of input shape to expect, hence it is raising error. Kindly go through the rknn.load_onnx and rknn.config() api documentation link for understanding parameters requirements.
i did and dynamic is enavled but
100 [00:01<?, ?it/s]W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
I OpFusing 1 : 93%|███████████████████████████████████████████▋ | 93/100 [00:01<00:00, 53.10it/s]W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 52.05it/s]
I OpFusing 0 : 0%| | 0/I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:02<00:00, 49.71it/s]
I OpFusing 1 : 0%| | 0/100 [00:02<?, ?it/s]W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
W build: The ‘dynamic_input’ function is enabled, disable _p_convert_maxpool_to_maxpool_tile!
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:02<00:00, 48.34it/s]
I OpFusing 2 : 0%| | 0/I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:02<00:00, 48.03it/s]
I GraphPreparing : 0%| | 0/I GraphPreparing : 100%|████████████████████████████████████████| 176/176 [00:00<00:00, 3163.93it/s]
I Quantizating 1/12: 0%| | 0/176 [00:00<?, ?it/s]E build: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!
W build: ===================== WARN(22) =====================
E rknn-toolkit2 version: 2.3.2
I Quantizating 1/12: 0%| | 0/176 [00:00<?, ?it/s]
E build: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2041, in rknn.api.rknn_base.RKNNBase.build
File “rknn/api/rknn_base.py”, line 175, in rknn.api.rknn_base.RKNNBase._quantize
File “rknn/api/quantizer.py”, line 1419, in rknn.api.quantizer.Quantizer.run
File “rknn/api/quantizer.py”, line 900, in rknn.api.quantizer.Quantizer._get_layer_range
File “rknn/api/rknn_utils.py”, line 328, in rknn.api.rknn_utils.get_input_img
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!
I ===================== WARN(0) =====================
E rknn-toolkit2 version: 2.3.2
Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2041, in rknn.api.rknn_base.RKNNBase.build
File “rknn/api/rknn_base.py”, line 175, in rknn.api.rknn_base.RKNNBase._quantize
File “rknn/api/quantizer.py”, line 1419, in rknn.api.quantizer.Quantizer.run
File “rknn/api/quantizer.py”, line 900, in rknn.api.quantizer.Quantizer._get_layer_range
File “rknn/api/rknn_utils.py”, line 328, in rknn.api.rknn_utils.get_input_img
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/vicharak/conv2.py”, line 33, in
ret = rknn.build(
File “/home/vicharak/miniforge3/envs/rknn/lib/python3.10/site-packages/rknn/api/rknn.py”, line 198, in build
return self.rknn_base.build(do_quantization=do_quantization, dataset=dataset, expand_batch_size=rknn_batch_size, auto_hybrid=auto_hybrid)
File “rknn/api/rknn_log.py”, line 349, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: Traceback (most recent call last):
File “rknn/api/rknn_log.py”, line 344, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
File “rknn/api/rknn_base.py”, line 2041, in rknn.api.rknn_base.RKNNBase.build
File “rknn/api/rknn_base.py”, line 175, in rknn.api.rknn_base.RKNNBase._quantize
File “rknn/api/quantizer.py”, line 1419, in rknn.api.quantizer.Quantizer.run
File “rknn/api/quantizer.py”, line 900, in rknn.api.quantizer.Quantizer._get_layer_range
File “rknn/api/rknn_utils.py”, line 328, in rknn.api.rknn_utils.get_input_img
File “rknn/api/rknn_log.py”, line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: The input(‘/home/vicharak/datas/prealigned_0.png’) shape (1, 640, 640, 3) is wrong, expect ‘nhwc’ like (8, 640, 640, 3)!
in this I guess you are using statically exported onnx model with batch_size (‘n’) = 8. In that case, in dataset.txt where you provide quantization dataset, you have to provide preprocessed dataset in shape (8, 640, 640, 3). This can be achieved by concatenating 8 (1, 640, 640, 3) shaped image by axis=0 as np.concatenate((img1, ...., img8), axis=0) and saving that as np.save("dataset_i.npy") and passing these paths to dataset_i.npy to dataset.txt for quantization.
its done, but with dynamic input not with giving rknn_batch_size with rknn.buikd
when i do that
from rknn.api import RKNN
Paths
onnx_model_path = ‘best-new6408.onnx’
rknn_model_path = ‘best-new640d.rknn’
Mean and std for your model
mu1, mu2, mu3 = 0, 0, 0 # Replace with your actual values
del1, del2, del3 = 1, 1, 1 # Replace with your actual values
rknn = RKNN()
Configure model for dynamic batch
Here we support batch sizes 1 to 8 for 640x640 input
rknn.config(
mean_values=[[mu1, mu2, mu3]],
std_values=[[del1, del2, del3]],
target_platform=‘rk3588’,
dynamic_input=[[[1, 3, 640, 640]], [[8, 3, 640, 640]]] # list of list of list
)
Load ONNX
ret = rknn.load_onnx(model=onnx_model_path)
if ret != 0:
print(“Failed to load ONNX model”)
exit(ret)
Quantization settings
quantization_flag = True
dataset_path = ‘dataset.txt’ # images for calibration
Build model with quantization
ret = rknn.build(
do_quantization=quantization_flag,
dataset=dataset_path,
rknn_batch_size=8
)
if ret != 0:
print(“Build failed”)
exit(ret)
Export RKNN model
ret = rknn.export_rknn(rknn_model_path)
if ret != 0:
print(“Export failed”)
exit(ret)
print(“RKNN model exported successfully!”)
I rknn-toolkit2 version: 2.3.2
W config: Please make sure the model can be dynamic when enable ‘config.dynamic_input’!
I The ‘dynamic_input’ function has been enabled, the MaxShape is dynamic_input[1] = [[8, 3, 640, 640]]!
The following functions are subject to the MaxShape:
1. The quantified dataset needs to be configured according to MaxShape
2. The eval_perf or eval_memory return the results of MaxShape
I Loading : 0%| | 0/I Loading : 100%|██████████████████████████████████████████████| 127/127 [00:00<00:00, 13154.46it/s]
E build: The ‘rknn_batch_size’ is conflict with model input!
Build failed
does it gove mow FPS on videocaptuere on video than live feed?
import cv2
import numpy as np
import imutils
import time
from rknnlite.api import RKNNLite
from imutils.video import VideoStream
-------------------- Config --------------------
rknn_model_path = ‘best-new640d.rknn’
video_path = ‘rtsp://admin:tdbtech4189@192.168.1.250:554’ # 0 for webcam
BATCH_SIZE = 8
FRAME_WIDTH, FRAME_HEIGHT = 640, 640 # model input
CLASSES = (
“laptop”, “Bike”, “Car”, “cattle”, “fire”, “Bus”, “Smartphone”,
“glasses”, “bottle”, “Auto”, “book”, “smoke”, “Number_plate”,
“tractor”, “Truck”, “bag”, “Face”, “pencilcase”, “Person”,
“helmet”, “machine”
)
-------------------- RKNNLite Setup --------------------
rknn_lite = RKNNLite()
ret = rknn_lite.load_rknn(rknn_model_path)
if ret != 0:
print(“Load RKNN model failed”)
exit(ret)
Use all 3 NPU cores for batch inference
ret = rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2)
if ret != 0:
print(“Init runtime environment failed”)
exit(ret)
-------------------- Video Capture --------------------
cap = VideoStream(video_path).start()
frame_buffer =
frame_display_buffer =
For FPS calculation
fps_display = 0
fps_count = 0
start_time = time.time()
while True:
frame = cap.read()
# Resize frame using imutils
# Resize frame to model input
frame_resized = cv2.resize(frame, (FRAME_WIDTH, FRAME_HEIGHT)) # FRAME_WIDTH=640, FRAME_HEIGHT=640
frame_rgb = cv2.cvtColor(frame_resized, cv2.COLOR_BGR2RGB)
frame_buffer.append(frame_rgb)
frame_display_buffer.append(frame)
# When batch is full, run inference
if len(frame_buffer) == BATCH_SIZE:
batch_input = np.stack(frame_buffer, axis=0).astype(np.float32) # nhwc
outputs = rknn_lite.inference(inputs=[batch_input], data_format="nhwc")
if outputs is not None:
batch_out = outputs[0] # shape: (BATCH_SIZE, num_detections, N)
for i, dets in enumerate(batch_out):
for det in dets:
if len(det) >= 6:
x1, y1, x2, y2, score, class_idx = det[:6]
class_idx = int(class_idx)
if class_idx >= len(CLASSES):
continue
class_name = CLASSES[class_idx]
# Draw on frame
cv2.rectangle(frame_display_buffer[i], (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0), 2)
cv2.putText(frame_display_buffer[i], f"{class_name}: {score:.2f}",
(int(x1), int(y1)-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
# Show frames with FPS
for f in frame_display_buffer:
fps_count += 1
if fps_count >= BATCH_SIZE:
end_time = time.time()
fps_display = fps_count / (end_time - start_time)
start_time = end_time
fps_count = 0
cv2.putText(f, f"FPS: {fps_display:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
cv2.imshow("RKNNLite Video", f)
# Clear buffers
frame_buffer = []
frame_display_buffer = []
if cv2.waitKey(1) & 0xFF == ord('q'):
break
Process remaining frames if any
if len(frame_buffer) > 0:
batch_input = np.stack(frame_buffer, axis=0).astype(np.float32)
outputs = rknn_lite.inference(inputs=[batch_input], data_format=“nhwc”)
if outputs is not None:
batch_out = outputs[0]
for i, dets in enumerate(batch_out):
for det in dets:
if len(det) >= 6:
x1, y1, x2, y2, score, class_idx = det[:6]
class_idx = int(class_idx)
if class_idx >= len(CLASSES):
continue
class_name = CLASSES[class_idx]
cv2.rectangle(frame_display_buffer[i], (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0), 2)
cv2.putText(frame_display_buffer[i], f"{class_name}: {score:.2f}",
(int(x1), int(y1)-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
cv2.imshow("RKNNLite Video", frame_display_buffer[i])
cv2.waitKey(1)
cap.release()
cv2.destroyAllWindows()
rknn_lite.release()
I AM GETTING fps JUST 16, CAN YU CHECK PLEASE
when you are modifying input shape for batching, you don’t need to provide rknn_batch_size paramter also.
import cv2
import numpy as np
import time
from rknnlite.api import RKNNLite
from imutils.video import VideoStream
from collections import deque
-------------------- Config --------------------
rknn_model_path = ‘best-new640d.rknn’
video_path = ‘rtsp://admin:tdbtech4189@192.168.1.250:554’
BATCH_SIZE = 8
MODEL_W, MODEL_H = 640, 640
CLASSES = (“laptop”, “Bike”, “Car”, “cattle”, “fire”, “Bus”, “Smartphone”,
“glasses”, “bottle”, “Auto”, “book”, “smoke”, “Number_plate”,
“tractor”, “Truck”, “bag”, “Face”, “pencilcase”, “Person”,
“helmet”, “machine”)
-------------------- RKNNLite Setup --------------------
rknn_lite = RKNNLite()
if rknn_lite.load_rknn(rknn_model_path) != 0:
print(“Load RKNN model failed”); exit(1)
if rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) != 0:
print(“Init runtime environment failed”); exit(1)
-------------------- Letterbox Resize --------------------
def letterbox_image(image, target_size=(MODEL_W, MODEL_H)):
h, w = image.shape[:2]
scale = min(target_size[0]/w, target_size[1]/h)
new_w, new_h = int(wscale), int(hscale)
resized = cv2.resize(image, (new_w, new_h))
canvas = np.zeros((target_size[1], target_size[0], 3), dtype=np.uint8)
top = (target_size[1] - new_h) // 2
left = (target_size[0] - new_w) // 2
canvas[top:top+new_h, left:left+new_w] = resized
return canvas, scale, left, top
-------------------- Video Capture --------------------
cap = VideoStream(video_path).start()
frame_buffer =
frame_display_buffer = deque(maxlen=30)
fps_display = 0
fps_count = 0
start_time = time.time()
-------------------- Inference Function --------------------
def run_batch_inference(frames_batch, display_batch, scales_offsets):
global fps_display, fps_count, start_time
# Model expects BGR 0-255, NHWC
batch_input = np.stack(frames_batch, axis=0).astype(np.float32)
outputs = rknn_lite.inference(inputs=[batch_input], data_format="nhwc")
if outputs is None: return
batch_out = outputs[0] # (BATCH_SIZE, num_detections, N)
for i, dets in enumerate(batch_out):
scale, dx, dy = scales_offsets[i]
for det in dets:
if len(det) >= 6:
x1, y1, x2, y2, score, class_idx = det[:6]
class_idx = int(class_idx)
if class_idx >= len(CLASSES): continue
class_name = CLASSES[class_idx]
# Adjust coordinates from letterbox
x1 = int((x1 - dx) / scale)
y1 = int((y1 - dy) / scale)
x2 = int((x2 - dx) / scale)
y2 = int((y2 - dy) / scale)
cv2.rectangle(display_batch[i], (x1, y1), (x2, y2), (0,255,0), 2)
cv2.putText(display_batch[i], f"{class_name}: {score:.2f}",
(x1, y1-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2)
# FPS display
for f in display_batch:
fps_count += 1
if fps_count >= BATCH_SIZE:
end_time = time.time()
fps_display = fps_count / (end_time - start_time)
start_time = end_time
fps_count = 0
cv2.putText(f, f"FPS: {fps_display:.2f}", (10,30),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
cv2.imshow("RKNNLite 640x640 Video", f)
-------------------- Main Loop --------------------
while True:
frame = cap.read()
if frame is None: continue
# Letterbox resize for model input
frame_resized, scale, dx, dy = letterbox_image(frame, (MODEL_W, MODEL_H))
frame_buffer.append(frame_resized)
frame_display_buffer.append(frame.copy())
if 'scales_offsets_list' not in locals(): scales_offsets_list = deque(maxlen=30)
scales_offsets_list.append((scale, dx, dy))
if len(frame_buffer) == BATCH_SIZE:
run_batch_inference(
list(frame_buffer),
list(frame_display_buffer)[-BATCH_SIZE:],
list(scales_offsets_list)[-BATCH_SIZE:]
)
frame_buffer = []
# Display last few frames smoothly
for f in list(frame_display_buffer)[-BATCH_SIZE:]:
cv2.imshow("RKNNLite 640x640 Video", f)
if cv2.waitKey(1) & 0xFF == ord('q'): break
Process remaining frames
if len(frame_buffer) > 0:
run_batch_inference(
list(frame_buffer),
list(frame_display_buffer)[-len(frame_buffer):],
list(scales_offsets_list)[-len(frame_buffer):]
)
cap.release()
cv2.destroyAllWindows()
rknn_lite.release()
can you check this
it gives on 17 FPS
does it gove mow FPS on videocaptuere on video than live feed?
what do you mean by this?
Recently I had tried modifying batch_sizes of various types of model to see the change in inference time. What I observed is that for the models whose input shape is lower like 224224, increasing batch_size helps increase in the net FPS upto certain increase in batch_size. But the same is not true for quite large input shape 640640. Maybe this has to do with certain sweet spot of memory bandwidth threshold which first increase the FPS until certain input size or computation per layer, then after a certain threshold it starts decreasing.
ok but after exprtin to rknn accuray is the issue
like misclasssification occutringf
while providing code can please use 3 backtick (```) a line before and a line after code finishes, this formats the code in actual code format.
‘’’
import cv2
import numpy as np
import time
from rknnlite.api import RKNNLite
from imutils.video import VideoStream
from collections import deque
-------------------- Config --------------------
rknn_model_path = ‘best-new640d.rknn’
video_path = ‘rtsp://admin:tdbtech4189@192.168.1.250:554’
BATCH_SIZE = 8
FRAME_WIDTH, FRAME_HEIGHT = 640, 640 # model input
CLASSES = (“laptop”, “Bike”, “Car”, “cattle”, “fire”, “Bus”, “Smartphone”,
“glasses”, “bottle”, “Auto”, “book”, “smoke”, “Number_plate”,
“tractor”, “Truck”, “bag”, “Face”, “pencilcase”, “Person”,
“helmet”, “machine”)
-------------------- RKNNLite Setup --------------------
rknn_lite = RKNNLite()
if rknn_lite.load_rknn(rknn_model_path) != 0:
print(“Load RKNN model failed”); exit(1)
if rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) != 0:
print(“Init runtime environment failed”); exit(1)
-------------------- Video Capture --------------------
cap = VideoStream(video_path).start()
frame_buffer =
frame_display_buffer = deque(maxlen=30)
fps_display = 0
fps_count = 0
start_time = time.time()
def run_batch_inference(frames_batch, display_batch):
global fps_display, start_time, fps_count
# Normalize to 0-1
batch_input = np.stack(frames_batch, axis=0).astype(np.float32) / 255.0
outputs = rknn_lite.inference(inputs=[batch_input], data_format="nhwc")
if outputs is not None:
batch_out = outputs[0] # (BATCH_SIZE, num_detections, N)
for i, dets in enumerate(batch_out):
for det in dets:
if len(det) >= 6:
x1, y1, x2, y2, score, class_idx = det[:6]
class_idx = int(class_idx)
if class_idx >= len(CLASSES): continue
class_name = CLASSES[class_idx]
# Draw directly on 640x640 resized frame
x1 = int(x1)
y1 = int(y1)
x2 = int(x2)
y2 = int(y2)
cv2.rectangle(display_batch[i], (x1, y1), (x2, y2), (0,255,0), 2)
cv2.putText(display_batch[i], f"{class_name}: {score:.2f}",
(x1, y1-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2)
# FPS
for f in display_batch:
fps_count += 1
if fps_count >= BATCH_SIZE:
end_time = time.time()
fps_display = fps_count / (end_time - start_time)
start_time = end_time
fps_count = 0
cv2.putText(f, f"FPS: {fps_display:.2f}", (10,30),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
cv2.imshow("RKNNLite 640x640 Video", f)
Main loop
while True:
frame = cap.read()
if frame is None: continue
# Resize frame to 640x640 for the model
frame_resized = cv2.resize(frame, (FRAME_WIDTH, FRAME_HEIGHT))
frame_rgb = cv2.cvtColor(frame_resized, cv2.COLOR_BGR2RGB)
frame_buffer.append(frame_rgb)
frame_display_buffer.append(frame_resized.copy()) # keep a copy for display
if len(frame_buffer) == BATCH_SIZE:
run_batch_inference(frame_buffer, list(frame_display_buffer)[-BATCH_SIZE:])
frame_buffer = []
# Always display last few frames for smooth playback
for f in list(frame_display_buffer)[-BATCH_SIZE:]:
cv2.imshow("RKNNLite 640x640 Video", f)
if cv2.waitKey(1) & 0xFF == ord('q'): break
Process remaining frames
if len(frame_buffer) > 0:
run_batch_inference(frame_buffer, list(frame_display_buffer)[-len(frame_buffer):])
cap.release()
cv2.destroyAllWindows()
rknn_lite.release()
‘’’
few common reasons for inaccuracies are:
- the model may expect input in RGB or BGR, but cv2 reads image in BGR format and PIL reads image in RGB format, one must change it according to model in preprocess
- model may expect the input to be normalised by certain mean, std or scale, but either these values are neither passed to rknn.config nor used for normalization in preprocessing or these values or these values are both passed to config and used for normalization in preprocessing also.
- while passing the input image, sometimes we pass the input in “nchw” format but forget to change the default “nhwc” format which is passed as parameter.
P.S. Please double check spelling and grammar of queries before posting, it is sometimes hard to know what the issue really is.
And for code formatting use 3 backtick (` one in the top left corner below esc button) and not the inverted comma
i managewd to run batch on yolo on rk3588 vicharak board
but issue is it lags until it reaches batch
also int8 was not working preopely so i worked on fp16 and it works
GPS for 8,3,640,640 is arounf 11
but it lags until it receives 8 frames.
what we can do as i want to run 5 cams in parallel in batch detection so process bbox will be effective else there is no point
import os
import cv2
import time
import numpy as np
from rknnlite.api import RKNNLite
from imutils.video import VideoStream
from collections import deque
# ---------------- CONFIG ----------------
MODEL_PATH = "yolo11s_fp16_dynamic.rknn"
RTSP_URL = "rtsp://admin:tdbtech4189@192.168.1.250:554"
IMG_SIZE = 640
CONF_THRESH = 0.30
IOU_THRESH = 0.45
BATCH_SIZE = 8
FRAME_SKIP = 0 # Process every n-th frame for speed
CLASSES = ("person", "bicycle", "car","motorbike ","aeroplane ","bus ","train","truck ","boat","traffic light",
"fire hydrant","stop sign ","parking meter","bench","bird","cat","dog ","horse ","sheep","cow","elephant",
"bear","zebra ","giraffe","backpack","umbrella","handbag","tie","suitcase","frisbee","skis","snowboard","sports ball","kite",
"baseball bat","baseball glove","skateboard","surfboard","tennis racket","bottle","wine glass","cup","fork","knife ",
"spoon","bowl","banana","apple","sandwich","orange","broccoli","carrot","hot dog","pizza ","donut","cake","chair","sofa",
"pottedplant","bed","diningtable","toilet ","tvmonitor","laptop ","mouse ","remote ","keyboard ","cell phone","microwave ",
"oven ","toaster","sink","refrigerator ","book","clock","vase","scissors ","teddy bear ","hair drier", "toothbrush ")
# ---------------- FUNCTIONS ----------------
def preprocess_image(img, img_size=640):
h, w = img.shape[:2]
scale = min(img_size / w, img_size / h)
new_w, new_h = int(w * scale), int(h * scale)
resized = cv2.resize(img, (new_w, new_h))
padded = np.full((img_size, img_size, 3), 114, dtype=np.uint8)
dw, dh = (img_size - new_w) // 2, (img_size - new_h) // 2
padded[dh:dh + new_h, dw:dw + new_w] = resized
return padded, scale, dw, dh
def bbox_iou(box1, boxes):
x1, y1, x2, y2 = box1
xx1 = np.maximum(x1, boxes[:, 0])
yy1 = np.maximum(y1, boxes[:, 1])
xx2 = np.minimum(x2, boxes[:, 2])
yy2 = np.minimum(y2, boxes[:, 3])
inter = np.maximum(0, xx2 - xx1) * np.maximum(0, yy2 - yy1)
area1 = (x2 - x1) * (y2 - y1)
area2 = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
union = area1 + area2 - inter
return inter / (union + 1e-6)
def nms(boxes, scores, iou_thresh):
x1, y1, x2, y2 = boxes.T
areas = (x2 - x1) * (y2 - y1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
inter = np.maximum(0, xx2 - xx1) * np.maximum(0, yy2 - yy1)
iou = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(iou <= iou_thresh)[0]
order = order[inds + 1]
return keep
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def decode_yolo(pred, conf_thresh=0.25, iou_thresh=0.45):
pred = pred.reshape(84, -1).T # (8400, 84)
xywh = pred[:, 0:4]
obj_conf = sigmoid(pred[:, 4])
cls_scores = sigmoid(pred[:, 5:])
cls_conf = cls_scores * obj_conf[:, None]
class_ids = np.argmax(cls_conf, axis=1)
scores = cls_conf[np.arange(cls_conf.shape[0]), class_ids]
mask = scores > conf_thresh
if not np.any(mask):
return np.empty((0, 4)), np.array([]), np.array([])
xywh = xywh[mask]
scores = scores[mask]
class_ids = class_ids[mask]
xyxy = np.zeros_like(xywh)
xyxy[:, 0] = xywh[:, 0] - xywh[:, 2] / 2
xyxy[:, 1] = xywh[:, 1] - xywh[:, 3] / 2
xyxy[:, 2] = xywh[:, 0] + xywh[:, 2] / 2
xyxy[:, 3] = xywh[:, 1] + xywh[:, 3] / 2
keep = nms(xyxy, scores, iou_thresh)
return xyxy[keep], scores[keep], class_ids[keep]
def draw_boxes(img, boxes, scores, class_ids):
for (box, score, cls) in zip(boxes, scores, class_ids):
x1, y1, x2, y2 = map(int, box)
color = (0, 255, 0)
label = f"{CLASSES[cls]} {score:.2f}"
cv2.rectangle(img, (x1, y1), (x2, y2), color, 2)
cv2.putText(img, label, (x1, y1 - 5),
cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)
return img
# ---------------- MAIN ----------------
if __name__ == "__main__":
print(f"🔹 Loading RKNN model: {MODEL_PATH}")
rknn = RKNNLite()
if rknn.load_rknn(MODEL_PATH) != 0:
raise SystemExit("❌ Failed to load RKNN model")
print("✅ Model loaded successfully")
if rknn.init_runtime() != 0:
raise SystemExit("❌ Failed to initialize runtime")
print("✅ Runtime initialized")
print(f"\n🎥 Starting RTSP stream: {RTSP_URL}")
vs = VideoStream(RTSP_URL).start()
time.sleep(1.0)
frame_buffer = deque(maxlen=BATCH_SIZE)
meta_data = []
frame_count = 0
print("🚀 Running inference... Press 'q' to exit.")
while True:
frame = vs.read()
if frame is None:
print("⚠️ Frame not received, reconnecting...")
time.sleep(1)
continue
frame_count += 1
processed, scale, dw, dh = preprocess_image(frame, IMG_SIZE)
frame_buffer.append(processed)
meta_data.append((frame, scale, dw, dh))
if len(frame_buffer) >= BATCH_SIZE:
input_batch = np.array(frame_buffer, dtype=np.uint8)
t0 = time.time()
outputs = rknn.inference(inputs=[input_batch])
t1 = time.time()
output = outputs[0]
fps = BATCH_SIZE / (t1 - t0)
print(f"✅ Batch done in {(t1-t0)*1000:.2f} ms ({fps:.2f} FPS)")
for i in range(len(frame_buffer)):
pred = output[i]
boxes, scores, class_ids = decode_yolo(pred, CONF_THRESH, IOU_THRESH)
img, scale, dw, dh = meta_data[i]
if len(boxes) > 0:
boxes[:, [0, 2]] -= dw
boxes[:, [1, 3]] -= dh
boxes /= scale
img_out = draw_boxes(img, boxes, scores, class_ids)
cv2.imshow("YOLO RKNN RTSP", img_out)
frame_buffer.clear()
meta_data.clear()
if cv2.waitKey(1) & 0xFF == ord('q'):
break
vs.stop()
cv2.destroyAllWindows()
print("\n✅ RTSP inference stopped.")
By lag, I suppose you mean the time difference between receiving input and displaying the final output. To utilise 3 NPU cores while reducing the lag that comes from waiting for another frame to come (and also from preprocess and postprocess), I think the best way to do it is by using 3 threads for inference, with inferencing one frame at a time and preprocessing and postprocessing should be done in separate threads. This way NPU utilisation would increase and idle time of NPU would decrease and so turnaround time would also decrease.
Though a little loss of accuracy is unavoidable while post-training quantization, it can be reduced by passing sufficient amount of diverse data while quantization callibrarion and using hybrid quantization if accuracy still suffers. If there is a total loss of accuracy, that means there is something wrong in some step of quantization. While quantizing yolo8 or yolo11 models, you should pass mean_values=[[0, 0, 0]], std_values=[[255, 255, 255]] as paramters in RKNNLite.config() function call and while inferencing pass uint8(0 - 255 range) image in right shape (shape can be modified using data_format paramter of inference() functon, default is “nhwc” which means height width comes before channel).
thing is, calcibration dataset was too small so may be hats an issue.
For batching, i want to know, supopose there are 50 nobects i want to process per cam, so this batch 8 fps will remain constant for upto 8 cameras? anmd multi object detection