GPU/NPU Not Utilized During ML Tasks (YOLO .pt/.rknn) – Clarification on Driver Support

Hello,

While running machine learning tasks such as YOLO models (.pt and .rknn formats), I noticed that the CPU is heavily used, whereas the GPU and NPU are either idle or only minimally active (e.g., one NPU core occasionally shows ~2% usage).

I’m wondering:

  1. Do we need to manually install ARM Mali GPU drivers or any additional components to enable GPU acceleration?
  2. Are there official or recommended GPU/NPU drivers provided by Vicharak for optimal hardware utilization?
  3. Any specific configuration needed to make full use of the NPU/GPU for inference?

Would appreciate any guidance on improving performance for ML workloads.

Thank you!

Which Image are you using?

Is it Ubuntu 24.04 LTS, Kernel 6.1?

To utilize NPU, you need to run .rknn format models using rknn-toolkit2 if using c++ or you can use its python wrapper rknn-toolkit-lite2 for doing inference in python. The quick guide regarding using NPU on axon can be found here.

For utlizing GPU of axon, like any other arm based gpu, you would need to use libraries which can run neural networks on arm based gpu, maybe using opencl or other GPU programming frameworks. One repo providing opencl wrapper to run neural networks and ai on gpu using opencl that I could find is this.

Axon NPU has 3 cores, to utilize it to the full, you might need to process input in batches or run inference in 3 cores in 3 separate threads in multi-threaded way.

The drivers related to GPU should be already present in the latest os image for Axon.

1 Like

I would like to share a quick update and seek further clarification regarding GPU utilization during inference:

  • Initially, my system was running on Kernel 5.x, but after an update, it is now using Kernel 6.1 on Ubuntu 22.04.
  • I am running inference using models in .rknn format through rknn-toolkit-lite2. However, I’ve observed that during execution, the CPU usage spikes to 300–400%, while both the GPU and NPU show minimal or no usage.

Given this, I wanted to ask:

Are there any additional GPU drivers or components that need to be installed manually to enable GPU acceleration on this setup?

The .rknn model can’t be run on gpu using any framework or library. For running models on gpu, you need to get gpu supported libraries or frameworks and required format model files. the driver for gpu is already installed, just need program to use that gpu.
the .rknn models being used with rknn-toolkit-lite2 uses NPU for operators which are supported on NPU (which are mostly compute heavy like convolution, matrix multiplication or similar) and others light weight and memory copying operations are performed with the help of cpu. CPU usage might be spiking if model is small and pre-processing and post-processing step is significant as compared to rknn_lite inference. If models are light weight those might not be able to use NPU completely and get inference done quickly. Also rknn-toolkit-lite2 will use the core of NPU that you mention while initialising runtime using RKNNLite.init_runtime(). try running multiple inferences in a loop and comparing speed of rknn model with other format for same model to run on cpu to check the difference.

1 Like