Rockchip Trouble Shooting RKNN Toolkit V1.2.1 EN
Rockchip Trouble Shooting RKNN Toolkit V1.2.1 EN
com/
福州瑞芯微电子股份有限公司
Fuzhou Rockchips Electronics Co., Ltd
(All rights reserved)
1
http://www.rock-chips.com/
Revision History
2
http://www.rock-chips.com/
Content
1. RKNN Toolkit usage related questions ...................................................................................... 4
2. Questions related with quantization accuracy ........................................................................ 15
3. Common issues of Caffe model conversion ............................................................................ 16
4. Common issues of Tensorflow model conversion ................................................................... 18
5. Common issues of Pytorch model conversion ........................................................................ 19
6. RKNN convolution acceleration tips ........................................................................................ 19
3
http://www.rock-chips.com/
It includes four values (M0 M1 M2 S0). The first three values are mean value parameters and the
last value is Scale parameter. If the input data have three channels (Cin0, Cin1, Cin2), the output
data will be (Cout0,Cout1, Cout2) after preprocessing. The calculating process is as below:
For example, if need to formulate the input data into [-1, 1], you can set this parameter as
If need to formulate the input data into [0, 1], you can set this parameter as (0 0 0 255).
1.2. When the input image is gray picture with single channel, how to set
rknn.config interface?
Please refer to the answer of 1.1, when the input image is single channel, only ”Cout0 =
(Cin0 - M0)/S0” is used, so you can set as (M0, 0, 0, S0), while the values of M1 and M2 are not
used.
You don’t need to set channel_mean_value or reorder_channel. The default value of mean
will set to 0, scale will set to 1.
4
http://www.rock-chips.com/
1) The speed of forward inferring test is slow, and some picture may take over 0.5s while
testing mobilenet-ssd.
2) The time difference between model rknn.inference and rknn.eval_perf() is relatively big,
such as:
There are two reasons for the issue of slow measured frame rate:
1. Using the method of pc + adb to upload picture is quite slow, as it has high frame rate
2. In the implementation of 0.9.8 and earlier, the inference included some extra time, and the
For more real measured frame rate, you can directly use c/c++ api to test on the board.
5
http://www.rock-chips.com/
1.7. The first inference of RKNN Toolkit 0.9.9 version is very slow
RKNN Toolkit 0.9.9 version postpones the model loading to the first inference, so the first
inference is relatively slow. This issue has been resolved in versions 1.0.0 and later.
Arm64 version RKNN Toolkit doesn’t support pre_compile so far, if need to open
1.9. Returned outputs of YOLO forward test is [array1 , array2], the length
is [10140 , 40560], what is the meaning of the returned value?
The outputs returned by rknn.inference is a list of numpy ndarray, the size and quantity of
each model output data are different, users need to look up the corresponding output and
(https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize), which
requires user should have some re-training experience of fine tune. Use rknn.build
(do_quantization=False) after the quantized model is loaded through RKNN Toolkit, and now
RKNN Toolkit will use the own quantization parameter of the model, so there is no loss on the
quantization accuracy.
⚫ Post training quantization
When use this method, user loads the well-trained float point model, and RKNN Toolkit will
do the quantization according to the dataset provided by user. Dataset should try to cover as
many input type of model as possible. To make example simple, generally put only one picture.
6
http://www.rock-chips.com/
Google. According to the description in the article of Quantizing deep convolutional networks for
efficient inference: A whitepaper, the accuracy loss of this quantization method is the smallest for
most networks.
Where ‘quant’ represents the quantized number; ‘float_num’ represents float; data type of
‘scalse’ if float32; data type of ‘zero-points’ is int32, it represents the corresponding quantized
value when the real number is 0. Finally saturate ‘quant’ to [range_min, range_max].
Currently only supports the inverse quantization of u8, the calculation formula is as follows:
✓ dynamic_fixed_point-8
For some models, the quantization accuracy of dynamic_fixed_point-8 is higher than
asymmetric_quantized-u8.
Where ‘quant’ represents the quantized number; ‘float_num’ represents float; ‘fl’ is the
number of digits shifted to the left. Finally saturate ‘quant’ to [range_min, range_max].
✓ dynamic_fixed_point-16
The quantization formula of dynamic_fixed_point-16 is the same as dynamic_fixed_point-8,
except bw=16. For RK3399pro/RK1808, there is 300Gops int16 computing unit inside NPU, for
some quantized to 8 bit network with relatively high accuracy loss, you can consider to use this
7
http://www.rock-chips.com/
quantization method.
There are two scenarios. When the loaded model is the quantized model,
do_quantization=False will use the quantization parameter of the model, for more details please
refer to the answer of 1.9. When the loaded model is the non-quantized model,
do_quantization=False will not do quantization, but will convert the weight from float32 to
It is because there is no data in dataset.txt, or the data format is not supported. Recommend
to use jpg or npy.
The reason of the error is Python environment is not clean, for example, numpy is installed
8
http://www.rock-chips.com/
in two different paths. You can re-build a clean Python environment and try again.
The reason is there is no root authority. Need to add ‘--user’ option for installation.
1.16. What is the role of dataset during RKNN quantization? Why does
quantization need to relate to dataset?
During RKNN quantization, need to find appropriate quantization parameters, such as scale
or zero point. These quantization parameters should be selected according to the inference of the
actual input.
The RKNN Toolkit needs to be upgraded to version 1.2.0 or later. And you need to specify the
number of input images when building the RKNN model. For detailed usage, refer to the
1.18. When will it support to convert pytorch and mxnet model directly to
rknn?
The function of converting Pytorch directly to rknn is under developing. There is no plan for
mxnet so far.
1.19. Pre-compile model generated by RKNN Toolkit 0.9.9 can not run on
RK3399Pro which NPU driver version is 0.9.6.
Pre-compiled model generated by RKNN Toolkit 1.0.0 can not run on device installed old
driver (NPU driver version < 0.9.6), and pre-compiled model generated by old RKNN Toolkit
(version < 1.0.0) can not run on device installed new NPU driver (NPU drvier version == 0.9.6).
The driver version number can be queried through the get_sdk_version interface.
9
http://www.rock-chips.com/
1.20. When I load model, the numpy module raises error: Object arrays
cannot be loaded when allow pickle=False.
This error is caused by the change in the default value of the allow_pickle parameter of the
load file interface after numpy is upgraded to 1.16.3. There are two solutions: one is to reduce
the numpy version to version 1.16.2 or lower; the other is to update RKNN Toolkit to version
1.0.0 or later.
10
http://www.rock-chips.com/
1) Make sure that the RKNN Toolkit and the firmware of devices have been upgraded to the latest
version. The correspondence between each version of the RKNN Toolkit and the components of
The version information can also be queried through the get_sdk_version interface, where the
2) Make sure the “adb devices” command can get the device, and the target and device_id
3) If you use RKNN Toolkit 1.1.0 and above, make sure rknn.list_devices() can get the devices list.
4) If you are using a compute stick or NTB mode for the RK1808 EVB version, make sure you have
called update_rk1808_usb_rule.sh (contained in the RKNN Toolkit distribution) to get read and
5) If you are running the AARCH64 version of the RKNN Toolkit directly on the
RK3399/RK3399Pro, make sure the system firmware has been upgraded to the latest version.
11
http://www.rock-chips.com/
rknn.base.ovxconfiggenerator.generate_vx_config_from_files
rknn.base.RKNNlib.app.exporter.ovxlib_case.casegenerator.CaseGenerator.generate
rknn.base.RKNNlib.app.exporter.ovxlib_case.casegenerator.CaseGenerator._gen_special_case
rknn.base.RKNNlib.app.exporter.ovxlib_case.casegenerator.CaseGenerator._gen_nb_file
1.24. Upgraded to RKNN Toolkit 1.2.0, there are 200 pictures in dataset.txt,
the rknn model is very low. Are these pictures used for quantitative
correction?
RKNN Toolkit 1.2.0 adjusts the default value of batch_size in config interface. In this version,
if you want to use multiple pictures for quantization correction, the value of this parameter
should be set to the corresponding number of pictures. If this value is set too large, it may cause
program exceptions due to exhaustion of system memory. In this situation, you need to upgrade
to version 1.2.1 or later. In version 1.2.1, the default value of batch_size is restored to 100, and
multiple quantization correction can be achieved with epochs parameter. The number of images
used for quantization correction is the product of batch_size and epochs. For example, if there
are 200 pictures in the dataset file, then batch_size is set to 100, epochs is set to 2, or batch_size
is set to 200, and epochs is set to 1, all of which can achieve the quantization correction of 200
pictures. But the memory usage peak of theformer is lower than that of the latter. If you only
want to use 100 of them, you can set batch_size to 100 and epochs to 1.
12
http://www.rock-chips.com/
1.25. The shape of numpy array in dataset is (4, 640, 480), but when
building quantized rknn model, the log prompts shape (640, 480,
1.26. Is the size of the image used for quantization correction the same as
Not required. RKNN Toolkit automatically scales images. Howerver, because zooming can
change the image information, it may have some impact on the accuracy, so it is better to use
pictures of similar size.
1.27. When using the RKNN Toolkit, if the logging module is used in the
13
http://www.rock-chips.com/
1.29. What deep learning framework does the RKNN Toolkit support?
Deep learning frameworks supported by the RKNN Toolkit include TensorFlow, TensorFlow
Lite, Caffe, ONNX and Darknet.
It corresponds to the version of each deep learning framework as follows:
RKNN Toolkit TensorFlow TF Lite Caffe ONNX Darknet
1.0.0 >=1.10.0, Schema 1.0 Release Latest commit:
<=1.13.2 version = 3 version 1.3.0 810d7f7
1.1.0 >=1.10.0, Schema 1.0 Release Latest commit:
<=1.13.2 version = 3 version 1.3.0 810d7f7
1.2.0 >=1.10.0, Schema 1.0 Release Latest commit:
<=1.13.2 version = 3 version 1.3.0 810d7f7
1.2.1 >=1.10.0, Schema 1.0 Release Latest commit:
<=1.13.2 version = 3 version 1.3.0 810d7f7
Note:
1. In compliance with semver, SavedModels written with one version of TensorFlow can
be loaded and evaluated with a later version of TensorFlow with the same major
release. So in theory, the pb file generated by TensorFlow before version 1.14.0, RKNN
Toolkit 1.0.0 and later are supported. For more information on TensorFlow version
compatibility, please refer to the official link:
https://www.tensorflow.org/guide/versions
2. RKNN Toolkit uses the TF Lite schema commits in link:
https://github.com/tensorflow/tensorflow/commits/master/tensorflow/lite/schema/sc
hema.fbs
Commit hash: 0c4f5dfea4ceb3d7c0b46fc04828420a344f7598.
Because TF Lite schema may not compatible with each other, TF Lite models with older
or newer schema may not be loaded successfully.
3. There are two caffe protocols RKNN Toolkit uses, one based on the officially modified
protocol of berkeley, and one based on the protocol containing the LSTM layer. The
protocol based on the official revision of berkeley comes from this link:
https://github.com/BVLC/caffe/tree/master/src/caffe/proto, commit hash is 21d0608.
On this basis RKNN Toolkit have added some OPs. The protocol containing the LSTM
layer refers to: https://github.com/xmfbit/warpctc-caffe/tree/master/src/caffe/proto,
commit hash is bd6181b. These two protocols are specified by the proto parameter in
the load_caffe interface.
4. The relationship between ONNX release version and opset version, IR version refers to
the official website description:
https://github.com/microsoft/onnxruntime/blob/master/docs/Versioning.md
ONNX release version ONNX opset version Supported ONNX IR version
1.3.0 8 3
1.4.1 9 3
5. Darknet official Github link: https://github.com/pjreddie/darknet. Our current
14
http://www.rock-chips.com/
conversion rules are based on the latest commit of the master branch (commit value:
810d7f7).
2.1. The accuracy doesn’t match with original model after quantization,
how to debug?
➢ Firstly make sure the accuracy of float type is similar to test result of original platform:
(1) Make rknn.build(do_quantization=False) when the quantized model is loaded by RKNN
Toolkit.
(2) Refer to 1.1 to set channel_mean_value parameter, which should be same as the
(3) Make sure the sequence of the input image channel must be R,G,B while testing.
(Whatever the sequence of the image channel is used for training, it must be input by
stands for BGR, and it must be consistent with the sequence of the image channel used
for training.
➢ Accuracy test after quantization
(1) Use multiple pictures to do quantization, to ensure the stability of quantization accuracy.
Set batch_size parameter in rknn.config (recommend to set batch_size = 200) and provide
If the display memory is not enough, you can set batch_size =1, epochs=200 instead of
(2) Accuracy comparison, try to use relatively big data set to do testing. Compare the
accuracy of top-1, top-5 for classifying network, compare mAP, Recall of data set for checking
Currently PC simulator supports to dump out data of each layer of network. Need to set an
15
http://www.rock-chips.com/
export NN_LAYER_DUMP=1
python xxx.py
After execution, tensor data file of each layer of network will be generated in current
directory, and then you can compare with data of other framework layer by layer.
Note, some layers may be combined, for example, conv+bn+scale may be combined into one
conv, in this case, need to compare with output of scale layer of the original model.
The RKNN Toolkit currently supports quantized models of the two frameworks TensorFlow
and TensorFlow Lite.
It means this model is old version of caffe mode. Need to change input layer into below
format.
layer {
name: "data"
type: "Input"
top: "data"
input_param {
shape {
dim: 1
dim: 3
dim: 224
dim: 224
}
}
}
round_mode field of Pool layer cannot be recognized, you can change it to ceil_model. For
example, if originally it is round_mode: CEIL, then you can delete (ceil_mode is True by default)
or change to ceil_mode:True.
16
http://www.rock-chips.com/
Not support detectionoutput layer, you can delete and then change to CPU.
3.5. There should be three output tensor after Caffe version SSD model
deletes detectionoutput, but actually only return two tensor by
RKNN inference
The missing tensor is priori box. It is the same during training and inference stage, and for all
inputs. In order to improve performance, RKNN Toolkit optimized the relative layer in the model.
If want to get the tensor of priori box, you can save the tensor of priori box, or use Caffe to do
Comparing with official code, need to change 'proposal' layer of prototxt as below:
layer {
name: 'proposal'
type: 'proposal'
17
http://www.rock-chips.com/
bottom: 'rpn_cls_prob_reshape'
bottom: 'rpn_bbox_pred'
top: 'rois'
top: 'scores'
proposal_param {
ratio: 0.5 ratio: 1.0 ratio: 2.0
scale: 8 scale: 16 scale: 32
base_size: 16
feat_stride: 16
pre_nms_topn: 6000
post_nms_topn: 300
nms_thresh: 0.7
min_size: 16
}
}
The main reason is that the version of the caffe model is too old and needs to be updated. The
2) Compile Caffe
One possible reason is that input node is not correct. You can modify as below:
18
http://www.rock-chips.com/
rknn.load_tensorflow(tf_pb='./ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb',
inputs=['FeatureExtractor/MobilenetV2/MobilenetV2/input'],
outputs=['concat', 'concat_1'],
input_size_list=[[INPUT_SIZE, INPUT_SIZE, 3]])
4.3. On RKNN Toolkit 1.0.0,is the output shape of RKNN model converted
from TensorFlow changed?
Versions prior to 1.0.0 will convert output shape from "NHWC" to "NCHW". Starting from
this version, the shape of the output will be consistent with the original model, and no longer
convert from "NHWC" to "NCHW". Please pay attention to the location of the channel when
Currently RKNN Toolkit indirectly supports pytorch through ONNX, so need to convert
pytorch to ONNX first. If issue occurs during conversion, please update RKNN Toolkit to the latest
version first.
This issue is introduced after pytorch 0.4.5 version. In your model, if there is something like
19
http://www.rock-chips.com/
Convolution cores can support a large range of kernel sizes. The minimum supported
The NN Engine performs most optimally when the Convolution kernel size is 3x3, under
Non-square kernels are also supported, but with some computation overhead.
The Convolution core can fuse ReLU and MAX Pooling operations on the fly to further
reduce computation and bandwidth overhead. A ReLU layer following a Convolution layer
will always be fused, while MAX pooling layer fusion has the following restrictions, Max
pooling must
- 3x3 pooling must have odd input size which is not one and no padding
- Horizontal input size must be less than 64 (8-bit mode) or 32 (16-bit mode) if pool size is
3x3
3. Depthwise Convolutions
friendly to quantized model. It`s recommend to use 2D convolution whenever possible when
20
http://www.rock-chips.com/
If you must use a Depthwise convolution, it`s recommend to follow the rules below that
- Remove the BN layer and activation layer of the Depthwise convolution layer.
It`s recommend to set the number of convolution output channels to be a multiple of the
number of convolution kernels in the NPU to ensure that all convolution kernels are better
redundancy in their design. Pruning a network to be sparse has been proven to reduce
21
http://www.rock-chips.com/
computations and memory fetches on zero values. The sparsity level can be fine grain down
to individual weights. Designing a sparse network to take advantage of this technology could
22