Mmdeploy Readthedocs Io en Latest
Mmdeploy Readthedocs Io en Latest
Release 1.3.1
MMDeploy Contributors
1 Get Started 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Convert Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Inference Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Evaluate Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
i
9 Quantize model 33
9.1 Why quantization ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
9.2 Post training quantization scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
9.3 How to convert model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
9.4 Custom calibration dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
10 Useful Tools 35
10.1 torch2onnx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
10.2 extract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
10.3 onnx2pplnn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
10.4 onnx2tensorrt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
10.5 onnx2ncnn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
10.6 profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
10.7 generate_md_table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
11 SDK Documentation 41
11.1 Setup & Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
11.2 API Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
12 Supported models 73
12.1 Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
13 Benchmark 75
13.1 Backends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
13.2 Latency benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
13.3 Performance benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
15 Test on TVM 79
15.1 Supported Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
15.2 Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
17 MMPretrain Deployment 83
17.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
17.2 Convert model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
17.3 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
17.4 Model inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
17.5 Supported models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
18 MMDetection Deployment 87
18.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
18.2 Convert model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
18.3 Model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
18.4 Model inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
18.5 Supported models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
ii
18.6 Reminder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
19 MMSegmentation Deployment 93
19.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
19.2 Convert model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
19.3 Model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
19.4 Model inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
19.5 Supported models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
19.6 Reminder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
20 MMagic Deployment 99
20.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
20.2 Convert model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
20.3 Model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
20.4 Model inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
20.5 Supported models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
iii
27.1 Introduction of ONNX Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
27.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
27.3 Build custom ops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
27.4 How to convert a model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
27.5 How to add a new custom op . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
27.6 Reminder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
27.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
iv
37.8 TRTBatchedRotatedNMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
37.9 GridPriorsTRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
37.10 ScaledDotProductAttentionTRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
37.11 GatherTopk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
37.12 MMCVMultiScaleDeformableAttention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
v
47 Cross compile snpe inference server on Ubuntu 18 203
47.1 1. Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
47.2 2. Cross compile gRPC with NDK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
47.3 3. (Skipable) Self-test whether NDK gRPC is available . . . . . . . . . . . . . . . . . . . . . . . . . 204
47.4 4. Cross compile snpe inference server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
47.5 5. Regenerate the proto interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
47.6 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
49 English 211
50 213
51 apis 215
52 apis/tensorrt 221
53 apis/onnxruntime 225
54 apis/ncnn 227
55 apis/pplnn 229
Index 235
vi
mmdeploy Documentation, Release 1.3.1
You can switch between Chinese and English documents in the lower-left corner of the layout.
GET STARTED 1
mmdeploy Documentation, Release 1.3.1
2 GET STARTED
CHAPTER
ONE
GET STARTED
MMDeploy provides useful tools for deploying OpenMMLab models to various platforms and devices.
With the help of them, you can not only do model deployment using our pre-defined pipelines but also customize your
own deployment pipeline.
1.1 Introduction
In MMDeploy, the deployment pipeline can be illustrated by a sequential modules, i.e., Model Converter, MMDeploy
Model and Inference SDK.
Model Converter aims at converting training models from OpenMMLab into backend models that can be run on target
devices. It is able to transform PyTorch model into IR model, i.e., ONNX, TorchScript, as well as convert IR model to
backend model. By combining them together, we can achieve one-click end-to-end model deployment.
MMDeploy Model is the result package exported by Model Converter. Beside the backend models, it also includes the
model meta info, which will be used by Inference SDK.
3
mmdeploy Documentation, Release 1.3.1
Inference SDK is developed by C/C++, wrapping the preprocessing, model forward and postprocessing modules in
model inference. It supports FFI such as C, C++, Python, C#, Java and so on.
1.2 Prerequisites
In order to do an end-to-end model deployment, MMDeploy requires Python 3.6+ and PyTorch 1.8+.
Step 0. Download and install Miniconda from the official website.
Step 1. Create a conda environment and activate it.
On CPU platforms:
Note: On GPU platform, please ensure that {cudatoolkit_version} matches your host CUDA toolkit version. Other-
wise, it probably brings in conflicts when deploying model with TensorRT.
1.3 Installation
export CUDNN_DIR=$(pwd)/cuda
export LD_LIBRARY_PATH=$CUDNN_DIR/lib64:$LD_LIBRARY_PATH
After the installation, you can enjoy the model deployment journey starting from converting PyTorch model to backend
model by running tools/deploy.py.
Based on the above settings, we provide an example to convert the Faster R-CNN in MMDetection to TensorRT as
below:
# clone mmdeploy to get the deployment config. `--recursive` is not necessary
git clone -b main https://github.com/open-mmlab/mmdeploy.git
(continues on next page)
# clone mmdetection repo. We have to use the config file to build PyTorch nn module
git clone -b 3.x https://github.com/open-mmlab/mmdetection.git
cd mmdetection
mim install -v -e .
cd ..
The converted model and its meta info will be found in the path specified by --work-dir. And they make up of
MMDeploy Model that can be fed to MMDeploy SDK to do model inference.
For more details about model conversion, you can read how_to_convert_model. If you want to customize the conversion
pipeline, you can edit the config file by following this tutorial.
Tip: You can convert the above model to onnx model and perform ONNX Runtime inference just by changing
‘detection_tensorrt_dynamic-320x320-1344x1344.py’ to ‘detection_onnxruntime_dynamic.py’ and making ‘–device’
as ‘cpu’.
After model conversion, we can perform inference not only by Model Converter but also by Inference SDK.
Model Converter provides a unified API named as inference_model to do the job, making all inference backends
API transparent to users. Take the previous converted Faster R-CNN tensorrt model for example,
backend_files=['mmdeploy_model/faster-rcnn/end2end.engine'],
img='mmdetection/demo/demo.jpg',
device='cuda:0')
Note: ‘backend_files’ in this API refers to backend engine file path, which MUST be put in a list, since some inference
engines like OpenVINO and ncnn separate the network structure and its weights into two files.
You can directly run MMDeploy demo programs in the precompiled package to get inference results.
wget https://github.com/open-mmlab/mmdeploy/releases/download/v1.3.1/mmdeploy-1.3.1-
˓→linux-x86_64-cuda11.8.tar.gz
tar xf mmdeploy-1.3.1-linux-x86_64-cuda11.8
cd mmdeploy-1.3.1-linux-x86_64-cuda11.8
# run python demo
python example/python/object_detection.py cuda ../mmdeploy_model/faster-rcnn ../
˓→mmdetection/demo/demo.jpg
Note: In the above command, the input model is SDK Model path. It is NOT engine file path but actually the path
passed to –work-dir. It not only includes engine files but also meta information like ‘deploy.json’ and ‘pipeline.json’.
In the next section, we will provide examples of deploying the converted Faster R-CNN model talked above with SDK
different FFI (Foreign Function Interface).
Python API
img = cv2.imread('mmdetection/demo/demo.jpg')
# create a detector
detector = Detector(model_path='mmdeploy_models/faster-rcnn', device_name='cuda', device_
˓→id=0)
cv2.imwrite('output_detection.png', img)
C++ API
Now let’s apply this procedure on the above Faster R-CNN model.
#include <cstdlib>
#include <opencv2/opencv.hpp>
#include "mmdeploy/detector.hpp"
int main() {
const char* device_name = "cuda";
int device_id = 0;
std::string model_path = "mmdeploy_model/faster-rcnn";
std::string image_path = "mmdetection/demo/demo.jpg";
// 1. load model
mmdeploy::Model model(model_path);
// 2. create predictor
mmdeploy::Detector detector(model, mmdeploy::Device{device_name, device_id});
// 3. read image
cv::Mat img = cv::imread(image_path);
// 4. inference
auto dets = detector.Apply(img);
// 5. deal with the result. Here we choose to visualize it
for (int i = 0; i < dets.size(); ++i) {
const auto& box = dets[i].bbox;
fprintf(stdout, "box %d, left=%.2f, top=%.2f, right=%.2f, bottom=%.2f, label=%d,␣
˓→score=%.4f\n",
When you build this example, try to add MMDeploy package in your CMake project as following. Then pass
-DMMDeploy_DIR to cmake, which indicates the path where MMDeployConfig.cmake locates. You can find it in
the prebuilt package.
find_package(MMDeploy REQUIRED)
target_link_libraries(${name} PRIVATE mmdeploy ${OpenCV_LIBS})
For more SDK C++ API usages, please read these samples.
For the rest C, C# and Java API usages, please read C demos, C# demos and Java demos respectively. We’ll talk about
Accelerate preprocessingExperimental
You can test the performance of deployed model using tool/test.py. For example,
python ${MMDEPLOY_DIR}/tools/test.py \
${MMDEPLOY_DIR}/configs/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
${MMDET_DIR}/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \
--model ${BACKEND_MODEL_FILES} \
--metrics ${METRICS} \
--device cuda:0
Note: Regarding the –model option, it represents the converted engine files path when using Model Converter to do
performance test. But when you try to test the metrics by Inference SDK, this option refers to the directory path of
MMDeploy Model.
TWO
2.1 Download
Note:
• If fetching submodule fails, you could get submodule manually by following instructions:
cd mmdeploy
git clone git@github.com:NVIDIA/cub.git third_party/cub
cd third_party/cub
git checkout c3cceac115
cd ..
git clone git@github.com:gabime/spdlog.git spdlog
cd spdlog
git checkout 9e8e52c048
• If it fails when git clone via SSH, you can try the HTTPS protocol like this:
2.2 Build
Please visit the following links to find out how to build MMDeploy according to the target platform.
• Linux-x86_64
• Windows
• MacOS
• Android-aarch64
• NVIDIA Jetson
11
mmdeploy Documentation, Release 1.3.1
• SNPE
• RISC-V
• Rockchip
THREE
MMDeploy provides prebuilt docker images for the convenience of its users on Docker Hub. The docker images are built
on the latest and released versions. For instance, the image with tag openmmlab/mmdeploy:ubuntu20.04-cuda11.
8-mmdeploy is built on the latest mmdeploy and the image with tag openmmlab/mmdeploy:ubuntu20.04-cuda11.
8-mmdeploy1.2.0 is for mmdeploy==1.2.0. The specifications of the Docker Image are shown below.
You can select a tag and run docker pull to get the docker image:
export TAG=openmmlab/mmdeploy:ubuntu20.04-cuda11.8-mmdeploy
docker pull $TAG
If the prebuilt docker images do not meet your requirements, then you can build your own image by running the
following script. The docker file is docker/Release/Dockerfileand its building argument is MMDEPLOY_VERSION,
which can be a tag or a branch from mmdeploy.
export MMDEPLOY_VERSION=main
export TAG=mmdeploy-${MMDEPLOY_VERSION}
docker build docker/Release/ -t ${TAG} --build-arg MMDEPLOY_VERSION=${MMDEPLOY_VERSION}
After pulling or building the docker image, you can use docker run to launch the docker service:
export TAG=openmmlab/mmdeploy:ubuntu20.04-cuda11.8-mmdeploy
docker run --gpus=all -it --rm $TAG
13
mmdeploy Documentation, Release 1.3.1
3.4 FAQs
1. CUDA error: the provided PTX was compiled with an unsupported toolchain:
As described here, update the GPU driver to the latest one for your GPU.
2. docker: Error response from daemon: could not select device driver “” with capabilities: [gpu].
FOUR
Through user investigation, we know that most users are already familiar with python and torch before using mmdeploy.
Therefore we provide scripts to simplify mmdeploy installation.
Assuming you already have
• python3 -m pip (conda or pyenv)
• nvcc (depends on inference backend)
• torch (not compulsory)
run this script to install mmdeploy + ncnn backend, nproc is not compulsory.
$ cd /path/to/mmdeploy
$ python3 tools/scripts/build_ubuntu_x64_ncnn.py
..
A sudo password may be required during this time, and the script will try its best to build and install mmdeploy SDK
and demo:
• Detect host OS version, make job number, whether use root and try to fix python3 -m pip
• Find the necessary basic tools, such as g++-7, cmake, wget, etc.
• Compile necessary dependencies, such as pyncnn, protobuf
The script will also try to avoid affecting host environment:
• The dependencies of source code compilation are placed in the mmdeploy-dep directory at the same level as
mmdeploy
• The script would not modify variables such as PATH, LD_LIBRARY_PATH, PYTHONPATH, etc.
• The environment variables that need to be modified will be printed, please pay attention to the final output
The script will eventually execute python3 tools/check_env.py, the successful installation should display the
version number of the corresponding backend and ops_is_available: True, for example:
$ python3 tools/check_env.py
..
2022-09-13 14:49:13,767 - mmdeploy - INFO - **********Backend information**********
2022-09-13 14:49:14,116 - mmdeploy - INFO - onnxruntime: 1.8.0 ops_is_avaliable :␣
˓→True
15
mmdeploy Documentation, Release 1.3.1
Here is the verified installation script. If you want mmdeploy to support multiple backends at the same time, you can
execute each script once:
FIVE
17
mmdeploy Documentation, Release 1.3.1
SIX
This tutorial briefly introduces how to export an OpenMMlab model to a specific backend using MMDeploy tools.
Notes:
• Supported backends are ONNXRuntime, TensorRT , ncnn, PPLNN, OpenVINO.
• Supported codebases are MMPretrain, MMDetection, MMSegmentation, MMOCR, MMagic.
6.1.1 Prerequisite
1. Install and build your target backend. You could refer to ONNXRuntime-install, TensorRT-install, ncnn-install,
PPLNN-install, OpenVINO-install for more information.
2. Install and build your target codebase. You could refer to MMPretrain-install, MMDetection-install,
MMSegmentation-install, MMOCR-install, MMagic-install.
6.1.2 Usage
python ./tools/deploy.py \
${DEPLOY_CFG_PATH} \
${MODEL_CFG_PATH} \
${MODEL_CHECKPOINT_PATH} \
${INPUT_IMG} \
--test-img ${TEST_IMG} \
--work-dir ${WORK_DIR} \
--calib-dataset-cfg ${CALIB_DATA_CFG} \
--device ${DEVICE} \
--log-level INFO \
--show \
--dump-info
19
mmdeploy Documentation, Release 1.3.1
• deploy_cfg : The deployment configuration of mmdeploy for the model, including the type of inference frame-
work, whether quantize, whether the input shape is dynamic, etc. There may be a reference relationship between
configuration files, mmdeploy/mmpretrain/classification_ncnn_static.py is an example.
• model_cfg : Model configuration for algorithm library, e.g. mmpretrain/configs/vision_transformer/
vit-base-p32_ft-64xb64_in1k-384.py, regardless of the path to mmdeploy.
• checkpoint : torch model path. It can start with http/https, see the implementation of mmcv.FileClient for
details.
• img : The path to the image or point cloud file used for testing during the model conversion.
• --test-img : The path of the image file that is used to test the model. If not specified, it will be set to None.
• --work-dir : The path of the work directory that is used to save logs and models.
• --calib-dataset-cfg : Only valid in int8 mode. The config used for calibration. If not specified, it will be
set to None and use the “val” dataset in the model config for calibration.
• --device : The device used for model conversion. If not specified, it will be set to cpu. For trt, use cuda:0
format.
• --log-level : To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING',
'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO.
• --show : Whether to show detection outputs.
• --dump-info : Whether to output information for SDK.
1. Find the model’s codebase folder in configs/. For converting a yolov3 model, you need to check configs/
mmdet folder.
2. Find the model’s task folder in configs/codebase_folder/. For a yolov3 model, you need to check configs/
mmdet/detection folder.
3. Find the deployment config file in configs/codebase_folder/task_folder/. For deploying a yolov3 model
to the onnx backend, you could use configs/mmdet/detection/detection_onnxruntime_dynamic.py.
6.1.5 Example
python ./tools/deploy.py \
configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
$PATH_TO_MMDET/configs/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py \
$PATH_TO_MMDET/checkpoints/yolo/yolov3_d53_mstrain-608_273e_coco_20210518_115020-
˓→a2c3acb8.pth \
$PATH_TO_MMDET/demo/demo.jpg \
--work-dir work_dir \
--show \
--device cuda:0
SEVEN
This tutorial describes how to write a config for model conversion and deployment. A deployment config includes
onnx config, codebase config, backend config.
• How to write config
– 1. How to write onnx config
∗ Description of onnx config arguments
∗ Example
∗ If you need to use dynamic axes
· Example
– 2. How to write codebase config
∗ Description of codebase config arguments
· Example
– 3. How to write backend config
∗ Example
– 4. A complete example of mmpretrain on TensorRT
– 5. The name rules of our deployment config
∗ Example
– 6. How to write model config
23
mmdeploy Documentation, Release 1.3.1
7.1.2 Example
onnx_config = dict(
type='onnx',
export_params=True,
keep_initializers_as_inputs=False,
opset_version=11,
save_file='end2end.onnx',
input_names=['input'],
output_names=['output'],
input_shape=None)
If the dynamic shape of inputs and outputs is required, you need to add dynamic_axes dict in onnx config.
• dynamic_axes: Describe the dimensional information about input and output.
Example
dynamic_axes={
'input': {
0: 'batch',
2: 'height',
3: 'width'
},
'dets': {
0: 'batch',
1: 'num_dets',
},
'labels': {
0: 'batch',
1: 'num_dets',
(continues on next page)
Codebase config part contains information like codebase type and task type.
Example
The backend config is mainly used to specify the backend on which model runs and provide the information needed
when the model runs on the backend , referring to ONNX Runtime, TensorRT , ncnn, PPLNN.
• type: Model’s backend, including onnxruntime, ncnn, pplnn, tensorrt, openvino.
7.3.1 Example
backend_config = dict(
type='tensorrt',
common_config=dict(
fp16_mode=False, max_workspace_size=1 << 30),
model_inputs=[
dict(
input_shapes=dict(
input=dict(
min_shape=[1, 3, 512, 1024],
opt_shape=[1, 3, 1024, 2048],
max_shape=[1, 3, 2048, 2048])))
])
backend_config = dict(
type='tensorrt',
common_config=dict(
fp16_mode=False,
max_workspace_size=1 << 30),
model_inputs=[
dict(
input_shapes=dict(
input=dict(
min_shape=[1, 3, 224, 224],
opt_shape=[4, 3, 224, 224],
max_shape=[64, 3, 224, 224])))])
onnx_config = dict(
type='onnx',
dynamic_axes={
'input': {
0: 'batch',
2: 'height',
3: 'width'
},
'output': {
0: 'batch'
}
},
export_params=True,
keep_initializers_as_inputs=False,
opset_version=11,
save_file='end2end.onnx',
input_names=['input'],
output_names=['output'],
input_shape=[224, 224])
There is a specific naming convention for the filename of deployment config files.
means that the min input shape is 512x1024 and the max input shape is 2048x2048.
7.5.1 Example
detection_tensorrt-int8_dynamic-320x320-1344x1344.py
According to model’s codebase, write the model config file. Model’s config file is used to initialize the model, referring
to MMPretrain, MMDetection, MMSegmentation, MMOCR, MMagic.
EIGHT
After converting a PyTorch model to a backend model, you may evaluate backend models with tools/test.py
8.1 Prerequisite
Install MMDeploy according to get-started instructions. And convert the PyTorch model or ONNX model to the
backend model by following the guide.
8.2 Usage
python tools/test.py \
${DEPLOY_CFG} \
${MODEL_CFG} \
--model ${BACKEND_MODEL_FILES} \
[--out ${OUTPUT_PKL_FILE}] \
[--format-only] \
[--metrics ${METRICS}] \
[--show] \
[--show-dir ${OUTPUT_IMAGE_DIR}] \
[--show-score-thr ${SHOW_SCORE_THR}] \
--device ${DEVICE} \
[--cfg-options ${CFG_OPTIONS}] \
[--metric-options ${METRIC_OPTIONS}]
[--log2file work_dirs/output.txt]
[--batch-size ${BATCH_SIZE}]
[--speed-test] \
[--warmup ${WARM_UP}] \
[--log-interval ${LOG_INTERVERL}] \
29
mmdeploy Documentation, Release 1.3.1
8.4 Example
python tools/test.py \
configs/mmpretrain/classification_onnxruntime_static.py \
{MMPRETRAIN_DIR}/configs/resnet/resnet50_b32x8_imagenet.py \
--model model.onnx \
--out out.pkl \
--device cpu \
--speed-test
8.5 Note
• The performance of each model in OpenMMLab codebases can be found in the document of each codebase.
8.5. Note 31
mmdeploy Documentation, Release 1.3.1
NINE
QUANTIZE MODEL
The fixed-point model has many advantages over the fp32 model:
• Smaller size, 8-bit model reduces file size by 75%
• Benefit from the smaller model, the Cache hit rate is improved and inference would be faster
• Chips tend to have corresponding fixed-point acceleration instructions which are faster and less energy consumed
(int8 on a common CPU requires only about 10% of energy)
APK file size and heat generation are key indicators while evaluating mobile APP; On server side, quantization means
that you can increase model size in exchange for precision and keep the same QPS.
cd /path/to/mmdeploy
export MODEL_CONFIG=/home/rg/konghuanjun/mmpretrain/configs/resnet/resnet18_8xb32_in1k.py
export MODEL_PATH=https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_
˓→8xb32_in1k_20210831-fbbb1da6.pth
33
mmdeploy Documentation, Release 1.3.1
# quantize
python3 tools/deploy.py configs/mmpretrain/classification_ncnn-int8_static.py ${MODEL_
˓→CONFIG} ${MODEL_PATH} /path/to/self-test.png --work-dir work_dir --device cpu --
˓→quant --quant-image-dir /path/to/imagenet-sample-images
...
Description
Calibration set is used to calculate quantization layer parameters. Some DFQ (Data Free Quantization) methods do not
even require a dataset.
• Create a folder, just put in some images (no directory structure, no negative example, no special filename format)
• The image needs to be the data comes from real scenario otherwise the accuracy would be drop
• You can not quantize model with test dataset
It is highly recommended that verifying model precision after quantization. Here is some quantization model test result.
TEN
USEFUL TOOLS
Apart from deploy.py, there are other useful tools under the tools/ directory.
10.1 torch2onnx
This tool can be used to convert PyTorch model from OpenMMLab to ONNX.
10.1.1 Usage
python tools/torch2onnx.py \
${DEPLOY_CFG} \
${MODEL_CFG} \
${CHECKPOINT} \
${INPUT_IMG} \
--work-dir ${WORK_DIR} \
--device cpu \
--log-level INFO
35
mmdeploy Documentation, Release 1.3.1
10.2 extract
ONNX model with Mark nodes in it can be partitioned into multiple subgraphs. This tool can be used to extract the
subgraph from the ONNX model.
10.2.1 Usage
python tools/extract.py \
${INPUT_MODEL} \
${OUTPUT_MODEL} \
--start ${PARITION_START} \
--end ${PARITION_END} \
--log-level INFO
• input_model : The path of input ONNX model. The output ONNX model will be extracted from this model.
• output_model : The path of output ONNX model.
• --start : The start point of extracted model with format <function_name>:<input/output>. The
function_name comes from the decorator @mark.
• --end : The end point of extracted model with format <function_name>:<input/output>. The
function_name comes from the decorator @mark.
• --log-level : To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING',
'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO.
10.2.3 Note
To support the model partition, you need to add Mark nodes in the ONNX model. The Mark node comes from the @mark
decorator. For example, if we have marked the multiclass_nms as below, we can set end=multiclass_nms:input
to extract the subgraph before NMS.
10.3 onnx2pplnn
10.3.1 Usage
python tools/onnx2pplnn.py \
${ONNX_PATH} \
${OUTPUT_PATH} \
--device cuda:0 \
--opt-shapes [224,224] \
--log-level INFO
10.4 onnx2tensorrt
10.4.1 Usage
python tools/onnx2tensorrt.py \
${DEPLOY_CFG} \
${ONNX_PATH} \
${OUTPUT} \
--device-id 0 \
--log-level INFO \
--calib-file /path/to/file
10.4. onnx2tensorrt 37
mmdeploy Documentation, Release 1.3.1
10.5 onnx2ncnn
10.5.1 Usage
python tools/onnx2ncnn.py \
${ONNX_PATH} \
${NCNN_PARAM} \
${NCNN_BIN} \
--log-level INFO
10.6 profiler
This tool helps to test latency of models with PyTorch, TensorRT and other backends. Note that the pre- and post-
processing is excluded when computing inference latency.
10.6.1 Usage
python tools/profiler.py \
${DEPLOY_CFG} \
${MODEL_CFG} \
${IMAGE_DIR} \
--model ${MODEL} \
--device ${DEVICE} \
--shape ${SHAPE} \
--num-iter ${NUM_ITER} \
--warmup ${WARMUP} \
--cfg-options ${CFG_OPTIONS} \
--batch-size ${BATCH_SIZE} \
--img-ext ${IMG_EXT}
10.6.3 Example:
python tools/profiler.py \
configs/mmpretrain/classification_tensorrt_dynamic-224x224-224x224.py \
../mmpretrain/configs/resnet/resnet18_8xb32_in1k.py \
../mmpretrain/demo/ \
--model work-dirs/mmpretrain/resnet/trt/end2end.engine \
--device cuda \
--shape 224x224 \
--num-iter 100 \
--warmup 10 \
--batch-size 1
----- Settings:
+------------+---------+
| batch size | 1 |
| shape | 224x224 |
| iterations | 100 |
| warmup | 10 |
+------------+---------+
----- Results:
+--------+------------+---------+
| Stats | Latency/ms | FPS |
+--------+------------+---------+
| Mean | 1.535 | 651.656 |
| Median | 1.665 | 600.569 |
| Min | 1.308 | 764.341 |
| Max | 1.689 | 591.983 |
+--------+------------+---------+
10.6. profiler 39
mmdeploy Documentation, Release 1.3.1
10.7 generate_md_table
10.7.1 Usage
python tools/generate_md_table.py \
${YML_FILE} \
${OUTPUT} \
--backends ${BACKENDS}
10.7.3 Example:
ELEVEN
SDK DOCUMENTATION
In terms of model deployment, most ML models require some preprocessing steps on the input data and postprocessing
steps on the output to get structured output. MMDeploy sdk provides a lot of pre-processing and post-processing
process. When you convert and deploy a model, you can enjoy the convenience brought by mmdeploy sdk.
Model Conversion
deploy.json
detail.json
pipeline.json
end2end.onnx
end2end.engine
output_pytorch.jpg
output_tensorrt.jpg
41
mmdeploy Documentation, Release 1.3.1
SDK Inference
Create a pipeline
mmdeploy_model_t model;
mmdeploy_model_create_by_path(model_path, &model);
mmdeploy_classifier_t classifier{};
mmdeploy_classifier_create(model, "cpu", 0, &classifier);
mmdeploy_model_t model;
mmdeploy_model_create(str.data(), size, &model);
mmdeploy_classifier_t classifier{};
mmdeploy_classifier_create(model, "cpu", 0, &classifier);
Model inference
mmdeploy_classification_t* res{};
int* res_count{};
mmdeploy_classifier_apply(classifier, &mat, 1, &res, &res_count);
11.1.2 profiler
The SDK has ability to record the time consumption of each module in the pipeline. It’s closed by default. To use this
ability, two steps are required:
• Generate profiler data
• Analyze profiler Data
Using the C interface and classification pipeline as an example, when creating the pipeline, the create api with context
information needs to be used, and profiler handle needs to be added to the context. The detailed code is shown below.
Running the demo normally will generate profiler data “profiler_data.txt” in the current directory.
#include <fstream>
#include <opencv2/imgcodecs/imgcodecs.hpp>
#include <string>
#include "mmdeploy/classifier.h"
return 1;
}
auto device_name = argv[1];
auto model_path = argv[2];
auto image_path = argv[3];
cv::Mat img = cv::imread(image_path);
if (!img.data) {
fprintf(stderr, "failed to load image: %s\n", image_path);
return 1;
}
mmdeploy_model_t model{};
mmdeploy_model_create_by_path(model_path, &model);
mmdeploy_context_t context{};
mmdeploy_context_create_by_device(device_name, 0, &context);
(continues on next page)
mmdeploy_classifier_t classifier{};
int status{};
status = mmdeploy_classifier_create_v2(model, context, &classifier);
if (status != MMDEPLOY_SUCCESS) {
fprintf(stderr, "failed to create classifier, code: %d\n", (int)status);
return 1;
}
mmdeploy_mat_t mat{
img.data, img.rows, img.cols, 3, MMDEPLOY_PIXEL_FORMAT_BGR, MMDEPLOY_DATA_TYPE_
˓→UINT8};
// inference loop
for (int i = 0; i < 100; i++) {
mmdeploy_classification_t* res{};
int* res_count{};
status = mmdeploy_classifier_apply(classifier, &mat, 1, &res, &res_count);
mmdeploy_classifier_destroy(classifier);
mmdeploy_model_destroy(model);
mmdeploy_profiler_destroy(profiler);
mmdeploy_context_destroy(context);
return 0;
}
The parsing results are as follows: “name” represents the name of the node, “n_call” represents the number of calls,
“t_mean” represents the average time consumption, “t_50%” and “t_90%” represent the percentiles of the time con-
sumption.
+---------------------------+--------+-------+--------+--------+-------+-------+
| name | occupy | usage | n_call | t_mean | t_50% | t_90% |
+===========================+========+=======+========+========+=======+=======+
| ./Pipeline | - | - | 100 | 4.831 | 1.913 | 1.946 |
+---------------------------+--------+-------+--------+--------+-------+-------+
| Preprocess/Compose | - | - | 100 | 0.125 | 0.118 | 0.144 |
+---------------------------+--------+-------+--------+--------+-------+-------+
| LoadImageFromFile | 0.017 | 0.017 | 100 | 0.081 | 0.077 | 0.098 |
+---------------------------+--------+-------+--------+--------+-------+-------+
(continues on next page)
common.h
enum mmdeploy_pixel_format_t
Values:
enumerator MMDEPLOY_PIXEL_FORMAT_BGR
enumerator MMDEPLOY_PIXEL_FORMAT_RGB
enumerator MMDEPLOY_PIXEL_FORMAT_GRAYSCALE
enumerator MMDEPLOY_PIXEL_FORMAT_NV12
enumerator MMDEPLOY_PIXEL_FORMAT_NV21
enumerator MMDEPLOY_PIXEL_FORMAT_BGRA
enumerator MMDEPLOY_PIXEL_FORMAT_COUNT
enum mmdeploy_data_type_t
Values:
enumerator MMDEPLOY_DATA_TYPE_FLOAT
enumerator MMDEPLOY_DATA_TYPE_HALF
enumerator MMDEPLOY_DATA_TYPE_UINT8
enumerator MMDEPLOY_DATA_TYPE_INT32
enumerator MMDEPLOY_DATA_TYPE_COUNT
enum mmdeploy_status_t
Values:
enumerator MMDEPLOY_SUCCESS
enumerator MMDEPLOY_E_INVALID_ARG
enumerator MMDEPLOY_E_NOT_SUPPORTED
enumerator MMDEPLOY_E_OUT_OF_RANGE
enumerator MMDEPLOY_E_OUT_OF_MEMORY
enumerator MMDEPLOY_E_FILE_NOT_EXIST
enumerator MMDEPLOY_E_FAIL
enumerator MMDEPLOY_STATUS_COUNT
typedef struct mmdeploy_device *mmdeploy_device_t
typedef struct mmdeploy_profiler *mmdeploy_profiler_t
struct mmdeploy_mat_t
Public Members
uint8_t *data
int height
int width
int channel
mmdeploy_pixel_format_t format
mmdeploy_data_type_t type
mmdeploy_device_t device
struct mmdeploy_rect_t
Public Members
float left
float top
float right
float bottom
struct mmdeploy_point_t
Public Members
float x
float y
typedef struct mmdeploy_value *mmdeploy_value_t
typedef struct mmdeploy_context *mmdeploy_context_t
mmdeploy_value_t mmdeploy_value_copy(mmdeploy_value_t value)
Copy value
Parameters value –
Returns
void mmdeploy_value_destroy(mmdeploy_value_t value)
Destroy value
Parameters value –
int mmdeploy_device_create(const char *device_name, int device_id, mmdeploy_device_t *device)
Create device handle
Parameters
• device_name –
• device_id –
• device –
Returns
void mmdeploy_device_destroy(mmdeploy_device_t device)
Destroy device handle
Parameters device –
int mmdeploy_profiler_create(const char *path, mmdeploy_profiler_t *profiler)
Create profiler
Parameters
• path – path to save the profile data
• profiler – handle for profiler, should be added to context and deleted by mmde-
ploy_profiler_destroy
Returns status of create
void mmdeploy_profiler_destroy(mmdeploy_profiler_t profiler)
Destroy profiler handle
Parameters profiler – handle for profiler, profile data will be written to disk after this call
int mmdeploy_context_create(mmdeploy_context_t *context)
Create context
Parameters context –
Returns
int mmdeploy_context_create_by_device(const char *device_name, int device_id, mmdeploy_context_t
*context)
Create context
Parameters
• device_name –
• device_id –
• context –
Returns
void mmdeploy_context_destroy(mmdeploy_context_t context)
Destroy context
Parameters context –
int mmdeploy_context_add(mmdeploy_context_t context, mmdeploy_context_type_t type, const char *name, const
void *object)
Add context object
Parameters
• context –
• type –
• name –
• object –
Returns
int mmdeploy_common_create_input(const mmdeploy_mat_t *mats, int mat_count, mmdeploy_value_t *value)
Create input value from array of mats
Parameters
• mats –
• mat_count –
• value –
Returns
executor.h
mmdeploy_scheduler_t mmdeploy_executor_system_pool()
model.h
pipeline.h
classifier.h
struct mmdeploy_classification_t
Public Members
int label_id
float score
typedef struct mmdeploy_classifier *mmdeploy_classifier_t
int mmdeploy_classifier_create(mmdeploy_model_t model, const char *device_name, int device_id,
mmdeploy_classifier_t *classifier)
Create classifier’s handle.
Parameters
• model – [in] an instance of mmclassification sdk model created by mmde-
ploy_model_create_by_path or mmdeploy_model_create in model.h
• device_name – [in] name of device, such as “cpu”, “cuda”, etc.
• device_id – [in] id of device.
• classifier – [out] instance of a classifier, which must be destroyed by mmde-
ploy_classifier_destroy
Returns status of creating classifier’s handle
int mmdeploy_classifier_create_by_path(const char *model_path, const char *device_name, int device_id,
mmdeploy_classifier_t *classifier)
Create classifier’s handle.
Parameters
• model_path – [in] path of mmclassification sdk model exported by mmdeploy model con-
verter
• device_name – [in] name of device, such as “cpu”, “cuda”, etc.
• device_id – [in] id of device.
• classifier – [out] instance of a classifier, which must be destroyed by mmde-
ploy_classifier_destroy
Returns status of creating classifier’s handle
int mmdeploy_classifier_apply(mmdeploy_classifier_t classifier, const mmdeploy_mat_t *mats, int mat_count,
mmdeploy_classification_t **results, int **result_count)
Use classifier created by mmdeploy_classifier_create_by_path to get label information of each image in a batch.
Parameters
• classifier – [in] classifier’s handle created by mmdeploy_classifier_create_by_path
• mats – [in] a batch of images
• mat_count – [in] number of images in the batch
• results – [out] a linear buffer to save classification results of each image, which must be
freed by mmdeploy_classifier_release_result
• result_count – [out] a linear buffer with length being mat_count to save the
number of classification results of each image. It must be released by mmde-
ploy_classifier_release_result
Returns status of inference
void mmdeploy_classifier_release_result(mmdeploy_classification_t *results, const int *result_count, int
count)
Release the inference result buffer created mmdeploy_classifier_apply.
Parameters
• results – [in] classification results buffer
• result_count – [in] results size buffer
• count – [in] length of result_count
void mmdeploy_classifier_destroy(mmdeploy_classifier_t classifier)
Destroy classifier’s handle.
Parameters classifier – [in] classifier’s handle created by mmdeploy_classifier_create_by_path
int mmdeploy_classifier_create_v2(mmdeploy_model_t model, mmdeploy_context_t context,
mmdeploy_classifier_t *classifier)
Same as mmdeploy_classifier_create, but allows to control execution context of tasks via context.
int mmdeploy_classifier_create_input(const mmdeploy_mat_t *mats, int mat_count, mmdeploy_value_t
*value)
Pack classifier inputs into mmdeploy_value_t.
Parameters
• mats – [in] a batch of images
• mat_count – [in] number of images in the batch
• value – [out] the packed value
Returns status of the operation
int mmdeploy_classifier_apply_v2(mmdeploy_classifier_t classifier, mmdeploy_value_t input,
mmdeploy_value_t *output)
Same as mmdeploy_classifier_apply, but input and output are packed in mmdeploy_value_t.
int mmdeploy_classifier_apply_async(mmdeploy_classifier_t classifier, mmdeploy_sender_t input,
mmdeploy_sender_t *output)
Apply classifier asynchronously.
Parameters
• classifier – [in] handle of the classifier
• input – [in] input sender that will be consumed by the operation
• output – [out] output sender
Returns status of the operation
int mmdeploy_classifier_get_result(mmdeploy_value_t output, mmdeploy_classification_t **results, int
**result_count)
Parameters
• output – [in] output obtained by applying a classifier
• results – [out] a linear buffer containing classification results of each image, released by
mmdeploy_classifier_release_result
• result_count – [out] a linear buffer containing the number of results for each input image,
released by mmdeploy_classifier_release_result
Returns status of the operation
detector.h
struct mmdeploy_instance_mask_t
Public Members
char *data
int height
int width
struct mmdeploy_detection_t
Public Members
int label_id
float score
mmdeploy_rect_t bbox
mmdeploy_instance_mask_t *mask
typedef struct mmdeploy_detector *mmdeploy_detector_t
int mmdeploy_detector_create(mmdeploy_model_t model, const char *device_name, int device_id,
mmdeploy_detector_t *detector)
Create detector’s handle.
Parameters
• model – [in] an instance of mmdetection sdk model created by mmde-
ploy_model_create_by_path or mmdeploy_model_create in model.h
• device_name – [in] name of device, such as “cpu”, “cuda”, etc.
• device_id – [in] id of device.
• detector – [out] instance of a detector
Returns status of creating detector’s handle
int mmdeploy_detector_create_by_path(const char *model_path, const char *device_name, int device_id,
mmdeploy_detector_t *detector)
Create detector’s handle.
Parameters
• model_path – [in] path of mmdetection sdk model exported by mmdeploy model converter
• device_name – [in] name of device, such as “cpu”, “cuda”, etc.
pose_detector.h
struct mmdeploy_pose_detection_t
Public Members
mmdeploy_point_t *point
keypoint
float *score
keypoint score
int length
number of keypoint
pose_tracker.h
Public Members
int32_t det_interval
int32_t det_label
float det_thr
float det_min_bbox_size
float det_nms_thr
int32_t pose_max_num_bboxes
float pose_kpt_thr
int32_t pose_min_keypoints
float pose_bbox_scale
float pose_min_bbox_size
float pose_nms_thr
float *keypoint_sigmas
int32_t keypoint_sigmas_size
float track_iou_thr
int32_t track_max_missing
int32_t track_history_size
float std_weight_position
float std_weight_velocity
float smooth_params[3]
struct mmdeploy_pose_tracker_target_t
Public Members
mmdeploy_point_t *keypoints
int32_t keypoint_count
float *scores
mmdeploy_rect_t bbox
uint32_t target_id
int mmdeploy_pose_tracker_default_params(mmdeploy_pose_tracker_param_t *params)
Fill params with default parameters.
Parameters params – [inout]
Returns status of the operation
int mmdeploy_pose_tracker_create(mmdeploy_model_t det_model, mmdeploy_model_t pose_model,
mmdeploy_context_t context, mmdeploy_pose_tracker_t *pipeline)
Create pose tracker pipeline.
Parameters
• det_model – [in] detection model object, created by mmdeploy_model_create
• pose_model – [in] pose model object
• context – [in] context object describing execution environment (device, profiler, etc. . . ),
created by mmdeploy_context_create
• pipeline – [out] handle of the created pipeline
Returns status of the operation
void mmdeploy_pose_tracker_destroy(mmdeploy_pose_tracker_t pipeline)
Destroy pose tracker pipeline.
Parameters pipeline – [in]
int mmdeploy_pose_tracker_create_state(mmdeploy_pose_tracker_t pipeline, const
mmdeploy_pose_tracker_param_t *params,
mmdeploy_pose_tracker_state_t *state)
Create a tracker state handle corresponds to a video stream.
Parameters
• pipeline – [in] handle of a pose tracker pipeline
• params – [in] params for creating the tracker state
rotated_detector.h
struct mmdeploy_rotated_detection_t
Public Members
int label_id
float score
float rbbox[5]
typedef struct mmdeploy_rotated_detector *mmdeploy_rotated_detector_t
int mmdeploy_rotated_detector_create(mmdeploy_model_t model, const char *device_name, int device_id,
mmdeploy_rotated_detector_t *detector)
Create rotated detector’s handle.
Parameters
• model – [in] an instance of mmrotate sdk model created by mmde-
ploy_model_create_by_path or mmdeploy_model_create in model.h
• device_name – [in] name of device, such as “cpu”, “cuda”, etc.
• device_id – [in] id of device.
• detector – [out] instance of a rotated detector
Returns status of creating rotated detector’s handle
int mmdeploy_rotated_detector_create_by_path(const char *model_path, const char *device_name, int
device_id, mmdeploy_rotated_detector_t *detector)
Create rotated detector’s handle.
Parameters
• model_path – [in] path of mmrotate sdk model exported by mmdeploy model converter
• device_name – [in] name of device, such as “cpu”, “cuda”, etc.
• device_id – [in] id of device.
• detector – [out] instance of a rotated detector
Returns status of creating rotated detector’s handle
int mmdeploy_rotated_detector_apply(mmdeploy_rotated_detector_t detector, const mmdeploy_mat_t *mats,
int mat_count, mmdeploy_rotated_detection_t **results, int
**result_count)
Apply rotated detector to batch images and get their inference results.
Parameters
• detector – [in] rotated detector’s handle created by mmde-
ploy_rotated_detector_create_by_path
• mats – [in] a batch of images
• mat_count – [in] number of images in the batch
• results – [out] a linear buffer to save detection results of each image. It must be released
by mmdeploy_rotated_detector_release_result
• result_count – [out] a linear buffer with length being mat_count to save the
number of detection results of each image. And it must be released by mmde-
ploy_rotated_detector_release_result
Returns status of inference
segmentor.h
struct mmdeploy_segmentation_t
Public Members
int height
height of mask that equals to the input image’s height
int width
width of mask that equals to the input image’s width
int classes
the number of labels in mask
int *mask
segmentation mask of the input image, in which mask[i * width + j] indicates the label id of pixel at (i, j),
this field might be null
float *score
segmentation score map of the input image in CHW format, in which score[height * width * k + i * width
+ j] indicates the score of class k at pixel (i, j), this field might be null
text_detector.h
struct mmdeploy_text_detection_t
Public Members
mmdeploy_point_t bbox[4]
a text bounding box of which the vertex are in clock-wise
float score
typedef struct mmdeploy_text_detector *mmdeploy_text_detector_t
int mmdeploy_text_detector_create(mmdeploy_model_t model, const char *device_name, int device_id,
mmdeploy_text_detector_t *detector)
Create text-detector’s handle.
Parameters
• model – [in] an instance of mmocr text detection model created by mmde-
ploy_model_create_by_path or mmdeploy_model_create in model.h
• device_name – [in] name of device, such as “cpu”, “cuda”, etc.
• device_id – [in] id of device.
• detector – [out] instance of a text-detector, which must be destroyed by mmde-
ploy_text_detector_destroy
Returns status of creating text-detector’s handle
int mmdeploy_text_detector_create_by_path(const char *model_path, const char *device_name, int
device_id, mmdeploy_text_detector_t *detector)
Create text-detector’s handle.
Parameters
• model_path – [in] path to text detection model
• device_name – [in] name of device, such as “cpu”, “cuda”, etc.
• device_id – [in] id of device
• detector – [out] instance of a text-detector, which must be destroyed by mmde-
ploy_text_detector_destroy
Returns status of creating text-detector’s handle
int mmdeploy_text_detector_apply(mmdeploy_text_detector_t detector, const mmdeploy_mat_t *mats, int
mat_count, mmdeploy_text_detection_t **results, int **result_count)
Apply text-detector to batch images and get their inference results.
Parameters
• detector – [in] text-detector’s handle created by mmdeploy_text_detector_create_by_path
• mats – [in] a batch of images
• mat_count – [in] number of images in the batch
• results – [out] a linear buffer to save text detection results of each image. It must be
released by calling mmdeploy_text_detector_release_result
• result_count – [out] a linear buffer of length mat_count to save the number of detection
results of each image. It must be released by mmdeploy_detector_release_result
Returns status of inference
void mmdeploy_text_detector_release_result(mmdeploy_text_detection_t *results, const int *result_count,
int count)
Release the inference result buffer returned by mmdeploy_text_detector_apply.
Parameters
• results – [in] text detection result buffer
• result_count – [in] results size buffer
• count – [in] the length of buffer result_count
void mmdeploy_text_detector_destroy(mmdeploy_text_detector_t detector)
Destroy text-detector’s handle.
Parameters detector – [in] text-detector’s handle created by mmde-
ploy_text_detector_create_by_path or mmdeploy_text_detector_create
int mmdeploy_text_detector_create_v2(mmdeploy_model_t model, mmdeploy_context_t context,
mmdeploy_text_detector_t *detector)
Same as mmdeploy_text_detector_create, but allows to control execution context of tasks via context.
int mmdeploy_text_detector_create_input(const mmdeploy_mat_t *mats, int mat_count, mmdeploy_value_t
*input)
Pack text-detector inputs into mmdeploy_value_t.
Parameters
• mats – [in] a batch of images
• mat_count – [in] number of images in the batch
Returns the created value
int mmdeploy_text_detector_apply_v2(mmdeploy_text_detector_t detector, mmdeploy_value_t input,
mmdeploy_value_t *output)
Same as mmdeploy_text_detector_apply, but input and output are packed in mmdeploy_value_t.
int mmdeploy_text_detector_apply_async(mmdeploy_text_detector_t detector, mmdeploy_sender_t input,
mmdeploy_sender_t *output)
Apply text-detector asynchronously.
Parameters
• detector – [in] handle to the detector
• input – [in] input sender that will be consumed by the operation
Returns output sender
int mmdeploy_text_detector_get_result(mmdeploy_value_t output, mmdeploy_text_detection_t **results, int
**result_count)
Unpack detector output from a mmdeploy_value_t.
Parameters
• output – [in] output sender returned by applying a detector
• results – [out] a linear buffer to save detection results of each image. It must be released
by mmdeploy_text_detector_release_result
text_recognizer.h
struct mmdeploy_text_recognition_t
Public Members
char *text
float *score
int length
typedef struct mmdeploy_text_recognizer *mmdeploy_text_recognizer_t
int mmdeploy_text_recognizer_create(mmdeploy_model_t model, const char *device_name, int device_id,
mmdeploy_text_recognizer_t *recognizer)
Create a text recognizer instance.
Parameters
• model – [in] an instance of mmocr text recognition model created by mmde-
ploy_model_create_by_path or mmdeploy_model_create in model.h
• device_name – [in] name of device, such as “cpu”, “cuda”, etc.
• device_id – [in] id of device.
• recognizer – [out] handle of the created text recognizer, which must be destroyed by mmde-
ploy_text_recognizer_destroy
Returns status code of the operation
int mmdeploy_text_recognizer_create_by_path(const char *model_path, const char *device_name, int
device_id, mmdeploy_text_recognizer_t *recognizer)
Create a text recognizer instance.
Parameters
• model_path – [in] path to text recognition model
• device_name – [in] name of device, such as “cpu”, “cuda”, etc.
• device_id – [in] id of device.
• recognizer – [out] handle of the created text recognizer, which must be destroyed by mmde-
ploy_text_recognizer_destroy
Returns status code of the operation
int mmdeploy_text_recognizer_apply(mmdeploy_text_recognizer_t recognizer, const mmdeploy_mat_t
*images, int count, mmdeploy_text_recognition_t **results)
Apply text recognizer to a batch of text images.
Parameters
• recognizer – [in] text recognizer’s handle created by mmde-
ploy_text_recognizer_create_by_path
• images – [in] a batch of text images
• count – [in] number of images in the batch
• results – [out] a linear buffer contains the recognized text, must be release by mmde-
ploy_text_recognizer_release_result
Returns status code of the operation
int mmdeploy_text_recognizer_apply_bbox(mmdeploy_text_recognizer_t recognizer, const mmdeploy_mat_t
*images, int image_count, const mmdeploy_text_detection_t
*bboxes, const int *bbox_count, mmdeploy_text_recognition_t
**results)
Apply text recognizer to a batch of images supplied with text bboxes.
Parameters
• recognizer – [in] text recognizer’s handle created by mmde-
ploy_text_recognizer_create_by_path
• images – [in] a batch of text images
• image_count – [in] number of images in the batch
• bboxes – [in] bounding boxes detected by text detector
• bbox_count – [in] number of bboxes of each images, must be same length as images
• results – [out] a linear buffer contains the recognized text, which has the same length as
bboxes, must be release by mmdeploy_text_recognizer_release_result
Returns status code of the operation
void mmdeploy_text_recognizer_release_result(mmdeploy_text_recognition_t *results, int count)
Release result buffer returned by mmdeploy_text_recognizer_apply or mmdeploy_text_recognizer_apply_bbox.
Parameters
• results – [in] result buffer by text recognizer
• count – [in] length of result
void mmdeploy_text_recognizer_destroy(mmdeploy_text_recognizer_t recognizer)
destroy text recognizer
Parameters recognizer – [in] handle of text recognizer created by mmde-
ploy_text_recognizer_create_by_path or mmdeploy_text_recognizer_create
int mmdeploy_text_recognizer_create_v2(mmdeploy_model_t model, mmdeploy_context_t context,
mmdeploy_text_recognizer_t *recognizer)
Same as mmdeploy_text_recognizer_create, but allows to control execution context of tasks via context.
video_recognizer.h
struct mmdeploy_video_recognition_t
Public Members
int label_id
float score
struct mmdeploy_video_sample_info_t
Public Members
int clip_len
int num_clips
typedef struct mmdeploy_video_recognizer *mmdeploy_video_recognizer_t
int mmdeploy_video_recognizer_create(mmdeploy_model_t model, const char *device_name, int device_id,
mmdeploy_video_recognizer_t *recognizer)
Create video recognizer’s handle.
Parameters
• model – [in] an instance of mmaction sdk model created by mmde-
ploy_model_create_by_path or mmdeploy_model_create in model.h
• device_name – [in] name of device, such as “cpu”, “cuda”, etc.
• device_id – [in] id of device.
• recognizer – [out] handle of the created video recognizer, which must be destroyed by
mmdeploy_video_recognizer_destroy
Returns status of creating video recognizer’s handle
int mmdeploy_video_recognizer_create_by_path(const char *model_path, const char *device_name, int
device_id, mmdeploy_video_recognizer_t *recognizer)
Create a video recognizer instance.
Parameters
• model_path – [in] path to video recognition model
• device_name – [in] name of device, such as “cpu”, “cuda”, etc.
• device_id – [in] id of device.
• recognizer – [out] handle of the created video recognizer, which must be destroyed by
mmdeploy_video_recognizer_destroy
Returns status code of the operation
int mmdeploy_video_recognizer_apply(mmdeploy_video_recognizer_t recognizer, const mmdeploy_mat_t
*images, const mmdeploy_video_sample_info_t *video_info, int
video_count, mmdeploy_video_recognition_t **results, int
**result_count)
Apply video recognizer to a batch of videos.
Parameters
• recognizer – [in] video recognizer’s handle created by mmde-
ploy_video_recognizer_create_by_path
• images – [in] a batch of videos
TWELVE
SUPPORTED MODELS
The table below lists the models that are guaranteed to be exportable to other backends.
12.1 Note
• Tag:
– static: This model only support static export. Please use static deploy config, just like
$MMDEPLOY_DIR/configs/mmseg/segmentation_tensorrt_static-1024x2048.py.
• SSD: When you convert SSD model, you need to use min shape deploy con-
fig just like 300x300-512x512 rather than 320x320-1344x1344, for example
$MMDEPLOY_DIR/configs/mmdet/detection/detection_tensorrt_dynamic-300x300-512x512.py.
• YOLOX: YOLOX with ncnn only supports static shape.
• Swin Transformer: For TensorRT, only version 8.4+ is supported.
• SAR: Chinese text recognition model is not supported as the protobuf size of ONNX is limited.
73
mmdeploy Documentation, Release 1.3.1
THIRTEEN
BENCHMARK
13.1 Backends
13.2.1 Platform
• Ubuntu 18.04
• ncnn 20211208
• Cuda 11.3
• TensorRT 7.2.3.4
• Docker 20.10.8
• NVIDIA tesla T4 tensor core GPU for TensorRT
• Static graph
• Batch size 1
• Synchronize devices after each inference.
• We count the average inference performance of 100 images of the dataset.
• Warm up. For ncnn, we warm up 30 iters for all codebases. As for other backends: for classification, we warm
up 1010 iters; for other codebases, we warm up 10 iters.
• Input resolution varies for different datasets of different codebases. All inputs are real images except for mmagic
because the dataset is not large enough.
Users can directly test the speed through model profiling. And here is the benchmark in our environment.
75
mmdeploy Documentation, Release 1.3.1
Users can directly test the performance through how_to_evaluate_a_model.md. And here is the benchmark in our
environment.
• As some datasets contain images with various resolutions in codebase like MMDet. The speed benchmark is
gained through static configs in MMDeploy, while the performance benchmark is gained through dynamic ones.
• Some int8 performance benchmarks of TensorRT require Nvidia cards with tensor core, or the performance
would drop heavily.
• DBNet uses the interpolate mode nearest in the neck of the model, which TensorRT-7 applies a quite different
strategy from Pytorch. To make the repository compatible with TensorRT-7, we rewrite the neck to use the
interpolate mode bilinear which improves final detection performance. To get the matched performance with
Pytorch, TensorRT-8+ is recommended, which the interpolate methods are all the same as Pytorch.
• Mask AP of Mask R-CNN drops by 1% for the backend. The main reason is that the predicted masks are directly
interpolated to original image in PyTorch, while they are at first interpolated to the preprocessed input image of
the model and then to original image in other backends.
• MMPose models are tested with flip_test explicitly set to False in model configs.
• Some models might get low accuracy in fp16 mode. Please adjust the model to avoid value overflow.
FOURTEEN
Here are the test conclusions of our edge devices. You can directly obtain the results of your own environment with
model profiling.
14.2 mmpretrain
tips:
1. The ImageNet-1k dataset is too large to test, only part of the dataset is used (8000/50000)
2. The heating of device will downgrade the frequency, so the time consumption will actually fluctuate. Here are
the stable values after running for a period of time. This result is closer to the actual demand.
14.4 mmpose
tips:
• Test pose_hrnet using AnimalPose’s test dataset instead of val dataset.
77
mmdeploy Documentation, Release 1.3.1
14.5 mmseg
tips:
• fcn works fine with 512x1024 size. Cityscapes dataset uses 1024x2048 resolution which causes device to reboot.
14.6 Notes
• We needs to manually split the mmdet model into two parts. Because
– In snpe source code, onnx_to_ir.py can only parse onnx input while ir_to_dlc.py does not support
topk operator
– UDO (User Defined Operator) does not work with snpe-onnx-to-dlc
• mmagic model
– srcnn requires cubic resize which snpe does not support
– esrgan converts fine, but loading the model causes the device to reboot
• mmrotate depends on e2cnn and needs to be installed manually its Python3.6 compatible branch
FIFTEEN
TEST ON TVM
The table above list the models that we have tested. Models not listed on the table might still be able to converted.
Please have a try.
15.2 Test
• Ubuntu 20.04
• tvm 0.9.0
*: We only test model on ssd since dynamic shape is not supported for now.
79
mmdeploy Documentation, Release 1.3.1
SIXTEEN
16.1.1 mmpretrain
Note:
• Because of the large amount of imagenet-1k data and ncnn has not released Vulkan int8 version, only part of the
test set (4000/50000) is used.
• The accuracy will vary after quantization, and it is normal for the classification model to increase by less than
1%.
Note: mmocr Uses ‘shapely’ to compute IoU, which results in a slight difference in accuracy
Note: MMPose models are tested with flip_test explicitly set to False in model configs.
81
mmdeploy Documentation, Release 1.3.1
SEVENTEEN
MMPRETRAIN DEPLOYMENT
• MMPretrain Deployment
– Installation
∗ Install mmpretrain
∗ Install mmdeploy
– Convert model
– Model Specification
– Model inference
∗ Backend model inference
∗ SDK model inference
– Supported models
MMPretrain aka mmpretrain is an open-source image classification toolbox based on PyTorch. It is a part of the
OpenMMLab project.
17.1 Installation
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your
target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run scripts. For example, the following
commands install mmdeploy as well as inference engine - ONNX Runtime.
83
mmdeploy Documentation, Release 1.3.1
You can use tools/deploy.py to convert mmpretrain models to the specified backend models. Its detailed usage can be
learned from here.
The command below shows an example about converting resnet18 model to onnx model that can be inferred by
ONNX Runtime.
cd mmdeploy
It is crucial to specify the correct deployment config during model conversion. We’ve already provided builtin deploy-
ment config files of all supported backends for mmpretrain. The config filename pattern is:
classification_{backend}-{precision}_{static | dynamic}_{shape}.py
• {backend}: inference backend, such as onnxruntime, tensorrt, pplnn, ncnn, openvino, coreml and etc.
• {precision}: fp16, int8. When it’s empty, it means fp32
• {static | dynamic}: static shape or dynamic shape
• {shape}: input shape or shape range of a model
Therefore, in the above example, you can also convert resnet18 to other backend models by changing the deploy-
ment config file classification_onnxruntime_dynamic.py to others, e.g., converting to tensorrt-fp16 model by
classification_tensorrt-fp16_dynamic-224x224-224x224.py.
Tip: When converting mmpretrain models to tensorrt models, –device should be set to “cuda”
Before moving on to model inference chapter, let’s know more about the converted model structure which is very
important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmpretrain/ort in the previous ex-
ample. It includes:
mmdeploy_models/mmpretrain/ort
deploy.json
detail.json
end2end.onnx
pipeline.json
in which,
• end2end.onnx: backend model which can be inferred by ONNX Runtime
• *.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmpretrain/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK
model includes both backend model and inference meta information.
Take the previous converted end2end.onnx model as an example, you can use the following code to inference the
model.
deploy_cfg = 'configs/mmpretrain/classification_onnxruntime_dynamic.py'
model_cfg = './resnet18_8xb32_in1k.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmpretrain/ort/end2end.onnx']
image = 'tests/data/tiger.jpeg'
# do model inference
with torch.no_grad():
(continues on next page)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='output_classification.png')
img = cv2.imread('tests/data/tiger.jpeg')
# create a classifier
classifier = Classifier(model_path='./mmdeploy_models/mmpretrain/ort', device_name='cpu',
˓→ device_id=0)
# perform inference
result = classifier(img)
# show inference result
for label_id, score in result:
print(label_id, score)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java
and so on. You can learn their usage from demos.
EIGHTEEN
MMDETECTION DEPLOYMENT
• MMDetection Deployment
– Installation
∗ Install mmdet
∗ Install mmdeploy
– Convert model
– Model specification
– Model inference
∗ Backend model inference
∗ SDK model inference
– Supported models
– Reminder
MMDetection aka mmdet is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab
project.
18.1 Installation
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your
target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run scripts. For example, the following
commands install mmdeploy as well as inference engine - ONNX Runtime.
87
mmdeploy Documentation, Release 1.3.1
You can use tools/deploy.py to convert mmdet models to the specified backend models. Its detailed usage can be learned
from here.
The command below shows an example about converting Faster R-CNN model to onnx model that can be inferred by
ONNX Runtime.
cd mmdeploy
# download faster r-cnn model from mmdet model zoo
mim download mmdet --config faster-rcnn_r50_fpn_1x_coco --dest .
# convert mmdet model to onnxruntime model with dynamic shape
python tools/deploy.py \
configs/mmdet/detection/detection_onnxruntime_dynamic.py \
faster-rcnn_r50_fpn_1x_coco.py \
faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
demo/resources/det.jpg \
--work-dir mmdeploy_models/mmdet/ort \
--device cpu \
--show \
--dump-info
It is crucial to specify the correct deployment config during model conversion. We’ve already provided builtin deploy-
ment config files of all supported backends for mmdetection, under which the config file path follows the pattern:
{task}/{task}_{backend}-{precision}_{static | dynamic}_{shape}.py
Therefore, in the above example, you can also convert faster r-cnn to other backend models by changing the de-
ployment config file detection_onnxruntime_dynamic.py to others, e.g., converting to tensorrt-fp16 model by
detection_tensorrt-fp16_dynamic-320x320-1344x1344.py.
Tip: When converting mmdet models to tensorrt models, –device should be set to “cuda”
Before moving on to model inference chapter, let’s know more about the converted model structure which is very
important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmdet/ort in the previous example.
It includes:
mmdeploy_models/mmdet/ort
deploy.json
detail.json
end2end.onnx
pipeline.json
in which,
• end2end.onnx: backend model which can be inferred by ONNX Runtime
• *.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmdet/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK model
includes both backend model and inference meta information.
Take the previous converted end2end.onnx model as an example, you can use the following code to inference the
model and visualize the results.
deploy_cfg = 'configs/mmdet/detection/detection_onnxruntime_dynamic.py'
model_cfg = './faster-rcnn_r50_fpn_1x_coco.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmdet/ort/end2end.onnx']
image = './demo/resources/det.jpg'
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='output_detection.png')
img = cv2.imread('./demo/resources/det.jpg')
# create a detector
detector = Detector(model_path='./mmdeploy_models/mmdet/ort', device_name='cpu', device_
˓→id=0)
# perform inference
bboxes, labels, masks = detector(img)
cv2.imwrite('output_detection.png', img)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java
and so on. You can learn their usage from demos.
18.6 Reminder
NINETEEN
MMSEGMENTATION DEPLOYMENT
• MMSegmentation Deployment
– Installation
∗ Install mmseg
∗ Install mmdeploy
– Convert model
– Model specification
– Model inference
∗ Backend model inference
∗ SDK model inference
– Supported models
– Reminder
MMSegmentation aka mmseg is an open source semantic segmentation toolbox based on PyTorch. It is a part of the
OpenMMLab project.
19.1 Installation
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your
target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run scripts. For example, the following
commands install mmdeploy as well as inference engine - ONNX Runtime.
93
mmdeploy Documentation, Release 1.3.1
NOTE:
• Adding $(pwd)/build/lib to PYTHONPATH is for importing mmdeploy SDK python module -
mmdeploy_runtime, which will be presented in chapter SDK model inference.
• When inference onnx model by ONNX Runtime, it requests ONNX Runtime library be found. Thus, we add it to
LD_LIBRARY_PATH.
Method III: Build from source
If neither I nor II meets your requirements, building mmdeploy from source is the last option.
You can use tools/deploy.py to convert mmseg models to the specified backend models. Its detailed usage can be learned
from here.
The command below shows an example about converting unet model to onnx model that can be inferred by ONNX
Runtime.
cd mmdeploy
It is crucial to specify the correct deployment config during model conversion. We’ve already provided builtin deploy-
ment config files of all supported backends for mmsegmentation. The config filename pattern is:
segmentation_{backend}-{precision}_{static | dynamic}_{shape}.py
• {backend}: inference backend, such as onnxruntime, tensorrt, pplnn, ncnn, openvino, coreml etc.
• {precision}: fp16, int8. When it’s empty, it means fp32
• {static | dynamic}: static shape or dynamic shape
• {shape}: input shape or shape range of a model
Therefore, in the above example, you can also convert unet to other backend models by changing the deploy-
ment config file segmentation_onnxruntime_dynamic.py to others, e.g., converting to tensorrt-fp16 model by
segmentation_tensorrt-fp16_dynamic-512x1024-2048x2048.py.
Tip: When converting mmseg models to tensorrt models, –device should be set to “cuda”
Before moving on to model inference chapter, let’s know more about the converted model structure which is very
important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmseg/ort in the previous example.
It includes:
mmdeploy_models/mmseg/ort
deploy.json
detail.json
end2end.onnx
pipeline.json
in which,
• end2end.onnx: backend model which can be inferred by ONNX Runtime
• *.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmseg/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK model
includes both backend model and inference meta information.
Take the previous converted end2end.onnx model as an example, you can use the following code to inference the
model and visualize the results.
deploy_cfg = 'configs/mmseg/segmentation_onnxruntime_dynamic.py'
model_cfg = './unet-s5-d16_fcn_4xb4-160k_cityscapes-512x1024.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmseg/ort/end2end.onnx']
image = './demo/resources/cityscapes.png'
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='./output_segmentation.png')
img = cv2.imread('./demo/resources/cityscapes.png')
# create a classifier
segmentor = Segmentor(model_path='./mmdeploy_models/mmseg/ort', device_name='cpu',␣
˓→device_id=0)
# perform inference
seg = segmentor(img)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java
and so on. You can learn their usage from demos.
19.6 Reminder
TWENTY
MMAGIC DEPLOYMENT
• MMagic Deployment
– Installation
∗ Install mmagic
∗ Install mmdeploy
– Convert model
∗ Convert super resolution model
– Model specification
– Model inference
∗ Backend model inference
∗ SDK model inference
– Supported models
MMagic aka mmagic is an open-source image and video editing toolbox based on PyTorch. It is a part of the Open-
MMLab project.
20.1 Installation
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your
target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run scripts. For example, the following
commands install mmdeploy as well as inference engine - ONNX Runtime.
99
mmdeploy Documentation, Release 1.3.1
You can use tools/deploy.py to convert mmagic models to the specified backend models. Its detailed usage can be
learned from here.
When using tools/deploy.py, it is crucial to specify the correct deployment config. We’ve already provided builtin
deployment config files of all supported backends for mmagic, under which the config file path follows the pattern:
{task}/{task}_{backend}-{precision}_{static | dynamic}_{shape}.py
The command below shows an example about converting ESRGAN model to onnx model that can be inferred by ONNX
Runtime.
cd mmdeploy
# download esrgan model from mmagic model zoo
mim download mmagic --config esrgan_psnr-x4c64b23g32_1xb16-1000k_div2k --dest .
# convert esrgan model to onnxruntime model with dynamic shape
python tools/deploy.py \
configs/mmagic/super-resolution/super-resolution_onnxruntime_dynamic.py \
esrgan_psnr-x4c64b23g32_1xb16-1000k_div2k.py \
esrgan_psnr_x4c64b23g32_1x16_1000k_div2k_20200420-bf5c993c.pth \
demo/resources/face.png \
--work-dir mmdeploy_models/mmagic/ort \
--device cpu \
--show \
--dump-info
You can also convert the above model to other backend models by changing the deployment config
file *_onnxruntime_dynamic.py to others, e.g., converting to tensorrt model by super-resolution/
super-resolution_tensorrt-_dynamic-32x32-512x512.py.
Tip: When converting mmagic models to tensorrt models, –device should be set to “cuda”
Before moving on to model inference chapter, let’s know more about the converted model structure which is very
important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmagic/ort in the previous example.
It includes:
mmdeploy_models/mmagic/ort
deploy.json
detail.json
end2end.onnx
pipeline.json
in which,
• end2end.onnx: backend model which can be inferred by ONNX Runtime
• *.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmagic/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK
model includes both backend model and inference meta information.
Take the previous converted end2end.onnx model as an example, you can use the following code to inference the
model and visualize the results.
deploy_cfg = 'configs/mmagic/super-resolution/super-resolution_onnxruntime_dynamic.py'
model_cfg = 'esrgan_psnr-x4c64b23g32_1xb16-1000k_div2k.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmagic/ort/end2end.onnx']
image = './demo/resources/face.png'
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='output_restorer.bmp')
img = cv2.imread('./demo/resources/face.png')
# create a classifier
restorer = Restorer(model_path='./mmdeploy_models/mmagic/ort', device_name='cpu', device_
˓→id=0)
# perform inference
result = restorer(img)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java
and so on. You can learn their usage from demos.
TWENTYONE
MMOCR DEPLOYMENT
• MMOCR Deployment
– Installation
∗ Install mmocr
∗ Install mmdeploy
– Convert model
∗ Convert text detection model
∗ Convert text recognition model
– Model specification
– Model Inference
∗ Backend model inference
∗ SDK model inference
· Text detection SDK model inference
· Text Recognition SDK model inference
– Supported models
– Reminder
MMOCR aka mmocr is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition,
and the corresponding downstream tasks including key information extraction. It is a part of the OpenMMLab project.
21.1 Installation
105
mmdeploy Documentation, Release 1.3.1
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your
target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run scripts. For example, the following
commands install mmdeploy as well as inference engine - ONNX Runtime.
You can use tools/deploy.py to convert mmocr models to the specified backend models. Its detailed usage can be learned
from here.
When using tools/deploy.py, it is crucial to specify the correct deployment config. We’ve already provided builtin
deployment config files of all supported backends for mmocr, under which the config file path follows the pattern:
{task}/{task}_{backend}-{precision}_{static | dynamic}_{shape}.py
cd mmdeploy
# download dbnet model from mmocr model zoo
mim download mmocr --config dbnet_resnet18_fpnc_1200e_icdar2015 --dest .
# convert mmocr model to onnxruntime model with dynamic shape
python tools/deploy.py \
configs/mmocr/text-detection/text-detection_onnxruntime_dynamic.py \
dbnet_resnet18_fpnc_1200e_icdar2015.py \
dbnet_resnet18_fpnc_1200e_icdar2015_20220825_221614-7c0e94f2.pth \
demo/resources/text_det.jpg \
--work-dir mmdeploy_models/mmocr/dbnet/ort \
--device cpu \
--show \
--dump-info
cd mmdeploy
# download crnn model from mmocr model zoo
mim download mmocr --config crnn_mini-vgg_5e_mj --dest .
# convert mmocr model to onnxruntime model with dynamic shape
python tools/deploy.py \
configs/mmocr/text-recognition/text-recognition_onnxruntime_dynamic.py \
crnn_mini-vgg_5e_mj.py \
crnn_mini-vgg_5e_mj_20220826_224120-8afbedbb.pth \
demo/resources/text_recog.jpg \
--work-dir mmdeploy_models/mmocr/crnn/ort \
--device cpu \
--show \
--dump-info
You can also convert the above models to other backend models by changing the deployment config file
*_onnxruntime_dynamic.py to others, e.g., converting dbnet to tensorrt-fp32 model by text-detection/
text-detection_tensorrt-_dynamic-320x320-2240x2240.py.
Tip: When converting mmocr models to tensorrt models, –device should be set to “cuda”
Before moving on to model inference chapter, let’s know more about the converted model structure which is very
important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmocr/dbnet/ort in the previous
example. It includes:
mmdeploy_models/mmocr/dbnet/ort
deploy.json
detail.json
(continues on next page)
in which,
• end2end.onnx: backend model which can be inferred by ONNX Runtime
• *.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmocr/dbnet/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK
model includes both backend model and inference meta information.
Take the previous converted end2end.onnx mode of dbnet as an example, you can use the following code to inference
the model and visualize the results.
deploy_cfg = 'configs/mmocr/text-detection/text-detection_onnxruntime_dynamic.py'
model_cfg = 'dbnet_resnet18_fpnc_1200e_icdar2015.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmocr/dbnet/ort/end2end.onnx']
image = './demo/resources/text_det.jpg'
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='output_ocr.png')
Tip:
Map ‘deploy_cfg’, ‘model_cfg’, ‘backend_model’ and ‘image’ to corresponding arguments in chapter convert text
recognition model, you will get the ONNX Runtime inference results of crnn onnx model.
Given the above SDK models of dbnet and crnn, you can also perform SDK model inference like following,
import cv2
from mmdeploy_runtime import TextDetector
img = cv2.imread('demo/resources/text_det.jpg')
# create text detector
detector = TextDetector(
model_path='mmdeploy_models/mmocr/dbnet/ort',
device_name='cpu',
device_id=0)
# do model inference
bboxes = detector(img)
# draw detected bbox into the input image
if len(bboxes) > 0:
pts = ((bboxes[:, 0:8] + 0.5).reshape(len(bboxes), -1,
2).astype(int))
cv2.polylines(img, pts, True, (0, 255, 0), 2)
cv2.imwrite('output_ocr.png', img)
import cv2
from mmdeploy_runtime import TextRecognizer
img = cv2.imread('demo/resources/text_recog.jpg')
# create text recognizer
recognizer = TextRecognizer(
model_path='mmdeploy_models/mmocr/crnn/ort',
device_name='cpu',
device_id=0
)
# do model inference
texts = recognizer(img)
# print the result
print(texts)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java
and so on. You can learn their usage from demos.
21.6 Reminder
TWENTYTWO
MMPOSE DEPLOYMENT
• MMPose Deployment
– Installation
∗ Install mmpose
∗ Install mmdeploy
– Convert model
– Model specification
– Model inference
∗ Backend model inference
∗ SDK model inference
– Supported models
MMPose aka mmpose is an open-source toolbox for pose estimation based on PyTorch. It is a part of the OpenMMLab
project.
22.1 Installation
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your
target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run scripts. For example, the following
commands install mmdeploy as well as inference engine - ONNX Runtime.
111
mmdeploy Documentation, Release 1.3.1
You can use tools/deploy.py to convert mmpose models to the specified backend models. Its detailed usage can be
learned from here.
The command below shows an example about converting hrnet model to onnx model that can be inferred by ONNX
Runtime.
cd mmdeploy
# download hrnet model from mmpose model zoo
mim download mmpose --config td-hm_hrnet-w32_8xb64-210e_coco-256x192 --dest .
# convert mmdet model to onnxruntime model with static shape
python tools/deploy.py \
configs/mmpose/pose-detection_onnxruntime_static.py \
td-hm_hrnet-w32_8xb64-210e_coco-256x192.py \
hrnet_w32_coco_256x192-c78dce93_20200708.pth \
demo/resources/human-pose.jpg \
--work-dir mmdeploy_models/mmpose/ort \
--device cpu \
--show
It is crucial to specify the correct deployment config during model conversion. We’ve already provided builtin deploy-
ment config files of all supported backends for mmpose. The config filename pattern is:
pose-detection_{backend}-{precision}_{static | dynamic}_{shape}.py
• {backend}: inference backend, such as onnxruntime, tensorrt, pplnn, ncnn, openvino, coreml etc.
• {precision}: fp16, int8. When it’s empty, it means fp32
• {static | dynamic}: static shape or dynamic shape
• {shape}: input shape or shape range of a model
Therefore, in the above example, you can also convert hrnet to other backend models by changing the deploy-
ment config file pose-detection_onnxruntime_static.py to others, e.g., converting to tensorrt model by
pose-detection_tensorrt_static-256x192.py.
Tip: When converting mmpose models to tensorrt models, –device should be set to “cuda”
Before moving on to model inference chapter, let’s know more about the converted model structure which is very
important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmpose/ort in the previous example.
It includes:
mmdeploy_models/mmpose/ort
deploy.json
detail.json
end2end.onnx
pipeline.json
in which,
• end2end.onnx: backend model which can be inferred by ONNX Runtime
• *.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmpose/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK
model includes both backend model and inference meta information.
Take the previous converted end2end.onnx model as an example, you can use the following code to inference the
model and visualize the results.
deploy_cfg = 'configs/mmpose/pose-detection_onnxruntime_static.py'
model_cfg = 'td-hm_hrnet-w32_8xb64-210e_coco-256x192.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmpose/ort/end2end.onnx']
image = './demo/resources/human-pose.jpg'
# do model inference
with torch.no_grad():
(continues on next page)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='output_pose.png')
TODO
TWENTYTHREE
MMDETECTION3D DEPLOYMENT
• MMDetection3d Deployment
– Install mmdet3d
– Convert model
– Model inference
– Supported models
MMDetection3d aka mmdet3d is an open source object detection toolbox based on PyTorch, towards the next-
generation platform for general 3D detection. It is a part of the OpenMMLab project.
We could install mmdet3d through mim. For other ways of installation, please refer to here
export MODEL_CONFIG=centerpoint_pillar02_second_secfpn_head-circlenms_8xb4-cyclic-20e_
˓→nus-3d.py
export MODEL_PATH=centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus_
˓→20220811_031844-191a3822.pth
export TEST_DATA=tests/data/n008-2018-08-01-15-16-36-0400__LIDAR_TOP__1533151612397179.
˓→pcd.bin
115
mmdeploy Documentation, Release 1.3.1
ls -lah centerpoint
..
-rw-rw-r-- 1 rg rg 87M 11 4 19:48 end2end.onnx
At present, the voxelize preprocessing and postprocessing of mmdet3d are not converted into onnx operations; the C++
SDK has not yet implemented the voxelize calculation.
The caller needs to refer to the corresponding python implementation to complete.
• Make sure trt >= 8.6 for some bug fixed, such as ScatterND, dynamic shape crash and so on.
TWENTYFOUR
MMROTATE DEPLOYMENT
• MMRotate Deployment
– Installation
∗ Install mmrotate
∗ Install mmdeploy
– Convert model
– Model specification
– Model inference
∗ Backend model inference
∗ SDK model inference
– Supported models
MMRotate is an open-source toolbox for rotated object detection based on PyTorch. It is a part of the OpenMMLab
project.
24.1 Installation
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your
target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run scripts. For example, the following
commands install mmdeploy as well as inference engine - ONNX Runtime.
117
mmdeploy Documentation, Release 1.3.1
NOTE:
• Adding $(pwd)/build/lib to PYTHONPATH is for importing mmdeploy SDK python module -
mmdeploy_runtime, which will be presented in chapter SDK model inference.
• When inference onnx model by ONNX Runtime, it requests ONNX Runtime library be found. Thus, we add it to
LD_LIBRARY_PATH.
Method III: Build from source
If neither I nor II meets your requirements, building mmdeploy from source is the last option.
You can use tools/deploy.py to convert mmrotate models to the specified backend models. Its detailed usage can be
learned from here.
The command below shows an example about converting rotated-faster-rcnn model to onnx model that can be
inferred by ONNX Runtime.
cd mmdeploy
It is crucial to specify the correct deployment config during model conversion. We’ve already provided builtin deploy-
ment config files of all supported backends for mmrotate. The config filename pattern is:
rotated_detection-{backend}-{precision}_{static | dynamic}_{shape}.py
• {backend}: inference backend, such as onnxruntime, tensorrt, pplnn, ncnn, openvino, coreml etc.
• {precision}: fp16, int8. When it’s empty, it means fp32
• {static | dynamic}: static shape or dynamic shape
• {shape}: input shape or shape range of a model
Therefore, in the above example, you can also convert rotated-faster-rcnn to other backend models by changing
the deployment config file rotated-detection_onnxruntime_dynamic to others, e.g., converting to tensorrt-fp16
model by rotated-detection_tensorrt-fp16_dynamic-320x320-1024x1024.py.
Tip: When converting mmrotate models to tensorrt models, –device should be set to “cuda”
Before moving on to model inference chapter, let’s know more about the converted model structure which is very
important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmrotate/ort in the previous exam-
ple. It includes:
mmdeploy_models/mmrotate/ort
deploy.json
detail.json
end2end.onnx
pipeline.json
in which,
• end2end.onnx: backend model which can be inferred by ONNX Runtime
• *.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmrotate/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK
model includes both backend model and inference meta information.
Take the previous converted end2end.onnx model as an example, you can use the following code to inference the
model and visualize the results.
deploy_cfg = 'configs/mmrotate/rotated-detection_onnxruntime_dynamic.py'
model_cfg = './rotated-faster-rcnn-le90_r50_fpn_1x_dota.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmrotate/ort/end2end.onnx']
image = './dota_demo.jpg'
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='./output.png')
img = cv2.imread('./dota_demo.jpg')
# create a detector
detector = RotatedDetector(model_path='./mmdeploy_models/mmrotate/ort', device_name='cpu
˓→', device_id=0)
# perform inference
det = detector(img)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java
and so on. You can learn their usage from demos.
TWENTYFIVE
MMACTION2 DEPLOYMENT
• MMAction2 Deployment
– Installation
∗ Install mmaction2
∗ Install mmdeploy
– Convert model
∗ Convert video recognition model
– Model specification
– Model Inference
∗ Backend model inference
∗ SDK model inference
· Video recognition SDK model inference
– Supported models
MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab
project.
25.1 Installation
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your
target platform and device.
Method I Install precompiled package
You can refer to get_started
Method II Build using scripts
121
mmdeploy Documentation, Release 1.3.1
If your target platform is Ubuntu 18.04 or later version, we encourage you to run scripts. For example, the following
commands install mmdeploy as well as inference engine - ONNX Runtime.
You can use tools/deploy.py to convert mmaction2 models to the specified backend models. Its detailed usage can be
learned from here.
When using tools/deploy.py, it is crucial to specify the correct deployment config. We’ve already provided builtin
deployment config files of all supported backends for mmaction2, under which the config file path follows the pattern:
{task}/{task}_{backend}-{precision}_{static | dynamic}_{shape}.py
cd mmdeploy
Before moving on to model inference chapter, let’s know more about the converted model structure which is very
important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmaction/tsn/ort in the previous
example. It includes:
mmdeploy_models/mmaction/tsn/ort
deploy.json
detail.json
end2end.onnx
pipeline.json
in which,
• end2end.onnx: backend model which can be inferred by ONNX Runtime
• *.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmaction/tsn/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK
model includes both backend model and inference meta information.
Take the previous converted end2end.onnx mode of tsn as an example, you can use the following code to inference
the model and visualize the results.
deploy_cfg = 'configs/mmaction/video-recognition/video-recognition_2d_onnxruntime_static.
˓→py'
model_cfg = 'tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmaction2/tsn/ort/end2end.onnx']
image = 'tests/data/arm_wrestling.mp4'
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# show top5-results
pred_scores = result[0].pred_scores.item.tolist()
top_index = np.argsort(pred_scores)[::-1]
for i in range(5):
index = top_index[i]
print(index, pred_scores[index])
Given the above SDK model of tsn you can also perform SDK model inference like following,
# refer to demo/python/video_recognition.py
# def SampleFrames(cap, clip_len, frame_interval, num_clips):
# ...
cap = cv2.VideoCapture('tests/data/arm_wrestling.mp4')
# create a recognizer
recognizer = VideoRecognizer(model_path='./mmdeploy_models/mmaction/tsn/ort', device_
˓→name='cpu', device_id=0)
# perform inference
result = recognizer(clips, info)
# show inference result
for label_id, score in result:
print(label_id, score)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java
and so on. You can learn their usage from demos.
MMAction2 only API of c, c++ and python for now.
TWENTYSIX
127
mmdeploy Documentation, Release 1.3.1
TWENTYSEVEN
ONNXRUNTIME SUPPORT
ONNX Runtime is a cross-platform inference and training accelerator compatible with many popular ML/DNN frame-
works. Check its github for more information.
27.2 Installation
• CPU Version
• GPU Version
If you want to use float16 precision, install the tool by running the following script:
Download onnxruntime-linux-*.tgz library from ONNX Runtime releases, extract it, expose ONNXRUNTIME_DIR
and finally add the lib path to LD_LIBRARY_PATH as below:
• CPU Version
129
mmdeploy Documentation, Release 1.3.1
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-
˓→x64-1.8.1.tgz
• GPU Version
In X64 GPU:
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-
˓→x64-gpu-1.8.1.tgz
In Arm GPU:
You can also go to ONNX Runtime Release to find corresponding release version package.
• CPU Version
• GPU Version
27.6 Reminder
• The custom operator is not included in supported operator list in ONNX Runtime.
• The custom operator should be able to be exported to ONNX.
27.7 References
• How to export Pytorch model with custom op to ONNX and run it in ONNX Runtime
• How to add a custom operator/kernel in ONNX Runtime
TWENTYEIGHT
OPENVINO SUPPORT
28.1 Installation
Install OpenVINO. It is recommended to use the installer or install using pip. Installation example using pip:
If you want to use OpenVINO in SDK, you need install OpenVINO with install_guides. Take openvino==2022.3.0
as example:
wget https://storage.openvinotoolkit.org/repositories/openvino/packages/2022.3/linux/l_
˓→openvino_toolkit_ubuntu20_2022.3.0.9052.9752fafe8eb_x86_64.tgz
To work with models from MMDetection, you may need to install it additionally.
133
mmdeploy Documentation, Release 1.3.1
28.2 Usage
python tools/deploy.py \
configs/mmdet/detection/detection_openvino_static-300x300.py \
/mmdetection_dir/mmdetection/configs/ssd/ssd300_coco.py \
/tmp/snapshots/ssd300_coco_20210803_015428-d231a06e.pth \
tests/data/tiger.jpeg \
--work-dir ../deploy_result \
--device cpu \
--log-level INFO
The table below lists the models that are guaranteed to be exportable to OpenVINO from MMDetection.
Notes:
• Custom operations from OpenVINO use the domain org.openvinotoolkit.
• For faster work in OpenVINO in the Faster-RCNN, Mask-RCNN, Cascade-RCNN, Cascade-Mask-RCNN models
the RoiAlign operation is replaced with the ExperimentalDetectronROIFeatureExtractor operation in the ONNX
graph.
• Models “VFNet” and “Faster R-CNN + DCN” use the custom “DeformableConv2D” operation.
With the deployment config, you can specify additional options for the Model Optimizer. To do this, add the necessary
parameters to the backend_config.mo_options in the fields args (for parameters with values) and flags (for flags).
Example:
backend_config = dict(
mo_options=dict(
args=dict({
'--mean_values': [0, 0, 0],
'--scale_values': [255, 255, 255],
'--data_type': 'FP32',
}),
flags=['--disable_fusing'],
)
)
Information about the possible parameters for the Model Optimizer can be found in the documentation.
28.5 Troubleshooting
• ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory
To resolve missing external dependency on Ubuntu*, execute the following command:
TWENTYNINE
PPLNN SUPPORT
MMDeploy supports ppl.nn v0.8.1 and later. This tutorial is based on Linux systems like Ubuntu-18.04.
29.1 Installation
29.2 Usage
Example:
python tools/deploy.py \
configs/mmdet/detection/detection_pplnn_dynamic-800x1344.py \
/mmdetection_dir/mmdetection/configs/retinanet/retinanet_r50_fpn_1x_coco.py \
/tmp/snapshots/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth \
tests/data/tiger.jpeg \
--work-dir ../deploy_result \
--device cuda \
--log-level INFO
137
mmdeploy Documentation, Release 1.3.1
THIRTY
Currently mmdeploy integrates the onnx2dlc model conversion and SDK inference, but the following features are not
yet supported:
• GPU_FP16 mode
• DSP/AIP quantization
• Operator internal profiling
• UDO operator
139
mmdeploy Documentation, Release 1.3.1
THIRTYONE
TENSORRT SUPPORT
31.1 Installation
Some custom ops are created to support models in OpenMMLab, and the custom ops can be built as follow:
If you haven’t installed TensorRT in the default path, Please add -DTENSORRT_DIR flag in CMake.
141
mmdeploy Documentation, Release 1.3.1
Please follow the tutorial in How to convert model. Note that the device must be cuda device.
Since TensorRT supports INT8 mode, a custom dataset config can be given to calibrate the model. Following is an
example for MMDetection:
# calibration_dataset.py
python tools/deploy.py \
...
--calib-dataset-cfg calibration_dataset.py
If the calibration dataset is not given, the data will be calibrated with the dataset in model config.
31.3 FAQs
backend_config = dict(
# other configs
model_inputs=[
dict(
input_shapes=dict(
input=dict(
min_shape=[1, 3, 320, 320],
opt_shape=[1, 3, 800, 1344],
max_shape=[1, 3, 1344, 1344])))
])
# other configs
The shape of the tensor input must be limited between input_shapes["input"]["min_shape"] and
input_shapes["input"]["max_shape"].
• Error error: [TensorRT] INTERNAL ERROR: Assertion failed: cublasStatus ==
CUBLAS_STATUS_SUCCESS
TRT 7.2.1 switches to use cuBLASLt (previously it was cuBLAS). cuBLASLt is the default choice for SM version
>= 7.0. However, you may need CUDA-10.2 Patch 1 (Released Aug 26, 2020) to resolve some cuBLASLt issues.
Another option is to use the new TacticSource API and disable cuBLASLt tactics if you don’t want to upgrade.
Read this for detail.
• Install mmdeploy on Jetson
We provide a tutorial to get start on Jetsons here.
THIRTYTWO
TORCHSCRIPT SUPPORT
TorchScript a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can
be saved from a Python process and loaded in a process where there is no Python dependency. Check the Introduction
to TorchScript for more details.
32.2.1 Prerequisite
wget https://download.pytorch.org/libtorch/cu111/libtorch-shared-with-deps-1.8.1%2Bcu111.
˓→zip
unzip libtorch-shared-with-deps-1.8.1+cu111.zip
cd libtorch
export Torch_DIR=$(pwd)
export LD_LIBRARY_PATH=$Torch_DIR/lib:$LD_LIBRARY_PATH
Note:
• If you want to save libtorch env variables to bashrc, you could run
145
mmdeploy Documentation, Release 1.3.1
32.5 FAQs
• Error: projects/thirdparty/libtorch/share/cmake/Caffe2/Caffe2Config.cmake:96
(message):Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN
libraries. Please set the proper cuDNN prefixes and / or install cuDNN.
May export CUDNN_ROOT=/root/path/to/cudnn to resolve the build error.
THIRTYTHREE
Currently, MMDeploy only tests rk3588 and rv1126 with linux platform.
The following features cannot be automatically enabled by mmdeploy and you need to manually modify the configu-
ration in MMDeploy like here.
• target_platform other than default
• quantization settings
• optimization level other than 1
147
mmdeploy Documentation, Release 1.3.1
THIRTYFOUR
MMDeploy has integrated TVM for model conversion and SDK. Features include:
• AutoTVM tuner
• Ansor tuner
• Graph Executor runtime
• Virtual machine runtime
149
mmdeploy Documentation, Release 1.3.1
THIRTYFIVE
35.1 Installation
To convert the model in mmdet, you need to compile libtorch to support custom operators such as nms (only needed in
conversion stage). For MacOS 12 users, please install Pytorch 1.8.0, for MacOS 13 users, please install Pytorch 2.0.0+.
cd ${PYTORCH_DIR}
mkdir build && cd build
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DPYTHON_EXECUTABLE=`which python` \
-DCMAKE_INSTALL_PREFIX=install \
-DDISABLE_SVE=ON
make install
35.2 Usage
python tools/deploy.py \
configs/mmdet/detection/detection_coreml_static-800x1344.py \
/mmdetection_dir/configs/retinanet/retinanet_r18_fpn_1x_coco.py \
/checkpoint/retinanet_r18_fpn_1x_coco_20220407_171055-614fd399.pth \
/mmdetection_dir/demo/demo.jpg \
--work-dir work_dir/retinanet \
--device cpu \
--dump-info
151
mmdeploy Documentation, Release 1.3.1
THIRTYSIX
153
mmdeploy Documentation, Release 1.3.1
– Parameters
– Inputs
– Outputs
– Type Constraints
36.1 grid_sampler
36.1.1 Description
36.1.2 Parameters
36.1.3 Inputs
36.1.4 Outputs
• T:tensor(float32, Linear)
36.2 MMCVModulatedDeformConv2d
36.2.1 Description
Perform Modulated Deformable Convolution on input feature, read Deformable ConvNets v2: More Deformable, Better
Results for detail.
36.2.2 Parameters
36.2.3 Inputs
36.2.4 Outputs
• T:tensor(float32, Linear)
36.3 NMSRotated
36.3.1 Description
36.3.2 Parameters
36.3.3 Inputs
36.3.4 Outputs
• T:tensor(float32, Linear)
36.4 RoIAlignRotated
36.4.1 Description
Perform RoIAlignRotated on output feature, used in bbox_head of most two-stage rotated object detectors.
36.4.2 Parameters
36.4.3 Inputs
36.4.4 Outputs
• T:tensor(float32)
36.5 NMSMatch
36.5.1 Description
36.5.2 Parameters
36.5.3 Inputs
36.5.4 Outputs
• T:tensor(float32)
THIRTYSEVEN
TENSORRT OPS
• TensorRT Ops
– TRTBatchedNMS
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– grid_sampler
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– MMCVInstanceNormalization
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– MMCVModulatedDeformConv2d
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– MMCVMultiLevelRoiAlign
∗ Description
157
mmdeploy Documentation, Release 1.3.1
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– MMCVRoIAlign
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– ScatterND
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– TRTBatchedRotatedNMS
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– GridPriorsTRT
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– ScaledDotProductAttentionTRT
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– GatherTopk
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– MMCVMultiScaleDeformableAttention
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
37.1 TRTBatchedNMS
37.1.1 Description
37.1.2 Parameters
37.1.3 Inputs
37.1.4 Outputs
• T:tensor(float32, Linear)
37.2 grid_sampler
37.2.1 Description
37.2.2 Parameters
37.2.3 Inputs
37.2.4 Outputs
• T:tensor(float32, Linear)
37.3 MMCVInstanceNormalization
37.3.1 Description
37.3.2 Parameters
37.3.3 Inputs
37.3.4 Outputs
• T:tensor(float32, Linear)
37.4 MMCVModulatedDeformConv2d
37.4.1 Description
Perform Modulated Deformable Convolution on input feature. Read Deformable ConvNets v2: More Deformable,
Better Results for detail.
37.4.2 Parameters
37.4.3 Inputs
37.4.4 Outputs
• T:tensor(float32, Linear)
37.5 MMCVMultiLevelRoiAlign
37.5.1 Description
Perform RoIAlign on features from multiple levels. Used in bbox_head of most two-stage detectors.
37.5.2 Parameters
37.5.3 Inputs
37.5.4 Outputs
• T:tensor(float32, Linear)
37.6 MMCVRoIAlign
37.6.1 Description
37.6.2 Parameters
37.6.3 Inputs
37.6.4 Outputs
• T:tensor(float32, Linear)
37.7 ScatterND
37.7.1 Description
ScatterND takes three inputs data tensor of rank r >= 1, indices tensor of rank q >= 1, and updates tensor of rank
q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input data, and then
updating its value to values specified by updates at specific index positions specified by indices. Its output shape is
the same as the shape of data. Note that indices should not have duplicate entries. That is, two or more updates for
the same index-location is not supported.
The output is calculated via the following equation:
output = np.copy(data)
update_indices = indices.shape[:-1]
for idx in np.ndindex(update_indices):
output[indices[idx]] = updates[idx]
37.7.2 Parameters
None
37.7.3 Inputs
37.7.4 Outputs
37.8 TRTBatchedRotatedNMS
37.8.1 Description
37.8.2 Parameters
37.8.3 Inputs
37.8.4 Outputs
• T:tensor(float32, Linear)
37.9 GridPriorsTRT
37.9.1 Description
37.9.2 Parameters
37.9.3 Inputs
37.9.4 Outputs
• T:tensor(float32, Linear)
• TAny: Any
37.10 ScaledDotProductAttentionTRT
37.10.1 Description
Dot product attention used to support multihead attention, read Attention Is All You Need for more detail.
37.10.2 Parameters
None
37.10.3 Inputs
37.10.4 Outputs
• T:tensor(float32, Linear)
37.11 GatherTopk
37.11.1 Description
37.11.2 Parameters
None
37.11.3 Inputs
37.11.4 Outputs
37.12 MMCVMultiScaleDeformableAttention
37.12.1 Description
Perform attention computation over a small set of key sampling points around a reference point rather than looking over
all possible spatial locations. Read Deformable DETR: Deformable Transformers for End-to-End Object Detection for
detail.
37.12.2 Parameters
None
37.12.3 Inputs
37.12.4 Outputs
• T:tensor(float32, Linear)
THIRTYEIGHT
NCNN OPS
• ncnn Ops
– Expand
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– Gather
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– Shape
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
– TopK
∗ Description
∗ Parameters
∗ Inputs
∗ Outputs
∗ Type Constraints
165
mmdeploy Documentation, Release 1.3.1
38.1 Expand
38.1.1 Description
Broadcast the input blob following the given shape and the broadcast rule of ncnn.
38.1.2 Parameters
38.1.3 Inputs
38.1.4 Outputs
• ncnn.Mat: Mat(float32)
38.2 Gather
38.2.1 Description
Given the data and indice blob, gather entries of the axis dimension of data indexed by indices.
38.2.2 Parameters
38.2.3 Inputs
38.2.4 Outputs
• ncnn.Mat: Mat(float32)
38.3 Shape
38.3.1 Description
38.3.2 Parameters
38.3.3 Inputs
38.3.4 Outputs
• ncnn.Mat: Mat(float32)
38.4 TopK
38.4.1 Description
Get the indices and value(optional) of largest or smallest k data among the axis. This op will map to onnx op TopK,
ArgMax, and ArgMin.
38.4.2 Parameters
38.4.3 Inputs
38.4.4 Outputs
• ncnn.Mat: Mat(float32)
THIRTYNINE
MMDEPLOY ARCHITECTURE
This article mainly introduces the functions of each directory of mmdeploy and how it works from model conversion
to real inference.
The entire mmdeploy can be seen as two independent parts: model conversion and SDK.
We introduce the entire repo directory structure and functions, without having to study the source code, just have an
impression.
Peripheral directory features:
$ cd /path/to/mmdeploy
$ tree -L 1
.
CMakeLists.txt # Compile custom operator and cmake configuration of SDK
configs # Algorithm library configuration for model conversion
csrc # SDK and custom operator
demo # FFI interface examples in various languages, such as␣
˓→csharp, java, python, etc.
tests # unittest
third_party # 3rd party dependencies required by SDK and FFI
tools # Tools are also the entrance to all functions, such as␣
˓→onnx2xx.py, profiler.py, test.py, etc.
It should be clear
• Model conversion mainly depends on tools, mmdeploy and small part of csrc directory;
• SDK is consist of three directories: csrc, third_party and demo.
169
mmdeploy Documentation, Release 1.3.1
Here we take ViT of mmpretrain as model example, and take ncnn as inference backend example. Other models and
inferences are similar.
Let’s take a look at the mmdeploy/mmdeploy directory structure and get an impression:
.
apis # The api used by tools is implemented here, such␣
˓→ as onnx2ncnn.py
calibration.py # trt dedicated collection of quantitative data
core # Software infrastructure
extract_model.py # Use it to export part of onnx
inference.py # Abstract function, which will actually call torch/
˓→ncnn specific inference
..
backend # Backend wrapper
base # Because there are multiple backends, there␣
˓→must be an OO design for the base class
Because when exporting ViT to onnx, it generates some operators that ncnn doesn’t support perfectly, mmdeploy’s
solution is to hijack the forward code and change it. The output onnx is suitable for ncnn.
For example, rewrite the process of conv -> shape -> concat_const -> reshape to conv -> reshape to trim
off the redundant shape and concat operator.
All mmpretrain algorithm rewriters are in the mmdeploy/codebase/mmpretrain/models directory.
Operators customized for ncnn are in the csrc/mmdeploy/backend_ops/ncnn/ directory, and are loaded together
with libncnn.so after compilation. The essence is in hotfix ncnn, which currently implements these operators:
• topk
• tensorslice
• shape
• gather
• expand
• constantofshape
We first use the modified mmdeploy_onnx2ncnnto convert model, then inference withpyncnn and custom ops.
When encountering a framework such as snpe that does not support python well, we use C/S mode: wrap a server with
protocols such as gRPC, and forward the real inference output.
For Rendering, mmdeploy directly uses the rendering API of upstream algorithm codebase.
39.3 SDK
After the model conversion completed, the SDK compiled with C++ can be used to execute on different platforms.
Let’s take a look at the csrc/mmdeploy directory structure:
.
apis # csharp, java, go, Rust and other FFI interfaces
backend_ops # Custom operators for each inference framework
CMakeLists.txt
codebase # The type of results preferred by each algorithm framework, such as␣
˓→multi-use bbox for detection task
core # Abstraction of graph, operator, device and so on
device # Implementation of CPU/GPU device abstraction
execution # Implementation of the execution abstraction
graph # Implementation of graph abstraction
model # Implement both zip-compressed and uncompressed work directory
net # Implementation of net, such as wrap ncnn forward C API
preprocess # Implement preprocess
utils # OCV tools
The essence of the SDK is to design a set of abstraction of the computational graph, and combine the multiple models’
• preprocess
• inference
• postprocess
Provide FFI in multiple languages at the same time.
FORTY
The PyTorch neural network is written in python that eases the development of the algorithm. But the use of Python
control flow and third-party libraries make it difficult to export the network to an intermediate representation. We
provide a ‘monkey patch’ tool to rewrite the unsupported function to another one that can be exported. Here is an
example:
@FUNCTION_REWRITER.register_rewriter(
func_name='torch.Tensor.repeat', backend='tensorrt')
def repeat_static(input, *size):
ctx = FUNCTION_REWRITER.get_context()
origin_func = ctx.origin_func
if input.dim() == 1 and len(size) == 1:
return origin_func(input.unsqueeze(0), *([1] + list(size))).squeeze(0)
else:
return origin_func(input, *size)
It is easy to use the function rewriter. Just add a decorator with arguments:
• func_name is the function to override. It can be either a PyTorch function or a custom function. Methods in
modules can also be overridden by this tool.
• backend is the inference engine. The function will be overridden when the model is exported to this engine. If
it is not given, this rewrite will be the default rewrite. The default rewrite will be used if the rewrite of the given
backend does not exist.
The arguments are the same as the original function, except a context ctx as the first argument. The context provides
some useful information such as the deployment config ctx.cfg and the original function (which has been overridden)
ctx.origin_func.
173
mmdeploy Documentation, Release 1.3.1
If you want to replace a whole module with another one, we have another rewriter as follows:
@MODULE_REWRITER.register_rewrite_module(
'mmagic.models.backbones.sr_backbones.SRCNN', backend='tensorrt')
class SRCNNWrapper(nn.Module):
def __init__(self,
module,
cfg,
channels=(3, 64, 32, 3),
kernel_sizes=(9, 1, 5),
upscale_factor=4):
super(SRCNNWrapper, self).__init__()
self._module = module
module.img_upsampler = nn.Upsample(
scale_factor=module.upscale_factor,
mode='bilinear',
align_corners=False)
The mappings between PyTorch and ONNX are defined in PyTorch with symbolic functions. The custom symbolic
function can help us to bypass some ONNX nodes which are unsupported by inference engine.
@SYMBOLIC_REWRITER.register_symbolic('squeeze', is_pytorch=True)
def squeeze_default(g, self, dim=None):
if dim is None:
dims = []
for i, size in enumerate(self.type().sizes()):
if size == 1:
(continues on next page)
FORTYONE
MMDeploy supports a number of backend engines. We welcome the contribution of new backends. In this tutorial, we
will introduce the general procedures to support a new backend in MMDeploy.
41.1 Prerequisites
Before contributing the codes, there are some requirements for the new backend that need to be checked:
• The backend must support ONNX as IR.
• If the backend requires model files or weight files other than a “.onnx” file, a conversion tool that converts the
“.onnx” file to model files and weight files is required. The tool can be a Python API, a script, or an executable
program.
• It is highly recommended that the backend provides a Python interface to load the backend files and inference
for validation.
The backends in MMDeploy must support the ONNX. The backend loads the “.onnx” file directly, or converts the
“.onnx” to its own format using the conversion tool. In this section, we will introduce the steps to support backend
conversion.
1. Add backend constant in mmdeploy/utils/constants.py that denotes the name of the backend.
Example:
# mmdeploy/utils/constants.py
class Backend(AdvancedEnum):
# Take TensorRT as an example
TENSORRT = 'tensorrt'
2. Add a corresponding package (a folder with __init__.py) in mmdeploy/backend/. For example, mmdeploy/
backend/tensorrt. In the __init__.py, there must be a function named is_available which checks if
users have installed the backend library. If the check is passed, then the remaining files of the package will be
loaded.
Example:
177
mmdeploy Documentation, Release 1.3.1
# mmdeploy/backend/tensorrt/__init__.py
def is_available():
return importlib.util.find_spec('tensorrt') is not None
if is_available():
from .utils import from_onnx, load, save
from .wrapper import TRTWrapper
__all__ = [
'from_onnx', 'save', 'load', 'TRTWrapper'
]
backend_config = dict(type='onnxruntime')
If the backend requires other files, then the arguments for the conversion from “.onnx” file to backend files should
be included in the config file.
Example:
backend_config = dict(
type='tensorrt',
common_config=dict(
fp16_mode=False, max_workspace_size=0))
After possessing a base backend config file, you can easily construct a complete deploy config through inheri-
tance. Please refer to our config tutorial for more details. Here is an example:
_base_ = ['../_base_/backends/onnxruntime.py']
4. If the backend requires model files or weight files other than a “.onnx” file, create a onnx2backend.py file
in the corresponding folder (e.g., create mmdeploy/backend/tensorrt/onnx2tensorrt.py). Then add a
conversion function onnx2backend in the file. The function should convert a given “.onnx” file to the required
backend files in a given work directory. There are no requirements on other parameters of the function and the
implementation details. You can use any tools for conversion. Here are some examples:
Use Python script:
input_names = ','.join(input_info.keys())
input_shapes = ','.join(str(list(elem)) for elem in input_info.values())
output = ','.join(output_names)
(continues on next page)
# mmdeploy/apis/ncnn/__init__.py
__all__ = ['is_available']
if is_available():
from mmdeploy.backend.ncnn.onnx2ncnn import (onnx2ncnn,
get_output_model_file)
__all__ += ['onnx2ncnn', 'get_output_model_file']
Create a backend manager class which derive from BaseBackendManager, implement its to_backend static
method.
Example:
@classmethod
def to_backend(cls,
ir_files: Sequence[str],
deploy_cfg: Any,
work_dir: str,
log_level: int = logging.INFO,
device: str = 'cpu',
**kwargs) -> Sequence[str]:
return ir_files
6. Convert the models of OpenMMLab to backends (if necessary) and inference on backend engine. If you find
some incompatible operators when testing, you can try to rewrite the original model for the backend following
the rewriter tutorial or add custom operators.
7. Add docstring and unit tests for new code :).
Although the backend engines are usually implemented in C/C++, it is convenient for testing and debugging if the
backend provides Python inference interface. We encourage the contributors to support backend inference in the Python
interface of MMDeploy. In this section we will introduce the steps to support backend inference.
1. Add a file named wrapper.py to corresponding folder in mmdeploy/backend/{backend}. For example,
mmdeploy/backend/tensorrt/wrapper.py. This module should implement and register a wrapper class
that inherits the base class BaseWrapper in mmdeploy/backend/base/base_wrapper.py.
Example:
@BACKEND_WRAPPER.register_module(Backend.TENSORRT.value)
class TRTWrapper(BaseWrapper):
2. The wrapper class can initialize the engine in __init__ function and inference in forward function. Note that
the __init__ function must take a parameter output_names and pass it to base class to determine the orders of
output tensors. The input and output variables of forward should be dictionaries denoting the name and value
of the tensors.
3. For the convenience of performance testing, the class should define a “execute” function that only calls the
inference interface of the backend engine. The forward function should call the “execute” function after pre-
processing the data.
Example:
@BACKEND_WRAPPER.register_module(Backend.ONNXRUNTIME.value)
class ORTWrapper(BaseWrapper):
def __init__(self,
onnx_file: str,
device: str,
output_names: Optional[Sequence[str]] = None):
# Initialization
# ...
super().__init__(output_names)
self.__ort_execute(self.io_binding)
# Postprocess data
# ...
@TimeCounter.count_time('onnxruntime')
(continues on next page)
4. Create a backend manager class which derive from BaseBackendManager, implement its build_wrapper static
method.
Example:
@BACKEND_MANAGERS.register('onnxruntime')
class ONNXRuntimeManager(BaseBackendManager):
@classmethod
def build_wrapper(cls,
backend_files: Sequence[str],
device: str = 'cpu',
input_names: Optional[Sequence[str]] = None,
output_names: Optional[Sequence[str]] = None,
deploy_cfg: Optional[Any] = None,
**kwargs):
from .wrapper import ORTWrapper
return ORTWrapper(
onnx_file=backend_files[0],
device=device,
output_names=output_names)
Previous parts show how to add a new backend in MMDeploy, which requires changing its source codes. However,
if we treat MMDeploy as a third party, the methods above are no longer efficient. To this end, adding a new backend
requires us pre-install another package named aenum. We can install it directly through pip install aenum.
After installing aenum successfully, we can use it to add a new backend through:
try:
Backend.get('backend_name')
except Exception:
extend_enum(Backend, 'BACKEND', 'backend_name')
We can run the codes above before we use the rewrite logic of MMDeploy.
FORTYTWO
This tutorial introduces how to add unit test for backend ops. When you add a custom op under backend_ops, you
need to add the corresponding test unit. Test units of ops are included in tests/test_ops/test_ops.py.
42.1 Prerequisite
• Compile new ops: After adding a new custom op, needs to recompile the relevant backend, referring to
build.md.
You can put unit test for ops in tests/test_ops/. Usually, the following program template can be used for your
custom op.
def test_roi_align(backend,
pool_h, # set␣
˓→parameters of op
pool_w,
spatial_scale,
sampling_ratio,
input_list=None,
save_dir=None):
backend.check_env()
if input_list is None:
input = torch.rand(1, 1, 16, 16, dtype=torch.float32) # 1.3 op input␣
˓→data initialization
183
mmdeploy Documentation, Release 1.3.1
wrapped_model = WrapFunction(wrapped_function).eval()
We provide some functions and classes for difference backends, such as TestOnnxRTExporter,
TestTensorRTExporter, TestNCNNExporter.
Set some parameters of op, such as ’pool_h‘, ’pool_w‘, ’spatial_scale‘, ’sampling_ratio‘ in roi_align. You can set
multiple parameters to test op.
184 Chapter 42. How to add test units for backend ops
mmdeploy Documentation, Release 1.3.1
Call the backend test class run_and_validate to run and verify the result output by the op on the backend.
def run_and_validate(self,
model,
input_list,
model_name='tmp',
tolerate_small_mismatch=False,
do_constant_folding=True,
dynamic_axes=None,
output_names=None,
input_names=None,
expected_result=None,
save_dir=None):
Parameter Description
• model: Input model to be tested and it can be torch model or any other backend model.
• input_list: List of test data, which is mapped to the order of input_names.
• model_name: The name of the model.
• tolerate_small_mismatch: Whether to allow small errors in the verification of results.
• do_constant_folding: Whether to use constant light folding to optimize the model.
• dynamic_axes: If you need to use dynamic dimensions, enter the dimension information.
• output_names: The node name of the output node.
• input_names: The node name of the input node.
• expected_result: Expected ground truth values for verification.
• save_dir: The folder used to save the output files.
pytest tests/test_ops/test_ops.py::test_XXXX
186 Chapter 42. How to add test units for backend ops
CHAPTER
FORTYTHREE
After you create a rewritten model using our rewriter, it’s better to write a unit test for the model to validate if the
model rewrite would come into effect. Generally, we need to get outputs of the original model and rewritten model,
then compare them. The outputs of the original model can be acquired directly by calling the forward function of the
model, whereas the way to generate the outputs of the rewritten model depends on the complexity of the rewritten
model.
If the changes to the model are small (e.g., only change the behavior of one or two variables and don’t introduce
side effects), you can construct the input arguments for the rewritten functions/modulesrun model’s inference in
RewriteContext and check the results.
# mmpretrain.models.classfiers.base.py
class BaseClassifier(BaseModule, metaclass=ABCMeta):
def forward(self, img, return_loss=True, **kwargs):
if return_loss:
return self.forward_train(img, **kwargs)
else:
return self.forward_test(img, **kwargs)
In the example, we only change the function that forward calls. We can test this rewritten function by writing the
following test function:
def test_baseclassfier_forward():
input = torch.rand(1)
from mmpretrain.models.classifiers import BaseClassifier
class DummyClassifier(BaseClassifier):
187
mmdeploy Documentation, Release 1.3.1
model = DummyClassifier().eval()
model_output = model(input)
with RewriterContext(cfg=dict()), torch.no_grad():
backend_output = model(input)
In this test function, we construct a derived class of BaseClassifier to test if the rewritten model would work in
the rewrite context. We get outputs of the original model by directly calling model(input) and get the outputs of the
rewritten model by calling model(input) in RewriteContext. Finally, we can check the outputs by asserting their
value.
In the first example, the output is generated in Python. Sometimes we may make big changes to original model functions
(e.g., eliminate branch statements to generate correct computing graph). Even if the outputs of a rewritten model
running in Python are correct, we cannot assure that the rewritten model can work as expected in the backend. Therefore,
we need to test the rewritten model in the backend.
deploy_cfg = ctx.cfg
is_dynamic_flag = is_dynamic_shape(deploy_cfg)
img_shape = img.shape[2:]
if not is_dynamic_flag:
img_shape = [int(val) for val in img_shape]
img_metas['img_shape'] = img_shape
return self.simple_test(img, img_metas, **kwargs)
def test_basesegmentor_forward():
from mmdeploy.utils.test import (WrapModel, get_model_outputs,
get_rewrite_outputs)
segmentor = get_model()
segmentor.cpu().eval()
# Prepare data
# ...
We provide some utilities to test rewritten functions. At first, you can construct a model and call get_model_outputs
to get outputs of the original model. Then you can wrap the rewritten function with WrapModel, which serves as a
partial function, and get the results with get_rewrite_outputs. get_rewrite_outputs returns two values that
indicate the content of outputs and whether the outputs come from the backend. Because we cannot assume that
everyone has installed the backend, we should check if the results are generated by a Python or backend engine. The
unit test must cover both conditions. Finally, we should compare the original and rewritten outputs, which may be done
simply by calling torch.allclose.
43.3 Note
To learn the complete usage of the test utilities, please refer to our apis document.
FORTYFOUR
MMDeploy supports exporting PyTorch models to partitioned onnx models. With this feature, users can define their
partition policy and get partitioned onnx models at ease. In this tutorial, we will briefly introduce how to support
partition a model step by step. In the example, we would break YOLOV3 model into two parts and extract the first part
without the post-processing (such as anchor generating and NMS) in the onnx model.
To support the model partition, we need to add Mark nodes in the ONNX model. This could be done with mmdeploy’s
@mark decorator. Note that to make the mark work, the marking operation should be included in a rewriting function.
At first, we would mark the model input, which could be done by marking the input tensor img in the forward
method of BaseDetector class, which is the parent class of all detector classes. Thus we name this marking
point as detector_forward and mark the inputs as input. Since there could be three outputs for detectors
such as Mask RCNN, the outputs are marked as dets, labels, and masks. The following code shows the idea of
adding mark functions and calling the mark functions in the rewrite. For source code, you could refer to mmde-
ploy/codebase/mmdet/models/detectors/single_stage.py
@mark(
'detector_forward', inputs=['input'], outputs=['dets', 'labels', 'masks'])
def __forward_impl(self, img, img_metas=None, **kwargs):
...
@FUNCTION_REWRITER.register_rewriter(
'mmdet.models.detectors.base.BaseDetector.forward')
def base_detector__forward(self, img, img_metas=None, **kwargs):
...
# call the mark function
return __forward_impl(...)
Then, we have to mark the output feature of YOLOV3Head, which is the input argument pred_maps in
get_bboxes method of YOLOV3Head class. We could add a internal function to only mark the pred_maps inside
yolov3_head__get_bboxes function as following.
@FUNCTION_REWRITER.register_rewriter(
(continues on next page)
191
mmdeploy Documentation, Release 1.3.1
Note that pred_maps is a list of Tensor and it has three elements. Thus, three Mark nodes with op name as
pred_maps.0, pred_maps.1, pred_maps.2 would be added in the onnx model.
After marking necessary nodes that would be used to split the model, we could add a deployment config file configs/
mmdet/detection/yolov3_partition_onnxruntime_static.py. If you are not familiar with how to write con-
fig, you could check write_config.md.
In the config file, we need to add partition_config. The key part is partition_cfg, which contains elements of
dict that designates the start nodes and end nodes of each model segments. Since we only want to keep YOLOV3 without
post-processing, we could set the start as ['detector_forward:input'], and end as ['yolo_head:input'].
Note that start and end can have multiple marks.
_base_ = ['./detection_onnxruntime_static.py']
Once we have marks of nodes and the deployment config with parition_config being set properly, we could use the
tool torch2onnx to export the model to onnx and get the partition onnx files.
python tools/torch2onnx.py \
configs/mmdet/detection/yolov3_partition_onnxruntime_static.py \
../mmdetection/configs/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py \
https://download.openmmlab.com/mmdetection/v2.0/yolo/yolov3_d53_mstrain-608_273e_coco/
˓→yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8.pth \
../mmdetection/demo/demo.jpg \
--work-dir ./work-dirs/mmdet/yolov3/ort/partition
After run the script above, we would have the partitioned onnx file yolov3.onnx in the work-dir. You can use the
visualization tool netron to check the model structure.
With the partitioned onnx file, you could refer to useful_tools.md to do the following procedures such as
mmdeploy_onnx2ncnn, onnx2tensorrt.
FORTYFIVE
This tutorial describes how to do regression test. The deployment configuration file contains codebase config and
inference config.
45.2 2. Usage
python ./tools/regression_test.py \
--codebase "${CODEBASE_NAME}" \
--backends "${BACKEND}" \
[--models "${MODELS}"] \
--work-dir "${WORK_DIR}" \
--device "${DEVICE}" \
--log-level INFO \
[--performance -p] \
[--checkpoint-dir "$CHECKPOINT_DIR"]
45.2.1 Description
• --codebase : The codebase to test, eg.mmdet. If you want to test multiple codebase, use mmpretrain mmdet
...
• --backends : The backend to test. By default, all backends would be tested. You can use onnxruntime
tesensorrtto choose several backends. If you also need to test the SDK, you need to configure the sdk_config
in tests/regression/${codebase}.yml.
• --models : Specify the model to be tested. All models in yml are tested by default. You can also give some
model names. For the model name, please refer to the relevant yml configuration file. For example ResNet
SE-ResNet "Mask R-CNN". Model name can only contain numbers and letters.
195
mmdeploy Documentation, Release 1.3.1
45.2.2 Notes
45.3 Example
1. Test all backends of mmdet and mmpose for model convert and precision
python ./tools/regression_test.py \
--codebase mmdet mmpose \
--work-dir "../mmdeploy_regression_working_dir" \
--device "cuda" \
--log-level INFO \
--performance
2. Test model convert and precision of some backends of mmdet and mmpose
python ./tools/regression_test.py \
--codebase mmdet mmpose \
--backends onnxruntime tensorrt \
--work-dir "../mmdeploy_regression_working_dir" \
--device "cuda" \
--log-level INFO \
-p
3. Test some backends of mmdet and mmpose, only test model convert
python ./tools/regression_test.py \
--codebase mmdet mmpose \
--backends onnxruntime tensorrt \
--work-dir "../mmdeploy_regression_working_dir" \
--device "cuda" \
--log-level INFO
4. Test some models of mmdet and mmpretrain, only test model convert
python ./tools/regression_test.py \
--codebase mmdet mmpose \
--models ResNet SE-ResNet "Mask R-CNN" \
--work-dir "../mmdeploy_regression_working_dir" \
--device "cuda" \
--log-level INFO
globals:
codebase_dir: ../mmocr # codebase path to test
checkpoint_force_download: False # whether to redownload the model even if it already␣
˓→exists
images:
img_densetext_det: &img_densetext_det ../mmocr/demo/demo_densetext_det.jpg
img_demo_text_det: &img_demo_text_det ../mmocr/demo/demo_text_det.jpg
img_demo_text_ocr: &img_demo_text_ocr ../mmocr/demo/demo_text_ocr.jpg
img_demo_text_recog: &img_demo_text_recog ../mmocr/demo/demo_text_recog.jpg
metric_info: &metric_info
hmean-iou: # metafile.Results.Metrics
eval_name: hmean-iou # test.py --metrics args
metric_key: 0_hmean-iou:hmean # the key name of eval log
tolerance: 0.1 # tolerated threshold interval
task_name: Text Detection # the name of metafile.Results.Task
dataset: ICDAR2015 # the name of metafile.Results.Dataset
word_acc: # same as hmean-iou, also a kind of metric
eval_name: acc
metric_key: 0_word_acc_ignore_case
tolerance: 0.2
task_name: Text Recognition
dataset: IIIT5K
convert_image_det: &convert_image_det # the image that will be used by detection model␣
˓→convert
input_img: *img_densetext_det
test_img: *img_demo_text_det
convert_image_rec: &convert_image_rec
input_img: *img_demo_text_recog
test_img: *img_demo_text_recog
backend_test: &default_backend_test True # whether test model precision for backend
sdk: # SDK config
sdk_detection_dynamic: &sdk_detection_dynamic configs/mmocr/text-detection/text-
˓→detection_sdk_dynamic.py
onnxruntime:
pipeline_ort_recognition_static_fp32: &pipeline_ort_recognition_static_fp32
convert_image: *convert_image_rec # the image used by model conversion
(continues on next page)
deploy_config: configs/mmocr/text-recognition/text-recognition_onnxruntime_static.py
˓→# the deploy cfg path to use, based on mmdeploy path
pipeline_ort_recognition_dynamic_fp32: &pipeline_ort_recognition_dynamic_fp32
convert_image: *convert_image_rec
backend_test: *default_backend_test
sdk_config: *sdk_recognition_dynamic
deploy_config: configs/mmocr/text-recognition/text-recognition_onnxruntime_dynamic.py
pipeline_ort_detection_dynamic_fp32: &pipeline_ort_detection_dynamic_fp32
convert_image: *convert_image_det
deploy_config: configs/mmocr/text-detection/text-detection_onnxruntime_dynamic.py
tensorrt:
pipeline_trt_recognition_dynamic_fp16: &pipeline_trt_recognition_dynamic_fp16
convert_image: *convert_image_rec
backend_test: *default_backend_test
sdk_config: *sdk_recognition_dynamic
deploy_config: configs/mmocr/text-recognition/text-recognition_tensorrt-fp16_dynamic-
˓→1x32x32-1x32x640.py
pipeline_trt_detection_dynamic_fp16: &pipeline_trt_detection_dynamic_fp16
convert_image: *convert_image_det
backend_test: *default_backend_test
sdk_config: *sdk_detection_dynamic
deploy_config: configs/mmocr/text-detection/text-detection_tensorrt-fp16_dynamic-
˓→320x320-2240x2240.py
openvino:
# same as onnxruntime backend configuration
ncnn:
# same as onnxruntime backend configuration
pplnn:
# same as onnxruntime backend configuration
torchscript:
# same as onnxruntime backend configuration
models:
- name: crnn # model name
metafile: configs/textrecog/crnn/metafile.yml # the path of model metafile, based on␣
˓→codebase path
FORTYSIX
46.1 Installation
cmake \
-DTorch_DIR=${Torch_DIR} \
-DMMDEPLOY_TARGET_BACKENDS="${your_backend};torchscript" \
.. # You can also add other build flags if you need
46.2 Usage
201
mmdeploy Documentation, Release 1.3.1
FORTYSEVEN
mmdeploy has provided a prebuilt package, if you want to compile it by self, or need to modify the .proto file, you
can refer to this document.
Note that the official gRPC documentation does not have complete support for the NDK.
47.1 1. Environment
$ cmake \
-DCMAKE_BUILD_TYPE=Release \
-DgRPC_INSTALL=ON \
-DgRPC_BUILD_TESTS=OFF \
-DgRPC_SSL_PROVIDER=package \
../..
# Install to host
$ make -j
$ sudo make install
2. Download the NDK and cross-compile the static libraries with android aarch64 format
$ wget https://dl.google.com/android/repository/android-ndk-r17c-linux-x86_64.zip
$ unzip android-ndk-r17c-linux-x86_64.zip
$ export ANDROID_NDK=/path/to/android-ndk-r17c
$ cd /path/to/grpc
$ mkdir -p cmake/build_aarch64 && pushd cmake/build_aarch64
$ cmake ../.. \
(continues on next page)
203
mmdeploy Documentation, Release 1.3.1
$ make -j
$ make install
$ cd /tmp/android_grpc_install
$ tree -L 1
.
bin
include
lib
share
$ cd /path/to/grpc/examples/cpp/helloworld/
$ mkdir cmake/build_aarch64 -p && pushd cmake/build_aarch64
$ cmake ../.. \
-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-26 \
-DANDROID_STL=c++_shared \
-DANDROID_TOOLCHAIN=clang \
-DCMAKE_BUILD_TYPE=Release \
-Dabsl_DIR=/tmp/android_grpc_install_shared/lib/cmake/absl \
-DProtobuf_DIR=/tmp/android_grpc_install_shared/lib/cmake/protobuf \
-DgRPC_DIR=/tmp/android_grpc_install_shared/lib/cmake/grpc
$ make -j
$ ls greeter*
greeter_async_client greeter_async_server greeter_callback_server greeter_server
greeter_async_client2 greeter_callback_client greeter_client
/data/local/tmp $ ./greeter_client
Greeter received: Hello world
1. Open the snpe tools website and download version 1.59. Unzip and set environment variables
Note that snpe >= 1.60 starts using clang-8.0, which may cause incompatibility with libc++_shared.
so on older devices.
$ export SNPE_ROOT=/path/to/snpe-1.59.0.3230
2. Open the snpe server directory within mmdeploy, use the options when cross-compiling gRPC
$ cd /path/to/mmdeploy
$ cd service/snpe/server
$ make -j
$ file inference_server
inference_server: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV),␣
˓→dynamically linked, interpreter /system/bin/linker64,␣
Finally, you can see infernece_server, adb push it to the device and execute.
If you have changed inference.proto, you need to regenerate the .cpp and .py interfaces
$ ln -s `which protoc-gen-grpc`
$ protoc --cpp_out=./ --grpc_out=./ --plugin=protoc-gen-grpc=grpc_cpp_plugin inference.
˓→proto
47.6 Reference
FORTYEIGHT
48.1 TensorRT
• “WARNING: Half2 support requested on hardware without native FP16 support, performance will be negatively
affected.”
Fp16 mode requires a device with full-rate fp16 support.
• “error: parameter check failed at: engine.cpp::setBindingDimensions::1046, condition: profileMinDims.d[i] <=
dimensions.d[i]”
When building an ICudaEngine from an INetworkDefinition that has dynamically resizable inputs, users
need to specify at least one optimization profile. Which can be set in deploy config:
backend_config = dict(
common_config=dict(max_workspace_size=1 << 30),
model_inputs=[
dict(
input_shapes=dict(
input=dict(
min_shape=[1, 3, 320, 320],
opt_shape=[1, 3, 800, 1344],
max_shape=[1, 3, 1344, 1344])))
])
The input tensor shape should be limited between min_shape and max_shape.
• “error: [TensorRT] INTERNAL ERROR: Assertion failed: cublasStatus == CUBLAS_STATUS_SUCCESS”
TRT 7.2.1 switches to use cuBLASLt (previously it was cuBLAS). cuBLASLt is the defaulted choice for SM
version >= 7.0. You may need CUDA-10.2 Patch 1 (Released Aug 26, 2020) to resolve some cuBLASLt issues.
Another option is to use the new TacticSource API and disable cuBLASLt tactics if you dont want to upgrade.
48.2 Libtorch
207
mmdeploy Documentation, Release 1.3.1
48.3 Windows
• Error: similar like this OSError: [WinError 1455] The paging file is too small for this
operation to complete. Error loading "C:\Users\cx\miniconda3\lib\site-packages\
torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies
Solution: according to this post, the issue may be caused by NVidia and will fix in CUDA release 11.7. For now
one could use the fixNvPe.py script to modify the nvidia dlls in the pytorch lib dir.
python fixNvPe.py --input=C:\Users\user\AppData\Local\Programs\Python\Python38\lib\
site-packages\torch\lib\*.dll
You can find your pytorch installation path with:
import torch
print(torch.__file__)
• enable_language(CUDA) error
C:/Software/cmake/cmake-3.23.1-windows-x86_64/share/cmake-3.23/Modules/
˓→CMakeDetermineCompilerId.cmake:59 (__determine_compiler_id_test)
C:/Software/cmake/cmake-3.23.1-windows-x86_64/share/cmake-3.23/Modules/
˓→CMakeDetermineCUDACompiler.cmake:339 (CMAKE_DETERMINE_COMPILER_ID)
C:/workspace/mmdeploy-0.6.0-windows-amd64-cuda11.1-tensorrt8.2.3.0/sdk/lib/cmake/
˓→MMDeploy/MMDeployConfig.cmake:27 (enable_language)
CMakeLists.txt:5 (find_package)
Cause CUDA Toolkit 11.1 was installed before Visual Studio, so the VS plugin was not installed. Or the version
of VS is too new, so that the installation of the VS plugin is skipped during the installation of the CUDA Toolkit
Solution This problem can be solved by manually copying the four files in C:\Program Files\NVIDIA
GPU Computing Toolkit\CUDA\v11.1\extras\visual_studio_integration\MSBuildExtensions
to C:\Software\Microsoft Visual Studio\2022\Community\Msbuild\Microsoft\VC\v170\
BuildCustomizations The specific path should be changed according to the actual situation.
• Under Windows system, when visualizing model inference result failed with the following error:
Cause In latest Windows systems, there are two onnxruntime.dll under the system path, and they will be
loaded first, causing conflicts.
C:\Windows\SysWOW64\onnxruntime.dll
C:\Windows\System32\onnxruntime.dll
48.5 Pip
$ which pip
# /path/to/.local/bin/pip
/path/to/miniconda3/lib/python3.9/site-packages/pip
FORTYNINE
ENGLISH
211
mmdeploy Documentation, Release 1.3.1
FIFTY
213
mmdeploy Documentation, Release 1.3.1
FIFTYONE
APIS
215
mmdeploy Documentation, Release 1.3.1
The partition-model is defined by the names of the input and output tensors exactly.
Examples
Parameters
• model (str | onnx.ModelProto) – Input ONNX model to be extracted.
• start_marker (str | Sequence[str]) – Start marker(s) to extract.
• end_marker (str | Sequence[str]) – End marker(s) to extract.
• start_name_map (Dict[str, str]) – A mapping of start names, defaults to None.
• end_name_map (Dict[str, str]) – A mapping of end names, defaults to None.
• dynamic_axes (Dict[str, Dict[int, str]]) – A dictionary to specify dynamic axes
of input/output, defaults to None.
• save_file (str) – A file to save the extracted model, defaults to None.
Returns The extracted model.
Return type onnx.ModelProto
Notes
Examples
Parameters
• model_cfg (str | mmengine.Config) – Model config file or Config object.
• deploy_cfg (str | mmengine.Config) – Deployment config file or Config object.
• backend_files (Sequence[str]) – Input backend model file(s).
• img (str | np.ndarray) – Input image file or numpy array for inference.
• device (str) – A string specifying device type.
Returns The inference results
Return type Any
217
mmdeploy Documentation, Release 1.3.1
Examples
Parameters
• img (str | np.ndarray | torch.Tensor) – Input image used to assist converting
model.
• work_dir (str) – A working directory to save files.
• save_file (str) – Filename to save onnx model.
• deploy_cfg (str | mmengine.Config) – Deployment config file or Config object.
• model_cfg (str | mmengine.Config) – Model config file or Config object.
• model_checkpoint (str) – A checkpoint path of PyTorch model, defaults to None.
• device (str) – A string specifying device type, defaults to ‘cuda:0’.
Examples
Parameters
• model_cfg (str | mmengine.Config) – Model config file or Config object.
• deploy_cfg (str | mmengine.Config) – Deployment config file or Config object.
• model (str | Sequence[str]) – Input model or file(s).
• img (str | np.ndarray | Sequence[str]) – Input image file or numpy array for infer-
ence.
• device (str) – A string specifying device type.
• backend (Backend) – Specifying backend type, defaults to None.
• output_file (str) – Output file to save visualized image, defaults to None. Only valid if
show_result is set to False.
• show_result (bool) – Whether to show plotted image in windows, defaults to False.
219
mmdeploy Documentation, Release 1.3.1
FIFTYTWO
APIS/TENSORRT
Example
221
mmdeploy Documentation, Release 1.3.1
Examples
Parameters
• work_dir (str) – A working directory.
• save_file (str) – The base name of the file to save TensorRT engine. E.g. end2end.engine.
• model_id (int) – Index of input model.
• deploy_cfg (str | mmengine.Config) – Deployment config.
• onnx_model (str | onnx.ModelProto) – input onnx model.
• device (str) – A string specifying cuda device, defaults to ‘cuda:0’.
• partition_type (str) – Specifying partition type of a model, defaults to ‘end2end’.
223
mmdeploy Documentation, Release 1.3.1
FIFTYTHREE
APIS/ONNXRUNTIME
225
mmdeploy Documentation, Release 1.3.1
FIFTYFOUR
APIS/NCNN
Example
Parameters
• onnx_path (ModelProto|str) – The path of the onnx model.
• output_file_prefix (str) – The path to save the output ncnn file.
227
mmdeploy Documentation, Release 1.3.1
FIFTYFIVE
APIS/PPLNN
229
mmdeploy Documentation, Release 1.3.1
FIFTYSIX
• genindex
• search
231
mmdeploy Documentation, Release 1.3.1
m
mmdeploy.apis, 215
mmdeploy.apis.ncnn, 227
mmdeploy.apis.onnxruntime, 225
mmdeploy.apis.pplnn, 229
mmdeploy.apis.tensorrt, 221
233
mmdeploy Documentation, Release 1.3.1
B module, 221
build_task_processor() (in module mmdeploy.apis), mmdeploy_classification_t (C++ struct), 52
215 mmdeploy_classification_t::label_id (C++
member), 52
C mmdeploy_classification_t::score (C++ mem-
create_calib_input_data() (in module mmde- ber), 52
ploy.apis), 215 mmdeploy_classifier_apply (C++ function), 52
mmdeploy_classifier_apply_async (C++ function),
E 53
mmdeploy_classifier_apply_v2 (C++ function), 53
extract_model() (in module mmdeploy.apis), 215
mmdeploy_classifier_create (C++ function), 52
mmdeploy_classifier_create_by_path (C++ func-
F tion), 52
from_onnx() (in module mmdeploy.apis.ncnn), 227 mmdeploy_classifier_create_input (C++ func-
from_onnx() (in module mmdeploy.apis.tensorrt), 221 tion), 53
mmdeploy_classifier_create_v2 (C++ function), 53
G mmdeploy_classifier_destroy (C++ function), 53
get_predefined_partition_cfg() (in module mmdeploy_classifier_get_result (C++ function),
mmdeploy.apis), 216 53
mmdeploy_classifier_release_result (C++ func-
I tion), 53
inference_model() (in module mmdeploy.apis), 217 mmdeploy_classifier_t (C++ type), 52
is_available() (in module mmdeploy.apis.ncnn), 227 mmdeploy_common_create_input (C++ function), 48
is_available() (in module mmde- mmdeploy_context_add (C++ function), 48
ploy.apis.onnxruntime), 225 mmdeploy_context_create (C++ function), 47
is_available() (in module mmdeploy.apis.pplnn), 229 mmdeploy_context_create_by_device (C++ func-
is_available() (in module mmdeploy.apis.tensorrt), tion), 47
222 mmdeploy_context_destroy (C++ function), 48
mmdeploy_context_t (C++ type), 47
L mmdeploy_data_type_t (C++ enum), 45
load() (in module mmdeploy.apis.tensorrt), 222 mmdeploy_data_type_t::MMDEPLOY_DATA_TYPE_COUNT
(C++ enumerator), 45
M mmdeploy_data_type_t::MMDEPLOY_DATA_TYPE_FLOAT
(C++ enumerator), 45
mmdeploy.apis
mmdeploy_data_type_t::MMDEPLOY_DATA_TYPE_HALF
module, 215
(C++ enumerator), 45
mmdeploy.apis.ncnn
mmdeploy_data_type_t::MMDEPLOY_DATA_TYPE_INT32
module, 227
(C++ enumerator), 45
mmdeploy.apis.onnxruntime
mmdeploy_data_type_t::MMDEPLOY_DATA_TYPE_UINT8
module, 225
(C++ enumerator), 45
mmdeploy.apis.pplnn
mmdeploy_detection_t (C++ struct), 54
module, 229
mmdeploy_detection_t::bbox (C++ member), 54
mmdeploy.apis.tensorrt
235
mmdeploy Documentation, Release 1.3.1
236 Index
mmdeploy Documentation, Release 1.3.1
Index 237
mmdeploy Documentation, Release 1.3.1
238 Index
mmdeploy Documentation, Release 1.3.1
O
onnx2tensorrt() (in module mmdeploy.apis.tensorrt),
222
Index 239