-
Notifications
You must be signed in to change notification settings - Fork 4.2k
[mobile] Mobile Perf Recipe #1031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploy preview for pytorch-tutorials-preview ready! Built with commit 7b182fe https://deploy-preview-1031--pytorch-tutorials-preview.netlify.app |
2db178e
to
ff05ebc
Compare
a3c5fd7
to
bac6bf6
Compare
recipes_source/mobile_perf.rst
Outdated
:: | ||
|
||
from torch.utils.mobile_optimizer import optimize_for_mobile | ||
traced_model = torch.jit.load("input_model_path") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total nitpick: Not all TorchScript models are traced - q.v., torch.jit.script()
, method decorations. I'd name this var torchscript_model
or similar for clarity & accuracy.
recipes_source/mobile_perf.rst
Outdated
|
||
from torch.utils.mobile_optimizer import optimize_for_mobile | ||
traced_model = torch.jit.load("input_model_path") | ||
optimized_model = optimize_for_mobile(traced_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are valid TorchScript models (googlenet & inception_v3 in torchvision) that segfault on this line. Other than that, the instructions work and there's a modest (<=2.2%) improvement in file size for these models:
mobilenet_v2
resnet18
alexnet
squeezenet1_0
vgg16
densenet161
shufflenet_v2_x1_0
mnasnet1_0
recipes_source/mobile_perf.rst
Outdated
|
||
2. Fuse operators using ``torch.quantization.fuse_modules`` | ||
Do not be confused that fuse_modules is in the quantization package. | ||
It works for all types of torch script modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
- PMM is going to come back and say that we should write it TorchScript.
- The code below does not pass a TorchScript module to
fuse_modules()
- that MobileNet v2 from TorchVision is not a subclass ofScriptModule
.
Passing in the original torchvision module itself or a version of it processed by torch.jit.script()
works, with the latter giving ~2% file size improvement.
recipes_source/mobile_perf.rst
Outdated
m = torchvision.models.mobilenet_v2(pretrained=True) | ||
m.eval() | ||
fuse_model(m) | ||
torch.jit.trace(m, torch.rand(1, 3, 224, 224)).save("mobilenetV2-bnfused.pt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be guiding people to torch.jit.trace()
? torch.jit.script()
preserves control flow, trace()
does not. trace()
is still there for cases where script()
hits an unsupported op.
recipes_source/mobile_perf.rst
Outdated
model.eval() | ||
script_model = torch.jit.script(model) | ||
x = torch.rand(1, 3 , 224, 224) | ||
y = script_model(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y
is never used, which means we could do without x
as well.
recipes_source/mobile_perf.rst
Outdated
|
||
supported_qengines = torch.backends.quantized.supported_engines | ||
print(supported_qengines) | ||
model = torchvision.models.quantization.__dict__['mobilenet_v2'](pretrained=True, quantize=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
- Is this the quantization workflow we want to show? This looks like it's just pulling down a pre-quantized version of the model, in which case this recipe is only useful if you want to quantize the torchvision models. In the general case, I'd think people would want to be able to quantize their own trained models for mobile deployment.
- On MacOS, this line throws a warning:
/Users/bradheintz/anaconda2/envs/pyto16pre/lib/python3.8/site-packages/torch/nn/quantized/modules/utils.py:8: UserWarning: 0quantize_tensor_per_tensor_affine current rounding mode is not set to round-to-nearest-ties-to-even (FE_TONEAREST). This will cause accuracy issues in quantized models. (Triggered internally at ../aten/src/ATen/native/quantized/affine_quantizer.cpp:25.)
qweight = torch.quantize_per_tensor(
/Users/bradheintz/anaconda2/envs/pyto16pre/lib/python3.8/site-packages/torch/quantization/observer.py:134: UserWarning: must run observer before calling calculate_qparams. Returning default scale and zero point
warnings.warn(
(And yes, it really looks like that in my terminal.) The warning came up in this env:
# torch 1.6.0a0+55bcb5d built from master with USE_CUDA=0
# torchvision 0.7.0a0+148bac2 built from master with USE_CUDA=0
# python 3.8.0
# MacOS 10.15.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should show a workflow where we start with a floating point model and then do the quantization. The steps are:
# Start with a fully trained floating point model
# The model code is modified to enable eager mode quantization, for more details
# please see the quantization tutorials at: https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html
model = torchvision.models.quantization.__dict__['resnet18'](pretrained=True, quantize=False)
torch.backends.quantized.engine='qnnpack'
# We convert the float model with the appropriate_Qconfig
model.eval()
model.qconfig = torch.quantization.get_default_qconfig('qnnpack')
torch.quantization.prepare(model)
# Run model with representative data for calibration
# model(calibration_data)
torch.quantization.convert(model)
script_model = torch.jit.script(model)
# Export to mobile
script_model._save_for_lite_interpreter("model.bc")
recipes_source/mobile_perf.rst
Outdated
model = torchvision.models.quantization.__dict__['mobilenet_v2'](pretrained=True, quantize=True) | ||
torch.backends.quantized.engine='qnnpack' | ||
model.eval() | ||
script_model = torch.jit.script(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually know the answer to this: Is it preferable to do quantization before or after TorchScript conversion, or does it matter at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, pytorch only supports doing quantization prior to scripting. We are working on adding support for quantization after scripting, but it is not part of release 1.6 yet.
3ababf9
to
f11a36c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great; I'm still tweaking my Android custom build env for the last couple of recipes.
import torch | ||
from torch.utils.mobile_optimizer import optimize_for_mobile | ||
|
||
class AnnotatedConvBnReLUModel(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including a sample model here is a great idea - this will help users generalize to their own use case.
e9d31e3
to
3839516
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added notes for one bug in the quantization step
recipes_source/mobile_perf.rst
Outdated
:: | ||
|
||
model.qconfig = torch.quantization.get_default_qconfig('qnnpack') | ||
torch.quantization.prepare(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prepare()
has inplace=False
by default, and the same goes for convert()
, so this whole method is a no-op except for setting model.qconfig
.
We either need to do:
model = torch.quantization.prepare(model)
# calibration
return torch.quantization.convert(model)
or:
torch.quantization.prepare(model, inplace=True)
# calibration
torch.quantization.convert(model, inplace=True)
7d047d3
to
910dfa9
Compare
No description provided.