-
Notifications
You must be signed in to change notification settings - Fork 74.8k
Description
Issue type
Performance
Have you reproduced the bug with TensorFlow Nightly?
No
Source
source
TensorFlow version
2.12
Custom code
No
OS platform and distribution
No response
Mobile device
No response
Python version
No response
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
I have problem with customized QAT model. I am using TF2.12.
Step 1: Firstly, I have fp32 model and I create QAT model from it
quant_aware_model = tfmot.quantization.keras.quantize_model(base_model)
After that I convert quant_aware_model
to Tflite model and I check Tflite model with Neutron. I saw that BatchNorm is fused with Conv layer.
Step 2: But when I added new layer Multiply
into the above fp32 model. Because this layer is not supported QAT by default, so I used similar source as in this link https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide#quantize_some_layers . I used this method tfmot.quantization.keras.quantize_apply
. After that I converted the new QAT model to TFLite model and check TFlite model with Neutron. I saw that BatchNorm is not fused into Conv layer ==> inference time is increased much for tflite model.
As my understanding, BatchNorm could be fused into Conv layer only when using method tfmot.quantization.keras.quantize_model
. Is that right?
With step 2, what I need to do, so that BatchNorm could be fused into Conv layer to reduce inference time?
Thank you.
Standalone code to reproduce the issue
No