BatchNorm is not fused to Conv layer in TFlite conversion from customized QAT model?

### Issue type

Performance

### Have you reproduced the bug with TensorFlow Nightly?

No

### Source

source

### TensorFlow version

2.12

### Custom code

No

### OS platform and distribution

_No response_

### Mobile device

_No response_

### Python version

_No response_

### Bazel version

_No response_

### GCC/compiler version

_No response_

### CUDA/cuDNN version

_No response_

### GPU model and memory

_No response_

### Current behavior?

I have problem with customized QAT model. I am using TF2.12.

**Step 1:** Firstly, I have fp32 model and I create QAT model from it
`quant_aware_model = tfmot.quantization.keras.quantize_model(base_model)`
After that I convert `quant_aware_model ` to Tflite model and I check Tflite model with Neutron. I saw that BatchNorm is fused with Conv layer.

**Step 2:** But when I added new layer `Multiply` into the above fp32 model. Because this layer is not supported QAT by default, so I used similar source as in this link https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide#quantize_some_layers . I used this method `tfmot.quantization.keras.quantize_apply`. After that I converted the new QAT model to TFLite model and check TFlite model with Neutron. I saw that BatchNorm is not fused into Conv layer ==> inference time is increased much for tflite model.

As my understanding, BatchNorm could be fused into Conv layer only when using method `tfmot.quantization.keras.quantize_model`. Is that right?

With step 2, what I need to do, so that BatchNorm could be fused into Conv layer to reduce inference time?

Thank you.

### Standalone code to reproduce the issue

```shell
No
```

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BatchNorm is not fused to Conv layer in TFlite conversion from customized QAT model? #98324

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BatchNorm is not fused to Conv layer in TFlite conversion from customized QAT model? #98324

Description

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions