Revisiting the Transferability of Few-Shot Image Classification: A Frequency Spectrum Perspective
Abstract
:1. Introduction
- We clarify why there is a decline in performance and what information is transferred during the distribution shift in the few-shot image classification task from a frequency spectrum perspective.
- We adopt a causal perspective of few-shot image classification to demonstrate that non-causal frequencies impact transferability and introduce a straightforward yet efficient method, the FRSM, to weight frequencies.
- The experimental results indicate that the proposed FRSM method achieves superior performance compared with representative state-of-the-art methods in the few-shot image classification task.
2. Related Work
2.1. Few-Shot Image Classification
2.2. Frequency Spectrum Learning
3. Methodology
3.1. Few-Shot Image Classification
3.2. A Causal Graph of FSIC
- . The variable signifies the causal frequency that genuinely represents the inherent characteristic of the input data X, such as the object details in an image. Conversely, indicates the non-causal frequency typically resulting from biases in the data or superficial patterns, such as the background details of an image. Given that and coexist within the input data X, these causal relationships are established.
- . The variable F corresponds to the frequency of the input data X; that is, , with denoting the fast Fourier transformation. To produce F, the traditional learning approach uses both the non-causal frequency and the causal frequency to extract the discriminative features.
- . The primary aim of learning via the frequency spectrum is to ascertain the attributes of the input data X. The classifier will determine the prediction Y based on the frequency F; specifically, , where stands for the inverse fast Fourier transformation.
3.3. Frequency Spectrum Mask
4. Experiment
4.1. Experimental Set-Up
4.2. Experimental Results
- Q1. The transferability from base to novel classes. The experimental results in Figure 4 aim to answer Q1. In Figure 4, we show the similarity of the testing and training datasets according to the amplitude ratio. The testing datasets’ similarity to the training dataset was ordered as follows: miniImageNet-test > Cars > Plantae > EuroSAT > ISIC > ChestX. The similarity is known to affect the transferability of the training dataset features into the testing datasets. To this end, we found that the datasets’ similarity and few-shot difficulty simultaneously led to performance degradation.
- Q2. The information from base to novel classes. We conducted experiments on testing datasets, with the results shown in Figure 2. We found that the high-frequency components in different datasets had more similarity than the low-frequency components. Therefore, compared with the low-frequency components, the high-frequency components contributed to the transferability more.
- Q3. The performance of the FRSM. To answer Q3, we conducted experiments on eight datasets. From Table 2, which reports the average classification accuracy, we have the following findings. (1) Our proposed FRSM achieved the best performance in most settings. This is because our FRSM assigned a large weight to the causal frequency and a low weight to the non-causal one, thereby avoiding the influence of the confounder on the transferability. (2) Surprisingly, our FRSM was weaker than the baseline for relatively similar testing data (e.g., tieredImageNet). The possible reason for this is that the confounder is generally simpler. For example, learning the backgrounds of images is easier than learning the foreground, and this can help the transferability from similar base classes to novel classes. Our FRSM suppressed the confounder and led to performance degradation. Nevertheless, we should focus more on testing datasets with larger distribution shifts, because this case is more in line with real-world environments.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv. 2020, 53, 1–34. [Google Scholar] [CrossRef]
- Thrun, S.; Pratt, L. Learning to Learn; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Chen, W.Y.; Liu, Y.C.; Kira, Z.; Wang, Y.C.F.; Huang, J.B. A closer look at few-shot classification. In Proceedings of the International Conference on Learning Representations, ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Chen, Y.; Liu, Z.; Xu, H.; Darrell, T.; Wang, X. Meta-baseline: Exploring simple meta-learning for few-shot learning. In Proceedings of the International Conference on Computer Vision, ICCV, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Tian, Y.; Wang, Y.; Krishnan, D.; Tenenbaum, J.B.; Isola, P. Rethinking few-shot image classification: A good embedding is all you need? In Proceedings of the European Conference on Computer Vision, ECCV, Glasgow, UK, 23–28 August 2020.
- Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 9. [Google Scholar] [CrossRef]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching networks for one shot learning. In Proceedings of the Advances in Neural Information Processing Systems, NeurIPS, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R.S. Prototypical networks for few-shot learning. In Proceedings of the NeurIPS, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the ICML, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Rusu, A.A.; Rao, D.; Sygnowski, J.; Vinyals, O.; Pascanu, R.; Osindero, S.; Hadsell, R. Meta-Learning with Latent Embedding Optimization. In Proceedings of the International Conference on Learning Representations, ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Raghu, A.; Raghu, M.; Bengio, S.; Vinyals, O. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML. In Proceedings of the International Conference on Learning Representations, ICLR, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Tseng, H.Y.; Lee, H.Y.; Huang, J.B.; Yang, M.-H. Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation. In Proceedings of the International Conference on Learning Representations, ICLR, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Wang, H.; Deng, Z.H. Cross-Domain Few-Shot Classification via Adversarial Task Augmentation. In Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI, Montreal, QC, Canada, 19–27 August 2021. [Google Scholar]
- Sun, J.; Lapuschkin, S.; Samek, W.; Zhao, Y.; Cheung, N.M.; Binder, A. Explanation-guided training for cross-domain few-shot classification. In Proceedings of the International Conference on Pattern Recognition, ICPR, Milan, Italy, 10–15 January 2021. [Google Scholar]
- Oh, J.; Kim, S.; Ho, N.; Kim, J.H.; Song, H.; Yun, S.Y. Understanding Cross-Domain Few-Shot Learning Based on Domain Similarity and Few-Shot Difficulty. In Proceedings of the Advances in Neural Information Processing Systems, NeurIPS, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Islam, A.; Chen, C.F.R.; Panda, R.; Karlinsky, L.; Feris, R.; Radke, R.J. Dynamic distillation network for cross-domain few-shot recognition with unlabeled data. In Proceedings of the Advances in Neural Information Processing Systems, NeurIPS, Virtual, 6–14 December 2021. [Google Scholar]
- Long, Y.; Zhang, Q.; Zeng, B.; Gao, L.; Liu, X.; Zhang, J.; Song, J. Frequency domain model augmentation for adversarial attack. In Proceedings of the European Conference on Computer Vision, ECCV, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Xu, K.; Qin, M.; Sun, F.; Wang, Y.; Chen, Y.K.; Ren, F. Learning in the frequency domain. In Proceedings of the Computer Vision and Pattern Recognition, CVPR, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Yue, Z.; Zhang, H.; Sun, Q.; Hua, X.S. Interventional few-shot learning. In Proceedings of the Advances in Neural Information Processing Systems, NeurIPS, Virtual, 6–12 December 2020. [Google Scholar]
- Rajasegaran, J.; Khan, S.; Hayat, M.; Khan, F.S.; Shah, M. Self-supervised knowledge distillation for few-shot learning. In Proceedings of the British Machine Vision Conference, BMVC, Virtual, 22–25 November 2021. [Google Scholar]
- Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.S.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. In Proceedings of the Computer Vision and Pattern Recognition, CVPR, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Geirhos, R.; Rubisch, P.; Michaelis, C.; Bethge, M.; Wichmann, F.A.; Brendel, W. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In Proceedings of the International Conference on Learning Representations, ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Xu, Q.; Zhang, R.; Zhang, Y.; Wang, Y.; Tian, Q. A Fourier-based Framework for Domain Generalization. In Proceedings of the Computer Vision and Pattern Recognition, CVPR, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Huang, J.; Guan, D.; Xiao, A.; Lu, S. Fsdr: Frequency space domain randomization for domain generalization. In Proceedings of the Computer Vision and Pattern Recognition, CVPR, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Guo, Y.; Codella, N.C.; Karlinsky, L.; Codella, J.V.; Smith, J.R.; Saenko, K.; Rosing, T.; Feris, R. A Broader Study of Cross-Domain Few-Shot Learning. In Proceedings of the European Conference on Computer Vision, ECCV, Glasgow, UK, 23–28 August 2020. [Google Scholar]
- Li, W.; Wang, Z.; Yang, X.; Dong, C.; Tian, P.; Qin, T.; Huo, J.; Shi, Y.; Wang, L.; Gao, Y.; et al. Libfewshot: A comprehensive library for few-shot learning. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 14938–14955. [Google Scholar] [CrossRef] [PubMed]
- Zhong, X.; Gu, C.; Ye, M.; Huang, W.; Lin, C.W. Graph complemented latent representation for few-shot image classification. IEEE Trans. Multimed. 2022, 25, 1979–1990. [Google Scholar] [CrossRef]
- Cheng, H.; Zhou, J.T.; Tay, W.P.; Wen, B. Graph neural networks with triple attention for few-shot learning. IEEE Trans. Multimed. 2023, 25, 8225–8239. [Google Scholar] [CrossRef]
Dataset | Samples | Number of Classes | Number of Samples | ||||
---|---|---|---|---|---|---|---|
tieredImageNet | 160 | 206,209 | |||||
CUB-200-2011 | 200 | 11,788 | |||||
ISIC | 7 | 10,015 | |||||
CropDisease | 38 | 43,456 | |||||
Cars | 196 | 16,185 | |||||
EuroSAT | 10 | 27,000 | |||||
Plantae | 69 | 26,650 | |||||
ChestX | 7 | 25,848 |
Model | tieredImageNet | CUB-200-2011 | CropDisease | EuroSAT | Average | |||||
---|---|---|---|---|---|---|---|---|---|---|
5 Shot | 10 Shot | 5 Shot | 10 Shot | 5 Shot | 10 Shot | 5 Shot | 10 Shot | 5 Shot | 10 Shot | |
Baseline [3] | 65.49% | 71.72% | 55.41% | 63.80% | 81.21% | 87.84% | 70.12% | 76.99% | 68.06% | 75.09% |
Baseline++ [3] | 68.68% | 72.62% | 53.64% | 59.41% | 60.60% | 68.59% | 55.12% | 60.09% | 59.51% | 65.18% |
SKD [23] | 66.93% | 72.70% | 55.59% | 64.28% | 80.36% | 87.46% | 70.85% | 77.18% | 68.44% | 75.41% |
ProtoNet [10] | 71.81% | 76.17% | 57.65% | 64.34% | 80.15% | 85.76% | 69.03% | 73.10% | 69.66% | 74.84% |
RelationNet [24] | 70.16% | 64.86% | 55.31% | 51.18% | 61.57% | 53.50% | 51.56% | 38.57% | 59.65% | 52.03% |
GCLR [31] | 70.98% | 73.12% | 55.78% | 60.89% | 68.56% | 70.45% | 60.87% | 64.57% | 64.05% | 67.26% |
AGNN [32] | 72.34% | 74.56% | 55.89% | 62.31% | 76.45% | 75.68% | 70.45% | 72.96% | 68.78% | 71.38% |
MAML [11] | 68.27% | 72.49% | 53.09% | 59.30% | 76.25% | 82.43% | 65.86% | 70.18% | 65.87% | 71.10% |
LEO [12] | 70.47% | 74.61% | 56.79% | 62.11% | 65.76% | 69.24% | 59.16% | 61.67% | 63.04% | 66.91% |
FRSM (Ours) | 67.62% | 72.57% | 55.03% | 63.82% | 82.55% | 88.93% | 74.75% | 79.85% | 69.99% | 76.29% |
Baseline [3] | 47.96% | 55.22% | 42.67% | 50.11% | 42.00% | 49.05% | 24.73% | 26.60% | 39.34% | 45.25% |
Baseline++ [3] | 43.72% | 49.63% | 36.92% | 42.11% | 41.11% | 45.24% | 23.44% | 24.84% | 36.30% | 40.45% |
SKD [23] | 49.59% | 57.05% | 41.78% | 49.23% | 41.11% | 47.10% | 23.22% | 25.56% | 38.92% | 44.74% |
ProtoNet [10] | 49.82% | 56.08% | 41.07% | 47.13% | 41.96% | 48.64% | 25.03% | 26.52% | 39.47% | 44.59% |
RelationNet [24] | 42.26% | 39.47% | 36.11% | 33.18% | 32.63% | 28.25% | 23.23% | 22.82% | 33.56% | 30.93% |
GCLR [31] | 49.23% | 54.67% | 39.87% | 44.09% | 35.68% | 39.01% | 23.45% | 25.78% | 37.06% | 40.89% |
AGNN [32] | 50.45% | 55.78% | 40.78% | 48.60% | 23.45% | 46.90% | 24.10% | 27.01% | 34.70% | 44.57% |
MAML [11] | 43.85% | 48.82% | 38.54% | 43.09% | 42.38% | 46.18% | 23.34% | 23.78% | 37.03% | 40.47% |
LEO [12] | 48.32% | 53.45% | 38.86% | 43.32% | 36.22% | 38.47% | 22.94% | 23.59% | 36.59% | 39.71% |
FRSM (Ours) | 51.25% | 58.80% | 43.66% | 51.04% | 43.45% | 49.45% | 25.12% | 27.28% | 40.87% | 46.64% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, M.; Wang, Z.; Wang, D. Revisiting the Transferability of Few-Shot Image Classification: A Frequency Spectrum Perspective. Entropy 2024, 26, 473. https://doi.org/10.3390/e26060473
Zhang M, Wang Z, Wang D. Revisiting the Transferability of Few-Shot Image Classification: A Frequency Spectrum Perspective. Entropy. 2024; 26(6):473. https://doi.org/10.3390/e26060473
Chicago/Turabian StyleZhang, Min, Zhitao Wang, and Donglin Wang. 2024. "Revisiting the Transferability of Few-Shot Image Classification: A Frequency Spectrum Perspective" Entropy 26, no. 6: 473. https://doi.org/10.3390/e26060473
APA StyleZhang, M., Wang, Z., & Wang, D. (2024). Revisiting the Transferability of Few-Shot Image Classification: A Frequency Spectrum Perspective. Entropy, 26(6), 473. https://doi.org/10.3390/e26060473