Multi-view collective tensor decomposition for cross-modal hashing

Cui, Limeng; Zhang, Jiawei; He, Lifang; Yu, Philip S.

doi:10.1007/s13735-018-0164-0

Multi-view collective tensor decomposition for cross-modal hashing

Regular Paper
Published: 01 January 2019

Volume 8, pages 47–59, (2019)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

442 Accesses
Explore all metrics

Abstract

With the development of social media, data often come from a variety of sources in different modalities. These data contain complementary information that can be used to produce better learning algorithms. Such data exhibit dual heterogeneity: On the one hand, data obtained from multiple modalities are intrinsically different; on the other hand, features obtained from different disciplines are usually heterogeneous. Existing methods often consider the first facet while ignoring the second. Thus, in this paper, we propose a novel multi-view cross-modal hashing method named Multi-view Collective Tensor Decomposition (MCTD) to mitigate the dual heterogeneity at the same time, which can fully exploit the multimodal multi-view feature while simultaneously discovering multiple separated subspaces by leveraging the data categories as supervision information. We propose a novel cross-modal retrieval framework which consists of three components: (1) two tensors which model the multi-view features from different modalities in order to get better representation of the complementary features and a latent representation space; (2) a block-diagonal loss which is used to explicitly enforce a more discriminative latent space by leveraging supervision information; and (3) two feature projection matrices which characterize the data and generate the latent representation for incoming new queries. We use an iterative updating optimization algorithm to solve the objective function designed for MCTD. Extensive experiments prove the effectiveness of MCTD compared with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Online supervised collective matrix factorization hashing for cross-modal retrieval

Article 22 October 2022

Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing

Article 03 May 2018

High-Order Correlation Embedding for Large-Scale Multi-modal Hashing

Notes

http://www.svcl.ucsd.edu/projects/crossmodal/.
http://www.cs.utexas.edu/~grauman/research/datasets.html.
http://ise.thss.tsinghua.edu.cn/MIG/code_data_cm.zip.
http://ise.thss.tsinghua.edu.cn/MIG/LSSH_code.rar.
https://bitbucket.org/linzijia72/.
We thank the authors for kindly providing the codes.
https://github.com/jiangqy/DCMH-CVPR2017.

References

Antipov G, Berrani SA, Ruchaud N, Dugelay JL (2015) Learned versus hand-crafted features for pedestrian gender recognition. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM, pp 1263–1266
Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on. IEEE, pp 3594–3601
Cao B, Zhou H, Li G, Yu PS (2016) Multi-view machines. In: Proceedings of the ninth ACM international conference on web search and data mining. ACM, pp 427–436
Cao Y, Long M, Wang J, Liu S (2017) Collective deep quantization for efficient cross-modal retrieval. In: AAAI, pp 3974–3980
Cao Y, Long M, Wang J, Yang Q, Yu PS (2016) Deep visual-semantic hashing for cross-modal retrieval. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1445–1454
Ding C, Tao D (2015) Robust face recognition via multimodal deep face representation. IEEE Trans Multimedia 17(11):2049–2058
Article Google Scholar
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233
Article Google Scholar
Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929
Article Google Scholar
Huang X, Peng Y, Yuan M (2017) Cross-modal common representation learning by hybrid transfer network. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 1893–1900
Hwang SJ, Grauman K (2012) Reading between the lines: object localization using implicit cues from image tags. IEEE Trans Pattern Anal Mach Intell 34(6):1145–1158
Article Google Scholar
Jiang QY, Li WJ (2017) Deep cross-modal hashing. In: Computer vision and pattern recognition (CVPR), 2017 IEEE conference on. IEEE, pp 3270–3278
Jin L, Gao S, Li Z, Tang J (2014) Hand-crafted features or machine learnt features? Together they improve rgb-d object recognition. In: Multimedia (ISM), 2014 IEEE international symposium on. IEEE, pp 311–319
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500
Article MathSciNet MATH Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: IJCAI proceedings-international joint conference on artificial intelligence, vol 22, p 1360
Li K, Qi GJ, Ye J, Hua KA (2017) Linear subspace ranking hashing for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 39(9):1825–1838
Article Google Scholar
Lin Z, Ding G, Han J, Wang J (2017) Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans Cybernet 47(12):4342–4355
Article Google Scholar
Liu H, Ji R, Wu Y, Hua G (2016) Supervised matrix factorization for cross-modality hashing. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence. AAAI Press, pp 1767–1773
Lu X, Wu F, Tang S, Zhang Z, He X, Zhuang Y (2013) A low rank structural large margin method for cross-modal ranking. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 433–442
Moran S, Lavrenko V (2015) Regularised cross-modal hashing. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 907–910
Mørup M, Hansen LK, Arnfred SM (2008) Algorithms for sparse nonnegative Tucker decompositions. Neural Computation 20(8):2112–2131
Article MATH Google Scholar
Peng Y, Huang X, Qi J (2016) Cross-media shared representation by hierarchical learning with multiple deep networks. In: IJCAI, pp 3846–3853
Peng Y, Qi J, Huang X, Yuan Y (2018) Ccl: cross-modal correlation learning with multigrained fusion by hierarchical network. IEEE Trans Multimedia 20(2):405–420
Article Google Scholar
Qi J, Peng Y (2018) Cross-modal bidirectional translation via reinforcement learning. In: IJCAI, pp 2630–2636
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on Multimedia. ACM, pp 251–260
Rendle S (2010) Factorization machines. In: Data mining (ICDM), 2010 IEEE 10th international conference on. IEEE, pp 995–1000
Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on. IEEE, pp 2160–2167
Shen X, Shen F, Sun QS, Yang Y, Yuan YH, Shen HT (2017) Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval. IEEE Trans Cybern 47(12):4275–4288
Article Google Scholar
Shen X, Shen F, Sun QS, Yuan YH (2015) Multi-view latent hashing for efficient multimedia search. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM, pp 831–834
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data. ACM, pp 785–796
Tang J, Jin L, Li Z, Gao S (2015) Rgb-d object recognition via incorporating latent data structure and prior knowledge. IEEE Trans Multimedia 17(11):1899–1908
Article Google Scholar
Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166
Article MathSciNet Google Scholar
Wang J, Shen HT, Song J, Ji J (2014) Hashing for similarity search: a survey. arXiv preprint arXiv:1408.2927
Wang K, Yin Q, Wang W, Wu S, Wang L (2016) A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215
Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2017) Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans Cybern 47(2):449–460
Google Scholar
Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26(5):2494–2507
Article MathSciNet Google Scholar
Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 175–184
Yao T, Kong X, Fu H, Tian Q (2016) Semantic consistency hashing for cross-modal retrieval. Neurocomputing 193:250–259
Article Google Scholar
Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. AAAI 1:7
Google Scholar
Zhang J, Peng Y (2017) Ssdh: semi-supervised deep hashing for large scale image retrieval. IEEE Trans Circuits Syst Video Technol
Zhang J, Peng Y (2018) Query-adaptive image retrieval by deep weighted hashing. IEEE Trans Multimedia
Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing
Zhen Y, Yeung DY (2012) Co-regularized hashing for multimodal data. In: Advances in neural information processing systems, pp 1376–1384
Zhen Y, Yeung DY (2012) A probabilistic model for multimodal hash function learning. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 940–948
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval. ACM, pp 415–424
Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 143–152

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China under Grant No.: 61672313 and 61503253, the National Science Foundation under Grant Nos.: IIS-1526499, IIS-1763365 and CNS-1626432 and Natural Science Foundation of Guangdong Province under Grant No.: 2017A030313339.

Author information

Authors and Affiliations

College of Information Science and Technology, Pennsylvania State University, State College, PA, USA
Limeng Cui
IFM Lab, Department of Computer Science, Florida State University, Tallahassee, FL, USA
Jiawei Zhang
Weill Cornell Medicine, Cornell University, New York, NY, USA
Lifang He
Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA
Philip S. Yu

Authors

Limeng Cui
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lifang He
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Limeng Cui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cui, L., Zhang, J., He, L. et al. Multi-view collective tensor decomposition for cross-modal hashing. Int J Multimed Info Retr 8, 47–59 (2019). https://doi.org/10.1007/s13735-018-0164-0

Download citation

Received: 01 September 2018
Revised: 06 December 2018
Accepted: 12 December 2018
Published: 01 January 2019
Issue Date: 07 March 2019
DOI: https://doi.org/10.1007/s13735-018-0164-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Multi-view collective tensor decomposition for cross-modal hashing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Online supervised collective matrix factorization hashing for cross-modal retrieval

Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing

High-Order Correlation Embedding for Large-Scale Multi-modal Hashing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-view collective tensor decomposition for cross-modal hashing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Online supervised collective matrix factorization hashing for cross-modal retrieval

Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing

High-Order Correlation Embedding for Large-Scale Multi-modal Hashing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation