


default search action
31st ACM Multimedia 2023: Ottawa, ON, Canada
- Abdulmotaleb El-Saddik, Tao Mei, Rita Cucchiara, Marco Bertini, Diana Patricia Tobon Vallejo, Pradeep K. Atrey, M. Shamim Hossain:
Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023. ACM 2023
Keynote Talks
- Chang Wen Chen
:
Internet of Video Things: Technical Challenges and Emerging Applications. 1-2 - Alejandro Jaimes
:
Multimodal AI & LLMs for Peacekeeping and Emergency Response. 3-4 - Ralf Steinmetz
:
Transition and Adaptability: The Cornerstone of Resilience in Future Networked Multimedia Systems and Beyond. 5-6
Oral Session I: Understanding Multimedia Content -- Media Interpretation
- Hao Shen
, Zhong-Qiu Zhao
, Yulun Zhang
, Zhao Zhang
:
Mutual Information-driven Triple Interaction Network for Efficient Image Dehazing. 7-16 - Yang Jiao
, Zequn Jie
, Jingjing Chen, Lin Ma
, Yu-Gang Jiang
:
Suspected Objects Matter: Rethinking Model's Prediction for One-stage Visual Grounding. 17-26 - Sophyani Banaamwini Yussif
, Ning Xie
, Yang Yang
, Heng Tao Shen
:
Self-Relational Graph Convolution Network for Skeleton-Based Action Recognition. 27-36 - Qian Ning
, Fangfang Wu
, Weisheng Dong
, Xin Li
, Guangming Shi
:
Exploring Correlations in Degraded Spatial Identity Features for Blind Face Restoration. 37-45 - Chuhao Zhou
, Jinxing Li
, Huafeng Li
, Guangming Lu
, Yong Xu
, Min Zhang
:
Video-based Visible-Infrared Person Re-Identification via Style Disturbance Defense and Dual Interaction. 46-55 - Wenmiao Hu
, Yichen Zhang
, Yuxuan Liang
, Xianjing Han
, Yifang Yin
, Hannes Kruppa
, See-Kiong Ng
, Roger Zimmermann
:
PetalView: Fine-grained Location and Orientation Extraction of Street-view Images via Cross-view Local Search. 56-66 - Haorui Wang
, Yibo Hu
, Yangfu Zhu
, Jinsheng Qi
, Bin Wu
:
Shifted GCN-GAT and Cumulative-Transformer based Social Relation Recognition for Long Videos. 67-76 - Jilong Wang
, Saihui Hou
, Yan Huang
, Chunshui Cao
, Xu Liu
, Yongzhen Huang
, Liang Wang
:
Causal Intervention for Sparse-View Gait Recognition. 77-85 - Digbalay Bose
, Rajat Hebbar
, Tiantian Feng
, Krishna Somandepalli
, Anfeng Xu
, Shrikanth Narayanan
:
MM-AU: Towards Multimodal Understanding of Advertisement Videos. 86-95 - Huiwei Lin
, Shanshan Feng
, Baoquan Zhang
, Hongliang Qiao
, Xutao Li
, Yunming Ye
:
UER: A Heuristic Bias Addressing Approach for Online Continual Learning. 96-104 - Peng Wu
, Xiankai Lu
, Jianbing Shen
, Yilong Yin
:
Clip Fusion with Bi-level Optimization for Human Mesh Reconstruction from Monocular Videos. 105-115 - Jinkai Zheng
, Xinchen Liu
, Shuai Wang
, Lihao Wang
, Chenggang Yan
, Wu Liu
:
Parsing is All You Need for Accurate Gait Recognition in the Wild. 116-124 - Dingyi Zhang
, Yingming Li
, Zhongfei Zhang
:
Multi-Scale Similarity Aggregation for Dynamic Metric Learning. 125-134 - Yue Feng
, Zhengye Zhang
, Rong Quan
, Limin Wang
, Jie Qin
:
RefineTAD: Learning Proposal-free Refinement for Temporal Action Detection. 135-143 - Zhenguang Liu
, Xinyang Yu
, Ruili Wang
, Shuai Ye
, Zhe Ma
, Jianfeng Dong
, Sifeng He
, Feng Qian
, Xiaobo Zhang
, Roger Zimmermann
, Lei Yang
:
Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization. 144-152 - Dongbao Yang
, Yu Zhou
, Xiaopeng Hong
, Aoting Zhang
, Xin Wei
, Linchengxi Zeng
, Zhi Qiao
, Weiping Wang:
Pseudo Object Replay and Mining for Incremental Object Detection. 153-162 - Shiqin Wang
, Xin Xu
, Xianzheng Ma
, Kui Jiang
, Zheng Wang
:
Informative Classes Matter: Towards Unsupervised Domain Adaptive Nighttime Semantic Segmentation. 163-172 - Ye Tian
, Mengyu Yang
, Lanshan Zhang
, Zhizhen Zhang
, Yang Liu
, Xiaohui Xie
, Xirong Que
, Wendong Wang
:
View while Moving: Efficient Video Recognition in Long-untrimmed Videos. 173-183 - Yimin Deng
, Huaizhen Tang
, Xulong Zhang
, Jianzong Wang
, Ning Cheng
, Jing Xiao
:
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion. 184-192 - Gege Shi
, Xueyang Fu
, Chengzhi Cao
, Zheng-Jun Zha
:
Alleviating Spatial Misalignment and Motion Interference for UAV-based Video Recognition. 193-202 - Yang Liu
, Zhaoyang Xia
, Mengyang Zhao
, Donglai Wei
, Yuzheng Wang
, Siao Liu
, Bobo Ju
, Gaoyun Fang
, Jing Liu
, Liang Song
:
Learning Causality-inspired Representation Consistency for Video Anomaly Detection. 203-212 - Dongyue Guo
, Yi Lin
, Xuehang You
, Zhongping Yang
, Jizhe Zhou
, Bo Yang
, Jianwei Zhang
, Han Shi
, Shasha Hu
, Zheng Zhang
:
M2ATS: A Real-world Multimodal Air Traffic Situation Benchmark Dataset and Beyond. 213-221 - Jianghu Lu
, Shikun Li
, Kexin Bao
, Pengju Wang
, Zhenxing Qian
, Shiming Ge
:
Federated Learning with Label-Masking Distillation. 222-232 - Lingxiao Lu
, Jiangtong Li
, Junyan Cao
, Li Niu
, Liqing Zhang
:
Painterly Image Harmonization using Diffusion Model. 233-241 - Xingran Xie
, Ting Jin
, Boxiang Yun
, Qingli Li
, Yan Wang
:
Exploring Hyperspectral Histopathology Image Segmentation from a Deformable Perspective. 242-251 - Runhua Jiang
, Yahong Han
:
Uncertainty-Aware Variate Decomposition for Self-supervised Blind Image Deblurring. 252-260
Oral Session II: Understanding Multimedia Content -- Multimodal Fusion and Embedding
- Chao Sun
, Min Chen
, Jialiang Cheng
, Han Liang
, Chuanbo Zhu
, Jincai Chen
:
SCLAV: Supervised Cross-modal Contrastive Learning for Audio-Visual Coding. 261-270 - Feng Lin
, Kaiqiang Fu
, Hao Luo
, Ziyue Zhan
, Zhibo Wang
, Zhenguang Liu
, Lorenzo Cavallaro
, Kui Ren
:
Cross-Modal and Multi-Attribute Face Recognition: A Benchmark. 271-279 - Ye Wang
, Junyang Chen
, Mengzhu Wang, Hao Li
, Wei Wang, Houcheng Su, Zhihui Lai
, Wei Wang, Zhenghan Chen
:
A Closer Look at Classifier in Adversarial Domain Generalization. 280-289 - Mengzhu Wang
, Jianlong Yuan
, Zhibin Wang
:
Mixture-of-Experts Learner for Single Long-Tailed Domain Generalization. 290-299 - Chao Zhang
, Jingwen Wei
, Bo Wang
, Zechao Li
, Chunlin Chen
, Huaxiong Li
:
Robust Spectral Embedding Completion Based Incomplete Multi-view Clustering. 300-308 - Jinhui Pang
, Zixuan Wang
, Jiliang Tang
, Mingyan Xiao
, Nan Yin
:
SA-GDA: Spectral Augmentation for Graph Domain Adaptation. 309-318 - Xihong Yang
, Cheng Tan
, Yue Liu
, Ke Liang
, Siwei Wang
, Sihang Zhou
, Jun Xia
, Stan Z. Li
, Xinwang Liu
, En Zhu
:
CONVERT: Contrastive Graph Clustering with Reliable Augmentation. 319-327 - Jintian Ji
, Songhe Feng
:
High-order Complementarity Induced Fast Multi-View Clustering with Enhanced Tensor Rank Minimization. 328-336 - Xihong Yang
, Jiaqi Jin
, Siwei Wang
, Ke Liang
, Yue Liu
, Yi Wen
, Suyuan Liu
, Sihang Zhou
, Xinwang Liu
, En Zhu
:
DealMVC: Dual Contrastive Calibration for Multi-view Clustering. 337-346 - Junming Hou
, Qi Cao
, Ran Ran
, Che Liu
, Junling Li
, Liang-Jian Deng
:
Bidomain Modeling Paradigm for Pansharpening. 347-357 - Yingying Wang
, Yunlong Lin
, Ge Meng
, Zhenqi Fu
, Yuhang Dong
, Linyu Fan
, Hedeng Yu
, Xinghao Ding
, Yue Huang:
Learning High-frequency Feature Enhancement and Alignment for Pan-sharpening. 358-367 - Xingfeng Li
, Yinghui Sun
, Quansen Sun
, Jia Dai
, Zhenwen Ren
:
Distribution Consistency based Fast Anchor Imputation for Incomplete Multi-view Clustering. 368-376 - Yushen Wei
, Yang Liu
, Hong Yan
, Guanbin Li
, Liang Lin
:
Visual Causal Scene Refinement for Video Question Answering. 377-386 - Hongye Liu
, Xianhai Xie
, Yang Gao
, Zhou Yu
:
Parameter-Efficient Transfer Learning for Audio-Visual-Language Tasks. 387-396 - Xi Chen
, Yun Xiong
, Siqi Wang
, Haofen Wang
, Tao Sheng
, Yao Zhang
, Yu Ye
:
ReCo: A Dataset for Residential Community Layout Planning. 397-405 - Runmin Cong
, Hongyu Liu
, Chen Zhang
, Wei Zhang
, Feng Zheng
, Ran Song
, Sam Kwong
:
Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection. 406-416 - Jinrong Cui
, Yuting Li
, Yulu Fu
, Jie Wen
:
Multi-view Self-Expressive Subspace Clustering Network. 417-425 - Jian Huang
, Yanli Ji
, Yang Yang
, Heng Tao Shen
:
Cross-modality Representation Interactive Learning for Multimodal Sentiment Analysis. 426-434 - Yixuan Ma
, Xiaolin Zhang
, Peng Zhang
, Kun Zhan
:
Entropy Neural Estimation for Graph Contrastive Learning. 435-443 - Liguo Zhang
, Zilin Tian
, Yunfei Long
, Sizhao Li
, Guisheng Yin
:
Cross-modal and Cross-medium Adversarial Attack for Audio. 444-453 - Liang Peng
, Xin Wang
, Xiaofeng Zhu
:
Unsupervised Multiplex Graph learning with Complementary and Consistent Information. 454-462 - Yixuan Wu
, Jintai Chen
, Jiahuan Yan
, Yiheng Zhu
, Danny Z. Chen
, Jian Wu
:
GCL: Gradient-Guided Contrastive Learning for Medical Image Segmentation with Multi-Perspective Meta Labels. 463-471 - Zhiying Jiang
, Zengxi Zhang
, Jinyuan Liu
, Xin Fan
, Risheng Liu
:
Multi-Spectral Image Stitching via Spatial Graph Reasoning. 472-480 - Jiaming Zhuo
, Can Cui
, Kun Fu
, Bingxin Niu
, Dongxiao He
, Yuanfang Guo
, Zhen Wang
, Chuan Wang
, Xiaochun Cao
, Liang Yang
:
Propagation is All You Need: A New Framework for Representation Learning and Classifier Training on Graphs. 481-489 - Yao Wu
, Mingwei Xing, Yachao Zhang
, Yuan Xie
, Jianping Fan
, Zhongchao Shi
, Yanyun Qu
:
Cross-modal Unsupervised Domain Adaptation for 3D Semantic Segmentation via Bidirectional Fusion-then-Distillation. 490-498
Oral Session III: Understanding Multimedia Content -- Vision and Language
- Yinjie Zhao
, Lichen Zhao
, Qian Yu
, Lu Sheng
, Jing Zhang
, Dong Xu
:
Distortion-aware Transformer in 360° Salient Object Detection. 499-508 - Zixiao Wang
, Hongtao Xie
, Yuxin Wang
, Jianjun Xu
, Boqiang Zhang
, Yongdong Zhang
:
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition. 509-518 - Bo Zou
, Chao Yang
, Chengbin Quan
, Youjian Zhao
:
SpaceCLIP: A Vision-Language Pretraining Framework With Spatial Reconstruction On Text. 519-528 - Xu Huang
, Jin Liu
, Zhizhong Zhang
, Yuan Xie
:
Improving Cross-Modal Recipe Retrieval with Component-Aware Prompted CLIP Embedding. 529-537 - Shuhan Kong
, Liang Li
, Beichen Zhang
, Wenyu Wang
, Bin Jiang
, Chenggang Yan
, Changhao Xu
:
Dynamic Contrastive Learning with Pseudo-samples Intervention for Weakly Supervised Joint Video MR and HD. 538-546 - Zheng Yuan, Qiao Jin, Chuanqi Tan, Zhengyun Zhao, Hongyi Yuan, Fei Huang, Songfang Huang:
RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training. 547-556 - Xiao Wang
, Yaoyu Li
, Tian Gan
, Zheng Zhang
, Jingjing Lv
, Liqiang Nie
:
RTQ: Rethinking Video-language Understanding Based on Image-text Model. 557-566 - Shanshan Zhong
, Zhongzhan Huang
, Wushao Wen
, Jinghui Qin
, Liang Lin
:
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models. 567-578 - Xin Dong
, Rui Wang
, Siyuan Liang
, Aishan Liu
, Lihua Jing
:
Face Encryption via Frequency-Restricted Identity-Agnostic Attacks. 579-588 - Peipei Song
, Dan Guo
, Xun Yang
, Shengeng Tang
, Erkun Yang
, Meng Wang
:
Emotion-Prior Awareness Network for Emotional Video Captioning. 589-600 - Dong Liu
, Qirong Mao
, Lijian Gao
, Qinghua Ren
, Zhenghan Chen
, Ming Dong
:
TE-KWS: Text-Informed Speech Enhancement for Noise-Robust Keyword Spotting. 601-610 - Jiancheng Pan
, Qing Ma
, Cong Bai
:
A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval. 611-620 - Nirmalendu Prakash
, Han Wang
, Nguyen-Khoi Hoang
, Ming Shan Hee
, Roy Ka-Wei Lee
:
PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using Large Language Models. 621-631 - Yue Lv, Jinxi Xiang, Jun Zhang, Wenming Yang, Xiao Han
, Wei Yang:
Dynamic Low-Rank Instance Adaptation for Universal Neural Image Compression. 632-642 - Leigang Qu
, Shengqiong Wu
, Hao Fei
, Liqiang Nie
, Tat-Seng Chua
:
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation. 643-654 - Yue Zhang
, Suchen Wang
, Shichao Kan
, Zhenyu Weng
, Yigang Cen
, Yap-Peng Tan
:
POAR: Towards Open Vocabulary Pedestrian Attribute Recognition. 655-665 - Shengshan Hu
, Wei Liu
, Minghui Li
, Yechao Zhang
, Xiaogeng Liu
, Xianlong Wang
, Leo Yu Zhang
, Junhui Hou
:
PointCRT: Detecting Backdoor in 3D Point Cloud via Corruption Robustness. 666-675 - Rui Qin
, Ming Sun
, Fangyuan Zhang
, Xing Wen
, Bin Wang
:
Blind Image Super-resolution with Rich Texture-Aware Codebook. 676-687 - Zizhang Wu
, Zhuozheng Li
, Zhi-Gang Fan
, Yunzhe Wu
, Jian Pu
, Xianzhi Li
:
V2Depth: Monocular Depth Estimation via Feature-Level Virtual-View Simulation and Refinement. 688-697 - Kai Chen
, Zhipeng Wei
, Jingjing Chen
, Zuxuan Wu
, Yu-Gang Jiang
:
GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos. 698-708 - Lianyu Hu
, Liqing Gao
, Zekang Liu
, Chi-Man Pun
, Wei Feng
:
AdaBrowse: Adaptive Video Browser for Efficient Continuous Sign Language Recognition. 709-718 - Lingfeng Li
, Gangming Zhao
, Yizhou Yu
, Jinpeng Li
:
Dynamic Triple Reweighting Network for Automatic Femoral Head Necrosis Diagnosis from Computed Tomography. 719-727 - Liu Liu, Jianming Du
, Hao Wu
, Xun Yang
, Zhenguang Liu
, Richang Hong
, Meng Wang
:
Category-Level Articulated Object 9D Pose Estimation via Reinforcement Learning. 728-736 - Qichao Ying
, Jiaxin Liu
, Sheng Li
, Haisheng Xu
, Zhenxing Qian
, Xinpeng Zhang
:
RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection. 737-746 - Xueyi Zhang
, Chengwei Zhang
, Tao Wang
, Jun Tang
, Songyang Lao
, Haizhou Li
:
Slow-Fast Time Parameter Aggregation Network for Class-Incremental Lip Reading. 747-756 - Yang Bai
, Jingyao Wang
, Min Cao
, Chen Chen
, Ziqiang Cao
, Liqiang Nie
, Min Zhang
:
Text-based Person Search without Parallel Image-Text Data. 757-767 - Jiawei Liang
, Siyuan Liang
, Aishan Liu
, Ke Ma
, Jingzhi Li
, Xiaochun Cao
:
Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation. 768-778 - Sun'ao Liu
, Yiheng Zhang
, Zhaofan Qiu
, Hongtao Xie
, Yongdong Zhang
, Ting Yao
:
CARIS: Context-Aware Referring Image Segmentation. 779-788 - Shizhou Zhang
, Qingchun Yang
, De Cheng
, Yinghui Xing
, Guoqiang Liang
, Peng Wang
, Yanning Zhang
:
Ground-to-Aerial Person Search: Benchmark Dataset and Approach. 789-799 - Fan Jiang
, Zilei Wang
:
Sparse Sharing Relation Network for Panoptic Driving Perception. 800-808
Oral Session IV: Engaging Users with Multimedia -- Emotional and Social Signals
- Daoming Zong
, Chaoyue Ding
, Baoxiang Li
, Jiakui Li
, Ken Zheng
, Qunyan Zhou
:
AcFormer: An Aligned and Compact Transformer for Multimodal Sentiment Analysis. 833-842 - Zeng Tao
, Yan Wang
, Zhaoyu Chen
, Boyang Wang
, Shaoqi Yan
, Kaixun Jiang
, Shuyong Gao
, Wenqiang Zhang
:
Freq-HD: An Interpretable Frequency-based High-Dynamics Affective Clip Selection Method for in-the-Wild Facial Expression Recognition in Videos. 843-852 - Peiguang Jing
, Xianyi Liu
, Ji Wang
, Yinwei Wei
, Liqiang Nie
, Yuting Su
:
StyleEDL: Style-Guided High-order Attention Network for Image Emotion Distribution Learning. 853-861 - Junjie Zhu
, Bingjun Luo
, Ao Sun
, Jinghang Tan
, Xibin Zhao
, Yue Gao
:
Variance-Aware Bi-Attention Expression Transformer for Open-Set Facial Expression Recognition in the Wild. 862-870 - Zixin Zhang
, Fan Qi
, Shuai Li
, Changsheng Xu
:
AffectFAL: Federated Active Affective Computing with Non-IID Data. 871-882 - Peiliang Gong
, Ziyu Jia
, Pengpai Wang
, Yueying Zhou
, Daoqiang Zhang
:
ASTDF-Net: Attention-Based Spatial-Temporal Dual-Stream Fusion Network for EEG-Based Emotion Recognition. 883-892
Oral Session V: Engaging Users with Multimedia -- Multimedia Search and Recommendation
- Yishu Liu
, Qingpeng Wu
, Zheng Zhang
, Jingyi Zhang
, Guangming Lu
:
Multi-Granularity Interactive Transformer Hashing for Cross-modal Retrieval. 893-902 - Wenjie Wang
, Xinyu Lin
, Liuhui Wang
, Fuli Feng
, Yinwei Wei
, Tat-Seng Chua
:
Equivariant Learning for Out-of-Distribution Cold-start Recommendation. 903-914 - Haokun Wen
, Xian Zhang
, Xuemeng Song
, Yinwei Wei
, Liqiang Nie
:
Target-Guided Composed Image Retrieval. 915-923 - Haoxuan Li
, Yi Bin
, Junrong Liao
, Yang Yang
, Heng Tao Shen
:
Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination. 924-934 - Xin Zhou
, Zhiqi Shen
:
A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation. 935-943 - Guiwei Zhang
, Yongfei Zhang
, Zichang Tan
:
ProtoHPE: Prototype-guided High-frequency Patch Enhancement for Visible-Infrared Person Re-identification. 944-954 - Wei Ji
, Xiangyan Liu
, An Zhang
, Yinwei Wei
, Yongxin Ni
, Xiang Wang
:
Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation. 955-965 - Junyang Chen
, Jialong Wang
, Zhijiang Dai
, Huisi Wu
, Mengzhu Wang
, Qin Zhang
, Huan Wang
:
Zero-shot Micro-video Classification with Neural Variational Inference in Graph Prototype Network. 966-974 - Zhiguo Chen
, Xun Jiang
, Xing Xu
, Zuo Cao
, Yijun Mo
, Heng Tao Shen
:
Joint Searching and Grounding: Multi-Granularity Video Content Retrieval. 975-983 - Yuyuan Li
, Chaochao Chen
, Xiaolin Zheng
, Yizhao Zhang
, Zhongxuan Han
, Dan Meng
, Jun Wang
:
Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems. 984-994 - Dugang Liu
, Yang Qiao
, Xing Tang
, Liang Chen
, Xiuqiang He
, Zhong Ming
:
Prior-Guided Accuracy-Bias Tradeoff Learning for CTR Prediction in Multimedia Recommendation. 995-1003 - Haoyue Bai
, Min Hou
, Le Wu
, Yonghui Yang
, Kun Zhang
, Richang Hong
, Meng Wang
:
GoRec: A Generative Cold-start Recommendation Framework. 1004-1012 - Jingzhi Li, Fengling Li, Lei Zhu
, Hui Cui
, Jingjing Li:
Prototype-guided Knowledge Transfer for Federated Unsupervised Cross-modal Hashing. 1013-1022
Oral Session VI: Engaging Users with Multimedia -- Interactions and Quality of Experience
- Shuai He
, Anlong Ming
, Shuntian Zheng
, Haobin Zhong
, Huadong Ma
:
EAT: An Enhancer for Aesthetics-Oriented Transformers. 1023-1032 - Sicheng Yang
, Zilin Wang
, Zhiyong Wu
, Minglei Li
, Zhensong Zhang
, Qiaochu Huang
, Lei Hao
, Songcen Xu
, Xiaofei Wu
, Changpeng Yang
, Zonghong Dai
:
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons. 1033-1044 - Haoning Wu
, Erli Zhang
, Liang Liao
, Chaofeng Chen
, Jingwen Hou
, Annan Wang
, Wenxiu Sun
, Qiong Yan
, Weisi Lin
:
Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach. 1045-1054 - Guangming Zhu
, Siyuan Wang
, Qing Cheng
, Kelong Wu
, Hao Li
, Liang Zhang
:
Sketch Input Method Editor: A Comprehensive Dataset and Methodology for Systematic Input Recognition. 1055-1065 - Tengchuan Kou
, Xiaohong Liu
, Wei Sun
, Jun Jia
, Xiongkuo Min
, Guangtao Zhai
, Ning Liu
:
StableVQA: A Deep No-Reference Quality Assessment Model for Video Stability. 1066-1076 - Jianjun Xiang
, Yuanjie Dang
, Peng Chen
, Ronghua Liang
, Ruohong Huan
, Zhengyu Zhang
:
Spatial-angular Quality-aware Representation Learning for Blind Light Field Image Quality Assessment. 1077-1087 - Yunlong Dong
, Xiaohong Liu
, Yixuan Gao
, Xunchu Zhou, Tao Tan
, Guangtao Zhai
:
Light-VQA: A Multi-Dimensional Quality Assessment Model for Low-Light Video Enhancement. 1088-1097 - Kun Yuan
, Zishang Kong
, Chuanchuan Zheng
, Ming Sun
, Xing Wen
:
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment. 1098-1107 - Kaiyuan Hu
, Haowen Yang
, Yili Jin
, Junhua Liu
, Yongting Chen
, Miao Zhang
, Fangxin Wang
:
Understanding User Behavior in Volumetric Video Watching: Dataset, Analysis and Prediction. 1108-1116 - Xiangfei Sheng
, Leida Li
, Pengfei Chen
, Jinjian Wu
, Weisheng Dong
, Yuzhe Yang
, Liwu Xu
, Yaqian Li
, Guangming Shi
:
AesCLIP: Multi-Attribute Contrastive Learning for Image Aesthetics Assessment. 1117-1126
Oral Session VII: Engaging Users with Multimedia -- Metaverse, Art and Culture
- Zheng Wei
, Xian Xu
, Lik-Hang Lee
, Wai Tong
, Huamin Qu
, Pan Hui
:
Feeling Present! From Physical to Virtual Cinematography Lighting Education with Metashadow. 1127-1136 - Shao-Kui Zhang
, Jia-Hong Liu
, Yike Li
, Tianyi Xiong
, Ke-Xin Ren
, Hongbo Fu
, Song-Hai Zhang
:
Automatic Generation of Commercial Scenes. 1137-1147 - Yang Chen
, Yingwei Pan
, Yehao Li
, Ting Yao
, Tao Mei
:
Control3D: Towards Controllable Text-to-3D Generation. 1148-1156 - Yuqing Zhang
, Zhou Fang
, Xinyu Yang
, Shengyu Zhang
, Baoyi He
, Huaiyong Dou
, Junchi Yan
, Yongquan Zhang
, Fei Wu
:
Reconnecting the Broken Civilization: Patchwork Integration of Fragments from Ancient Manuscripts. 1157-1166
Oral Session VIII: Engaging Users with Multimedia -- Multimedia Applications
- Zixin Wang
, Yadan Luo
, Zhi Chen
, Sen Wang
, Zi Huang
:
Cal-SFDA: Source-Free Domain-adaptive Semantic Segmentation with Differentiable Expected Calibration Error. 1167-1178 - Runmin Cong
, Mengyao Sun
, Sanyi Zhang
, Xiaofei Zhou
, Wei Zhang
, Yao Zhao
:
Frequency Perception Network for Camouflaged Object Detection. 1179-1189 - Xiaoshuai Wu
, Xin Liao
, Bo Ou
:
SepMark: Deep Separable Watermarking for Unified Source Tracing and Deepfake Detection. 1190-1201 - Runmin Cong
, Yuchen Guan
, Jinpeng Chen
, Wei Zhang
, Yao Zhao
, Sam Kwong
:
SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection. 1202-1211 - Hao Tan
, Weichao Kong
, Feng Zhang
, Wenjin Qin
, Jianjun Wang
:
High-Order Tensor Recovery Coupling Multilayer Subspace Priori with Application in Video Restoration. 1212-1220 - Chen Wang
, Jiadai Sun
, Lina Liu
, Chenming Wu
, Zhelun Shen
, Dayan Wu
, Yuchao Dai
, Liangjun Zhang
:
Digging into Depth Priors for Outdoor Neural Radiance Fields. 1221-1230 - Fanrui Zhang
, Jiawei Liu
, Qiang Zhang
, Esther Sun
, Jingyi Xie
, Zheng-Jun Zha
:
ECENet: Explainable and Context-Enhanced Network for Muti-modal Fact verification. 1231-1240 - Baochen Xiong
, Xiaoshan Yang
, Yaguang Song
, Yaowei Wang
, Changsheng Xu
:
Client-Adaptive Cross-Model Reconstruction Network for Modality-Incomplete Multimodal Federated Learning. 1241-1249 - Jinpeng Lin
, Min Zhou
, Ye Ma
, Yifan Gao
, Chenxi Fei
, Yangjian Chen
, Zhang Yu
, Tiezheng Ge
:
AutoPoster: A Highly Automatic and Content-aware Design System for Advertising Poster Generation. 1250-1260 - Gangyan Zeng
, Yuan Zhang
, Yu Zhou
, Bo Fang
, Guoqing Zhao
, Xin Wei
, Weiping Wang
:
Filling in the Blank: Rationale-Augmented Prompt Tuning for TextVQA. 1261-1272 - Liuhan Chen
, Yirou Wang
, Yongyong Chen
:
End-to-end XY Separation for Single Image Blind Deblurring. 1273-1282 - Junxian Chen
, Ying Liu
, Yiqi Liang
, Dandan Long
, Xiaolin He
, Ruihui Li
:
SD-Net: Spatially-Disentangled Point Cloud Completion Network. 1283-1293 - Jiawei Jiang
, Yuchao Feng
, Jiacheng Chen
, Dongyan Guo
, Jianwei Zheng
:
Latent-space Unfolding for MRI Reconstruction. 1294-1302 - Hongpeng Lin
, Ludan Ruan
, Wenke Xia
, Peiyu Liu
, Jingyuan Wen
, Yixin Xu
, Di Hu
, Ruihua Song
, Wayne Xin Zhao
, Qin Jin
, Zhiwu Lu
:
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World. 1303-1313 - Pengteng Li
, Ying He
, F. Richard Yu
, Pinhao Song
, Dongfu Yin
, Guang Zhou
:
IGG: Improved Graph Generation for Domain Adaptive Object Detection. 1314-1324 - De Cheng
, Lingfeng He
, Nannan Wang
, Shizhou Zhang
, Zhen Wang
, Xinbo Gao
:
Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID. 1325-1333 - Xun Jiang
, Zailei Zhou
, Xing Xu
, Yang Yang
, Guoqing Wang
, Heng Tao Shen
:
Faster Video Moment Retrieval with Point-Level Supervision. 1334-1342 - Xianliang Huang
, Jiajie Gou
, Shuhang Chen
, Zhizhou Zhong
, Jihong Guan
, Shuigeng Zhou
:
IDDR-NGP: Incorporating Detectors for Distractors Removal with Instant Neural Radiance Field. 1343-1351 - Junzhe Zhang
, Tong Chen
, Dandan Ding
, Zhan Ma
:
G-PCC++: Enhanced Geometry-based Point Cloud Compression. 1352-1363 - Zhengcong Fei
, Mingyuan Fan
, Junshi Huang
:
Gradient-Free Textual Inversion. 1364-1373 - Qiaosong Qi
, Le Zhuo
, Aixi Zhang
, Yue Liao
, Fei Fang
, Si Liu
, Shuicheng Yan
:
DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation. 1374-1382 - Peihuan Huang
, Gaofeng Cao
, Fei Zhou
, Guoping Qiu
:
Video Inverse Tone Mapping Network with Luma and Chroma Mapping. 1383-1391 - Qi Jia
, Xiaomei Feng
, Yu Liu
, Xin Fan
, Longin Jan Latecki
:
Learning Pixel-wise Alignment for Unsupervised Image Stitching. 1392-1400 - Han Yan
, Haijun Zhang
, Xiangyu Mu
, Jicong Fan
, Zhao Zhang
:
FashionDiff: A Controllable Diffusion Model Using Pairwise Fashion Elements for Intelligent Design. 1401-1411 - Wei Yu
, Qi Zhu
, Naishan Zheng
, Jie Huang
, Man Zhou
, Feng Zhao
:
Learning Non-Uniform-Sampling for Ultra-High-Definition Image Enhancement. 1412-1421 - Haoxing Chen
, Zhangxuan Gu
, Yaohui Li
, Jun Lan
, Changhua Meng
, Weiqiang Wang
, Huaxiong Li
:
Hierarchical Dynamic Image Harmonization. 1422-1430 - Sha Guo
, Zhuo Chen
, Yang Zhao
, Ning Zhang
, Xiaotong Li
, Lingyu Duan
:
Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach. 1431-1442 - Kaixun Jiang
, Zhaoyu Chen
, Xinyu Zhou
, Jingyu Zhang
, Lingyi Hong
, Jiafeng Wang
, Bo Li
, Yan Wang
, Wenqiang Zhang
:
Towards Decision-based Sparse Attacks on Video Recognition. 1443-1454 - Mingqi Fang
, Lingyun Yu
, Hongtao Xie
, Junqiang Wu
, Zezheng Wang
, Jiahong Li
, Yongdong Zhang
:
RAIRNet: Region-Aware Identity Rectification for Face Forgery Detection. 1455-1464 - Xiao He
, Chang Tang
, Xin Zou
, Wei Zhang
:
Multispectral Object Detection via Cross-Modal Conflict-Aware Learning. 1465-1474 - Huan Zheng
, Zhao Zhang
, Jicong Fan
, Richang Hong
, Yi Yang, Shuicheng Yan
:
Decoupled Cross-Scale Cross-View Interaction for Stereo Image Enhancement in the Dark. 1475-1484 - Kexin Li
, Zongxin Yang
, Lei Chen
, Yi Yang, Jun Xiao
:
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation. 1485-1494 - Zisong Chen
, Chunyu Lin
, Lang Nie
, Zhijie Shen
, Kang Liao
, Yuanzhouhan Cao
, Yao Zhao
:
S-OmniMVS: Incorporating Sphere Geometry into Omnidirectional Stereo Matching. 1495-1503 - Yichen Zhang
, Yifang Yin
, Ying Zhang
, Zhenguang Liu
, Zheng Wang
, Roger Zimmermann
:
Prototypical Cross-domain Knowledge Transfer for Cervical Dysplasia Visual Inspection. 1504-1514 - Yuchen Sun
, Qianqian Xu
, Zitai Wang
, Qingming Huang
:
When Measures are Unreliable: Imperceptible Adversarial Perturbations toward Top-k Multi-Label Learning. 1515-1526 - Bowei Xu
, Hao Chen
, Zhan Ma
:
Karma: Adaptive Video Streaming via Causal Sequence Modeling. 1527-1535 - Xinting Liao
, Chaochao Chen
, Weiming Liu
, Pengyang Zhou
, Huabin Zhu
, Shuheng Shen
, Weiqiang Wang
, Mengling Hu
, Yanchao Tan
, Xiaolin Zheng
:
Joint Local Relational Augmentation and Global Nash Equilibrium for Federated Learning with Non-IID Data. 1536-1545 - Jin Wang
, Jiade Chen
, Yunhui Shi
, Nam Ling
, Baocai Yin
:
SSPU-Net: A Structure Sensitive Point Cloud Upsampling Network with Multi-Scale Spatial Refinement. 1546-1555 - Haoyue Wang
, Sheng Li
, Silu Cao
, Rui Yang
, Jishen Zeng
, Zhenxing Qian
, Xinpeng Zhang
:
On Physically Occluded Fake Identity Document Detection. 1556-1564 - Deqi Li
, Shi-Sheng Huang
, Tianyu Shen
, Hua Huang
:
Dynamic View Synthesis with Spatio-Temporal Feature Warping from Sparse Views. 1565-1576
Oral Session IX: Engaging Users with Multimedia -- Social-good, Fairness and Transparency
- Shengfang Zhai
, Yinpeng Dong
, Qingni Shen
, Shi Pu
, Yuejian Fang
, Hang Su
:
Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning. 1577-1587 - Jingxuan Tan
, Nan Zhong
, Zhenxing Qian
, Xinpeng Zhang
, Sheng Li
:
Deep Neural Network Watermarking against Model Extraction Attack. 1588-1597 - Yu Bai
, Bo Zhang
, Zheng Zhang
, Wu Liu
, Jinwen Li
, Xiangyang Gong
, Wendong Wang
:
CoCa: A Connectivity-Aware Cascade Framework for Histology Gland Segmentation. 1598-1606 - Bo Zhang
, Yunpeng Tan
, Zheng Zhang
, Wu Liu
, Hui Gao
, Zhijun Xi
, Wendong Wang
:
Factorized Omnidirectional Representation based Vision GNN for Anisotropic 3D Multimodal MR Image Segmentation. 1607-1615 - Rui Hu
, Yahan Tu
, Jitao Sang
:
Echoes: Unsupervised Debiasing via Pseudo-bias Labeling in an Echo Chamber. 1616-1624 - Luxin Cai
, Naiyue Chen
, Yuanzhouhan Cao
, Jiahuan He
, Yidong Li
:
FedCE: Personalized Federated Learning Method based on Clustering Ensembles. 1625-1633
Oral Session X: Multimedia systems -- Data Systems Management and Indexing
- Naoki Ono
, Yusuke Matsui
:
Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search. 1659-1667 - Cheng Xiong
, Chuan Qin
, Guorui Feng
, Xinpeng Zhang
:
Flexible and Secure Watermarking for Latent Diffusion Model. 1668-1676 - Rukai Wei
, Yu Liu
, Jingkuan Song
, Heng Cui
, Yanzhao Xie
, Ke Zhou
:
CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing. 1677-1688
Oral Session XI: Multimedia systems -- Systems and Middleware, Transport and Delivery
- Rui Lu
, Lai Wei
, Shuntao Zhu
, Chuang Hu
, Dan Wang
:
Pagoda: Privacy Protection for Volumetric Video Streaming through Poisson Diffusion Model. 1689-1697 - Yuyang Leng
, Renyuan Liu
, Hongpeng Guo
, Songqing Chen
, Shuochao Yao
:
ScaleFlow: Efficient Deep Vision Pipeline with Closed-Loop Scale-Adaptive Inference. 1698-1706 - Tianchi Huang
, Rui-Xiao Zhang
, Chenglei Wu
, Lifeng Sun
:
Optimizing Adaptive Video Streaming with Human Feedback. 1707-1718
Poster Session I: Understanding Multimedia Content -- Media Interpretation
- Hao Tang
, Jun Liu
, Shuanglin Yan
, Rui Yan
, Zechao Li
, Jinhui Tang
:
M3Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition. 1719-1728 - Chen Cheng
, Jingkuan Song
, Xiaosu Zhu
, Junchen Zhu
, Lianli Gao
, Hengtao Shen
:
CUCL: Codebook for Unsupervised Continual Learning. 1729-1737 - Yang Liu
, Chen Chen
, Can Wang
, Xulin King
, Mengyuan Liu
:
Regress Before Construct: Regress Autoencoder for Point Cloud Self-supervised Learning. 1738-1749 - Bo Wang
, Zhao Zhang
, Suiyi Zhao
, Haijun Zhang
, Richang Hong
, Meng Wang
:
CropCap: Embedding Visual Cross-Partition Dependency for Image Captioning. 1750-1758 - Yanqi Wu
, Xue Song
, Jingjing Chen
, Yu-Gang Jiang
:
Generalizing Face Forgery Detection via Uncertainty Learning. 1759-1767 - Bingqing Zhang
, Sen Wang
, Yifan Liu
, Brano Kusy
, Xue Li
, Jiajun Liu
:
Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection. 1768-1778 - Yuanshen Guan
, Ruikang Xu
, Mingde Yao
, Lizhi Wang
, Zhiwei Xiong
:
Mutual-Guided Dynamic Network for Image Fusion. 1779-1788 - Chenxi Xie
, Changqun Xia
, Tianshu Yu
, Jia Li
:
Frequency Representation Integration for Camouflaged Object Detection. 1789-1797 - Tao Wang
, Lei Jin
, Zhang Wang
, Xiaojin Fan
, Yu Cheng
, Yinglei Teng
, Junliang Xing
, Jian Zhao
:
DecenterNet: Bottom-Up Human Pose Estimation Via Decentralized Pose Representation. 1798-1808 - Jingyi Wang
, Can Zhang
, Jinfa Huang
, Botao Ren
, Zhidong Deng
:
Improving Scene Graph Generation with Superpixel-Based Interaction Learning. 1809-1820 - Shifeng Xia
, Lin Geng
, Ningzhong Liu
, Han Sun
, Jie Qin
:
Lifelong Scene Text Recognizer via Expert Modules. 1821-1830 - Zhen Ye
, Wei Xue
, Xu Tan
, Jie Chen
, Qifeng Liu
, Yike Guo
:
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model. 1831-1839 - Runhao Zeng
, Qi Deng
, Huixuan Xu
, Shuaicheng Niu
, Jian Chen
:
Exploring Motion Cues for Video Test-Time Adaptation. 1840-1850 - Yan Shu
, Wei Wang
, Yu Zhou
, Shaohui Liu
, Aoting Zhang
, Dongbao Yang
, Weiping Wang:
Perceiving Ambiguity and Semantics without Recognition: An Efficient and Effective Ambiguous Scene Text Detector. 1851-1862 - Jiaming Chu
, Lei Jin
, Xiaojin Fan
, Yinglei Teng
, Yunchao Wei
, Yuqiang Fang
, Junliang Xing
, Jian Zhao
:
Single-Stage Multi-human Parsing via Point Sets and Center-based Offsets. 1863-1873 - Chengxiao Sun
, Yan Xu
, Jialun Pei
, Haopeng Fang
, He Tang
:
Partitioned Saliency Ranking with Dense Pyramid Transformers. 1874-1883 - Jianbiao Mei
, Yu Yang
, Mengmeng Wang
, Zizhang Li
, Xiaojun Hou
, Jongwon Ra
, Laijian Li
, Yong Liu
:
CenterLPS: Segment Instances by Centers for LiDAR Panoptic Segmentation. 1884-1894 - Zhenhua Ning
, Zhuotao Tian
, Guangming Lu
, Wenjie Pei
:
Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement. 1895-1904 - Mu Chen
, Zhedong Zheng
, Yi Yang, Tat-Seng Chua
:
PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation. 1905-1914 - Xinyan Zu
, Haiyang Yu
, Bin Li
, Xiangyang Xue
:
Weakly-Supervised Text Instance Segmentation. 1915-1923 - Wenjie Xuan
, Shanshan Zhao
, Yu Yao
, Juhua Liu
, Tongliang Liu
, Yixin Chen
, Bo Du
, Dacheng Tao
:
PNT-Edge: Towards Robust Edge Detection with Noisy Labels by Learning Pixel-level Noise Transitions. 1924-1932 - Pan Gao
, Haoyue Tian
, Jie Qin
:
Video Frame Interpolation with Flow Transformer. 1933-1942 - Xianghao Kong
, Wentao Jiang
, Jinrang Jia
, Yifeng Shi
, Runsheng Xu
, Si Liu
:
DUSA: Decoupled Unsupervised Sim2Real Adaptation for Vehicle-to-Everything Collaborative Perception. 1943-1954 - Ruiqi Zhang
, Jie Chen
, Qiang Wang
:
Explicifying Neural Implicit Fields for Efficient Dynamic Human Avatar Modeling via a Neural Explicit Surface. 1955-1963 - Shili Zhou
, Xuhao Jiang
, Weimin Tan
, Ruian He
, Bo Yan
:
MVFlow: Deep Optical Flow Estimation of Compressed Videos with Motion Vector Prior. 1964-1974 - Ri Cheng
, Xuhao Jiang
, Ruian He
, Shili Zhou
, Weimin Tan
, Bo Yan
:
Uncertainty-Guided Spatial Pruning Architecture for Efficient Frame Interpolation. 1975-1986 - Junshan Hu
, Liansheng Zhuang
, Weisong Dong
, Shiming Ge
, Shafei Wang
:
Learning Generalized Representations for Open-Set Temporal Action Localization. 1987-1996 - Jie Gao
, Bineng Zhong
, Yan Chen
:
Unambiguous Object Tracking by Exploiting Target Cues. 1997-2005 - Keran Wang
, Hongtao Xie
, Yuxin Wang
, Dongming Zhang
, Yadong Qu
, Zuan Gao
, Yongdong Zhang
:
Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text Detection. 2006-2015 - Jiamin Chen
, Jianlou Si
, Naihao Liu
, Yao Wu
, Li Niu
, Chen Qian
:
Object Part Parsing with Hierarchical Dual Transformer. 2016-2024 - Xugong Qin
, Pengyuan Lyu
, Chengquan Zhang
, Yu Zhou
, Kun Yao
, Peng Zhang
, Hailun Lin
, Weiping Wang
:
Towards Robust Real-Time Scene Text Detection: From Semantic to Instance Representation Learning. 2025-2034 - Xiyao Ma
, Shiqi Liu
, Xiaoliang Xie
, Xiao-Hu Zhou
, Zengguang Hou
, Xinkai Qu
, Wenzheng Han
, Ming Wang
, Meng Song
, Lin-Sen Zhang
:
Towards Flexible and Universal: A Novel Endpoint-based Framework for Vessel Structural Information Extraction. 2035-2044 - Sejin Park
, Taehyung Lee
, Yeejin Lee
, Byeongkeun Kang
:
FDCNet: Feature Drift Compensation Network for Class-Incremental Weakly Supervised Object Localization. 2045-2053 - Meng Shen
, Yanzuo Lu
, Yanxu Hu
, Andy J. Ma
:
Collaborative Learning of Diverse Experts for Source-free Universal Domain Adaptation. 2054-2065 - Wentao Yang
, Zhe Li
, Dezhi Peng
, Lianwen Jin
, Mengchao He
, Cong Yao
:
Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition. 2066-2077 - Kejun Lin
, Zhixiang Wang
, Zheng Wang
, Yinqiang Zheng
, Shin'ichi Satoh
:
Beyond Domain Gap: Exploiting Subjectivity in Sketch-Based Person Retrieval. 2078-2089 - Ben Sha, Baopu Li, Tao Chen, Jiayuan Fan, Tao Sheng:
Rethinking Pseudo-Label-Based Unsupervised Person Re-ID with Hierarchical Prototype-based Graph. 2090-2100 - Kehua Guo
, Rui Ding
, Tian Qiu
, Xiangyuan Zhu
, Zheng Wu
, Liwei Wang
, Hui Fang
:
Single Domain Generalization via Unsupervised Diversity Probe. 2101-2111 - Ruijin Liu
, Ning Lu
, Dapeng Chen
, Cheng Li
, Zejian Yuan
, Wei Peng
:
PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer. 2112-2120 - Houzhang Fang
, Zikai Liao
, Lu Wang
, Qingshan Li
, Yi Chang
, Luxin Yan
, Xuhua Wang
:
DANet: Multi-scale UAV Target Detection with Dynamic Feature Perception and Scale-aware Knowledge Distillation. 2121-2130 - Bo Dong
, Jialun Pei
, Rongrong Gao
, Tian-Zhu Xiang
, Shuo Wang
, Huan Xiong
:
A Unified Query-based Paradigm for Camouflaged Instance Segmentation. 2131-2138 - Jialun Pei
, Zhangjun Zhou
, Yueming Jin
, He Tang
, Pheng-Ann Heng
:
Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy Dichotomous Image Segmentation. 2139-2147 - Yuxiang Cai
, Meng Xi
, Yongheng Shang
, Jianwei Yin
:
Exploring High-Correlation Source Domain Information for Multi-Source Domain Adaptation in Semantic Segmentation. 2148-2158 - Linfeng Tan
, Jiangtong Li
, Li Niu
, Liqing Zhang
:
Deep Image Harmonization in Dual Color Spaces. 2159-2167 - Wenyu Zhang
, Xin Deng
, Baojun Jia
, Xingtong Yu
, Yifan Chen
, Jin Ma
, Qing Ding
, Xinming Zhang
:
Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution. 2168-2179 - Yanqi Bao
, Yuxin Li
, Jing Huo
, Tianyu Ding
, Xinyue Liang
, Wenbin Li
, Yang Gao
:
Where and How: Mitigating Confusion in Neural Radiance Fields from Sparse Inputs. 2180-2188 - Hang Guo
, Tao Dai
, Mingyan Zhu
, Guanghao Meng
, Bin Chen
, Zhi Wang
, Shu-Tao Xia
:
One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer. 2189-2198 - Muxin Liao
, Shishun Tian
, Yuhang Zhang
, Guoguang Hua
, Wenbin Zou
, Xia Li
:
Calibration-based Dual Prototypical Contrastive Learning Approach for Domain Generalization Semantic Segmentation. 2199-2210 - Wentian Xin
, Qiguang Miao, Yi Liu, Ruyi Liu, Chi-Man Pun
, Cheng Shi
:
Skeleton MixFormer: Multivariate Topology Representation for Skeleton-based Action Recognition. 2211-2220 - Xiaojie Li
, Shaowei He
, Jianlong Wu
, Yue Yu
, Liqiang Nie
, Min Zhang
:
Mask Again: Masked Knowledge Distillation for Masked Video Modeling. 2221-2232 - Mingxuan Zhang
, Xiao Wu
, Zhaoquan Yuan
, Qi He
, Xiang Huang
:
Human-Object-Object Interaction: Towards Human-Centric Complex Interaction Detection. 2233-2242 - Yilun Zhang
, Yuqian Fu
, Xingjun Ma
, Lizhe Qi
, Jingjing Chen
, Zuxuan Wu
, Yu-Gang Jiang
:
On the Importance of Spatial Relations for Few-shot Action Recognition. 2243-2251 - Jiarui Yu
, Haoran Li
, Yanbin Hao
, Bin Zhu
, Tong Xu
, Xiangnan He
:
CgT-GAN: CLIP-guided Text GAN for Image Captioning. 2252-2263 - Xiaojie Li
, Jianlong Wu
, Shaowei He
, Shuo Kang
, Yue Yu
, Liqiang Nie
, Min Zhang
:
Fine-grained Key-Value Memory Enhanced Predictor for Video Representation Learning. 2264-2274 - Ziyang Gong
, Fuhao Li
, Yupeng Deng
, Wenjun Shen
, Xianzheng Ma
, Zhenming Ji
, Nan Xia
:
Train One, Generalize to All: Generalizable Semantic Segmentation from Single-Scene to All Adverse Scenes. 2275-2284 - Cheng Zhang
, Yu Zhu
, Qingsen Yan
, Jinqiu Sun
, Yanning Zhang
:
All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation. 2285-2293 - Ziyu Yang
, Sucheng Ren
, Zongwei Wu
, Nanxuan Zhao
, Junle Wang
, Jing Qin
, Shengfeng He
:
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos. 2294-2304 - Zengbin Wang
, Saihui Hou
, Man Zhang
, Xu Liu
, Chunshui Cao
, Yongzhen Huang
, Shibiao Xu
:
LandmarkGait: Intrinsic Human Parsing for Gait Recognition. 2305-2314 - Wenjia Ren
, Qingmin Liao
, Zhijing Shao
, Xiangru Lin
, Xin Yue
, Yu Zhang
, Zongqing Lu
:
Patchmatch Stereo++: Patchmatch Binocular Stereo with Continuous Disparity Optimization. 2315-2325 - Rui Wang
, Cong Zou
, Weizhong Zhang
, Zixuan Zhu
, Lihua Jing
:
Consistency-aware Feature Learning for Hierarchical Fine-grained Visual Classification. 2326-2334 - Jun Yu
, Peng He
, Ziqi Peng
:
FSR-Net: Deep Fourier Network for Shadow Removal. 2335-2343 - Tianwei Yu
, Peng Chen
, Yuanjie Dang
, Ruohong Huan
, Ronghua Liang
:
Multi-Speed Global Contextual Subspace Matching for Few-Shot Action Recognition. 2344-2352 - Haonan Wang
, Jie Liu
, Jie Tang
, Gangshan Wu
:
Lightweight Super-Resolution Head for Human Pose Estimation. 2353-2361 - Yunkee Chae
, Junghyun Koo
, Sungho Lee
, Kyogu Lee
:
Exploiting Time-Frequency Conformers for Music Audio Enhancement. 2362-2370 - Jiaming Liu
, Yue Wu
, Maoguo Gong
, Qiguang Miao
, Wenping Ma
, Cai Xu
:
Exploring Dual Representations in Large-Scale Point Clouds: A Simple Weakly Supervised Semantic Segmentation Framework. 2371-2380 - Keke Chen
, Xiangbo Shu
, Guo-Sen Xie
, Rui Yan
, Jinhui Tang
:
Foreground/Background-Masked Interaction Learning for Spatio-temporal Action Detection. 2381-2390 - Xin Wang, Benyuan Meng, Hong Chen
, Yuan Meng, Ke Lv, Wenwu Zhu:
TIVA-KG: A Multimodal Knowledge Graph with Text, Image, Video and Audio. 2391-2399 - Wanqing Zhao
, Yuta Nakashima
, Haiyuan Chen
, Noboru Babaguchi
:
Enhancing Fake News Detection in Social Media via Label Propagation on Cross-modal Tweet Graph. 2400-2408 - Xingxing Yang
, Jie Chen
, Zaifeng Yang
:
Cooperative Colorization: Exploring Latent Cross-Domain Priors for NIR Image Spectrum Translation. 2409-2417 - Yihao Huang
, Liangru Sun
, Qing Guo
, Felix Juefei-Xu
, Jiayi Zhu
, Jincao Feng
, Yang Liu
, Geguang Pu
:
ALA: Naturalness-aware Adversarial Lightness Attack. 2418-2426 - Liya Ji
, Chan Ho Park
, Zhefan Rao
, Qifeng Chen
:
Neural Image Popularity Assessment with Retrieval-augmented Transformer. 2427-2436 - Yanchao Liu
, Xina Cheng
, Takeshi Ikenaga
:
A Figure Skating Jumping Dataset for Replay-Guided Action Quality Assessment. 2437-2445 - Yeying Jin
, Beibei Lin
, Wending Yan
, Yuan Yuan
, Wei Ye
, Robby T. Tan
:
Enhancing Visibility in Nighttime Haze Images Using Guided APSF and Gradient Adaptive Convolution. 2446-2457 - Xiang Li
, Yandong Wen
, Muqiao Yang
, Jinglu Wang
, Rita Singh
, Bhiksha Raj
:
Rethinking Voice-Face Correlation: A Geometry View. 2458-2467 - Baiang Li
, Huan Zheng
, Zhao Zhang
, Yang Zhao
, Zhongqiu Zhao
, Haijun Zhang
:
Dynamic Grouped Interaction Network for Low-Light Stereo Image Enhancement. 2468-2476 - Jiafu Wu
, Jian Li
, Jiangning Zhang
, Boshen Zhang
, Mingmin Chi
, Yabiao Wang
, Chengjie Wang
:
PVG: Progressive Vision Graph for Vision Recognition. 2477-2486 - Chenyi Zhuang
, Pan Gao
, Aljosa Smolic
:
StylePrompter: All Styles Need Is Attention. 2487-2497 - Pengling Zhang
, Huibin Yan
, Wenhui Wu
, Shuoyao Wang
:
Improving Federated Person Re-Identification through Feature-Aware Proximity and Aggregation. 2498-2506 - Xizhe Xue
, Dongdong Yu
, Lingqiao Liu
, Yu Liu, Satoshi Tsutsui
, Ying Li
, Zehuan Yuan
, Ping Song
, Mike Zheng Shou
:
Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization. 2507-2515 - Dongliang Zhu
, Ruimin Hu
, Shengli Song
, Xiang Guo
, Xixi Li
, Zheng Wang
:
Cross-Illumination Video Anomaly Detection Benchmark. 2516-2525 - Yuanbin Fu
, Xiaojie Guo
:
Practical Edge Detection via Robust Collaborative Learning. 2526-2534 - Haoyi Xiu
, Xin Liu
, Weimin Wang
, Kyoung-Sook Kim
, Masashi Matsuoka
:
MSECNet: Accurate and Robust Normal Estimation for 3D Point Clouds by Multi-Scale Edge Conditioning. 2535-2543 - Xiao Liu
, Xiuya Shi
, Lufei Chen
, Linbo Qing
, Chao Ren
:
Efficient Parallel Multi-Scale Detail and Semantic Encoding Network for Lightweight Semantic Segmentation. 2544-2552 - Jiquan Zhong
, Xiaolin Huang
, Xiao Yu
:
Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature Fusion in Dynamic Scenes. 2553-2563 - Yudong Mao
, Peilin Chen
, Shurun Wang
, Shiqi Wang
, Dapeng Wu
:
Peering into The Sketch: Ultra-Low Bitrate Face Compression for Joint Human and Machine Perception. 2564-2572 - Xiaodong Jin
, Taiping Zhang
:
MTSN: Multiscale Temporal Similarity Network for Temporal Action Localization. 2573-2581 - Guanzhou Ke
, Yang Yu
, Guoqing Chao
, Xiaoli Wang
, Chenyang Xu
, Shengfeng He
:
Disentangling Multi-view Representations Beyond Inductive Bias. 2582-2590 - Lei Zhao
, Le Han
, Min Yao
, Nenggan Zheng
:
Implicit Decouple Network for Efficient Pose Estimation. 2591-2599 - Zhenjie Chen
, Hongsong Wang
, Jie Gui
:
Occluded Skeleton-Based Human Action Recognition with Dual Inhibition Training. 2625-2634 - Xujie Kang
, Kanglin Liu
, Jiang Duan
, Yuanhao Gong
, Guoping Qiu
:
P2I-NET: Mapping Camera Pose to Image via Adversarial Learning for New View Synthesis in Real Indoor Environments. 2635-2643 - Wenpeng Xing
, Jie Chen
, Ka Chun Cheung
, Simon See
:
IRCasTRF: Inverse Rendering by Optimizing Cascaded Tensorial Radiance Fields, Lighting, and Materials From Multi-view Images. 2644-2653 - Zhiqi Yu
, Jingjing Li
, Zhekai Du
, Fengling Li
, Lei Zhu
, Yang Yang
:
Noise-Robust Continual Test-Time Domain Adaptation. 2654-2662 - Zeyu Wang
, Fabien Colonnier
, Jinghong Zheng
, Jyotibdha Acharya
, Wenyu Jiang
, Kejie Huang
:
TIRDet: Mono-Modality Thermal InfraRed Object Detection Based on Prior Thermal-To-Visible Translation. 2663-2672 - Junzhe Cai
, Shuiyan Chen
, Heng Li
, Beihao Xia
, Zimin Mao
, Wei Yuan
:
HARP: Let Object Detector Undergo Hyperplasia to Counter Adversarial Patches. 2673-2683 - Lei Xu
, Rei Kawakami
, Nakamasa Inoue
:
Scale-space Tokenization for Improving the Robustness of Vision Transformers. 2684-2693 - Kosuke Mizufune
, Shunsuke Tanaka
, Toshihide Yukitake
, Tatsushi Matsubayashi
:
Margin MCC: Chance-Robust Metric for Video Boundary Detection with Allowed Margin. 2694-2703 - Liangchen Song
, Xuan Gong
, Helong Zhou
, Jiajie Chen
, Qian Zhang
, David S. Doermann
, Junsong Yuan
:
Exploring the Knowledge Transferred by Response-Based Teacher-Student Distillation. 2704-2713 - Feng Gao
, Jiaxu Leng
, Ji Gan
, Xinbo Gao
:
Selecting Learnable Training Samples is All DETRs Need in Crowded Pedestrian Detection. 2714-2722 - Qiankun Li
, Xiaolong Huang
, Zhifan Wan
, Lanqing Hu
, Shuzhe Wu
, Jie Zhang
, Shiguang Shan
, Zengfu Wang
:
Data-Efficient Masked Video Modeling for Self-supervised Action Recognition. 2723-2733 - Teng Fu
, Xiaocong Wang
, Haiyang Yu
, Ke Niu
, Bin Li
, Xiangyang Xue
:
DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions. 2734-2743 - Peiran Xu
, Yadong Mu
:
Co-Salient Object Detection with Semantic-Level Consensus Extraction and Dispersion. 2744-2755 - Xuenan Xu
, Zhiling Zhang
, Zelin Zhou
, Pingyue Zhang
, Zeyu Xie
, Mengyue Wu
, Kenny Q. Zhu
:
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data. 2756-2764 - Bingyang Wang
, Tanlin Li
, Jiannan Wu
, Yi Jiang
, Huchuan Lu
, You He
:
A Simple Baseline for Open-World Tracking via Self-training. 2765-2774 - Yuxuan Zhao
, Jin Ma
, Zhongang Qi
, Zehua Xie
, Yu Luo
, Qiusheng Kang
, Ying Shan
:
VTLayout: A Multi-Modal Approach for Video Text Layout. 2775-2784 - Rajat Hebbar
, Digbalay Bose
, Shrikanth Narayanan
:
SEAR: Semantically-grounded Audio Representations. 2785-2794 - Zongyuan Yang
, Baolin Liu
, Yongping Xiong
, Lan Yi
, Guibin Wu
, Xiaojun Tang
, Ziqi Liu
, Junjie Zhou
, Xing Zhang
:
DocDiff: Document Enhancement via Residual Diffusion Models. 2795-2806 - Boshen Xu
, Sipeng Zheng
, Qin Jin
:
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World. 2807-2816 - Sihan Ma
, Qiong Cao
, Hongwei Yi
, Jing Zhang
, Dacheng Tao
:
GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction. 2817-2828 - Hui Lu
, Xixin Wu
, Zhiyong Wu
, Helen Meng
:
SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody. 2829-2837 - Xiaohan Wang
, Yuehu Liu
, Xinhang Song
, Beibei Wang
, Shuqiang Jiang
:
Generating Explanations for Embodied Action Decision from Visual Observation. 2838-2846 - Jieteng Yao
, Junjie Chen
, Li Niu
, Bin Sheng
:
Scene-aware Human Pose Generation using Transformer. 2847-2855 - Wanying Zhang
, Shen Zhao
, Fanyang Meng
, Songtao Wu
, Mengyuan Liu
:
Dynamic Compositional Graph Convolutional Network for Efficient Composite Human Motion Prediction. 2856-2864 - Jiaqi Li
, Yiran Wang
, Zihao Huang
, Jinghong Zheng
, Ke Xian
, Zhiguo Cao
, Jianming Zhang
:
Diffusion-Augmented Depth Prediction with Sparse Annotations. 2865-2876 - Chunwei Wu
, Guitao Cao
, Yan Li
, Xidong Xi
, Wenming Cao
, Hong Wang
:
Chaos to Order: A Label Propagation Perspective on Source-Free Domain Adaptation. 2877-2887 - Lianggangxu Chen
, Jiale Lu
, Youqi Song
, Changbo Wang
, Gaoqi He
:
Beware of Overcorrection: Scene-induced Commonsense Graph for Scene Graph Generation. 2888-2897 - Haiyang Yu
, Xiaocong Wang
, Ke Niu
, Bin Li
, Xiangyang Xue
:
Scene Text Segmentation with Text-Focused Transformers. 2898-2907 - Liangwei Jiang
, Jiaxin Chen
, Di Huang
, Yunhong Wang
:
MIEP: Channel Pruning with Multi-granular Importance Estimation for Object Detection. 2908-2917
Poster Session II: Understanding Multimedia Content -- Multimodal Fusion and Embedding
- Shanshan Wang
, Yiyang Chen
, Zhenwei He
, Xun Yang
, Mengzhu Wang
, Quanzeng You
, Xingyi Zhang
:
Disentangled Representation Learning with Causality for Unsupervised Domain Adaptation. 2918-2926 - Jie Wen
, Gehui Xu
, Chengliang Liu
, Lunke Fei
, Chao Huang
, Wei Wang
, Yong Xu
:
Localized and Balanced Efficient Incomplete Multi-view Clustering. 2927-2935 - Mengzhu Wang
, Junyang Chen
, Huan Wang
, Huisi Wu
, Zhidan Liu
, Qin Zhang
:
Interpolation Normalization for Contrast Domain Generalization. 2936-2945 - Yujing Liu
, Zongqian Wu
, Zhengyu Lu
, Guoqiu Wen
, Junbo Ma
, Guangquan Lu
, Xiaofeng Zhu
:
Multi-teacher Self-training for Semi-supervised Node Classification with Noisy Labels. 2946-2954 - Liang Yang
, Jiayi Wang
, Tingting Zhang
, Dongxiao He
, Chuan Wang
, Yuanfang Guo
, Xiaochun Cao
, Bingxin Niu
, Zhen Wang
:
Long Short-Term Graph Memory Against Class-imbalanced Over-smoothing. 2955-2963 - Zitan Chen
, Zhuang Qi
, Xiao Cao
, Xiangxian Li
, Xiangxu Meng
, Lei Meng
:
Class-level Structural Relation Modeling and Smoothing for Visual Representation Learning. 2964-2972 - Shengkai Sun
, Daizong Liu
, Jianfeng Dong
, Xiaoye Qu
, Junyu Gao
, Xun Yang
, Xun Wang
, Meng Wang
:
Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding. 2973-2984 - Pan Mu
, Zhiying Du
, Jinyuan Liu
, Cong Bai
:
Little Strokes Fell Great Oaks: Boosting the Hierarchical Features for Multi-exposure Image Fusion. 2985-2993 - Jing Wang
, Songhe Feng
, Gengyu Lyu
, Zhibin Gu
:
Triple-Granularity Contrastive Learning for Deep Multi-View Subspace Clustering. 2994-3002 - Zhao Su
, Yong Yang
, Shuying Huang
, Weiguo Wan
, Wei Tu
, Hangyuan Lu
, Changjie Chen
:
CTCP: Cross Transformer and CNN for Pansharpening. 3003-3011 - Yonghua Zhu
, Zhenyun Deng
, Yang Chen
, Robert Amor
, Michael Witbrock
:
Chain of Propagation Prompting for Node Classification. 3012-3020 - Yi Wen
, Suyuan Liu
, Xinhang Wan
, Siwei Wang
, Ke Liang
, Xinwang Liu
, Xihong Yang
, Pei Zhang
:
Efficient Multi-View Graph Clustering with Local and Global Structure Preservation. 3021-3030 - Yi Wen
, Siwei Wang
, Ke Liang
, Weixuan Liang
, Xinhang Wan
, Xinwang Liu
, Suyuan Liu
, Jiyuan Liu
, En Zhu
:
Scalable Incomplete Multi-View Clustering with Structure Alignment. 3031-3040 - Yi Bin
, Haoxuan Li
, Yahui Xu
, Xing Xu
, Yang Yang
, Heng Tao Shen
:
Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval. 3041-3050 - Cai Xu
, Zehui Li
, Ziyu Guan
, Wei Zhao
, Xiangyu Song
, Yue Wu
, Jianxin Li
:
Unbalanced Multi-view Deep Learning. 3051-3059 - Shuping Zhao
, Lunke Fei
, Jie Wen
, Bob Zhang
, Pengyang Zhao
:
Incomplete Multi-View Clustering with Regularized Hierarchical Graph. 3060-3068 - Man-Sheng Chen
, Jia-Qi Lin
, Chang-Dong Wang
, Wudong Xi
, Dong Huang
:
On Regularizing Multiple Clusterings for Ensemble Clustering by Graph Tensor Learning. 3069-3077 - Guixu Lin
, Jin Han
, Mingdeng Cao
, Zhihang Zhong
, Yinqiang Zheng
:
Event-guided Frame Interpolation and Dynamic Range Expansion of Single Rolling Shutter Image. 3078-3088 - Peng Zhou
, Liang Du
:
Learnable Graph Filter for Multi-view Clustering. 3089-3098 - Zhuang Qi
, Lei Meng
, Zitan Chen
, Han Hu
, Hui Lin
, Xiangxu Meng
:
Cross-Silo Prototypical Calibration for Federated Learning with Non-IID Data. 3099-3107 - Hai Zhou
, Zhe Xue
, Ying Liu
, Boang Li
, Junping Du
, Meiyu Liang
, Yuankai Qi
:
CALM: An Enhanced Encoding and Confidence Evaluating Framework for Trustworthy Multi-view Learning. 3108-3116 - Houlun Chen
, Xin Wang
, Xiaohan Lan
, Hong Chen
, Xuguang Duan
, Jia Jia
, Wenwu Zhu
:
Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding. 3117-3128 - Lei Liu
, Chenglong Li
, Yun Xiao
, Jin Tang
:
Quality-Aware RGBT Tracking via Supervised Reliability Learning and Weighted Residual Guidance. 3129-3137 - Yang Wang
, Bo Dong
, Yuji Zhang
, Yunduo Zhou
, Haiyang Mei
, Ziqi Wei
, Xin Yang:
Event-Enhanced Multi-Modal Spiking Neural Network for Dynamic Obstacle Avoidance. 3138-3148 - Yujun Ma
, Benjia Zhou
, Ruili Wang
, Pichao Wang
:
Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition. 3149-3160 - Peng Zhao
, Qiangchang Wang
, Yilong Yin
:
M3R: Masked Token Mixup and Cross-Modal Reconstruction for Zero-Shot Learning. 3161-3171 - Yicong Li
, Xun Yang
, An Zhang
, Chun Feng
, Xiang Wang
, Tat-Seng Chua
:
Redundancy-aware Transformer for Video Question Answering. 3172-3180 - Wanting Yin
, Hongtao Xie
, Lei Zhang
, Jiannan Ge
, Pandeng Li
, Chuanbin Liu
, Yongdong Zhang
:
Frequency-based Zero-Shot Learning with Phase Augmentation. 3181-3189 - Shiyuan Yang
, Xiaodong Chen
, Jing Liao
:
Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model. 3190-3199 - Fangjian Lin
, Jianlong Yuan
, Sitong Wu
, Fan Wang
, Zhibin Wang
:
UniNeXt: Exploring A Unified Architecture for Vision Recognition. 3200-3208 - Junjie Wu
, Chen Gong
, Ziqiang Cao
, Guohong Fu
:
MCG-MNER: A Multi-Granularity Cross-Modality Generative Framework for Multimodal NER with Instruction. 3209-3218 - Siran Peng
, Chenhao Guo
, Xiao Wu
, Liang-Jian Deng
:
U2Net: A General Framework with Spatial-Spectral-Integrated Double U-Net for Image Fusion. 3219-3227 - Yansheng Qiu
, Ziyuan Zhao
, Hongdou Yao
, Delin Chen
, Zheng Wang
:
Modal-aware Visual Prompting for Incomplete Multi-modal Brain Tumor Segmentation. 3228-3239 - Hui Tang
, Xun Liang
:
Where to Find Fascinating Inter-Graph Supervision: Imbalanced Graph Classification with Kernel Information Bottleneck. 3240-3249 - Wuyuan Xie
, Kaimin Wang
, Yakun Ju
, Miaohui Wang
:
pmBQA: Projection-based Blind Point Cloud Quality Assessment via Multimodal Learning. 3250-3258 - Zihao Zhang
, Qianqian Wang
, Zhiqiang Tao
, Quanxue Gao
, Wei Feng
:
Dropping Pathways Towards Deep Multi-View Graph Subspace Clustering Networks. 3259-3267 - Penglei Wang
, Danyang Wu
, Rong Wang
, Feiping Nie
:
Multi-view Graph Clustering via Efficient Global-Local Spectral Embedding Fusion. 3268-3276 - Hao Wang
, Zhi-Qi Cheng
, Jingdong Sun
, Xin Yang
, Xiao Wu
, Hongyang Chen
, Yan Yang
:
Debunking Free Fusion Myth: Online Multi-view Anomaly Detection with Disentangled Product-of-Experts Modeling. 3277-3286 - Yunlong Lin
, Zhenqi Fu
, Ge Meng
, Yingying Wang
, Yuhang Dong
, Linyu Fan
, Hedeng Yu
, Xinghao Ding
:
Domain-irrelevant Feature Learning for Generalizable Pan-sharpening. 3287-3296 - Qingwei Wang
, Jinyu Yang
, Xiaosheng Yu
, Fangyi Wang
, Peng Chen
, Feng Zheng
:
Depth-aided Camouflaged Object Detection. 3297-3306 - Wei Ji
, Jingjing Li
, Cheng Bian
, Zhicheng Zhang
, Li Cheng
:
SemanticRT: A Large-Scale Dataset and Method for Robust Semantic Segmentation in Multispectral Images. 3307-3316 - Zhuo Chen
, Jiaoyan Chen
, Wen Zhang
, Lingbing Guo
, Yin Fang
, Yufeng Huang
, Yichi Zhang
, Yuxia Geng
, Jeff Z. Pan
, Wenting Song
, Huajun Chen
:
MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid. 3317-3327 - Jiayi Zhang
, Weixin Li
:
Multi-Modal and Multi-Scale Temporal Fusion Architecture Search for Audio-Visual Video Parsing. 3328-3336 - Jiaqi Li
, Guilin Qi
, Chuanyi Zhang
, Yongrui Chen
, Yiming Tan
, Chenlong Xia
, Ye Tian
:
Incorporating Domain Knowledge Graph into Multimodal Movie Genre Classification with Self-Supervised Attention and Contrastive Learning. 3337-3345 - Yong Yang
, Mengzhen Li
, Shuying Huang
, Hangyuan Lu
, Wei Tu
, Weiguo Wan
:
Multi-scale Spatial-Spectral Attention Guided Fusion Network for Pansharpening. 3346-3354 - Xuehao Wang
, Shuai Li
, Chenglizhao Chen
, Aimin Hao
, Hong Qin
:
Modality Profile - A New Critical Aspect to be Considered When Generating RGB-D Salient Object Detection Training Set. 3355-3364 - Meng Liu
, Ke Liang
, Dayu Hu
, Hao Yu
, Yue Liu
, Lingyuan Meng
, Wenxuan Tu
, Sihang Zhou
, Xinwang Liu
:
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification. 3365-3374 - Mufeng Yao
, Jiaqi Wang
, Jinlong Peng
, Mingmin Chi
, Chao Liu
:
FOLT: Fast Multiple Object Tracking from UAV-captured Videos Based on Optical Flow. 3375-3383 - Zihan Li
, Yuan Zheng
, Xiangde Luo
, Dandan Shan
, Qingqi Hong
:
ScribbleVC: Scribble-supervised Medical Image Segmentation with Vision-Class Embedding. 3384-3393 - Jiaqing Fan
, Tiankang Su
, Kaihua Zhang
, Bo Liu, Qingshan Liu
:
Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation. 3394-3402 - Haowei Wang
, Jiji Tang
, Jiayi Ji
, Xiaoshuai Sun
, Rongsheng Zhang
, Yiwei Ma
, Minda Zhao
, Lincheng Li
, Zeng Zhao
, Tangjie Lv
, Rongrong Ji
:
Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation. 3403-3414 - Kongming Liang
, Xinran Wang
, Haiwen Zhang
, Zhanyu Ma
, Jun Guo
:
Hierarchical Visual Attribute Learning in the Wild. 3415-3423 - Qiang Zhang
, Jiawei Liu
, Fanrui Zhang
, Jingyi Xie
, Zheng-Jun Zha
:
Hierarchical Semantic Enhancement Network for Multimodal Fake News Detection. 3424-3433 - Meng Shen
, Yizheng Huang
, Jianxiong Yin
, Heqing Zou
, Deepu Rajan
, Simon See
:
Towards Balanced Active Learning for Multimodal Classification. 3434-3445 - Shiping Ge
, Zhiwei Jiang
, Yafeng Yin
, Cong Wang
, Zifeng Cheng
, Qing Gu
:
Learning Event-Specific Localization Preferences for Audio-Visual Event Localization. 3446-3454 - Zongwei Wu
, Jingjing Wang
, Zhuyun Zhou
, Zhaochong An
, Qiuping Jiang
, Cédric Demonceaux
, Guolei Sun
, Radu Timofte
:
Object Segmentation by Mining Cross-Modal Semantics. 3455-3464 - Wenxin Ni
, Qianqian Xu
, Yangbangyan Jiang
, Zongsheng Cao
, Xiaochun Cao
, Qingming Huang
:
PSNEA: Pseudo-Siamese Network for Entity Alignment between Multi-modal Knowledge Graphs. 3489-3497 - Xinyue Chen
, Jie Xu
, Yazhou Ren
, Xiaorong Pu
, Ce Zhu
, Xiaofeng Zhu
, Zhifeng Hao
, Lifang He
:
Federated Deep Multi-View Clustering with Global Self-Supervision. 3498-3506 - Sung Jin Um
, Dongjin Kim
, Jung Uk Kim
:
Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization. 3507-3516 - Fangming Zhong
, Chenglong Chu
, Zijie Zhu
, Zhikui Chen
:
Hypergraph-Enhanced Hashing for Unsupervised Cross-Modal Retrieval via Robust Similarity Guidance. 3517-3527 - Yue Liu
, Ke Liang
, Jun Xia
, Xihong Yang
, Sihang Zhou
, Meng Liu
, Xinwang Liu
, Stan Z. Li
:
Reinforcement Graph Clustering with Unknown Cluster Number. 3528-3537 - Jingyu Wu
, Shi Chen
, Shuyu Gan
, Weijun Li
, Changyuan Yang
, Lingyun Sun
:
Cultural Self-Adaptive Multimodal Gesture Generation Based on Multiple Culture Gesture Dataset. 3538-3549 - Xin Zou
, Chang Tang
, Xiao Zheng
, Zhenglai Li
, Xiao He
, Shan An
, Xinwang Liu
:
DPNET: Dynamic Poly-attention Network for Trustworthy Multi-modal Classification. 3550-3559 - Zhihao Zhang
, Yiwei Chen
, Weizhan Zhang
, Caixia Yan
, Qinghua Zheng
, Qi Wang
, Wangdu Chen
:
Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer. 3560-3568 - Jinda Lu
, Shuo Wang
, Xinyu Zhang
, Yanbin Hao
, Xiangnan He
:
Semantic-based Selection, Synthesis, and Supervision for Few-shot Learning. 3569-3578 - Jinyong Wen
, Shiming Xiang
, Chunhong Pan
:
Exploring Universal Principles for Graph Contrastive Learning: A Statistical Perspective. 3579-3589 - Deepanway Ghosal
, Navonil Majumder
, Ambuj Mehrish
, Soujanya Poria
:
Text-to-Audio Generation using Instruction Guided Latent Diffusion Model. 3590-3598 - Shangyu Xing
, Fei Zhao
, Zhen Wu
, Chunhui Li
, Jianbing Zhang
, Xinyu Dai
:
DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking. 3599-3608 - Shaokui Gu
, Xu Yuan
, Liang Zhao
, Zhenjiao Liu
, Yan Hu
, Zhikui Chen
:
MVCIR-net: Multi-view Clustering Information Reinforcement Network. 3609-3618 - Yixi Liu
, Yuze Tan
, Hongjie Wu
, Shudong Huang
, Yazhou Ren
, Jiancheng Lv
:
Preserving Local and Global Information: An Effective Metric-based Subspace Clustering. 3619-3627 - Jiaming Gu
, Jingyu Zhang
, Muyang Zhang
, Weiliang Meng
, Shibiao Xu
, Jiguang Zhang
, Xiaopeng Zhang
:
FeaCo: Reaching Robust Feature-Level Consensus in Noisy Pose Conditions. 3628-3636 - Masayasu Muraoka
, Bishwaranjan Bhattacharjee
, Michele Merler
, Graeme Blackwood
, Yulong Li
, Yang Zhao
:
Cross-Lingual Transfer of Large Language Model by Visually-Derived Supervision Toward Low-Resource Languages. 3637-3646 - Jingyang Yuan
, Xiao Luo
, Yifang Qin
, Zhengyang Mao
, Wei Ju
, Ming Zhang
:
ALEX: Towards Effective Graph Transfer Learning with Noisy Labels. 3647-3656 - Chenwei Zhang
, Yuxuan Hu
, Min Yang
, Chengming Li
, Xiping Hu
:
Skeletal Spatial-Temporal Semantics Guided Homogeneous-Heterogeneous Multimodal Network for Action Recognition. 3657-3666 - Zhong Chen
, Zhizhong Zhang
, Xin Tan
, Yanyun Qu
, Yuan Xie
:
Unveiling the Power of CLIP in Unsupervised Visible-Infrared Person Re-Identification. 3667-3675 - Haowen Wang
, Zhipeng Fan
, Zhen Zhao
, Zhengping Che
, Zhiyuan Xu
, Dong Liu
, Feifei Feng
, Yakun Huang
, Xiuquan Qiao
, Jian Tang
:
DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field. 3676-3685 - Yuechen Wang
, Wengang Zhou
, Zhenbo Lu
, Houqiang Li
:
Text-Only Training for Visual Storytelling. 3686-3695 - Zihao Zhang
, Jie Wang
, Yahong Han
:
Saliency Prototype for RGB-D and RGB-T Salient Object Detection. 3696-3705 - Zhu Liu
, Jinyuan Liu
, Benzhuang Zhang
, Long Ma
, Xin Fan
, Risheng Liu
:
PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation. 3706-3714 - Baogui Xu
, Chengjin Xu
, Bing Su
:
Cross-Modal Graph Attention Network for Entity Alignment. 3715-3723 - Yuwei Zhou
, Xin Wang
, Hong Chen
, Xuguang Duan
, Wenwu Zhu
:
Intra- and Inter-Modal Curriculum for Multimodal Learning. 3724-3735 - Yaobin Zhang
, Jianming Lv
, Chen Liu
, Hongmin Cai
:
Graph based Spatial-temporal Fusion for Multi-modal Person Re-identification. 3736-3744 - Yuanbin Wang
, Shaofei Huang
, Yulu Gao
, Zhen Wang
, Rui Wang
, Kehua Sheng
, Bo Zhang
, Si Liu
:
Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation. 3745-3754 - Zhaojian Li
, Bin Zhao
, Yuan Yuan
:
Bio-Inspired Audiovisual Multi-Representation Integration via Self-Supervised Learning. 3755-3764 - Junyin Wang
, Chenghu Du
, Hui Li
, Shengwu Xiong
:
DLFusion: Painting-Depth Augmenting-LiDAR for Multimodal Fusion 3D Object Detection. 3765-3776 - Wenna Wang
, Tao Zhuo
, Xiuwei Zhang
, Mingjun Sun
, Hanlin Yin
, Yinghui Xing
, Yanning Zhang
:
Automatic Network Architecture Search for RGB-D Semantic Segmentation. 3777-3786 - Nuo Chen
, Jin Xie
, Jing Nie
, Jiale Cao, Zhuang Shao, Yanwei Pang:
Attentive Alignment Network for Multispectral Pedestrian Detection. 3787-3795 - Dong Chen
, Siliang Tang
, Zijin Shen
, Guoming Wang
, Jun Xiao
, Yueting Zhuang
, Carl Yang
:
FedAA: Using Non-sensitive Modalities to Improve Federated Learning while Preserving Image Privacy. 3796-3806 - Mengze Li
, Haoyu Zhang
, Juncheng Li
, Zhou Zhao
, Wenqiao Zhang
, Shengyu Zhang
, Shiliang Pu
, Yueting Zhuang
, Fei Wu
:
Unsupervised Domain Adaptation for Video Object Grounding with Cascaded Debiasing Learning. 3807-3816 - Zhengyang Mao
, Wei Ju
, Yifang Qin
, Xiao Luo
, Ming Zhang
:
RAHNet: Retrieval Augmented Hybrid Network for Long-tailed Graph Classification. 3817-3826 - Youngjoon Jang
, Kyeongha Rho
, Jong-Bin Woo
, Hyeongkeun Lee
, Jihwan Park
, Youshin Lim
, Byeong-Yeol Kim
, Joon Son Chung
:
That's What I Said: Fully-Controllable Talking Face Generation. 3827-3836 - Quanmin Liang
, Xiawu Zheng
, Kai Huang
, Yan Zhang
, Jie Chen
, Yonghong Tian
:
Event-Diffusion: Event-Based Image Reconstruction and Restoration with Diffusion Models. 3837-3846 - Han Fang
, Zhifei Yang
, Xianghao Zang
, Chao Ban
, Zhongjiang He
, Hao Sun
, Lanxiang Zhou
:
Mask to Reconstruct: Cooperative Semantics Completion for Video-text Retrieval. 3847-3856 - Yixuan Ma
, Kun Zhan
:
Self-Contrastive Graph Diffusion Network. 3857-3865 - Yiyang Chen
, Shanshan Zhao
, Changxing Ding
, Liyao Tang
, Chaoyue Wang
, Dacheng Tao
:
Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic Segmentation. 3866-3875 - Ren Wang
, Haoliang Sun
, Xiushan Nie
, Yuxiu Lin
, Xiaoming Xi
, Yilong Yin
:
Multi-View Representation Learning via View-Aware Modulation. 3876-3886 - Boxiang Yun
, Xingran Xie
, Qingli Li
, Yan Wang
:
Uni-Dual: A Generic Unified Dual-Task Medical Self-Supervised Learning Framework. 3887-3896 - Yifan Dong
, Suhang Wu
, Fandong Meng
, Jie Zhou
, Xiaoli Wang
, Jianxin Lin
, Jinsong Su
:
Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering. 3897-3907 - Shilong Li
, Boyu Qiao
, Kun Li
, Qianqian Lu
, Meng Lin
, Wei Zhou
:
Multi-modal Social Bot Detection: Learning Homophilic and Heterophilic Connections Adaptively. 3908-3916 - Weibing Zhao
, Haiming Zhang
, Chaoda Zheng
, Xu Yan
, Shuguang Cui
, Zhen Li
:
CPU: Codebook Lookup Transformer with Knowledge Distillation for Point Cloud Upsampling. 3917-3925 - Mohit Tomar
, Abhisek Tiwari
, Tulika Saha
, Sriparna Saha
:
Your tone speaks louder than your face! Modality Order Infused Multi-modal Sarcasm Detection. 3926-3933 - Jieming Wang
, Ziyan Li
, Jianfei Yu
, Li Yang
, Rui Xia
:
Fine-Grained Multimodal Named Entity Recognition and Grounding with a Generative Framework. 3934-3943 - Wei Liu
, Xinlei Yang
, Zhenhua Li
, Feng Qian
:
SkipStreaming: Pinpointing User-Perceived Redundancy in Correlated Web Video Streaming through the Lens of Scenes. 3944-3953 - Zhao Yang
, Bing Su
, Ji-Rong Wen
:
Synthesizing Long-Term Human Motions with Diffusion Models via Coherent Sampling. 3954-3964 - Haichao Zhang
, Yi Xu
, Hongsheng Lu
, Takayuki Shimizu
, Yun Fu
:
Layout Sequence Prediction From Noisy Mobile Modality. 3965-3974 - Chenyang Lyu, Wenxi Li
, Tianbo Ji
, Longyue Wang
, Liting Zhou, Cathal Gurrin
, Linyi Yang
, Yi Yu, Yvette Graham
, Jennifer Foster
:
Graph-Based Video-Language Learning with Multi-Grained Audio-Visual Alignment. 3975-3984 - Meng Liu
, Fenglei Zhang
, Xin Luo
, Fan Liu
, Yinwei Wei
, Liqiang Nie
:
Advancing Video Question Answering with a Multi-modal and Multi-layer Question Enhancement Network. 3985-3993 - Wenrui Li
, Xi-Le Zhao
, Zhengyu Ma
, Xingtao Wang
, Xiaopeng Fan
, Yonghong Tian
:
Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning. 3994-4002 - Qianru Qiu
, Xueting Wang
, Mayu Otani
:
Multimodal Color Recommendation in Vector Graphic Documents. 4003-4011 - Hengcan Shi
, Munawar Hayat
, Jianfei Cai
:
Open-Vocabulary Object Detection via Scene Graph Discovery. 4012-4021 - Jushuo Chen
, Feifei Dai
, Xiaoyan Gu
, Jiang Zhou
, Bo Li
, Weiping Wang
:
Universal Domain Adaptive Network Embedding for Node Classification. 4022-4030 - Chenyu Yang
, Mengxi Chen
, Yanfeng Wang
, Yu Wang
:
Uncertainty-Guided End-to-End Audio-Visual Speaker Diarization for Far-Field Recordings. 4031-4041 - Tianyu Liu
, Peng Zhang
, Wei Huang
, Yufei Zha
, Tao You
, Yanning Zhang
:
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization. 4042-4052 - Yuhuan Lu
, Bangchao Deng
, Weijian Yu
, Dingqi Yang
:
HELIOS: Hyper-Relational Schema Modeling from Knowledge Graphs. 4053-4064 - Zhongfan Sun
, Yongli Hu
, Qingqing Gao
, Huajie Jiang
, Junbin Gao
, Yanfeng Sun
, Baocai Yin
:
Breaking the Barrier Between Pre-training and Fine-tuning: A Hybrid Prompting Model for Knowledge-Based VQA. 4065-4073 - Ziteng Wen
, Hai Xu
, Chenyu Liu
, Tao Guo
, Jinshui Hu
, Xuming He
, Fengren Wang
, Shun Lou
, Haibo Fan
:
OccluBEV: Occlusion Aware Spatiotemporal Modeling for Multi-view 3D Object Detection. 4074-4083
Poster Session III: Understanding Multimedia Content -- Vision and Language
- Xingyu Shen
, Xiang Zhang
, Xun Yang
, Yibing Zhan
, Long Lan
, Jianfeng Dong
, Hongzhou Wu
:
Semantics-Enriched Cross-Modal Alignment for Complex-Query Video Moment Retrieval. 4109-4118 - Yun Liu
, Zhongsheng Yan
, Sixiang Chen
, Tian Ye
, Wenqi Ren
, Erkang Chen
:
NightHazeFormer: Single Nighttime Haze Removal Using Prior Query Transformer. 4119-4128 - Hua Li
, Junyan Liang
, Wenjie Li
, Wenhui Wu
:
FSNet: Frequency Domain Guided Superpixel Segmentation Network for Complex Scenes. 4129-4137 - Zhi Chen
, Peng-Fei Zhang
, Jingjing Li
, Sen Wang
, Zi Huang
:
Zero-Shot Learning by Harnessing Adversarial Samples. 4138-4146 - Tian Ye
, Sixiang Chen
, Yun Liu
, Wenhao Chai
, Jinbin Bai
, Wenbin Zou
, Yunchen Zhang
, Mingchao Jiang
, Erkang Chen
, Chenghao Xue
:
Sequential Affinity Learning for Video Restoration. 4147-4156 - Yiwei Ma
, Xiaoshuai Sun
, Jiayi Ji
, Guannan Jiang
, Weilin Zhuang
, Rongrong Ji
:
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval. 4157-4168 - Rui Xu
, Le Hui
, Yuehui Han
, Jianjun Qian
, Jin Xie:
Transformer-based Point Cloud Generation Network. 4169-4177 - Jun Guo
, Xingyu Zheng
, Aishan Liu
, Siyuan Liang
, Yisong Xiao
, Yichao Wu
, Xianglong Liu
:
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks. 4178-4189 - Daizong Liu
, Xiaoye Qu
, Jianfeng Dong
, Guoshun Nan
, Pan Zhou
, Zichuan Xu
, Lixing Chen
, He Yan
, Yu Cheng:
Filling the Information Gap between Video and Query for Language-Driven Moment Retrieval. 4190-4199 - Zhibo Tian
, Xiaolin Zhang
, Peng Zhang
, Kun Zhan
:
Improving Semi-Supervised Semantic Segmentation with Dual-Level Siamese Structure Network. 4200-4208 - Jiarui Yang
, Chuan Wang
, Zeming Liu
, Jiahong Wu
, Dongsheng Wang
, Liang Yang
, Xiaochun Cao
:
Focusing on Flexible Masks: A Novel Framework for Panoptic Scene Graph Generation with Relation Constraints. 4209-4218 - Chunyu Xie
, Heng Cai
, Jincheng Li
, Fanjing Kong
, Xiaoyu Wu
, Jianfei Song
, Henrique Morimitsu
, Lin Yao
, Dexin Wang
, Xiangzheng Zhang
, Dawei Leng
, Baochang Zhang
, Xiangyang Ji
, Yafeng Deng
:
CCMB: A Large-scale Chinese Cross-modal Benchmark. 4219-4227 - Sixiang Chen
, Tian Ye
, Yun Liu
, Jinbin Bai
, Haoyu Chen
, Yunlong Lin
, Jun Shi
, Erkang Chen
:
CPLFormer: Cross-scale Prototype Learning Transformer for Image Snow Removal. 4228-4239 - Xuan Yao
, Junyu Gao
, Mengyuan Chen
, Changsheng Xu
:
Video Entailment via Reaching a Structure-Aware Cross-modal Consensus. 4240-4249 - Cheng Chen
, Yunqing Chen
, Shuang Song
, Jianan Wang
, Huansheng Ning
, Ruoxiu Xiao
:
Cerebrovascular Segmentation in TOF-MRA with Topology Regularization Adversarial Model. 4250-4259 - Jiale Yu
, Baopeng Zhang
, Qirui Li
, Haoyang Chen
, Zhu Teng
:
Hierarchical Reasoning Network with Contrastive Learning for Few-Shot Human-Object Interaction Recognition. 4260-4268 - Sixiang Chen
, Tian Ye
, Chenghao Xue
, Haoyu Chen
, Yun Liu
, Erkang Chen
, Lei Zhu
:
Uncertainty-Driven Dynamic Degradation Perceiving and Background Modeling for Efficient Single Image Desnowing. 4269-4280 - Chenpeng Du
, Qi Chen
, Tianyu He
, Xu Tan
, Xie Chen
, Kai Yu
, Sheng Zhao
, Jiang Bian
:
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder. 4281-4289 - Jiexin Wang
, Yujie Zhou, Wenwen Qiang
, Ying Ba
, Bing Su, Ji-Rong Wen
:
Spatio-Temporal Branching for Motion Prediction using Motion Increments. 4290-4299 - Zhenqian Wu
, Yazhou Ren
, Xiaorong Pu
, Zhifeng Hao
, Lifang He
:
Generative Neutral Features-Disentangled Learning for Facial Expression Recognition. 4300-4308 - Tingting Wang
, Yongxu Ye
, Faming Fang
, Guixu Zhang
, Ming Xu
:
Deep Algorithm Unrolling with Registration Embedding for Pansharpening. 4309-4318 - Huilin Zhu
, Jingling Yuan, Xian Zhong, Zhengwei Yang
, Zheng Wang, Shengfeng He
:
DAOT: Domain-Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting. 4319-4329 - Wei Ji
, Renjie Liang
, Lizi Liao
, Hao Fei
, Fuli Feng
:
Partial Annotation-based Video Moment Retrieval via Iterative Learning. 4330-4339 - Yirui Shen
, Jingxuan Kang
, Shuang Li
, Zhenjie Yu
, Shuigen Wang
:
Style Transfer Meets Super-Resolution: Advancing Unpaired Infrared-to-Visible Image Translation with Detail Enhancement. 4340-4348 - Chongyang Zhao
, Yuankai Qi
, Qi Wu
:
Mind the Gap: Improving Success Rate of Vision-and-Language Navigation by Revisiting Oracle Success Routes. 4349-4358 - Xinda Liu
, Yaohui Zhu
, Linhu Liu
, Jiang Tian
, Lili Wang
:
Feature-Suppressed Contrast for Self-Supervised Food Pre-training. 4359-4367 - Yuchen Zhou
, Guang Tan
, Mengtang Li
, Chao Gou
:
Learning from Easy to Hard Pairs: Multi-step Reasoning Network for Human-Object Interaction Detection. 4368-4377 - Chengyang Fang
, Jiangnan Li
, Liang Li
, Can Ma
, Dayong Hu
:
Separate and Locate: Rethink the Text in Text-based Visual Question Answering. 4378-4388 - Yunshi Lan
, Xiang Li
, Xin Liu
, Yang Li
, Wei Qin
, Weining Qian
:
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts. 4389-4400 - Jie Xu
, Shanshan Zhang
, Jian Yang
:
Adaptive Decoupled Pose Knowledge Distillation. 4401-4409 - Li Li
, Chenwei Wang
, You Qin
, Wei Ji
, Renjie Liang
:
Biased-Predicate Annotation Identification via Unbiased Visual Predicate Representation. 4410-4420 - Huan Liu
, Lu Zhang
, Jihong Guan
, Shuigeng Zhou
:
Zero-Shot Object Detection by Semantics-Aware DETR with Adaptive Contrastive Loss. 4421-4430 - Tao Jin
, Xize Cheng
, Linjun Li
, Wang Lin
, Ye Wang, Zhou Zhao
:
Rethinking Missing Modality Learning from a Decoding Perspective. 4431-4439 - Zhijin Ge
, Fanhua Shang
, Hongying Liu
, Yuanyuan Liu
, Liang Wan
, Wei Feng
, Xiaosen Wang
:
Improving the Transferability of Adversarial Examples with Arbitrary Style Transfer. 4440-4449 - Xin Wang
, Zihao Wu
, Hong Chen
, Xiaohan Lan
, Wenwu Zhu
:
Mixup-Augmented Temporally Debiased Video Grounding with Content-Location Disentanglement. 4450-4459 - Yaya Shi
, Haowei Liu
, Haiyang Xu
, Zongyang Ma
, Qinghao Ye
, Anwen Hu
, Ming Yan
, Ji Zhang
, Fei Huang
, Chunfeng Yuan
, Bing Li
, Weiming Hu
, Zheng-Jun Zha
:
Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval. 4460-4470 - Jiawei Li
, Jiansheng Chen
, Jinyuan Liu
, Huimin Ma
:
Learning a Graph Neural Network with Cross Modality Interaction for Image Fusion. 4471-4479 - Chaoya Jiang
, Haiyang Xu
, Wei Ye
, Qinghao Ye
, Chenliang Li
, Ming Yan
, Bin Bi
, Shikun Zhang
, Fei Huang
, Ji Zhang
:
COPA : Efficient Vision-Language Pre-training through Collaborative Object- and Patch-Text Alignment. 4480-4491 - Shuyu Yang
, Yinan Zhou
, Zhedong Zheng
, Yaxiong Wang
, Li Zhu
, Yujiao Wu
:
Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark. 4492-4501 - Shiwei Gan
, Yafeng Yin
, Zhiwei Jiang
, Lei Xie
, Sanglu Lu
:
Towards Real-Time Sign Language Recognition and Translation on Edge Devices. 4502-4512 - Qiwei Li
, Zuchao Li
, Xiantao Cai
, Bo Du
, Hai Zhao
:
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling. 4513-4523 - Shaokun Wang
, Weiwei Shi
, Yuhang He
, Yifan Yu
, Yihong Gong
:
Non-Exemplar Class-Incremental Learning via Adaptive Old Class Reconstruction. 4524-4534 - Ruixiang Jiang
, Lingbo Liu
, Changwen Chen
:
CLIP-Count: Towards Text-Guided Zero-Shot Object Counting. 4535-4545 - Fuxiang Yang
, Tonghua Su
, Xiang Zhou
, Donglin Di
, Zhongjie Wang
, Songze Li
:
Self-Supervised Cross-Language Scene Text Editing. 4546-4554 - Feng Chen
, Jiajia Liu
, Kaixiang Ji
, Wang Ren
, Jian Wang
, Jingdong Chen
:
Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER. 4555-4563 - Liang He
, Hongke Wang
, Yongchang Cao
, Zhen Wu
, Jianbing Zhang
, Xinyu Dai
:
MORE: A Multimodal Object-Entity Relation Extraction Dataset with a Benchmark Evaluation. 4564-4573 - Ziyue Wu
, Junyu Gao
, Changsheng Xu
:
Weakly-supervised Video Scene Graph Generation via Unbiased Cross-modal Learning. 4574-4583 - Jiong Yin
, Liang Li
, Jiehua Zhang
, Chenggang Yan
, Lei Zhang
, Zunjie Zhu
:
Reducing Intrinsic and Extrinsic Data Biases for Moment Localization with Natural Language. 4584-4594 - Yaoming Wang
, Yuchen Liu
, Xiaopeng Zhang
, Jin Li
, Bowen Shi
, Chenglin Li
, Wenrui Dai
, Hongkai Xiong
, Qi Tian
:
VioLET: Vision-Language Efficient Tuning with Collaborative Multi-modal Gradients. 4595-4605 - Junyi Zeng
, Chong Bao
, Rui Chen
, Zilong Dong
, Guofeng Zhang
, Hujun Bao
, Zhaopeng Cui
:
Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with Whitted-Style Ray Tracing. 4606-4615 - Hongbin Xu
, Weitao Chen
, Yang Liu
, Zhipeng Zhou
, Haihong Xiao
, Baigui Sun
, Xuansong Xie
, Wenxiong Kang
:
Semi-supervised Deep Multi-view Stereo. 4616-4625 - Chen Jiang
, Hong Liu
, Xuzheng Yu
, Qing Wang
, Yuan Cheng
, Jia Xu
, Zhongyi Liu
, Qingpei Guo
, Wei Chu
, Ming Yang
, Yuan Qi
:
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning. 4626-4636 - Tian Gan
, Xiao Wang
, Yan Sun
, Jianlong Wu
, Qingpei Guo
, Liqiang Nie
:
Temporal Sentence Grounding in Streaming Videos. 4637-4646 - Decheng Liu
, Weizhao Yang
, Chunlei Peng
, Nannan Wang
, Ruimin Hu
, Xinbo Gao
:
Modality-agnostic Augmented Multi-Collaboration Representation for Semi-supervised Heterogenous Face Recognition. 4647-4656 - Yifan Li
, Yaochen Li
, Wenneng Tang
, Zhifeng Zhu
, Jinhuo Yang
, Yuehu Liu
:
Swin-UNIT: Transformer-based GAN for High-resolution Unpaired Image Translation. 4657-4665 - Xiaoxiong Du
, Jun Peng
, Yiyi Zhou
, Jinlu Zhang
, Siting Chen
, Guannan Jiang
, Xiaoshuai Sun
, Rongrong Ji
:
PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks. 4666-4677 - Jingzheng Li
, Hailong Sun
:
LiFT: Transfer Learning in Vision-Language Models for Downstream Adaptation and Generalization. 4678-4687 - Manman Zhang
, Ge Luo
, Yuchen Ma
, Sheng Li
, Zhenxing Qian
, Xinpeng Zhang
:
VCMaster: Generating Diverse and Fluent Live Video Comments Based on Multimodal Contexts. 4688-4696 - Fulong Ye
, Yuxing Long
, Fangxiang Feng
, Xiaojie Wang
:
Whether you can locate or not? Interactive Referring Expression Generation. 4697-4706 - Yiming Li
, Xiaoshan Yang
, Changsheng Xu
:
Iterative Learning with Extra and Inner Knowledge for Long-tail Dynamic Scene Graph Generation. 4707-4715 - Jing Zhang
, Yingshuai Xie
, Xiaoqiang Liu
:
Improving Image Captioning through Visual and Semantic Mutual Promotion. 4716-4724 - Minghao Zhu
, Xiao Lin
, Ronghao Dang
, Chengju Liu
, Qijun Chen
:
Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning. 4725-4736 - Zhuoling Li
, Yong Wang
:
Better Integrating Vision and Semantics for Improving Few-shot Classification. 4737-4746 - Mingrui Lao
, Nan Pu
, Yu Liu
, Zhun Zhong
, Erwin M. Bakker
, Nicu Sebe
, Michael S. Lew
:
Multi-Domain Lifelong Visual Question Answering via Self-Critical Distillation. 4747-4758 - Xue Song
, Jingjing Chen
, Yu-Gang Jiang
:
Relation Triplet Construction for Cross-modal Text-to-Video Retrieval. 4759-4767 - Shuyi Ouyang
, Hongyi Wang
, Ziwei Niu
, Zhenjia Bai
, Shiao Xie
, Yingying Xu
, Ruofeng Tong
, Yen-Wei Chen
, Lanfen Lin
:
HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification. 4768-4777 - Haonan Zhang
, Lianli Gao
, Pengpeng Zeng
, Alan Hanjalic
, Heng Tao Shen
:
Depth-Aware Sparse Transformer for Video-Language Learning. 4778-4787 - Chuanpeng Yang
, Fuqing Zhu
, Jizhong Han
, Songlin Hu
:
Invariant Meets Specific: A Scalable Harmful Memes Detection Framework. 4788-4797 - Wuyuan Xie
, Miaohui Wang
:
A Method of Micro-Geometric Details Preserving in Surface Reconstruction from Gradient. 4798-4806 - Wenhui Li
, Yan Wang
, Yuting Su
, Lanjun Wang
, Weizhi Nie
, An-An Liu
:
Progressive Positive Association Framework for Image and Text Retrieval. 4807-4815 - Fangzheng Tian
, Sungchan Kim
:
Globally-Robust Instance Identification and Locally-Accurate Keypoint Alignment for Multi-Person Pose Estimation. 4816-4827 - Kun Zhang
, Lei Zhang
, Bo Hu
, Mengxiao Zhu
, Zhendong Mao
:
Unlocking the Power of Cross-Dimensional Semantic Dependency for Image-Text Matching. 4828-4837 - Zhiqing Chen
, Yawei Luo
, Jian Shao
, Yi Yang, Chunping Wang, Lei Chen
, Jun Xiao
:
Dark Knowledge Balance Learning for Unbiased Scene Graph Generation. 4838-4847 - Yanbiao Ma
, Licheng Jiao
, Fang Liu
, Shuyuan Yang
, Xu Liu
, Lingling Li
:
Orthogonal Uncertainty Representation of Data Manifold for Robust Long-Tailed Learning. 4848-4857 - Rundong He
, Rongxue Li
, Zhongyi Han
, Xihong Yang
, Yilong Yin
:
Topological Structure Learning for Weakly-Supervised Out-of-Distribution Detection. 4858-4866 - Weikang Wang
, Jing Liu
, Yuting Su
, Weizhi Nie
:
Efficient Spatio-Temporal Video Grounding with Semantic-Guided Feature Decomposition. 4867-4876 - Jiale Lu
, Lianggangxu Chen
, Youqi Song
, Shaohui Lin
, Changbo Wang
, Gaoqi He
:
Prior Knowledge-driven Dynamic Scene Graph Generation with Causal Inference. 4877-4885 - Junwen Chen
, Jie Zhu
, Yu Kong
:
ATM: Action Temporality Modeling for Video Question Answering. 4886-4895 - Shaoxiang Guo
, Qing Cai
, Lin Qi
, Junyu Dong
:
CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting. 4896-4907 - Ying Yang
, Mulin Chen
, Xuelong Li
:
A Multitask Framework for Graffiti-to-Image Translation. 4908-4916 - Zihao Wang
, Weichen Zhang
, Weihong Bao
, Fei Long
, Chun Yuan
:
Adaptive Contrastive Learning for Learning Robust Representations under Label Noise. 4917-4927 - Yunyi Xuan
, Weijie Chen
, Shicai Yang
, Di Xie
, Luojun Lin
, Yueting Zhuang
:
Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification. 4928-4938 - Yanzhe Chen
, Huasong Zhong
, Xiangteng He
, Yuxin Peng
, Lele Cheng
:
Real20M: A Large-scale E-commerce Dataset for Cross-domain Retrieval. 4939-4948 - Dongsheng Xu
, Wenye Zhao
, Yi Cai
, Qingbao Huang
:
Zero-TextCap: Zero-shot Framework for Text-based Image Captioning. 4949-4957 - Zhaoxin Wang
, Handing Wang
, Cong Tian
, Yaochu Jin
:
Adversarial Training of Deep Neural Networks Guided by Texture and Structural Information. 4958-4967 - Xu Gu
, Yuchong Sun
, Feiyue Ni
, Shizhe Chen
, Xihua Wang
, Ruihua Song
, Boyuan Li
, Xiang Cao
:
TeViS: Translating Text Synopses to Video Storyboards. 4968-4979 - Nan Xi
, Jingjing Meng
, Junsong Yuan
:
Chain-of-Look Prompting for Verb-centric Surgical Triplet Recognition in Endoscopic Videos. 5007-5016 - Wencan Huang
, Daizong Liu
, Wei Hu
:
Dense Object Grounding in 3D Scenes. 5017-5026 - Xiaoxuan He
, Siming Fu
, Xinpeng Ding
, Yuchen Cao
, Hualiang Wang
:
Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition. 5027-5037 - Kanzhi Cheng
, Wenpo Song
, Zheng Ma
, Wenhao Zhu
, Zixuan Zhu
, Jianbing Zhang
:
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model. 5038-5047 - Yue Wang
, Jinlong Peng
, Jiangning Zhang
, Ran Yi
, Liang Liu
, Yabiao Wang
, Chengjie Wang
:
Toward High Quality Facial Representation Learning. 5048-5058 - Zikai Gao
, Peng Qiao
, Yong Dou
:
HAAN: Human Action Aware Network for Multi-label Temporal Action Detection. 5059-5069 - Baoli Sun
, Xinchen Ye
, Zhihui Wang
, Haojie Li
, Zhiyong Wang
:
Exploring Coarse-to-Fine Action Token Localization and Interaction for Fine-grained Video Action Recognition. 5070-5078 - Zhe Wang
, Jiaoyan Guan
, Mengping Yang
, Ting Xiao
, Ziqiu Chi
:
Semantic-Aware Generator and Low-level Feature Augmentation for Few-shot Image Generation. 5079-5088 - Bowen Yuan
, Sisi You
, Bing-Kun Bao
:
Self-PT: Adaptive Self-Prompt Tuning for Low-Resource Visual Question Answering. 5089-5098 - Ping Wang
, Xin Yuan
:
SAUNet: Spatial-Attention Unfolding Network for Image Compressive Sensing. 5099-5108 - Lin Deng
, Yuzhong Zhong
, Maoning Wang
, Jianwei Zhang
:
CONICA: A Contrastive Image Captioning Framework with Robust Similarity Learning. 5109-5119 - Zikang Liu
, Sihan Chen
, Longteng Guo
, Handong Li
, Xingjian He
, Jing Liu
:
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner. 5120-5131 - Jiali Chen
, Zhenjun Guo
, Jiayuan Xie
, Yi Cai
, Qing Li
:
Deconfounded Visual Question Generation with Causal Inference. 5132-5142 - Jing Zhao
, Heliang Zheng
, Chaoyue Wang
, Long Lan
, Wanrong Huang
, Wenjing Yang
:
Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator. 5143-5152 - Wenqing Wang
, Kaifeng Gao
, Yawei Luo
, Tao Jiang
, Fei Gao
, Jian Shao
, Jianwen Sun
, Jun Xiao
:
Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation. 5153-5163 - Shuo Yang
, Zirui Shang
, Xinxiao Wu
:
Probability Distribution Based Frame-supervised Language-driven Action Localization. 5164-5173 - Yaoyuan Liang
, Zhao Yang
, Yansong Tang
, Jiashuo Fan
, Ziran Li
, Jingang Wang
, Philip H. S. Torr
, Shao-Lun Huang
:
LUNA: Language as Continuing Anchors for Referring Expression Comprehension. 5174-5184 - Xuming Hu
, Junzhe Chen
, Aiwei Liu
, Shiao Meng
, Lijie Wen
, Philip S. Yu
:
Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity and Relation Extraction. 5185-5194 - Xiao Liang
, Di Wang
, Quan Wang
, Bo Wan
, Lingling An
, Lihuo He
:
Language-Guided Visual Aggregation Network for Video Question Answering. 5195-5203 - Jue Chen
, Huan Yuan
, Jianchao Tan
, Bin Chen
, Chengru Song
, Di Zhang
:
Resource Constrained Model Compression via Minimax Optimization for Spiking Neural Networks. 5204-5213 - Huimin Huang
, Yawen Huang
, Shiao Xie
, Lanfen Lin
, Ruofeng Tong, Yen-Wei Chen
, Yuexiang Li
, Yefeng Zheng
:
Semi-Supervised Convolutional Vision Transformer with Bi-Level Uncertainty Estimation for Medical Image Segmentation. 5214-5222 - Qian Yang
, Qian Chen
, Wen Wang
, Baotian Hu
, Min Zhang
:
Enhancing Multi-modal Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation. 5223-5234 - Linbo Wang
, Jing Wu
, Xianyong Fang
, Zhengyi Liu
, Chenjie Cao
, Yanwei Fu
:
Local Consensus Enhanced Siamese Network with Reciprocal Loss for Two-view Correspondence Learning. 5235-5243 - Rui Cao
, Ming Shan Hee
, Adriel Kuek
, Wen-Haw Chong
, Roy Ka-Wei Lee
, Jing Jiang
:
Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection. 5244-5252 - Tiantian Gong
, Guodong Du
, Junsheng Wang
, Yongkang Ding
, Liyan Zhang
:
Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identification. 5253-5261 - Yuanhao Zhai
, Mingzhen Huang
, Tianyu Luan
, Lu Dong
, Ifeoma Nwogu
, Siwei Lyu
, David S. Doermann
, Junsong Yuan
:
Language-guided Human Motion Synthesis with Atomic Actions. 5262-5271 - Yuan Zhang
, Weihua Chen
, Yichen Lu
, Tao Huang
, Xiuyu Sun
, Jian Cao
:
Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty. 5272-5280 - Yu Zhao
, Hao Fei
, Yixin Cao
, Bobo Li
, Meishan Zhang
, Jianguo Wei
, Min Zhang
, Tat-Seng Chua
:
Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling. 5281-5291 - Ziqiao Peng
, Yihao Luo
, Yue Shi
, Hao Xu
, Xiangyu Zhu
, Hongyan Liu
, Jun He
, Zhaoxin Fan
:
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces. 5292-5301 - Yujie Zhou
, Wenwen Qiang
, Anyi Rao
, Ning Lin
, Bing Su
, Jiaqi Wang
:
Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization. 5302-5310 - Guojin Zhong
, Jin Yuan
, Pan Wang
, Kailun Yang
, Weili Guan
, Zhiyong Li
:
Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation. 5311-5320 - Jiafeng Mao
, Xueting Wang
, Kiyoharu Aizawa
:
Guided Image Synthesis via Initial Image Editing in Diffusion Model. 5321-5329 - Song Yang
, Qiang Li
, Wenhui Li
, Min Liu
, Xuanya Li
, Anan Liu
:
External Knowledge Dynamic Modeling for Image-text Retrieval. 5330-5338 - Qiang Wang
, Junlong Du
, Ke Yan
, Shouhong Ding
:
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning. 5339-5347 - Zhou Zhou
, Jiahao Chao
, Jiali Gong
, Hongfan Gao
, Zhenbing Zeng
, Zhengfeng Yang
:
Enhancing Real-Time Super Resolution with Partial Convolution and Efficient Variance Attention. 5348-5357 - Binyi Su
, Hua Zhang
, Zhong Zhou
:
HSIC-based Moving Weight Averaging for Few-Shot Open-Set Object Detection. 5358-5369 - Zhihong Chen
, Zilei Wang
, Yixin Zhang
:
Exploiting Low-confidence Pseudo-labels for Source-free Object Detection. 5370-5379 - Runnan Chen
, Xinge Zhu
, Nenglun Chen
, Wei Li
, Yuexin Ma
, Ruigang Yang
, Wenping Wang
:
Bridging Language and Geometric Primitives for Zero-shot Point Cloud Segmentation. 5380-5388 - Yuehui Han
, Jiaxin Chen
, Jianjun Qian
, Jin Xie:
Graph Spectral Perturbation for 3D Point Cloud Contrastive Learning. 5389-5398 - Jiahua Rao
, Zifei Shan
, Longpo Liu
, Yao Zhou
, Yuedong Yang
:
Retrieval-based Knowledge Augmented Vision Language Pre-training. 5399-5409 - Yulin Jin
, Xiaoyu Zhang
, Jian Lou
, Xiaofeng Chen
:
ACQ: Few-shot Backdoor Defense via Activation Clipping and Quantizing. 5410-5418 - Yi Tang
, Hiroshi Kawasaki
, Takafumi Iwaguchi
:
Underwater Image Enhancement by Transformer-based Diffusion Model with Non-uniform Sampling for Skip Strategy. 5419-5427 - Zhenghan Chen
, Changzeng Fu
, Ruoxue Wu
, Ye Wang
, Xunzhu Tang, Xiaoxuan Liang
:
LGFat-RGCN: Faster Attention with Heterogeneous RGCN for Medical ICD Coding Generation. 5428-5435 - Jianlong Yuan
, Jinchao Ge
, Zhibin Wang
, Yifan Liu
:
Semi-supervised Semantic Segmentation with Mutual Knowledge Distillation. 5436-5444 - Tao Niu
, Yihang Lou
, Yinglei Teng
, Jianzhong He
, Yiding Liu
:
Shift Pruning: Equivalent Weight Pruning for CNN via Differentiable Shift Operator. 5445-5454 - Shuman Fang
, Shuai Liu
, Jie Li
, Guannan Jiang
, Xianming Lin
, Rongrong Ji
:
Improving Human-Object Interaction Detection via Virtual Image Learning. 5455-5463 - Bo Zhang
, Jian Wang
, Hui Ma
, Bo Xu
, Hongfei Lin
:
ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Dialogue Generation. 5464-5473 - Borui Jiang
, Yadong Mu
:
Diffused Fourier Network for Video Action Segmentation. 5474-5483 - Rui Xu
, Yong Luo
, Han Hu
, Bo Du
, Jialie Shen
, Yonggang Wen
:
Rethinking the Localization in Weakly Supervised Object Localization. 5484-5494 - Jiahua Xiao
, Yantao Ji
, Xing Wei
:
Hyperspectral Image Denoising with Spectrum Alignment. 5495-5503 - Zilin Du
, Yunxin Li
, Xu Guo
, Yidan Sun
, Boyang Li
:
Training Multimedia Event Extraction With Generated Images and Captions. 5504-5513 - Xixi Nie
, Bo Hu
, Xinbo Gao
, Leida Li
, Xiaodan Zhang
, Bin Xiao
:
BMI-Net: A Brain-inspired Multimodal Interaction Network for Image Aesthetic Assessment. 5514-5522 - Sindhu B. Hegde
, Rudrabha Mukhopadhyay
, C. V. Jawahar
, Vinay P. Namboodiri
:
Towards Accurate Lip-to-Speech Synthesis in-the-Wild. 5523-5531 - Yicheng Song
, Shuyong Gao
, Haozhe Xing
, Yiting Cheng
, Yan Wang
, Wenqiang Zhang
:
Towards End-to-End Unsupervised Saliency Detection with Self-Supervised Top-Down Context. 5532-5541 - Hanbing Liu
, Jun-Yan He
, Zhi-Qi Cheng
, Wangmeng Xiang
, Qize Yang
, Wenhao Chai
, Gaoang Wang
, Xu Bao
, Bin Luo
, Yifeng Geng
, Xuansong Xie:
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation. 5542-5551 - Chunhui Zhang
, Xin Sun
, Yiqian Yang
, Li Liu
, Qiong Liu
, Xi Zhou
, Yanfeng Wang
:
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment. 5552-5561 - Nan Li
, Pijian Li
, Dongsheng Xu
, Wenye Zhao
, Yi Cai
, Qingbao Huang
:
Scene-text Oriented Visual Entailment: Task, Dataset and Solution. 5562-5571 - Songhe Deng
, Wei Zhuo
, Jinheng Xie
, Linlin Shen
:
QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation. 5572-5583 - Lele Lv
, Qing Liu
, Shichao Kan
, Yixiong Liang
:
Confidence-Aware Contrastive Learning for Semantic Segmentation. 5584-5593 - Ao Wang
, Hui Chen
, Zijia Lin
, Zixuan Ding
, Pengzhang Liu
, Yongjun Bao
, Weipeng Yan
, Guiguang Ding
:
Hierarchical Prompt Learning Using CLIP for Multi-label Classification with Single Positive Labels. 5594-5604 - Wenrui Li
, Zhengyu Ma
, Liang-Jian Deng
, Penghong Wang
, Jinqiao Shi
, Xiaopeng Fan
:
Reservoir Computing Transformer for Image-Text Retrieval. 5605-5613 - Gege Qi
, Yuefeng Chen
, Xiaofeng Mao
, Binyuan Hui
, Xiaodan Li
, Rong Zhang
, Hui Xue
:
Model Inversion Attack via Dynamic Memory Learning. 5614-5622 - Zhiming Hu
, Angela Ning Ye
, Salar Hosseini Khorasgani
, Iqbal Mohomed
:
AdaCLIP: Towards Pragmatic Multimodal Video Retrieval. 5623-5633 - Zhenyang Li
, Yangyang Guo
, Kejie Wang
, Xiaolin Chen
, Liqiang Nie
, Mohan S. Kankanhalli
:
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR. 5634-5644 - Keyu Tu
, Zilei Wang
, Junjie Li
, Yixin Zhang
:
Semi-supervised Domain Adaptation via Joint Contrastive Learning with Sensitivity. 5645-5654 - Xinzi Cao
, Xiawu Zheng
, Yunhang Shen
, Ke Li
, Jie Chen
, Yutong Lu
, Yonghong Tian
:
LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization. 5655-5664 - Cong-Duy Nguyen
, The-Anh Vu-Le
, Thong Nguyen
, Tho Quan, Anh Tuan Luu
:
Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment. 5665-5673 - Zheng Ma
, Mianzhi Pan
, Wenhan Wu
, Kanzhi Cheng
, Jianbing Zhang
, Shujian Huang
, Jiajun Chen
:
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models. 5674-5685 - Zhe Li
, Laurence T. Yang
, Xin Nie
, Bocheng Ren
, Xianjun Deng
:
Enhancing Sentence Representation with Visually-supervised Multimodal Pre-training. 5686-5695 - Longzheng Wang
, Chuang Zhang
, Hongbo Xu
, Yongxiu Xu
, Xiaohan Xu
, Siqi Wang
:
Cross-modal Contrastive Learning for Multimodal Fake News Detection. 5696-5704 - Dingyi Yang
, Hongyu Chen
, Xinglin Hou
, Tiezheng Ge
, Yuning Jiang
, Qin Jin
:
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences. 5705-5715 - Yinuo Jing
, Chunyu Wang
, Ruxu Zhang
, Kongming Liang
, Zhanyu Ma
:
Category-Specific Prompts for Animal Action Recognition with Pretrained Vision-Language Models. 5716-5724 - Rui Xu
, Le Hui
, Yuehui Han
, Jianjun Qian
, Jin Xie:
Scene Graph Masked Variational Autoencoders for 3D Scene Generation. 5725-5733 - Shuo Huang
, Zongxin Yang
, Liangting Li
, Yi Yang, Jia Jia
:
AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion. 5734-5745 - Xu Bao
, Zhi-Qi Cheng
, Jun-Yan He
, Wangmeng Xiang
, Chenyang Li
, Jingdong Sun
, Hanbing Liu
, Wei Liu
, Bin Luo
, Yifeng Geng
, Xuansong Xie:
KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration. 5746-5755 - Zhiyu Jin
, Hanyang Yu
, Chen Haul
, Linxiang Wang
, Zuobin Zhu
, Qiu Shen
, Xun Cao
:
WormTrack: Dataset and Benchmark for Multi-Object Tracking in Worm Crowds. 5756-5763 - Jinglei Zhang
, Tiancheng Lin
, Yi Xu, Kai Chen, Rui Zhang:
Relational Contrastive Learning for Scene Text Recognition. 5764-5775 - Yiting Liu
, Liang Li
, Beichen Zhang
, Shan Huang
, Zheng-Jun Zha
, Qingming Huang
:
MaTCR: Modality-Aligned Thought Chain Reasoning for Multimodal Task-Oriented Dialogue Generation. 5776-5785 - Xiaoyu Li
, Xiaoxue Chen
, Zuming Huang
, Lele Xie
, Jingdong Chen
, Ming Yang
:
Fine-grained Pseudo Labels for Scene Text Recognition. 5786-5795 - Jiachen Sun
, Mark Ibrahim
, Melissa Hall
, Ivan Evtimov
, Z. Morley Mao
, Cristian Canton-Ferrer
, Caner Hazirbas
:
VPA: Fully Test-Time Visual Prompt Adaptation. 5796-5806 - Haonan Shi
, Wenwen Pan
, Zhou Zhao
, Mingmin Zhang
, Fei Wu
:
Unsupervised Domain Adaptation for Referring Semantic Segmentation. 5807-5818 - Guangming Shi
, Xuyang Li
, Xuemei Xie
, Mingxuan Yu
, Chengwei Rao
, Jiakai Luo
:
OCSKB: An Object Component Sketch Knowledge Base for Fast 6D Pose Estimation. 5819-5827 - Hongbo Sun
, Xiangteng He
, Jiahuan Zhou
, Yuxin Peng
:
Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition. 5828-5836
Poster Session IV: Engaging Users with Multimedia -- Emotional and Social Signals
- Teng Sun, Juntong Ni, Wenjie Wang, Liqiang Jing, Yinwei Wei, Liqiang Nie:
General Debiasing for Multimodal Sentiment Analysis. 5861-5869 - Tuukka Ruotsalo
, Kalle Mäkelä
, Michiel M. A. Spapé
, Luis A. Leiva
:
Feeling Positive? Predicting Emotional Image Similarity from Brain Signals. 5870-5878 - Tongjie Pan
, Yalan Ye
, Hecheng Cai
, Shudong Huang
, Yang Yang
, Guoqing Wang
:
Multimodal Physiological Signals Fusion for Online Emotion Recognition. 5879-5888 - Hanwei Liu
, Huiling Cai
, Qingcheng Lin
, Xuefeng Li
, Hui Xiao
:
Learning from More: Combating Uncertainty Cross-multidomain for Facial Expression Recognition. 5889-5898 - Yizhuo Lu
, Changde Du
, Qiongyi Zhou
, Dianpeng Wang
, Huiguang He
:
MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion. 5899-5908 - Jaeho Yoon
, Jaewoo Park
, Kensuke Wagata
, Hojin Park
, Andrew Beng Jin Teoh
:
Pretrained Implicit-Ensemble Transformer for Open-Set Authentication on Multimodal Mobile Biometrics. 5909-5922 - Bobo Li
, Hao Fei
, Lizi Liao
, Yu Zhao
, Chong Teng
, Tat-Seng Chua
, Donghong Ji
, Fei Li
:
Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition. 5923-5934 - Yiwei Ru
, Peipei Li
, Muyi Sun
, Yunlong Wang, Kunbo Zhang
, Qi Li
, Zhaofeng He
, Zhenan Sun
:
Sensing Micro-Motion Human Patterns using Multimodal mmRadar and Video Signal for Affective and Psychological Intelligence. 5935-5946 - Yunxiao Wang
, Meng Liu
, Zhe Li
, Yupeng Hu
, Xin Luo
, Liqiang Nie
:
Unlocking the Power of Multimodal Learning for Emotion Recognition in Conversation. 5947-5955 - Jiaxin Ye
, Yujie Wei
, Xin-Cheng Wen
, Chenglong Ma
, Zhizhong Huang
, Kunhong Liu
, Hongming Shan
:
Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition. 5956-5965 - Yuchen Liu
, Haoyu Zhang
, Shichao Liu
, Xiang Yin
, Zejun Ma
, Qin Jin
:
Emotionally Situated Text-to-Speech Synthesis in User-Agent Conversation. 5966-5974 - Wei-Bang Jiang
, Xuan-Hao Liu
, Wei-Long Zheng
, Bao-Liang Lu
:
Multimodal Adaptive Emotion Transformer with Flexible Modality Inputs on A Novel Dataset with Continuous Labels. 5975-5984 - Ming Jin
, Jinpeng Li
:
Graph to Grid: Learning Deep Representations for Multimodal Emotion Recognition. 5985-5993 - Shihao Zou
, Xianying Huang
, Xudong Shen
:
Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation. 5994-6003 - Minh Tran
, Yelin Kim
, Che-Chun Su
, Cheng-Hao Kuo
, Mohammad Soleymani
:
SAAML: A Framework for Semi-supervised Affective Adaptation via Metric Learning. 6004-6015 - Tian-Yu Xiang
, Xiao-Hu Zhou
, Xiao-Liang Xie
, Shi-Qi Liu
, Hong-Jun Yang
, Zhen-Qiu Feng
, Mei-Jiang Gui
, Hao Li
, De-Xing Huang
, Zeng-Guang Hou
:
Learning Shared Semantic Information from Multimodal Bio-signals for Brain-Muscle Modulation Analysis. 6016-6024 - Xiaoyu Chen
, Changde Du
, Qiongyi Zhou
, Huiguang He
:
Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning. 6025-6033 - Jicai Pan
, Shangfei Wang
:
Progressive Visual Content Understanding Network for Image Emotion Classification. 6034-6044 - Xiaocui Yang
, Shi Feng
, Daling Wang
, Yifei Zhang
, Soujanya Poria
:
Few-shot Multimodal Sentiment Analysis Based on Multimodal Probabilistic Fusion Prompts. 6045-6053 - Zhouan Zhu
, Chenguang Li
, Jicai Pan
, Xin Li
, Yufei Xiao
, Yanan Chang
, Feiyi Zheng
, Shangfei Wang
:
MEDIC: A Multimodal Empathy Dataset in Counseling. 6054-6062 - Yan Li
, Liang Zhang
, Xiangyuan Lan
, Dongmei Jiang
:
Towards Adaptable Graph Representation Learning: An Adaptive Multi-Graph Contrastive Transformer. 6063-6071 - Luojun Lin
, Zhifeng Shen
, Jia-Li Yin
, Qipeng Liu
, Yuanlong Yu
, Weijie Chen
:
MetaFBP: Learning to Learn High-Order Predictor for Personalized Facial Beauty Prediction. 6072-6080 - Yayue Deng
, Jinlong Xue
, Fengping Wang
, Yingming Gao
, Ya Li
:
CMCU-CSS: Enhancing Naturalness via Commonsense-based Multi-modal Context Understanding in Conversational Speech Synthesis. 6081-6089 - Sidharth Anand
, Naresh Kumar Devulapally
, Sreyasee Das Bhattacharjee
, Junsong Yuan
:
Multi-label Emotion Analysis in Conversation via Multimodal Knowledge Distillation. 6090-6100 - Zhihe Zhao, Dongdong Weng, Hanzhi Guo, Jing Hou, Jixiang Zhou:
Facial Auto Rigging from 4D Expressions via Skinning Decomposition. 6101-6109 - Licai Sun
, Zheng Lian
, Bin Liu
, Jianhua Tao
:
MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition. 6110-6121 - Yucheng Liu
, Ziyu Jia
, Haichao Wang
:
EmotionKD: A Cross-Modal Knowledge Distillation Framework for Emotion Recognition Based on Physiological Signals. 6122-6131 - Zaijing Li
, Ting-En Lin
, Yuchuan Wu
, Meng Liu
, Fengxiao Tang
, Ming Zhao
, Yongbin Li
:
UniSA: Unified Generative Framework for Sentiment Analysis. 6132-6142 - Yi Wu
, Shangfei Wang
, Yanan Chang
:
Patch-Aware Representation Learning for Facial Expression Recognition. 6143-6151 - Hao Yu
, Danielle A. Allessio
, Will Lee
, William Rebelsky
, Frank Sylvia
, Tom Murray
, John J. Magee
, Ivon Arroyo
, Beverly P. Woolf
, Sarah Adel Bargal
, Margrit Betke
:
COVES: A Cognitive-Affective Deep Model that Personalizes Math Problem Difficulty in Real Time and Improves Student Engagement with an Online Tutor. 6152-6160 - Ravikiran Parameshwara
, Ibrahim Radwan
, Akshay Asthana
, Iman Abbasnejad
, Ramanathan Subramanian
, Roland Goecke
:
Efficient Labelling of Affective Video Datasets via Few-Shot & Multi-Task Contrastive Learning. 6161-6170
Poster Session V: Engaging Users with Multimedia -- Multimedia Search and Recommendation
- Tianyu Chang
, Xun Yang
, Xin Luo
, Wei Ji
, Meng Wang
:
Learning Style-Invariant Robust Representation for Generalizable Visual Instance Retrieval. 6171-6180 - Xiyue Gao
, Zhuoqi Ma
, Jiangtao Cui
, Xiaofang Xia
, Cai Xu
:
Hierarchical Category-Enhanced Prototype Learning for Imbalanced Temporal Recommendation. 6181-6189 - Zhongxuan Han
, Chaochao Chen
, Xiaolin Zheng
, Weiming Liu
, Jun Wang
, Wenjie Cheng
, Yuyuan Li
:
In-processing User Constrained Dominant Sets for User-Oriented Fairness in Recommender Systems. 6190-6201 - Shuanglin Yan
, Neng Dong
, Jun Liu
, Liyan Zhang
, Jinhui Tang
:
Learning Comprehensive Representations with Richer Self for Text-to-Image Person Re-Identification. 6202-6211 - Huafeng Liu
, Mingjie Zhou
, Liping Jing
, Michael K. Ng
:
Doubly Intention Learning for Cold-start Recommendation with Uncertainty-aware Stochastic Meta Process. 6212-6222 - Hongru Liang
, Jingyao Liu
, Yuanxin Xiang
, Jiachen Du
, Lanjun Zhou
, Shushen Pan
, Wenqiang Lei
:
DiVa: An Iterative Framework to Harvest More Diverse and Valid Labels from User Comments for Music. 6223-6233 - Zhenghong Lin
, Yanchao Tan
, Yunfei Zhan
, Weiming Liu
, Fan Wang
, Chaochao Chen
, Shiping Wang
, Carl Yang
:
Contrastive Intra- and Inter-Modality Generation for Enhancing Incomplete Multimedia Recommendation. 6234-6242 - Weiming Liu
, Xiaolin Zheng
, Chaochao Chen
, Mengling Hu
, Xinting Liao
, Fan Wang
, Yanchao Tan
, Dan Meng
, Jun Wang
:
Differentially Private Sparse Mapping for Privacy-Preserving Cross Domain Recommendation. 6243-6252 - Zexian Yang
, Dayan Wu
, Wanqian Zhang
, Bo Li
, Weiping Wang
:
Handling Label Uncertainty for Camera Incremental Person Re-Identification. 6253-6263 - Wenhui Li
, Xinqi Su
, Dan Song
, Lanjun Wang
, Kun Zhang
, An-An Liu
:
Towards Deconfounded Image-Text Matching with Causal Inference. 6264-6273 - Yu Shang
, Chen Gao
, Jiansheng Chen
, Depeng Jin
, Huimin Ma
, Yong Li
:
Enhancing Adversarial Robustness of Multi-modal Recommendation via Modality Balancing. 6274-6282 - Yang Zhang, Songhe Feng
:
Enhancing Domain-Invariant Parts for Generalized Zero-Shot Learning. 6283-6291 - Shenshen Li
, Xing Xu
, Yang Yang
, Fumin Shen
, Yijun Mo
, Yujie Li
, Heng Tao Shen
:
DCEL: Deep Cross-modal Evidential Learning for Text-Based Person Retrieval. 6292-6300 - Ying Li
, Chunming Guan
, Rui Cai
, Ye Erwan
, Ding Yuxiang
, Jiaquan Gao
:
Tran-GCN: Multi-label Pattern Image Retrieval via Transformer Driven Graph Convolutional Network. 6301-6310 - Ziqi Zhou
, Shengshan Hu
, Minghui Li
, Hangtao Zhang
, Yechao Zhang
, Hai Jin
:
AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning. 6311-6320 - Jiajie Su
, Chaochao Chen
, Zibin Lin
, Xi Li
, Weiming Liu
, Xiaolin Zheng
:
Personalized Behavior-Aware Transformer for Multi-Behavior Sequential Recommendation. 6321-6331 - Jinhu Lu
, Guohao Sun
, Xiu Fang
, Jian Yang
, Wei He
:
A Contrastive Learning Framework for Dual-Target Cross-Domain Recommendation. 6332-6339 - Chong-Yu Zhang
, Xin Luo
, Yu-Wei Zhan
, Peng-Fei Zhang
, Zhen-Duo Chen
, Yongxin Wang
, Xun Yang
, Xin-Shun Xu
:
Self-Distillation Dual-Memory Online Hashing with Hash Centers for Streaming Data Retrieval. 6340-6349 - Zhenpeng Song
, Qinliang Su
, Jiayang Chen
:
Unsupervised Hashing with Contrastive Learning by Exploiting Similarity Knowledge and Hidden Structure of Data. 6350-6358 - Xinfeng Dong
, Longfei Han
, Dingwen Zhang
, Li Liu
, Junwei Han
, Huaxiang Zhang
:
Giving Text More Imagination Space for Image-text Matching. 6359-6368 - Wei Yang
, Zhengru Fang
, Tianle Zhang
, Shiguang Wu
, Chi Lu
:
Modal-aware Bias Constrained Contrastive Learning for Multimodal Recommendation. 6369-6378 - Wenshuo Zhao
, Jingkuan Song
, Shengming Yuan
, Lianli Gao
, Yang Yang
, Hengtao Shen
:
Precise Target-Oriented Attack against Deep Hashing-based Retrieval. 6379-6389 - Hao Wei
, Shuhui Wang
, Zhe Xue
, Shengbo Chen
, Qingming Huang
:
Conversational Composed Retrieval with Iterative Sequence Refinement. 6390-6399 - Yulu Wang
, Pengwen Dai
, Xiaojun Jia
, Zhitao Zeng
, Rui Li
, Xiaochun Cao
:
Hi-SIGIR: Hierachical Semantic-Guided Image-to-image Retrieval via Scene Graph. 6400-6409 - Shanshan Huang
, Haoxuan Li
, Qingsong Li
, Chunyuan Zheng
, Li Liu
:
Pareto Invariant Representation Learning for Multimedia Recommendation. 6410-6419 - Jiaguo Yu
, Yuming Shen
, Haofeng Zhang
:
Hashing One With All. 6420-6431 - Aozhu Chen
, Ziyuan Wang
, Chengbo Dong
, Kaibin Tian
, Ruixiang Zhao
, Xun Liang
, Zhanhui Kang
, Xirong Li
:
ChinaOpen: A Dataset for Open-world Multimodal Learning. 6432-6440 - Panwen Hu
, Nan Xiao
, Feifei Li
, Yongquan Chen
, Rui Huang
:
A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model. 6441-6450 - Jianyang Zhai
, Xiawu Zheng
, Chang-Dong Wang
, Hui Li
, Yonghong Tian
:
Knowledge Prompt-tuning for Sequential Recommendation. 6451-6461 - Wenfeng Liu
, Xudong Wang
, Lei Tan
, Yan Zhang
, Pingyang Dai
, Yongjian Wu
, Rongrong Ji
:
Learning Occlusion Disentanglement with Fine-grained Localization for Occluded Person Re-identification. 6462-6471 - He Zhang
, Ying Sun
, Weiyu Guo
, Yafei Liu
, Haonan Lu
, Xiaodong Lin
, Hui Xiong
:
Interactive Interior Design Recommendation via Coarse-to-fine Multimodal Reinforcement Learning. 6472-6480 - Tinghui Zhu
, Jingping Liu
, Jiaqing Liang
, Haiyun Jiang
, Yanghua Xiao
, Zongyu Wang
, Rui Xie
, Yunsen Xian
:
Towards Visual Taxonomy Expansion. 6481-6490 - Wenzhe Du
, Haoyang Su
, Cam-Tu Nguyen
, Jian Sun
:
Enhancing Product Representation with Multi-form Interactions for Multimodal Conversational Recommendation. 6491-6500 - Yuan Sun
, Dezhong Peng
, Jian Dai
, Zhenwen Ren
:
Stepwise Refinement Short Hashing for Image Retrieval. 6501-6509 - Rui Yang
, Shuang Wang
, Huan Zhang
, Siyuan Xu
, Yanhe Guo
, Xiutiao Ye
, Biao Hou
, Licheng Jiao
:
Knowledge Decomposition and Replay: A Novel Cross-modal Image-Text Retrieval Continual Learning Method. 6510-6519 - Haiyang Xie, Zhengwei Yang
, Huilin Zhu
, Zheng Wang:
Striking a Balance: Unsupervised Cross-Domain Crowd Counting via Knowledge Diffusion. 6520-6529 - Hongzu Su
, Jingjing Li
, Fengling Li
, Lei Zhu
, Ke Lu
, Yang Yang
:
Task-Adversarial Adaptation for Multi-modal Recommendation. 6530-6538 - Zezhong Lv
, Bing Su
, Ji-Rong Wen
:
Counterfactual Cross-modality Reasoning for Weakly Supervised Video Moment Localization. 6539-6547 - Jinpeng Wang
, Ziyun Zeng
, Yunxiao Wang
, Yuting Wang, Xingyu Lu, Tianxiang Li, Jun Yuan, Rui Zhang
, Hai-Tao Zheng
, Shu-Tao Xia
:
MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation. 6548-6557 - Xin Lu
, Shikun Chen
, Yichao Cao
, Xin Zhou
, Xiaobo Lu
:
Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval. 6558-6566 - Fan Liu
, Huilin Chen
, Zhiyong Cheng
, Liqiang Nie
, Mohan S. Kankanhalli
:
Semantic-Guided Feature Distillation for Multimodal Recommendation. 6567-6575 - Penghang Yu
, Zhiyi Tan
, Guanming Lu
, Bing-Kun Bao
:
Multi-View Graph Convolutional Network for Multimedia Recommendation. 6576-6585
Poster Session V: Engaging Users with Multimedia -- Summarization, Analytics, and Storytelling
- Yutong Wang
, Hongteng Xu
, Dixin Luo
:
Self-supervised Video Summarization Guided by Semantic Inverse Optimal Transport. 6611-6622 - Zhuo Zhou
, Wenxuan Liu
, Danni Xu
, Zheng Wang
, Jian Zhao
:
Uncovering the Unseen: Discover Hidden Intentions by Micro-Behavior Graph Reasoning. 6623-6633 - Jingqiu Li
, Lanjun Wang
, Jianlin He
, Yongdong Zhang
, Anan Liu
:
Improving Rumor Detection by Class-based Adversarial Domain Adaptation. 6634-6642 - Peggy Tang
, Kun Hu
, Lei Zhang
, Junbin Gao
, Jiebo Luo
, Zhiyong Wang
:
TopicCAT: Unsupervised Topic-Guided Co-Attention Transformer for Extreme Multimodal Summarisation. 6643-6652 - Tao Yang
, Fan Wang
, Junfan Lin
, Zhongang Qi
, Yang Wu
, Jing Xu
, Ying Shan
, Changwen Chen
:
Toward Human Perception-Centric Video Thumbnail Generation. 6653-6664
Poster Session VI: Engaging Users with Multimedia -- Interactions and Quality of Experience
- Huimin Zeng
, Weinong Wang
, Xin Tao
, Zhiwei Xiong
, Yu-Wing Tai
, Wenjie Pei
:
Feature Decoupling-Recycling Network for Fast Interactive Segmentation. 6665-6675 - Shima Mohammadi
, João Ascenso
:
Predictive Sampling for Efficient Pairwise Subjective Image Quality Assessment. 6676-6684 - Quan Wang
, Yanli Ren
, Xinpeng Zhang
, Guorui Feng
:
Interactive Image Style Transfer Guided by Graffiti. 6685-6694 - Hongbo Liu
, Mingda Wu
, Kun Yuan
, Ming Sun
, Yansong Tang
, Chuanchuan Zheng
, Xing Wen
, Xiu Li
:
Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment. 6695-6704 - Lanjun Wang
, Xinran Qiao
, Yanwei Xie
, Weizhi Nie
, Yongdong Zhang
, Anan Liu
:
My Brother Helps Me: Node Injection Based Adversarial Attack on Social Bot Detection. 6705-6714 - Michela Testolina
, Davi Lazzarotto
, Rafael Rodrigues
, Shima Mohammadi
, João Ascenso
, António M. G. Pinheiro
, Touradj Ebrahimi
:
On the Performance of Subjective Visual Quality Assessment Protocols for Nearly Visually Lossless Image Compression. 6715-6723 - Zan Gao
, Xinglei Cui
, Yibo Zhao
, Tao Zhuo
, Weili Guan
, Meng Wang
:
A Novel Temporal Channel Enhancement and Contextual Excavation Network for Temporal Action Localization. 6724-6733 - Jin Liu
, Xi Wang
, Xiaomeng Fu
, Yesheng Chai
, Cai Yu
, Jiao Dai
, Jizhong Han
:
MFR-Net: Multi-faceted Responsive Listening Head Generation via Denoising Diffusion Model. 6734-6743 - Songlin Tang
, Wenjie Pei
, Xin Tao
, Tanghui Jia
, Guangming Lu
, Yu-Wing Tai
:
Scene-Generalizable Interactive Segmentation of Radiance Fields. 6744-6755 - Zhonghao Lin
, Haihan Duan
, Jiaye Li
, Xinyao Sun
, Wei Cai
:
MetaCast: A Self-Driven Metaverse Announcer Architecture Based on Quality of Experience Evaluation Model. 6756-6764 - Wuyuan Xie
, Shukang Wang
, Rong Zhang
, Miaohui Wang
:
Visual Redundancy Removal of Composite Images via Multimodal Learning. 6765-6773 - Xiaoyu Ma
, Chenxi Feng
, Jiaojiao Wang
, Qiang Lin
, Suiyu Zhang
, Jinchi Zhu
, Xiaodiao Chen
, Chang Liu
, Dingguo Yu
:
A Model-Agnostic Semantic-Quality Compatible Framework based on Self-Supervised Semantic Decoupling. 6774-6784 - Wei Xie
, Haobo Jiang
, Shuo Gu
, Jin Xie:
Implicit Obstacle Map-driven Indoor Navigation Model for Robust Obstacle Avoidance. 6785-6793 - Hancheng Zhu
, Zhiwen Shao
, Yong Zhou
, Guangcheng Wang
, Pengfei Chen
, Leida Li
:
Personalized Image Aesthetics Assessment with Attribute-guided Fine-grained Feature Representation. 6794-6802 - Songtao Wang
, Xiaoqi Wang
, Hao Gao
, Jian Xiong
:
Non-Local Geometry and Color Gradient Aggregation Graph Model for No-Reference Point Cloud Quality Assessment. 6803-6810
Poster Session VII: Engaging Users with Multimedia -- Metaverse, Art and Culture
- Jionghao Wang
, Ziyu Chen
, Jun Ling
, Rong Xie
, Li Song
:
360-Degree Panorama Generation from Few Unregistered NFoV Images. 6811-6821 - Haozhe Wu
, Songtao Zhou
, Jia Jia
, Junliang Xing
, Qi Wen
, Xiang Wen
:
Speech-Driven 3D Face Animation with Composite and Regional Facial Movements. 6822-6830 - Huiguo He
, Tianfu Wang
, Huan Yang
, Jianlong Fu
, Nicholas Jing Yuan
, Jian Yin
, Hongyang Chao
, Qi Zhang
:
Learning Profitable NFT Image Diffusions via Multiple Visual-Policy Guided Reinforcement Learning. 6831-6840 - Chaohui Yu
, Qiang Zhou
, Jingliang Li
, Zhe Zhang
, Zhibin Wang
, Fan Wang
:
Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation. 6841-6850 - Ye Pan
, Ruisi Zhang
, Jingying Wang
, Yu Ding
, Kenny Mitchell
:
Real-time Facial Animation for 3D Stylized Character with Emotion Dynamics. 6851-6859 - Haibo Yang
, Yang Chen
, Yingwei Pan
, Ting Yao
, Zhineng Chen
, Tao Mei
:
3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models. 6860-6868 - Xiaolei Diao
, Daqian Shi
, Jian Li
, Lida Shi
, Mingzhe Yue
, Ruihua Qi
, Chuntao Li
, Hao Xu
:
Toward Zero-shot Character Recognition: A Gold Standard Dataset with Radical-level Annotations. 6869-6877 - Haibo Chen
, Lei Zhao
, Jun Li
, Jian Yang
:
TSSAT: Two-Stage Statistics-Aware Transformation for Artistic Style Transfer. 6878-6887 - Jian-Jun Qiao
, Jie Zhang
, Xiao Wu
, Yu-Pei Song
, Wei Li
:
CPNet: Cartoon Parsing with Pixel and Part Correlation. 6888-6897 - Liangchen Song
, Liangliang Cao
, Hongyu Xu
, Kai Kang
, Feng Tang
, Junsong Yuan
, Zhao Yang
:
RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture. 6898-6906 - Fengyuan Liu
, Lingyun Yu
, Hongtao Xie
, Chuanbin Liu
, Zhiguo Ding
, Quanwei Yang
, Yongdong Zhang
:
High Fidelity Face Swapping via Semantics Disentanglement and Structure Enhancement. 6907-6917 - Zihao Huang
, Min Shi
, Chengxin Liu
, Ke Xian
, Zhiguo Cao
:
SimHMR: A Simple Query-based Framework for Parameterized Human Mesh Reconstruction. 6918-6927 - Quan Wang
, Sheng Li
, Xinpeng Zhang
, Guorui Feng
:
Rethinking Neural Style Transfer: Generating Personalized and Watermarked Stylized Images. 6928-6937 - Xin Jin
, Wu Zhou
, Jinyu Wang
, Duo Xu
, Yongsen Zheng
:
An Order-Complexity Aesthetic Assessment Model for Aesthetic-aware Music Recommendation. 6938-6947 - Jianwei Hu
, Ningna Wang
, Baorong Yang
, Gang Chen
, Xiaohu Guo
, Bin Wang
:
S3DS: Self-supervised Learning of 3D Skeletons from Single View Images. 6948-6958 - Kun Cheng
, Mingrui Zhu
, Nannan Wang
, Guozhang Li
, Xiaoyu Wang
, Xinbo Gao
:
Controllable Face Sketch-Photo Synthesis with Flexible Generative Priors. 6959-6968 - Xiaomeng Fu
, Xi Wang
, Jin Liu
, Shuhui Wang
, Jiao Dai
, Jizhong Han
:
CoP: Chain-of-Pose for Image Animation in Large Pose Changes. 6969-6977 - Sebin Lee
, Daye Kim
, Jungjin Lee
:
The Effects of Viewing Formats and Song Genres on Audience Experiences in Virtual Avatar Concerts. 6978-6988 - Ray LC
, Sijia Liu
, Qiaosheng Lyu
:
IN/ACTive: A Distance-Technology-Mediated Stage for Performer-Audience Telepresence and Environmental Control. 6989-6997 - Ruizhao Chen
, Ye Pan
, Zhigang Deng
, Lili Wang
, Lizhuang Ma
:
Double Doodles: Sketching Animation in Immersive Environment With 3+6 DOFs Motion Gestures. 6998-7006 - Zhong Li
, Liangchen Song
, Zhang Chen
, Xiangyu Du
, Lele Chen
, Junsong Yuan
, Yi Xu
:
Relit-NeuLF: Efficient Relighting and Novel View Synthesis via Neural 4D Light Field. 7007-7016
Poster Session VIII: Engaging Users with Multimedia -- Multimedia Applications
- Junbao Zhuo
, Xingyu Zhao
, Shuhao Cui
, Qingming Huang
, Shuhui Wang
:
Adaptive Feature Swapping for Unsupervised Domain Adaptation. 7017-7028 - Xiaobo Shen
, Yinfan Chen
, Shirui Pan
, Weiwei Liu
, Yuhui Zheng
:
Graph Convolutional Incomplete Multi-modal Hashing. 7029-7037 - Ruitao Chen
, Guoyang Xie
, Jiaqi Liu
, Jinbao Wang
, Ziqi Luo
, Jinfan Wang
, Feng Zheng
:
EasyNet: An Easy Network for 3D Industrial Anomaly Detection. 7038-7046 - Yujuan Ding
, P. Y. Mok
, Yi Bin
, Xun Yang
, Zhiyong Cheng
:
Modeling Multi-Relational Connectivity for Personalized Fashion Matching. 7047-7055 - Guancheng Chen
, Xin Liu
, Xing Xu
, Yiu-Ming Cheung
, Taihao Li
:
Taking a Part for the Whole: An Archetype-agnostic Framework for Voice-Face Association. 7056-7064 - Kui Jiang
, Wenxuan Liu
, Zheng Wang
, Xian Zhong
, Junjun Jiang
, Chia-Wen Lin
:
DAWN: Direction-aware Attention Wavelet Network for Image Deraining. 7065-7074 - Honggu Liu
, Xiaodan Li
, Wenbo Zhou
, Han Fang
, Paolo Bestagini
, Weiming Zhang
, Yuefeng Chen
, Stefano Tubaro
, Nenghai Yu
, Yuan He
, Hui Xue
:
BiFPro: A Bidirectional Facial-data Protection Framework against DeepFake. 7075-7084 - De Cheng
, Xiaojian Huang
, Nannan Wang
, Lingfeng He
, Zhihui Li
, Xinbo Gao
:
Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement. 7085-7093 - Jinkang Guo
, Zhibo Wan
, Zhihan Lv
:
Digital Twins Fuzzy System Based on Time Series Forecasting Model LFTformer. 7094-7100 - Harry Cheng
, Yangyang Guo
, Liqiang Nie
, Zhiyong Cheng
, Mohan S. Kankanhalli
:
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration. 7101-7110 - Pan Mu
, Hanning Xu
, Zheyuan Liu
, Zheng Wang
, Sixian Chan
, Cong Bai
:
A Generalized Physical-knowledge-guided Dynamic Model for Underwater Image Enhancement. 7111-7120 - Yubin Wang
, Huimin Yu
, Yuming Yan
, Shuyi Song
, Biyang Liu
, Yichong Lu
:
Exploring Shape Embedding for Cloth-Changing Person Re-Identification via 2D-3D Correspondences. 7121-7130 - Chao Shuai
, Jieming Zhong
, Shuang Wu
, Feng Lin
, Zhibo Wang
, Zhongjie Ba
, Zhenguang Liu
, Lorenzo Cavallaro
, Kui Ren
:
Locate and Verify: A Two-Stream Network for Improved Deepfake Detection. 7131-7142 - Yinyin Peng
, Donghui Hu
, Yaofei Wang
, Kejiang Chen
, Gang Pei
, Weiming Zhang
:
StegaDDPM: Generative Image Steganography based on Denoising Diffusion Probabilistic Model. 7143-7151 - Jianyang Shi
, Haijun Zhang
, Dongliang Zhou
, Zhao Zhang
:
Toward Intelligent Interactive Design: A Generation Framework Based on Cross-domain Fashion Elements. 7152-7163 - Danni Yang
, Jiayi Ji
, Xiaoshuai Sun
, Haowei Wang
, Yinan Li
, Yiwei Ma
, Rongrong Ji
:
Semi-Supervised Panoptic Narrative Grounding. 7164-7174 - Jiahang Zhang
, Lilang Lin
, Jiaying Liu
:
Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning. 7175-7183 - Shuting Dong
, Zhe Wu
, Feng Lu
, Chun Yuan
:
Enhanced Image Deblurring: An Efficient Frequency Exploitation and Preservation Network. 7184-7193 - Han Yan
, Haijun Zhang
, Jie Hou
, Jicong Fan
, Zhao Zhang
:
InspirNET: An Unsupervised Generative Adversarial Network with Controllable Fine-grained Texture Disentanglement for Fashion Generation. 7194-7204 - Yiwen Xu
, Ruoyu Guo
, Maurice Pagnucco
, Yang Song
:
Draw2Edit: Mask-Free Sketch-Guided Image Manipulation. 7205-7215 - Hongtao Wu
, Yijun Yang
, Haoyu Chen
, Jingjing Ren
, Lei Zhu
:
Mask-Guided Progressive Network for Joint Raindrop and Rain Streak Removal in Videos. 7216-7225 - Ce Wang
, Kun Shang
, Haimiao Zhang
, Shang Zhao, Dong Liang
, S. Kevin Zhou
:
Active CT Reconstruction with a Learned Sampling Policy. 7226-7235 - Yifan Gao
, Jinpeng Lin
, Min Zhou
, Chuanbin Liu
, Hongtao Xie
, Tiezheng Ge
, Yuning Jiang
:
TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design. 7236-7246 - Chang Liu
, Lichen Wang
, Yun Fu
:
Rethinking Neighborhood Consistency Learning on Unsupervised Domain Adaptation. 7247-7254 - Zijun Deng
, Xiangteng He
, Yuxin Peng
, Xiongwei Zhu
, Lele Cheng
:
MV-Diffusion: Motion-aware Video Diffusion Model. 7255-7263 - Yanglin Feng
, Hongyuan Zhu
, Dezhong Peng
, Xi Peng
, Peng Hu
:
ROAD: Robust Unsupervised Domain Adaptation with Noisy Labels. 7264-7273 - Gaozhi Liu
, Yichao Si
, Zhenxing Qian
, Xinpeng Zhang
, Sheng Li
, Wanli Peng
:
WRAP: Watermarking Approach Robust Against Film-coating upon Printed Photographs. 7274-7282 - Baolong Liu
, Tianyi Zheng
, Peng Zheng
, Daizong Liu
, Xiaoye Qu
, Junyu Gao
, Jianfeng Dong
, Xun Wang
:
Lite-MKD: A Multi-modal Knowledge Distillation Framework for Lightweight Few-shot Action Recognition. 7283-7294 - Zijun Deng
, Xiangteng He
, Yuxin Peng
:
Efficiency-optimized Video Diffusion Models. 7295-7303 - Gehui Li
, Jinyuan Liu
, Long Ma
, Zhiying Jiang
, Xin Fan
, Risheng Liu
:
Fearless Luminance Adaptation: A Macro-Micro-Hierarchical Transformer for Exposure Correction. 7304-7313 - Zengxi Zhang
, Zhiying Jiang
, Jinyuan Liu
, Xin Fan
, Risheng Liu
:
WaterFlow: Heuristic Normalizing Flow for Underwater Image Enhancement and Beyond. 7314-7323 - Jinpu Zhang
, Ziwen Li
, Ruonan Wei
, Yuehuan Wang
:
Progressive Domain-style Translation for Nighttime Tracking. 7324-7334 - Guangyuan Li
, Wei Xing
, Lei Zhao
, Zehua Lan
, Zhanjie Zhang
, Jiakai Sun
, Haolin Yin
, Huaizhong Lin
, Zhijie Lin
:
DuDoINet: Dual-Domain Implicit Network for Multi-Modality MR Image Arbitrary-scale Super-Resolution. 7335-7344 - Han Fang
, Kejiang Chen
, Yupeng Qiu
, Jiayang Liu
, Ke Xu
, Chengfang Fang
, Weiming Zhang
, Ee-Chien Chang
:
DeNoL: A Few-Shot-Sample-Based Decoupling Noise Layer for Cross-channel Watermarking Robustness. 7345-7353 - Luojun Lin
, Zhifeng Shen
, Zhishu Sun
, Yuanlong Yu
, Lei Zhang
, Weijie Chen
:
Parameter Exchange for Robust Dynamic Domain Generalization. 7354-7362 - Zichun Wang
, Yulun Zhang
, Debing Zhang
, Ying Fu
:
Recurrent Self-Supervised Video Denoising with Denser Receptive Field. 7363-7372 - Junxue Yang
, Xin Liao
:
Exploiting Fine-Grained DCT Representations for Hiding Image-Level Messages within JPEG Images. 7373-7382 - Qingshan Hou
, Peng Cao
, Jiaqi Wang
, Xiaoli Liu
, Jinzhu Yang
, Osmar R. Zaïane
:
A Reference-free Self-supervised Domain Adaptation Framework for Low-quality Fundus Image Enhancement. 7383-7393 - Wei Wan
, Shengshan Hu
, Minghui Li
, Jianrong Lu
, Longling Zhang
, Leo Yu Zhang
, Hai Jin
:
A Four-Pronged Defense Against Byzantine Attacks in Federated Learning. 7394-7402 - Boyang Wang
, Yan Wang
, Qing Zhao
, Junxiong Lin
, Zeng Tao
, Pinxue Guo
, Zhaoyu Chen
, Kaixun Jiang
, Shaoqi Yan
, Shuyong Gao
, Wenqiang Zhang
:
A Capture to Registration Framework for Realistic Image Super-Resolution in the Industry Environment. 7403-7412 - Xinhao Deng
, Pingping Zhang, Wei Liu
, Huchuan Lu
:
Recurrent Multi-scale Transformer for High-Resolution Salient Object Detection. 7413-7423 - Yanru He
, Kejiang Chen
, Guoqiang Chen
, Zehua Ma
, Kui Zhang
, Jie Zhang
, Huanyu Bian
, Han Fang
, Weiming Zhang
, Nenghai Yu
:
ProTegO: Protect Text Content against OCR Extraction Attack. 7424-7434 - Chenxi Wang
, Hongjun Wu
, Zhi Jin
:
FourLLIE: Boosting Low-Light Image Enhancement by Fourier Frequency Information. 7459-7469 - Teng Hu
, Ran Yi
, Haokun Zhu
, Liang Liu
, Jinlong Peng
, Yabiao Wang
, Chengjie Wang
, Lizhuang Ma
:
Stroke-based Neural Painting and Stylization with Dynamically Predicted Painting Region. 7470-7480 - Lingyi Hong
, Wei Zhang
, Shuyong Gao
, Hong Lu
, Wenqiang Zhang
:
SimulFlow: Simultaneously Extracting Feature and Identifying Target for Unsupervised Video Object Segmentation. 7481-7490 - Yan Zhu, Junbao Zhuo, Bin Ma, Jiajia Geng, Xiaoming Wei, Xiaolin Wei, Shuhui Wang:
Orthogonal Temporal Interpolation for Zero-Shot Video Recognition. 7491-7501 - Jingcan Duan
, Pei Zhang
, Siwei Wang
, Jingtao Hu
, Hu Jin
, Jiaxin Zhang
, Haifang Zhou
, Xinwang Liu
:
Normality Learning-based Graph Anomaly Detection via Multi-Scale Contrastive Learning. 7502-7511 - Kangkang Zhou
, Lijun Zhang
, Feng Lu
, Xiang-Dong Zhou
, Yu Shi
:
Efficient Hierarchical Multi-view Fusion Transformer for 3D Human Pose Estimation. 7512-7520 - Jiawei Wang
, Zhanchang Ma
, Da Cao
, Yuquan Le
, Junbin Xiao
, Tat-Seng Chua
:
Deconfounded Multimodal Learning for Spatio-temporal Video Grounding. 7521-7529 - Minyi Zhao
, Shijie Xuyang
, Jihong Guan
, Shuigeng Zhou
:
STIRER: A Unified Model for Low-Resolution Scene Text Image Recovery and Recognition. 7530-7539 - Jingwen Chen
, Yingwei Pan
, Ting Yao
, Tao Mei
:
ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors. 7540-7548 - Guangzhao Dai
, Xiangbo Shu
, Rui Yan
, Peng Huang
, Jinhui Tang
:
Slowfast Diversity-aware Prototype Learning for Egocentric Action Recognition. 7549-7558 - Jiacheng Deng
, Li Dong
, Jiahao Chen
, Diqun Yan
, Rangding Wang
, Dengpan Ye
, Lingchen Zhao
, Jinyu Tian
:
Universal Defensive Underpainting Patch: Making Your Text Invisible to Optical Character Recognition. 7559-7568 - Zhiqing Hong
, Chenye Cui
, Rongjie Huang
, Lichao Zhang
, Jinglin Liu
, Jinzheng He
, Zhou Zhao
:
UniSinger: Unified End-to-End Singing Voice Synthesis With Cross-Modality Information Matching. 7569-7579 - Yuqi Jiang
, Chune Zhang
, Shuo Jin
, Jiao Liu
, Jiapeng Wang
:
CLG-INet: Coupled Local-Global Interactive Network for Image Restoration. 7580-7589 - Chen Liu
, Peike Patrick Li
, Xingqun Qi
, Hu Zhang
, Lincheng Li
, Dadong Wang
, Xin Yu
:
Audio-Visual Segmentation by Exploring Cross-Modal Mutual Semantics. 7590-7598 - Junhong Gou
, Siyu Sun
, Jianfu Zhang
, Jianlou Si
, Chen Qian
, Liqing Zhang
:
Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow. 7599-7607 - Xian Wang
, Xiaoyu Mo
, Lik-Hang Lee
, Xiaoying Wei
, Xiaofu Jin
, Mingming Fan
, Pan Hui
:
Designing Loving-Kindness Meditation in Virtual Reality for Long-Distance Romantic Relationships. 7608-7617 - Wei Jiang
, Jiayu Yang
, Yongqi Zhai
, Peirong Ning
, Feng Gao
, Ronggang Wang
:
MLIC: Multi-Reference Entropy Model for Learned Image Compression. 7618-7627 - Mingrui Zhang
, Ming Chen
, Yan Zhou
, Li Chen
, Weihua Jian
, Pengfei Wan
:
Automatic Human Scene Interaction through Contact Estimation and Motion Adaptation. 7628-7637 - Gang Li
, Xianzheng Ma
, Zhao Wang
, Hao Li
, Qifei Zhang
, Chao Wu
:
When Masked Image Modeling Meets Source-free Unsupervised Domain Adaptation: Dual-Level Masked Network for Semantic Segmentation. 7638-7647 - Chuanming Wang
, Huiyuan Fu
, Huadong Ma
:
Multi-Part Token Transformer with Dual Contrastive Learning for Fine-grained Image Classification. 7648-7656 - Junwei Zhao
, Jianming Ye
, Shiliang Zhang
, Zhaofei Yu
, Tiejun Huang
:
Recognizing High-Speed Moving Objects with Spike Camera. 7657-7665 - Meiqi Cao
, Rui Yan
, Xiangbo Shu
, Jiachao Zhang
, Jinpeng Wang
, Guo-Sen Xie
:
MUP: Multi-granularity Unified Perception for Panoramic Activity Recognition. 7666-7675 - Hongyuan Wang
, Lizhi Wang
, Chang Chen
, Xue Hu
, Fenglong Song
, Hua Huang
:
Learning Spectral-wise Correlation for Spectral Super-Resolution: Where Similarity Meets Particularity. 7676-7685 - Kun Yang
, Dingkang Yang
, Jingyu Zhang
, Hanqi Wang
, Peng Sun
, Liang Song
:
What2comm: Towards Communication-efficient Collaborative Perception via Feature Decoupling. 7686-7695 - Hongjie Zhang
, Yi Liu
, Yali Wang
, Limin Wang
, Yu Qiao
:
Learning Discriminative Feature Representation for Open Set Action Recognition. 7696-7705 - Shenghai Yuan
, Jijia Chen
, Jiaqi Li
, Wenchao Jiang
, Song Guo
:
LHNet: A Low-cost Hybrid Network for Single Image Dehazing. 7706-7717 - Songlin Yang
, Wei Wang
, Jun Ling
, Bo Peng
, Xu Tan
, Jing Dong
:
Context-Aware Talking-Head Video Editing. 7718-7727 - Euihyeok Lee
, Chulhong Min
, Jaeseung Lee
, Jin Yu
, Seungwoo Kang
:
GrooveMeter: Enabling Music Engagement-aware Apps by Detecting Reactions to Daily Music Listening via Earable Sensing. 7728-7736 - Xianghao Zang
, Wei Gao
, Ge Li
, Han Fang
, Chao Ban
, Zhongjiang He
, Hao Sun
:
A Baseline Investigation: Transformer-based Cross-view Baseline for Text-based Person Search. 7737-7746 - Pengyuan Lyu
, Weihong Ma
, Hongyi Wang
, Yuechen Yu
, Chengquan Zhang
, Kun Yao
, Yang Xue
, Jingdong Wang
:
GridFormer: Towards Accurate Table Structure Recognition via Grid Prediction. 7747-7757 - Yingchi Liu
, Zhu Liu
, Long Ma
, Jinyuan Liu
, Xin Fan
, Zhongxuan Luo
, Risheng Liu
:
Bilevel Generative Learning for Low-Light Vision. 7758-7766 - Maizhen Ning
, Qiu-Feng Wang
, Kaizhu Huang
, Xiaowei Huang
:
A Symbolic Characters Aware Model for Solving Geometry Problems. 7767-7775 - Haoyu Wang
, Haozhe Wu
, Junliang Xing
, Jia Jia
:
Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space. 7776-7784 - Xinmin Qiu
, Congying Han
, Zicheng Zhang
, Bonan Li
, Tiande Guo
, Xuecheng Nie
:
DiffBFR: Bootstrapping Diffusion Model for Blind Face Restoration. 7785-7795 - Mingrui Lao
, Nan Pu
, Zhun Zhong
, Nicu Sebe
, Michael S. Lew
:
FedVQA: Personalized Federated Visual Question Answering over Heterogeneous Scenes. 7796-7807 - Guangyao Li
, Wenxuan Hou
, Di Hu
:
Progressive Spatio-temporal Perception for Audio-Visual Question Answering. 7808-7816 - Yusheng Guo
, Nan Zhong
, Zhenxing Qian
, Xinpeng Zhang
:
Physical Invisible Backdoor Based on Camera Imaging. 7817-7825 - Zhihao Li
, Kexue Fu
, Haoran Wang
, Manning Wang
:
PI-NeRF: A Partial-Invertible Neural Radiance Fields for Pose Estimation. 7826-7836 - Mengping Yang
, Zhe Wang
, Wenyi Feng
, Qian Zhang, Ting Xiao
:
Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation. 7837-7848 - Zhicong Zheng
, Xinfeng Li
, Chen Yan
, Xiaoyu Ji
, Wenyuan Xu
:
The Silent Manipulator: A Practical and Inaudible Backdoor Attack against Speech Recognition Systems. 7849-7858 - Joo Chan Lee
, Daniel Rho
, Jong Hwan Ko
, Eunbyung Park
:
FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos. 7859-7870 - Hongming Luo
, Fei Zhou
, Zehong Zhou
, Kin-Man Lam
, Guoping Qiu
:
Restoration of Multiple Image Distortions using a Semi-dynamic Deep Neural Network. 7871-7880 - Dongliang Zhou
, Haijun Zhang
, Jianghong Ma
, Jicong Fan
, Zhao Zhang
:
FCBoost-Net: A Generative Network for Synthesizing Multiple Collocated Outfits via Fashion Compatibility Boosting. 7881-7889 - Fanda Fan
, Chaoxu Guo
, Litong Gong
, Biao Wang
, Tiezheng Ge
, Yuning Jiang
, Chunjie Luo
, Jianfeng Zhan
:
Hierarchical Masked 3D Diffusion Model for Video Outpainting. 7890-7900 - Runhua Jiang
, Yongge Liu
, Boyuan Zhang
, Xu Chen
, Deng Li
, Yahong Han
:
OraclePoints: A Hybrid Neural Representation for Oracle Character. 7901-7911 - Yuke Li
, Jingkuan Song
, Hao Ni
, Heng Tao Shen
:
Style-Controllable Generalized Person Re-identification. 7912-7921 - Hengchang Guo
, Qilong Zhang
, Junwei Luo
, Feng Guo
, Wenbin Zhang
, Xiaodong Su
, Minglei Li
:
Practical Deep Dispersed Watermarking with Synchronization and Fusion. 7922-7932 - Wenxue Cui
, Xingtao Wang
, Xiaopeng Fan
, Shaohui Liu
, Chen Ma
, Debin Zhao
:
G2-DUN: Gradient Guided Deep Unfolding Network for Image Compressive Sensing. 7933-7942 - Zicong Luo
, Sheng Li
, Guobiao Li
, Zhenxing Qian
, Xinpeng Zhang:
Securing Fixed Neural Network Steganography. 7943-7951 - Yong Liu, Hang Dong, Boyang Liang, Songwei Liu, Qingji Dong, Kai Chen, Fangmin Chen, Lean Fu, Fei Wang:
Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for Super-Resolution. 7952-7960 - Kuan Tian
, Yonghang Guan
, Jinxi Xiang
, Jun Zhang
, Xiao Han
, Wei Yang
:
Towards Real-Time Neural Video Codec for Cross-Platform Application Using Calibration Information. 7961-7970 - Chen Tang
, Kai Ouyang
, Zenghao Chai
, Yunpeng Bai
, Yuan Meng
, Zhi Wang
, Wenwu Zhu
:
SEAM: Searching Transferable Mixed-Precision Quantization Policy through Large Margin Regularization. 7971-7980 - Guangyuan Li
, Wei Xing
, Lei Zhao
, Zehua Lan
, Jiakai Sun
, Zhanjie Zhang
, Quanwei Zhang
, Huaizhong Lin
, Zhijie Lin
:
Self-Reference Image Super-Resolution via Pre-trained Diffusion Large Model and Window Adjustable Transformer. 7981-7992 - Shuting Xia
, Tingyu Fan
, Yiling Xu
, Jenq-Neng Hwang
, Zhu Li
:
Learning Dynamic Point Cloud Compression via Hierarchical Inter-frame Block Matching. 7993-8003 - Bingchen Gong
, Yuehao Wang
, Xiaoguang Han
, Qi Dou
:
RecolorNeRF: Layer Decomposed Radiance Fields for Efficient Color Editing of 3D Scenes. 8004-8015 - Jianwen Sun
, Fenghua Yu
, Sannyuya Liu
, Yawei Luo
, Ruxia Liang
, Xiaoxuan Shen
:
Adversarial Bootstrapped Question Representation Learning for Knowledge Tracing. 8016-8025 - Hongwei Ren
, Yue Zhou
, Haotian Fu
, Yulong Huang
, Renjing Xu
, Bojun Cheng
:
TTPOINT: A Tensorized Point Cloud Network for Lightweight Action Recognition with Event Cameras. 8026-8034 - Kun Pan
, Yifang Yin
, Yao Wei
, Feng Lin
, Zhongjie Ba
, Zhenguang Liu
, Zhibo Wang
, Lorenzo Cavallaro
, Kui Ren
:
DFIL: Deepfake Incremental Learning by Exploiting Domain-invariant Forgery Clues. 8035-8046 - Lei Liu
, Zhihao Hu
, Zhenghao Chen
, Dong Xu
:
ICMH-Net: Neural Image Compression Towards both Machine Vision and Human Vision. 8047-8056 - Meng Li
, Yibo Shi
, Jing Wang
, Yunqi Huang
:
High Visual-Fidelity Learned Video Compression. 8057-8066 - Guanghui Zhang
, Ke Liu
, Mengbai Xiao
, Bingshu Wang
, Vaneet Aggarwal
:
An Intelligent Learning Approach to Achieve Near-Second Low-Latency Live Video Streaming under Highly Fluctuating Networks. 8067-8075 - Weisong Zhao
, Xiangyu Zhu
, Zhixiang He
, Xiaoyu Zhang
, Zhen Lei
:
Cross-Architecture Distillation for Face Recognition. 8076-8085 - Zhijian Wu
, Jun Li
, Dingjiang Huang
:
Separable Modulation Network for Efficient Image Super-Resolution. 8086-8094 - Yulin Zhang
, Jiangqun Ni
, Wenkang Su
, Xin Liao
:
A Novel Deep Video Watermarking Framework with Enhanced Robustness to H.264/AVC Compression. 8095-8104 - Yue Yuan
, Rundong He
, Zhongyi Han
, Yilong Yin
:
LHAct: Rectifying Extremely Low and High Activations for Out-of-Distribution Detection. 8105-8113 - Jinshui Hu
, Hao Wu
, Mingjun Chen
, Chenyu Liu
, Jiajia Wu
, Shi Yin
, Baocai Yin
, Bing Yin
, Cong Liu
, Jun Du
, Lirong Dai
:
Handwritten Chemical Structure Image to Structure-Specific Markup Using Random Conditional Guided Decoder. 8114-8124 - Ting Zhang
, Nanfeng Jiang
, Hongxin Wu
, Keke Zhang
, Yuzhen Niu
, Tiesong Zhao
:
HCSD-Net: Single Image Desnowing with Color Space Transformation. 8125-8133 - Wenhao Li
, Guangyang Wu
, Wenyi Wang
, Peiran Ren
, Xiaohong Liu
:
FastLLVE: Real-Time Low-Light Video Enhancement with Intensity-Aware Look-Up Table. 8134-8144 - Yuyang Yin
, Dejia Xu
, Chuangchuang Tan
, Ping Liu
, Yao Zhao
, Yunchao Wei
:
CLE Diffusion: Controllable Light Enhancement Diffusion Model. 8145-8156 - Pengfei Zhou
, Weiqing Min
, Yang Zhang
, Jiajun Song
, Ying Jin
, Shuqiang Jiang
:
SeeDS: Semantic Separable Diffusion Synthesizer for Zero-shot Food Detection. 8157-8166 - Liao Shen
, Xingyi Li
, Huiqiang Sun
, Juewen Peng
, Ke Xian
, Zhiguo Cao
, Guosheng Lin
:
Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image. 8167-8175 - Yang Xiao
, Bo Duan
, Mingwei Sun
, Jingwei Huang
:
LocalPose: Object Pose Estimation with Local Geometry Guidance. 8176-8184 - Xianghao Jiao
, Yaohua Liu
, Jiaxin Gao
, Xinyuan Chu
, Xin Fan
, Risheng Liu
:
PEARL: Preprocessing Enhanced Adversarial Robust Learning of Image Deraining for Semantic Segmentation. 8185-8194 - Yuchao Feng
, Yanyan Shao
, Honghui Xu
, Jinshan Xu
, Jianwei Zheng
:
A Lightweight Collective-attention Network for Change Detection. 8195-8203 - Mengyi Wang
, Xinxin Zhang
, Yongshun Gong
, Yilong Yin
:
Personalized Single Image Reflection Removal Network through Adaptive Cascade Refinement. 8204-8213 - Weiyu Sun
, Xinyu Zhang
, Hao Lu
, Ying Chen
, Yun Ge
, Xiaolin Huang
, Jie Yuan
, Yingcong Chen
:
Resolve Domain Conflicts for Generalizable Remote Physiological Measurement. 8214-8224 - Yang Wei, Bin Xiao, Xiuli Bi
, Zhuoran Ma, Yang Liu, Zhuo Ma:
Secondary Labeling: A Novel Labeling Strategy for Image Manipulation Detection. 8225-8232 - Qingliang Liu
, Jiangqun Ni
, Xianglei Hu
:
Robust Image Steganography against General Scaling Attacks. 8233-8241 - Han Kim
, Chunggi Lee
, Junsoo Lee
, Dohyun Kim
, Kwangjin Lee
, Moohyun Oh
, Daesik Kim
:
FlatGAN: A Holistic Approach for Robust Flat-Coloring in High-Definition with Understanding Line Discontinuity. 8242-8250 - Lin Zhu
, Yunlong Zheng
, Mengyue Geng
, Lizhi Wang
, Hua Huang
:
Recurrent Spike-based Image Restoration under General Illumination. 8251-8260 - Shiao Xie
, Ziwei Niu
, Huimin Huang
, Hao Sun
, Rui Qin
, Yen-Wei Chen
, Lanfen Lin
:
IS2Net: Intra-domain Semantic and Inter-domain Style Enhancement for Semi-supervised Medical Domain Generalization. 8285-8293 - Junbao Zhuo
, Xingyu Zhao
, Shuhui Wang
, Huimin Ma
, Qingming Huang
:
Synthesizing Videos from Images for Image-to-Video Adaptation. 8294-8303 - Didi Zhu
, Yinchuan Li
, Yunfeng Shao
, Jianye Hao
, Fei Wu
, Kun Kuang
, Jun Xiao
, Chao Wu
:
Generalized Universal Domain Adaptation with Generative Flow Networks. 8304-8315 - Dongxia Huang
, Weiqi Luo
, Peijia Zheng
, Jiwu Huang
:
Automatic Asymmetric Embedding Cost Learning via Generative Adversarial Networks. 8316-8326 - Wen Yang
, Jinjian Wu
, Leida Li
, Weisheng Dong
, Guangming Shi
:
Event-based Motion Deblurring with Modality-Aware Decomposition and Recomposition. 8327-8335 - Cong Huang
, Jiahao Li
, Lei Chu
, Dong Liu
, Yan Lu
:
Disentangle Propagation and Restoration for Efficient Video Recovery. 8336-8345 - Zhen Zhao
, Meng Zhao
, Ye Liu
, Di Yin
, Luping Zhou
:
Entropy-based Optimization on Individual and Global Predictions for Semi-Supervised Learning. 8346-8355 - Chenxi Wang
, Zhi Jin
:
Brighten-and-Colorize: A Decoupled Network for Customized Low-Light Image Enhancement. 8356-8366 - Panda Pan
, Yang Zhao
, Yuan Chen
, Wei Jia
, Zhao Zhang
, Ronggang Wang
:
Cross-view Resolution and Frame Rate Joint Enhancement for Binocular Video. 8367-8375 - Fanfan Ye
, Bingyi Lu
, Liang Ma
, Qiaoyong Zhong
, Di Xie
:
Up to Thousands-fold Storage Saving: Towards Efficient Data-Free Distillation of Large-Scale Visual Classifiers. 8376-8386 - Menglin Wang
, Xiaojin Gong
:
Learning Intra and Inter-Camera Invariance for Isolated Camera Supervised Person Re-identification. 8387-8395 - Mingzhi Lyu
, Yi Huang
, Adams Wai-Kin Kong
:
Adversarial Attack for Robust Watermark Protection Against Inpainting-based and Blind Watermark Removers. 8396-8405 - Junhong Lin
, Shufan Pei
, Bing Chen
, Nanfeng Jiang
, Wei Gao
, Tiesong Zhao
:
LDRM: Degradation Rectify Model for Low-light Imaging via Color-Monochrome Cameras. 8406-8414 - Shang Chai
, Liansheng Zhuang
, Fengying Yan
, Zihan Zhou
:
Two-stage Content-Aware Layout Generation for Poster Designs. 8415-8423 - Zhenbo Shi
, Wei Yang
, Zhenbo Xu
, Zhidong Yu, Liusheng Huang
:
Reinforcement Learning-based Adversarial Attacks on Object Detectors using Reward Shaping. 8424-8432 - Zhengwentai Sun
, Yanghong Zhou
, Honghong He
, P. Y. Mok
:
SGDiff: A Style Guided Diffusion Model for Fashion Synthesis. 8433-8442 - Zhengyan Sheng
, Yang Ai
, Yan-Nian Chen
, Zhen-Hua Ling
:
Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment. 8443-8452 - Zeyu Ma
, Ziqiang Zheng
, Jiwei Wei
, Xiaoyong Wei
, Yang Yang
, Heng Tao Shen
:
Open-Scenario Domain Adaptive Object Detection in Autonomous Driving. 8453-8462 - Run Wang
, Jixing Ren
, Boheng Li
, Tianyi She
, Wenhui Zhang
, Liming Fang
, Jing Chen
, Lina Wang
:
Free Fine-tuning: A Plug-and-Play Watermarking Scheme for Deep Neural Networks. 8463-8474 - Dan Zeng
, Shanchuan Hong
, Shuiwang Li
, Qiaomu Shen
, Bo Tang
:
Data-Scarce Animal Face Alignment via Bi-Directional Cross-Species Knowledge Transfer. 8475-8485 - Chaoning Zhang
, Philipp Benz
, Adil Karjauv
, In So Kweon
, Choong Seon Hong
:
Simple Techniques are Sufficient for Boosting Adversarial Transferability. 8486-8494 - Yuhang Zhao
, Shanchen Pang
, Zhihan Lv
, Sheng Miao
:
Augmented Digital Twins for Predictive Automatic Regulation and Fault Alarm in Sewage Plan. 8495-8503 - Siyue Yao
, Mingjie Sun
, Bingliang Li
, Fengyu Yang
, Junle Wang
, Ruimao Zhang
:
Dance with You: The Diversity Controllable Dancer Generation via Diffusion Models. 8504-8514 - Ji Zhang
, Xiao Wu
, Zhi-Qi Cheng
, Qi He
, Wei Li
:
Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment. 8515-8524 - Qiutang Qi
, Haonan Cheng
, Yang Wang
, Long Ye
, Shaobin Li
:
RD-FGFS: A Rule-Data Hybrid Framework for Fine-Grained Footstep Sound Synthesis from Visual Guidance. 8525-8533 - Lihua Lu
, Hui Wei
, Xin Jin
, Yihao Zhang
, Boyan Dong
, Longteng Jiang
, Xiaohui Zhang
, Ruyang Li
, Yaqian Zhao
:
Aesthetics-Driven Virtual Time-Lapse Photography Generation. 8534-8542 - Zhenghao Chen
, Lucas Relic
, Roberto Azevedo
, Yang Zhang
, Markus Gross
, Dong Xu
, Luping Zhou
, Christopher Schroers
:
Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers. 8543-8551 - Xuan Hai
, Xin Liu
, Yuan Tan
, Qingguo Zhou
:
SiFDetectCracker: An Adversarial Attack Against Fake Voice Detection Based on Speaker-Irrelative Features. 8552-8560 - Leo Shan
, Wenzhang Zhou
, Grace Zhao
:
Incremental Few Shot Semantic Segmentation via Class-agnostic Mask Proposal and Language-driven Classifier. 8561-8570 - Raghav Jain
, Apoorva Singh
, Vivek Kumar Gangwar, Sriparna Saha
:
AbCoRD: Exploiting multimodal generative approach for Aspect-based Complaint and Rationale Detection. 8571-8579 - Davide Morelli
, Alberto Baldrati
, Giuseppe Cartella
, Marcella Cornia
, Marco Bertini
, Rita Cucchiara
:
LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On. 8580-8589 - Yanli Ji, Lingfeng Ye
, Huili Huang
, Lijing Mao
, Yang Zhou, Lingling Gao:
Localization-assisted Uncertainty Score Disentanglement Network for Action Quality Assessment. 8590-8597 - Kaixun Jiang
, Lingyi Hong
, Zhaoyu Chen
, Pinxue Guo
, Zeng Tao
, Yan Wang
, Wenqiang Zhang
:
Exploring the Adversarial Robustness of Video Object Segmentation via One-shot Adversarial Attacks. 8598-8607 - Chris Lenart
, Pegah Ahadian
, Yuxin Yang
, Simon Suo
, Ashton Corsello
, Karl W. Kosko
, Qiang Guan
:
Gaze Analysis System for Immersive 360° Video for Preservice Teacher Education. 8608-8616 - Shuo Jin
, Meiqin Liu
, Chao Yao, Chunyu Lin
, Yao Zhao
:
Kernel Dimension Matters: To Activate Available Kernels for Real-time Video Super-Resolution. 8617-8625 - Jinlong Fan
, Jing Zhang
, Zhi Hou
, Dacheng Tao
:
AniPixel: Towards Animatable Pixel-Aligned Human Avatar. 8626-8634 - Qifeng Lin
, Luojun Lin
, Yuanlong Yu
, Gang Fu
:
A Multiple Prediction Mechanisms Ensemble for Complex Remote Sensing Scenes. 8635-8643 - Siyuan Huang
, Bo Zhang
, Botian Shi
, Hongsheng Li
, Yikang Li
, Peng Gao
:
SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification. 8644-8652 - Ziyu Feng
, Zheming Xu
, Haina Qin
, Congyan Lang
, Bing Li
, Weihua Xiong
:
SMM: Self-supervised Multi-Illumination Color Constancy Model with Multiple Pretext Tasks. 8653-8661 - Shukai Wu
, Yuhang Yang
, Shuchang Xu
, Weiming Liu
, Xiao Yan
, Sanyuan Zhang
:
FlexIcon: Flexible Icon Colorization via Guided Images and Palettes. 8662-8673
Poster Session IX: Engaging Users with Multimedia -- Social-good, Fairness and Transparency
- Yanzhen Ren
, Hongcheng Zhu
, Liming Zhai
, Zongkun Sun
, Rubing Shen
, Lina Wang
:
Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion. 8674-8685 - Liping Yi
, Gang Wang
, Xiaoguang Liu
, Zhuan Shi
, Han Yu
:
FedGH: Heterogeneous Federated Learning with Generalized Global Header. 8686-8696 - Jinqian Chen
, Jihua Zhu
, Qinghai Zheng
:
Towards Fast and Stable Federated Learning: Confronting Heterogeneity via Knowledge Anchor. 8697-8706 - Maosen Li
, Xurong Li
, Kun Yu
, Cheng Deng
, Heng Huang
, Feng Mao
, Hui Xue
, Minghao Li
:
Spatio-Temporal Catcher: A Self-Supervised Transformer for Deepfake Video Detection. 8707-8718 - Shide Du
, Zihan Fang
, Shiyang Lan
, Yanchao Tan
, Manuel Günther
, Shiping Wang
, Wenzhong Guo
:
Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness. 8719-8729 - Peini Guo
, Hong Liu
, Jianbing Wu
, Guoquan Wang
, Tao Wang
:
Semantic-aware Consistency Network for Cloth-changing Person Re-Identification. 8730-8739 - Xiaotian Wang
, Shuo Liang
, Zhifu Zhao
, Xinyu Cui
, Kai Chen
, Xuanhang Xu
:
Adaptive Spatio-Temporal Directed Graph Neural Network for Parkinson's Detection using Vertical Ground Reaction Force. 8740-8748 - Rui Zhang
, Hongxia Wang, Mingshan Du
, Hanqing Liu
, Yang Zhou
, Qiang Zeng
:
UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization. 8749-8759 - Xiangming Gu
, Wei Zeng
, Ye Wang
:
Elucidate Gender Fairness in Singing Voice Transcription. 8760-8769 - Yuyan Bu
, Qiang Sheng
, Juan Cao
, Peng Qi
, Danding Wang
, Jintao Li
:
Combating Online Misinformation Videos: Characterization, Detection, and Future Directions. 8770-8780 - Yuxuan Zhang, Lei Liu, Li Liu:
Cuing Without Sharing: A Federated Cued Speech Recognition Framework via Mutual Knowledge Distillation. 8781-8789 - Yuxuan Tan
, Yuanman Li
, Limin Zeng
, Jiaxiong Ye
, Wei Wang
, Xia Li
:
Multi-scale Target-Aware Framework for Constrained Splicing Detection and Localization. 8790-8798 - Muyao Niu
, Zhuoxiao Li
, Yifan Zhan
, Huy H. Nguyen
, Isao Echizen
, Yinqiang Zheng
:
Physics-Based Adversarial Attack on Near-Infrared Human Detector for Nighttime Surveillance Camera Systems. 8799-8807 - Shengtao Lou
, Buyu Liu
, Jun Bao
, Jiajun Ding
, Jun Yu
:
Follow-me: Deceiving Trackers with Fabricated Paths. 8808-8818 - Lu Wei
, Bin Liu
, Jiujun He
, Manxue Zhang
, Yi Huang
:
Autistic Spectrum Disorders Diagnose with Graph Neural Networks. 8819-8827 - Hui Wei
, Hanxun Yu
, Kewei Zhang
, Zhixiang Wang
, Jianke Zhu
, Zheng Wang
:
Moiré Backdoor Attack (MBA): A Novel Trigger for Pedestrian Detectors in the Physical World. 8828-8838 - Wenxuan Liu
, Tianyao He
, Chen Gong
, Ning Zhang
, Hua Yang
, Junchi Yan
:
Fine-Grained Music Plagiarism Detection: Revealing Plagiarists through Bipartite Graph Matching and a Comprehensive Large-Scale Dataset. 8839-8848 - Kui Zhang
, Hang Zhou
, Jie Zhang
, Qidong Huang
, Weiming Zhang
, Nenghai Yu
:
Ada3Diff: Defending against 3D Adversarial Point Clouds via Adaptive Diffusion. 8849-8859 - Yi Zhang
, Jitao Sang
, Junyang Wang
, Dongmei Jiang
, Yaowei Wang
:
Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut Features. 8860-8868 - Zhihao Yue
, Jun Xia
, Zhiwei Ling
, Ming Hu
, Ting Wang
, Xian Wei, Mingsong Chen
:
Model-Contrastive Learning for Backdoor Elimination. 8869-8880 - Shucheng Li
, Runchuan Wang
, Hao Wu
, Sheng Zhong
, Fengyuan Xu
:
SIEGE: Self-Supervised Incremental Deep Graph Learning for Ethereum Phishing Scam Detection. 8881-8890 - Jinzhang Hu
, Ruimin Hu
, Zheng Wang
, Dengshi Li
, Junhang Wu
, Lingfei Ren
, Yilong Zang
, Zijun Huang
, Mei Wang
:
Collaborative Fraud Detection: How Collaboration Impacts Fraud Detection. 8891-8899 - Zixuan Ni
, Longhui Wei
, Jiacheng Li
, Siliang Tang
, Yueting Zhuang
, Qi Tian
:
Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion. 8900-8909 - Wan Jiang
, Yunfeng Diao
, He Wang
, Jianxin Sun
, Meng Wang
, Richang Hong
:
Unlearnable Examples Give a False Sense of Security: Piercing through Unexploitable Data with Learnable Examples. 8910-8921
Poster Session X: Multimedia systems -- Data Systems Management and Indexing
- Fei Shen
, Xiangbo Shu
, Xiaoyu Du
, Jinhui Tang
:
Pedestrian-specific Bipartite-aware Similarity Learning for Text-based Person Retrieval. 8922-8931 - Xingyu Zhao
, Lei Qi
, Yuexuan An
, Xin Geng
:
Generalizable Label Distribution Learning. 8932-8941 - Yan Jiang
, Hongtao Xie
, Lei Zhang
, Pandeng Li
, Dongming Zhang
, Yongdong Zhang
:
Dual Dynamic Proxy Hashing Network for Long-tailed Image Retrieval. 8942-8953 - Sensen Zhang
, Xun Liang
, Hui Tang
, Zhenyu Guan
:
Hybrid Interaction Temporal Knowledge Graph Embedding Based on Householder Transformations. 8954-8962 - Huaiwen Zhang
, Yang Yang
, Fan Qi
, Shengsheng Qian
, Changsheng Xu
:
C2MR: Continual Cross-Modal Retrieval for Streaming Multi-modal Data. 8963-8974 - Yuanding Zhou
, Xinran Li
, Yaodong Fang
, Chuan Qin
:
When Perceptual Authentication Hashing Meets Neural Architecture Search. 8975-8983 - Xinbiao Gan
, Jiaqi Guo
, Peilin Guo
, Guang Wu
, Jiaqi Si
, Songzhu Mei
, Cong Liu
, Tiejun Li
:
GraphMedia: Communication-balanced Graph Searching for Billion-scale Social Media Access. 8984-8993
Poster Session XI: Multimedia systems -- Systems and Middleware
- Yunfei Long
, Zhe Xue
, Lingyang Chu
, Tianlong Zhang
, Junjiang Wu
, Yu Zang
, Junping Du
:
FedCD: A Classifier Debiased Federated Learning Framework for Non-IID Data. 8994-9002 - Jangho Kim
, Jayeon Yoo
, Yeji Song
, KiYoon Yoo
, Nojun Kwak
:
Finding Efficient Pruned Network via Refined Gradients for Pruned Weights. 9003-9011 - Jingzong Li
, Yik Hong Cai
, Libin Liu
, Yu Mao
, Chun Jason Xue
, Hong Xu
:
Moby: Empowering 2D Models for Efficient Point Cloud Analytics on the Edge. 9012-9021 - Lorenzo Catania
, Dario Allegra
:
NIF: A Fast Implicit Image Compression with Bottleneck Layers and Modulated Sinusoidal Activations. 9022-9031 - Wanting Li
, Yongcai Wang
, Yongyu Guo
, Shuo Wang
, Yu Shao
, Xuewei Bai
, Xudong Cai
, Qiang Ye
, Deying Li
:
ColSLAM: A Versatile Collaborative SLAM System for Mobile Phones Using Point-Line Features and Map Caching. 9032-9041 - Xizhong Zhu
, Guoqing Xiang
, Peng Zhang
, Huizhu Jia
, Xiaodong Xie
:
A Hardware-efficient Unified Motion Estimation for Video Coding. 9042-9050 - Yuxin Kong
, Peng Yang
, Yan Cheng
:
Edge-Assisted On-Device Model Update for Video Analytics in Adverse Environments. 9051-9060 - Fangchen Ye
, Jin Lin
, Hongzhan Huang
, Jianping Fan
, Zhongchao Shi
, Yuan Xie
, Yanyun Qu
:
Hardware-friendly Scalable Image Super Resolution with Progressive Structured Sparsity. 9061-9069 - Junteng Zhang
, Tong Chen
, Dandan Ding
, Zhan Ma
:
YOGA: Yet Another Geometry-based Point Cloud Compressor. 9070-9081 - Zhixiang Ye
, Qinghao Hu
, Tianli Zhao
, Wangping Zhou
, Jian Cheng
:
MCUNeRF: Packing NeRF into an MCU with 1MB Memory. 9082-9092 - Jianwei Zheng
, Changnan Xiao
, Mingliang Li
, Zhenhua Li
, Feng Qian
, Wei Liu
, Xudong Wu
:
ParliRobo: Participant Lightweight AI Robots for Massively Multiplayer Online Games (MMOGs). 9093-9102 - Guanyu Xu
, Jiawei Hao
, Li Shen
, Han Hu
, Yong Luo
, Hui Lin
, Jialie Shen
:
LGViT: Dynamic Early Exiting for Accelerating Vision Transformer. 9103-9114 - Seyeon Kim
, Kyungmin Bin
, Donggyu Yang
, Sangtae Ha
, Song Chong
, Kyunghan Lee
:
ENTRO: Tackling the Encoding and Networking Trade-off in Offloaded Video Analytics. 9115-9123 - Sheng-Ming Tang
, Yuan-Chun Sun
, Cheng-Hsin Hsu
:
A Blind Streaming System for Multi-client Online 6-DoF View Touring. 9124-9133 - Yizhen Yuan
, Rui Kong
, Shenghao Xie
, Yuanchun Li
, Yunxin Liu
:
PatchBackdoor: Backdoor Attack against Deep Neural Networks without Model Modification. 9134-9142 - Shuoqian Wang
, Mufeng Zhu
, Na Li
, Mengbai Xiao
, Yao Liu
:
VQBA: Visual-Quality-Driven Bit Allocation for Low-Latency Point Cloud Streaming. 9143-9151 - Zheming Yang
, Wen Ji
, Qi Guo
, Zhi Wang
:
JAVP: Joint-Aware Video Processing with Edge-Cloud Collaboration for DNN Inference. 9152-9160
Poster Session XII: Multimedia systems -- Transport and Delivery
- Yizong Wang
, Dong Zhao
, Huanhuan Zhang
, Chenghao Huang
, Teng Gao
, Zixuan Guo
, Liming Pang
, Huadong Ma
:
Hermes: Leveraging Implicit Inter-Frame Correlation for Bandwidth-Efficient Mobile Volumetric Video Streaming. 9185-9193 - Fulin Wang
, Qing Li
, Wanxin Shi
, Gareth Tyson
, Yong Jiang
, Lianbo Ma
, Peng Zhang
, Yulong Lan
, Zhicheng Li
:
Reparo: QoE-Aware Live Video Streaming in Low-Rate Networks by Intelligent Frame Recovery. 9194-9204 - Hongbin Lin
, Bolin Chen
, Zhichen Zhang
, Jielian Lin
, Xu Wang
, Tiesong Zhao
:
DeepSVC: Deep Scalable Video Coding for Both Machine and Human Vision. 9205-9214 - Chaoyang Li
, Rui-Xiao Zhang
, Tianchi Huang
, Lianchen Jia
, Lifeng Sun
:
Concerto: Client-server Orchestration for Real-Time Video Analytics. 9215-9223 - Mingxuan Yan
, Yi Wang
, Xuedou Xiao
, Zhiqing Luo
, Jianhua He
, Wei Wang
:
Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation. 9224-9233 - Haiping Wang
, Zhenhua Yu
, Ruixiao Zhang
, Siping Tao
, Hebin Yu
, Shu Shi
:
TwinStar: A Practical Multi-path Transmission Framework for Ultra-Low Latency Video Delivery. 9234-9242 - Sergi Fernández
, Mario Montagud
, David Rincón
, Juame Moragues
, Gianluca Cernigliaro
:
Addressing Scalability for Real-time Multiuser Holo-portation: Introducing and Assessing a Multipoint Control Unit (MCU) for Volumetric Video. 9243-9251 - Zhichen Zhang
, Bolin Chen
, Hongbin Lin
, Jielian Lin
, Xu Wang
, Tiesong Zhao
:
ELFIC: A Learning-based Flexible Image Codec with Rate-Distortion-Complexity Optimization. 9252-9261 - Yueheng Li
, Zicheng Zhang
, Hao Chen
, Zhan Ma
:
Mamba: Bringing Multi-Dimensional ABR to WebRTC. 9262-9270 - Xiaodong Yang
, Yiting Shao
, Shan Liu
, Thomas H. Li
, Ge Li
:
PDE-based Progressive Prediction Framework for Attribute Compression of 3D Point Clouds. 9271-9281
Brave New Ideas Session
- Zijie Ye
, Jia Jia
, Junliang Xing
:
Semantics2Hands: Transferring Hand Motion Semantics between Avatars. 9282-9290 - Danni Xu
, Shaojing Fan
, Mohan S. Kankanhalli
:
Combating Misinformation in the Era of Generative AI Models. 9291-9298 - Qi Yang
, Marlo Ongpin
, Sergey I. Nikolenko
, Alfred Huang
, Aleksandr Farseev
:
Against Opacity: Explainable AI and Large Language Models for Effective Digital Advertising. 9299-9305 - Federico Betti, Jacopo Staiano, Lorenzo Baraldi, Lorenzo Baraldi, Rita Cucchiara
, Nicu Sebe
:
Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation. 9306-9312 - Junchen Zhu
, Huan Yang
, Huiguo He
, Wenjing Wang
, Zixi Tuo
, Wen-Huang Cheng
, Lianli Gao
, Jingkuan Song
, Jianlong Fu
:
MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images. 9313-9319 - Alexander Martin
, Haitian Zheng
, Jie An
, Jiebo Luo
:
Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation. 9320-9328 - Zihao Wu
, Xin Wang
, Hong Chen
, Kaidong Li
, Yi Han
, Lifeng Sun
, Wenwu Zhu
:
Diff4Rec: Sequential Recommendation with Curriculum-scheduled Diffusion Augmentation. 9329-9335
Doctoral Symposium
- Ahmed Elhagry
:
Text-to-Metaverse: Towards a Digital Twin-Enabled Multimodal Conditional Generative Metaverse. 9336-9339 - Tao Pu
:
Video Scene Graph Generation with Spatial-Temporal Knowledge. 9340-9344 - Keke Zhang
:
Limited-Reference Image Quality Assessment: Paradigms and Discussions. 9345-9349 - Ying Fang
:
Haptic-aware Interaction: Design and Evaluation. 9350-9354 - Yuchen Yang
:
Encoding and Decoding Narratives: Datafication and Alternative Access Models for Audiovisual Archives. 9355-9359 - Sandipan Sarma
:
Zero-Shot Learning for Computer Vision Applications. 9360-9364
Technical Demonstrations
- Qinghao Ye
, Haiyang Xu
, Ming Yan
, Chenlin Zhao
, Junyang Wang
, Xiaoshan Yang
, Ji Zhang
, Fei Huang
, Jitao Sang
, Changsheng Xu
:
mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM. 9365-9367 - Zheng Zhang
, Songling Chen
, Mixiao Hou
, Guangming Lu
:
Multimodal Emotion Interaction and Visualization Platform. 9368-9370 - Junchen Zhu
, Huan Yang
, Wenjing Wang
, Huiguo He
, Zixi Tuo
, Yongsheng Yu
, Wen-Huang Cheng
, Lianli Gao
, Jingkuan Song
, Jianlong Fu
, Jiebo Luo
:
MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text. 9371-9373 - Djamahl Etchegaray
, Yadan Luo
, Zachary FitzChance
, Anthony Southon
, Jinjiang Zhong
:
Open-RoadAtlas: Leveraging VLMs for Road Condition Survey with Real-Time Mobile Auditing. 9374-9375 - Yi Han
, Kaidong Li
, Zihan Song
, Wei Feng
, Xiang Cao
, Shida Guo
, Xin Wang
, Xuguang Duan
, Wenwu Zhu
:
H2V4Sports: Real-Time Horizontal-to-Vertical Video Converter for Sports Lives via Fast Object Detection and Tracking. 9376-9378 - Mizuki Takenawa
, Naoki Sugimoto
, Leslie Wöhler
, Satoshi Ikehata
, Kiyoharu Aizawa
:
360RVW: Fusing Real 360° Videos and Interactive Virtual Worlds. 9379-9381 - Yu-Hsuan Chen
, Chen-Wei Fu
, Wei-Lun Huang
, Ming-Cong Su
, Hsin-Yu Huang
, Andrew Chen
, Tse-Yu Pan
:
SetterVision: Motion-based Tactical Training System for Volleyball Setters in Virtual Reality. 9382-9384 - Hao Wu
, Yueyao Li
, Yan Zhuang
, Xinyao Sun
, Wei Cai
:
BranchClash: A Fully On-Chain Tower Defense Blockchain Game with New Collaboration Mechanism. 9385-9387 - Yuki Konishi
, Panote Siriaraya
, Da Li
, Katsumi Tanaka
, Yukiko Kawai
, Shinsuke Nakajima
:
Development of an Online Marathon System using Acoustic AR. 9388-9389 - Zhanbin Hu
, Jianwu Wu
, Danyang Gao
, Yixu Zhou
, Qiang Zhu
:
CFTF: Controllable Fine-grained Text2Face and Its Human-in-the-loop Suspect Portraits Application. 9390-9392 - Zeyu Jin
, Zixuan Wang
, Qixin Wang
, Jia Jia
, Ye Bai
, Yi Zhao
, Hao Li
, Xiaorui Wang
:
HoloSinger: Semantics and Music Driven Motion Generation with Octahedral Holographic Projection. 9393-9395 - Dongkai Wang
, Shiliang Zhang
, Yaowei Wang
, Yonghong Tian
, Tiejun Huang
, Wen Gao
:
HumVis: Human-Centric Visual Analysis System. 9396-9398 - Yuya Moroto
, Rintaro Yanagi
, Naoki Ogawa
, Kyohei Kamikawa
, Keigo Sakurai
, Ren Togo
, Keisuke Maeda
, Takahiro Ogawa
, Miki Haseyama
:
Personalized Content Recommender System via Non-verbal Interaction Using Face Mesh and Facial Expression. 9399-9401 - Ming Feng
, Kele Xu
, Hengxing Cai
:
IFS-SED: Incremental Few-Shot Sound Event Detection Using Explicit Learning and Calibration. 9402-9404 - Qiuyun Zhang
, Bin Guo
, Lina Yao
, Han Wang
, Ying Zhang
, Zhiwen Yu
:
ALDA: An Adaptive Layout Design Assistant for Diverse Posters throughout the Design Process. 9405-9407 - Yang Chen
, Jingwen Chen
, Yingwei Pan
, Xinmei Tian
, Tao Mei
:
3D Creation at Your Fingertips: From Text or Image to 3D Assets. 9408-9410 - Rintaro Yanagi
, Atsushi Hashimoto
, Naoya Chiba
, Yoshitaka Ushiku
:
Reference-based Dense Pose Estimation via Partial 3D Point Cloud Matching. 9411-9413 - Shanghua Gao
, Zhijie Lin
, Xingyu Xie
, Pan Zhou
, Ming-Ming Cheng
, Shuicheng Yan
:
EditAnything: Empowering Unparalleled Flexibility in Image Editing and Generation. 9414-9416 - Lorenzo Agnolucci
, Alberto Baldrati
, Marco Bertini
, Alberto Del Bimbo
:
Zero-Shot Image Retrieval with Human Feedback. 9417-9419
Grand Challenges
- Xin Zhang
, Wen Xie
, Ziqi Dai
, Jun Rao
, Haokun Wen
, Xuan Luo
, Meishan Zhang
, Min Zhang
:
Finetuning Language Models for Multimodal Question Answering. 9420-9424 - Ruizhe Li
, Jiahao Guo
, Mingxi Li
, Zhengqian Wu
, Chao Liang
:
A Hierarchical Deep Video Understanding Method with Shot-Based Instance Search and Large Language Model. 9425-9429 - Shijian Mao
, Wudong Xi
, Lei Yu
, Gaotian Lü
, Xingxing Xing
, Xingchen Zhou
, Wei Wan
:
Enhanced CatBoost with Stacking Features for Social Media Prediction. 9430-9435 - Zebang Cheng, Yuxiang Lin, Zhaoru Chen, Xiang Li, Shuyi Mao, Fan Zhang, Daijun Ding, Bowen Zhang, Xiaojiang Peng:
Semi-Supervised Multimodal Emotion Recognition with Expression MAE. 9436-9440 - Meng Liu
, Yongqiang Li
, Shuyan Zhai
, Weili Guan
, Liqiang Nie
:
Towards Realistic Conversational Head Generation: A Comprehensive Framework for Lifelike Video Synthesis. 9441-9445 - Kangshuai Guo
, Zhijian Xu
, Shichao Luo
, Feigao Wei
, Yan Wang
, Yanru Zhang
:
Invisible Video Watermark Method Based on Maximum Voting and Probabilistic Superposition. 9446-9450 - Chih-Chung Hsu
, Chia-Ming Lee
, Xiu-Yu Hou
, Chi-Han Tsai
:
Gradient Boost Tree Network based on Extensive Feature Analysis for Popularity Prediction of Social Posts. 9451-9455 - Haoru Chen
, Tianjiao Wan
, Zhimin Lin
, Kele Xu
, Jin Wang
, Huaimin Wang
:
VTQAGen: BART-based Generative Model For Visual Text Question Answering. 9456-9461 - Xiaolu Chen
, Weilong Chen
, Chenghao Huang
, Zhongjian Zhang
, Lixin Duan
, Yanru Zhang
:
Double-Fine-Tuning Multi-Objective Vision-and-Language Transformer for Social Media Popularity Prediction. 9462-9466 - Nicolae-Catalin Ristea
, Radu Tudor Ionescu
:
Cascaded Cross-Modal Transformer for Request and Complaint Detection. 9467-9471 - Qiya Song
, Renwei Dian
, Bin Sun
, Jie Xie
, Shutao Li
:
Multi-scale Conformer Fusion Network for Multi-participant Behavior Analysis. 9472-9476 - Dejan Porjazovski
, Yaroslav Getman
, Tamás Grósz
, Mikko Kurimo
:
Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference. 9477-9481 - Yanjie Sun
, Kele Xu
, Chaorun Liu
, Yong Dou
, Kun Qian
:
Automatic Audio Augmentation for Requests Sub-Challenge. 9482-9486 - Jun Yu
, Mohan Jing
, Weihao Liu
, Tongxu Luo
, Bingyuan Zhang
, Keda Lu
, Fangyu Lei
, Jianqing Sun
, Jiaen Liang
:
Answer-Based Entity Extraction and Alignment for Visual Text Question Answering. 9487-9491 - Siddhant R. Viksit
, Vinayak Abrol
:
Multi-Layer Acoustic & Linguistic Feature Fusion for ComParE-23 Emotion and Requests Challenge. 9492-9495 - Jun Yu
, Keda Lu
, Mohan Jing
, Ziqi Liang
, Bingyuan Zhang
, Jianqing Sun
, Jiaen Liang
:
Sliding Window Seq2seq Modeling for Engagement Estimation. 9496-9500 - Wenfeng Qin
, Bochao Zou
, Xin Li
, Weiping Wang
, Huimin Ma
:
Micro-Expression Spotting with Face Alignment and Optical Flow. 9501-9505 - Cong Liang
, Jiahe Wang
, Haofan Zhang
, Bing Tang
, Junshan Huang
, Shangfei Wang
, Xiaoping Chen
:
UniFaRN: Unified Transformer for Facial Reaction Generation. 9506-9510 - Payal Mohapatra
, Akash Pandey
, Yueyuan Sui
, Qi Zhu
:
Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks. 9511-9515 - Kun Li
, Dan Guo
, Guoliang Chen
, Feiyang Liu
, Meng Wang
:
Data Augmentation for Human Behavior Analysis in Multi-Person Conversations. 9516-9520 - Vu Ngoc Tu
, Van Thong Huynh
, Hyung-Jeong Yang
, Soo-Hyung Kim
, Shah Nawaz
, Karthik Nandakumar
, Muhammad Zaigham Zaheer
:
DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation. 9521-9525 - Surbhi Madan
, Rishabh Jain
, Gulshan Sharma
, Ramanathan Subramanian
, Abhinav Dhall
:
MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings. 9526-9530 - Haotian Wang
, Yuxuan Xi
, Hang Chen
, Jun Du
, Yan Song
, Qing Wang
, Hengshun Zhou
, Chenxi Wang
, Jiefeng Ma
, Pengfei Hu
, Ya Jiang
, Shi Cheng
, Jie Zhang
, Yuzhe Weng
:
Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023. 9531-9535 - Ximi Hoque
, Adamay Mann
, Gulshan Sharma
, Abhinav Dhall
:
BEAMER: Behavioral Encoder to Generate Multiple Appropriate Facial Reactions. 9536-9540 - Jun Yu
, Zhongpeng Cai
, Shenshen Du
, Xiaxin Shen
, Lei Wang
, Fang Gao
:
Efficient Micro-Expression Spotting Based on Main Directional Mean Optical Flow Feature. 9541-9545 - Qifei Li
, Yingming Gao
, Ya Li
:
Mining High-quality Samples from Raw Data and Majority Voting Method for Multimodal Emotion Recognition. 9546-9550 - Runze Liu
, Yaqun Fang
, Fan Yu
, Ruiqi Tian
, Tongwei Ren
, Gangshan Wu
:
Deep Video Understanding with Video-Language Model. 9551-9555 - Haifeng Chen
, Chujia Guo
, Yan Li
, Peng Zhang
, Dongmei Jiang
:
Semi-Supervised Multimodal Emotion Recognition with Class-Balanced Pseudo-labeling. 9556-9560 - Jun Yu
, Ji Zhao
, Guochen Xie
, Fengxin Chen
, Ye Yu
, Liang Peng
, Minglei Li
, Zonghong Dai
:
Leveraging the Latent Diffusion Models for Offline Facial Multiple Appropriate Reactions Generation. 9561-9565 - Wei Dai
:
Improvements on SadTalker-based Approach for ViCo Conversational Head Generation Challenge. 9566-9570 - Sunan Li
, Hailun Lian
, Cheng Lu
, Yan Zhao
, Chuangao Tang
, Yuan Zong
, Wenming Zheng
:
Multimodal Emotion Recognition in Noisy Environment Based on Progressive Label Revision. 9571-9575 - Ke Xu
, Kang Chen
, Licai Sun
, Zheng Lian
, Bin Liu
, Gong Chen
, Haiyang Sun
, Mingyu Xu
, Jianhua Tao
:
Integrating VideoMAE based model and Optical Flow for Micro- and Macro-expression Spotting. 9576-9580 - Zhigang Chang
, Weitai Hu
, Qing Yang
, Shibao Zheng
:
Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline. 9581-9585 - Kangzhong Wang
, MK Michael Cheung
, Youqian Zhang
, Chunxi Yang
, Peter Q. Chen
, Eugene Yujun Fu
, Grace Ngai
:
Unveiling Subtle Cues: Backchannel Detection Using Temporal Multimodal Attention Networks. 9586-9590 - Yuanxing Xu
, Yuting Wei
, Bin Wu
:
Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding. 9591-9595 - Daoming Zong
, Chaoyue Ding
, Baoxiang Li
, Dinghao Zhou
, Jiakui Li
, Ken Zheng, Qunyan Zhou
:
Building Robust Multimodal Sentiment Recognition via a Simple yet Effective Multimodal Transformer. 9596-9600 - Chunxi Yang
, Kangzhong Wang
, Peter Q. Chen
, MK Michael Cheung
, Youqian Zhang
, Eugene Yujun Fu
, Grace Ngai
:
MultiMediate 2023: Engagement Level Detection using Audio and Video Features. 9601-9605 - Keith Curtis
, George Awad
, Afzal Godil, Ian Soboroff
:
The ACM Multimedia 2023 Deep Video Understanding Grand Challenge. 9606-9609 - Zheng Lian
, Haiyang Sun
, Licai Sun
, Kang Chen, Mingyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, Ye Liu, Bin Liu, Jiangyan Yi, Meng Wang, Erik Cambria
, Guoying Zhao
, Björn W. Schuller
, Jianhua Tao
:
MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning. 9610-9614 - Mohan Zhou
, Yalong Bai
, Wei Zhang
, Ting Yao
, Tiejun Zhao
, Tao Mei
:
Learning and Evaluating Human Preferences for Conversational Head Generation. 9615-9619 - Siyang Song
, Micol Spitale
, Cheng Luo
, Germán Barquero
, Cristina Palmero
, Sergio Escalera
, Michel F. Valstar
, Tobias Baur
, Fabien Ringeval
, Elisabeth André
, Hatice Gunes
:
REACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge. 9620-9624 - Adrian K. Davison
, Jingting Li
, Moi Hoon Yap
, John See
, Wen-Huang Cheng
, Xiaobai Li
, Xiaopeng Hong
, Su-Jing Wang
:
MEGC2023: ACM Multimedia 2023 ME Grand Challenge. 9625-9629 - Jin Chen
, Yi Yu
, Shien Song
, Xinying Wang
, Jie Yang
, Yifei Xue
, Yizhen Lao
:
ACM Multimedia 2023 Grand Challenge Report: Invisible Video Watermark. 9630-9634 - Björn W. Schuller
, Anton Batliner
, Shahin Amiriparian
, Alexander Barnhill
, Maurice Gerczuk
, Andreas Triantafyllopoulos
, Alice E. Baird
, Panagiotis Tzirakis
, Chris Gagne
, Alan S. Cowen
, Nikola Lackovic
, Marie-José Caraty
, Claude Montacié
:
The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests. 9635-9639 - Philipp Müller
, Michal Balazia
, Tobias Baur
, Michael Dietz
, Alexander Heimerl
, Dominik Schiller
, Mohammed Guermal
, Dominike Thomas
, François Brémond
, Jan Alexandersson
, Elisabeth André
, Andreas Bulling
:
MultiMediate '23: Engagement Estimation and Bodily Behaviour Recognition in Social Interactions. 9640-9645 - Kang Chen
, Tianli Zhao
, Xiangqian Wu
:
VTQA2023: ACM Multimedia 2023 Visual Text Question Answering Challenge. 9646-9650 - Bo Wu
, Peiye Liu
, Wen-Huang Cheng
, Bei Liu
, Zhaoyang Zeng
, Jia Wang
, Qiushi Huang
, Jiebo Luo
:
SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge. 9651-9655
Open Source Session
- Jiabei He
, Yang Shen
, Xiu-Shen Wei
, Ye Wu
:
Hawkeye: A PyTorch-based Library for Fine-Grained Image Recognition with Deep Learning. 9656-9659 - Hang Yuan
, Wei Gao
:
OpenFastVC: An Open Source Library for Video Coding Fast Algorithm Implementation. 9660-9663 - Lingxiao He
, Xingyu Liao
, Wu Liu
, Xinchen Liu
, Peng Cheng
, Tao Mei
:
FastReID: A Pytorch Toolbox for General Instance Re-identification. 9664-9667 - Daniele Malitesta
, Giuseppe Gassi
, Claudio Pomo
, Tommaso Di Noia
:
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation. 9668-9671 - Songlin Fan
, Wei Gao
:
Screen-based 3D Subjective Experiment Software. 9672-9675 - Max van Spengler
, Philipp Wirth
, Pascal Mettes
:
HypLL: The Hyperbolic Learning Library. 9676-9679 - Gustavo Leticio
, Lucas Pascotti Valem
, Leonardo Tadeu Lopes
, Daniel Carlos Guimarães Pedronette
:
pyUDLF: A Python Framework for Unsupervised Distance Learning Tasks. 9680-9684 - Wei Gao
, Shangkun Sun
, Huiming Zheng
, Yuyang Wu
, Hua Ye
, Yongchi Zhang
:
OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame Compression. 9685-9688 - Ming Shan Hee
, Aditi Kumaresan
, Nguyen-Khoi Hoang
, Nirmalendu Prakash
, Rui Cao
, Roy Ka-Wei Lee
:
MATK: The Meme Analytical Tool Kit. 9689-9692 - Aaron Keesing
, Yun Sing Koh
, Vithya Yogarajan
, Michael Witbrock
:
Emotion Recognition ToolKit (ERTK): Standardising Tools For Emotion Recognition Research. 9693-9696
Tutorial Summaries
- Xu Tan
:
Revisiting Learning Paradigms for Multimedia Data Generation. 9697-9699 - Debanjan Datta
, Gerald Friedland
:
Efficient Multimedia Computing: Unleashing the Power of AutoML. 9700-9701 - Xin Wang
, Hong Chen
, Wenwu Zhu
:
Disentangled Representation Learning for Multimedia. 9702-9704 - Cem Sazara
:
Diffusion Models in Generative AI. 9705-9706
Panel Summaries
- Irene Viola
, Maria Torres Vega
:
On the Impact of Interactive eXtended Reality: Challenges and Opportunities for Multimedia Research. 9707-9708 - Mohan S. Kankanhalli
, Marcel Worring
:
Panel: Multimodal Large Foundation Models. 9709
Workshop Summaries
- Hideo Saito
, Thomas B. Moeslund
, Rainer Lienhart
:
MMSports '23: 6th International Workshop on Multimedia Content Analysis in Sports. 9710-9712 - Zheng Lian
, Erik Cambria
, Guoying Zhao
, Björn W. Schuller
, Jianhua Tao
:
MRAC'23: 1st International Workshop on Multimodal and Responsible Affective Computing. 9713-9714 - Zhedong Zheng
, Yujiao Shi
, Tingyu Wang
, Jun Liu
, Jianwu Fang
, Yunchao Wei
, Tat-Seng Chua
:
UAVM '23: 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective. 9715-9717 - Valérie Gouet-Brunet
, Ronak Kosti
, Li Weng
:
SUMAC '23: 5th Workshop on the analySis, Understanding and proMotion of heritAge Contents: Advances in Machine Learning, Signal Processing, Multimodal Techniques and Human-machine Interaction. 9718-9720 - Cheng Jin
, Liang He
, Mingli Song
, Rui Wang
:
McGE '23: 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice. 9721-9722 - Shahin Amiriparian
, Lukas Christ
, Andreas König
, Alan Cowen
, Eva-Maria Meßner
, Erik Cambria
, Björn W. Schuller
:
MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of Affects. 9723-9725 - Stavroula G. Mougiakakou
, Keiji Yanai
, Dario Allegra
:
MADiMa '23: 8th International Workshop on Multimedia Assisted Dietary Management. 9726-9727 - Irene Viola
, Hadi Amirpour
, Stephanie Arévalo Arboleda
, Maria Torres Vega
:
IXR '23: 2nd International Workshop on Interactive eXtended Reality. 9728-9730 - Mohan S. Kankanhalli
, Ioannis (Yiannis) Patras
, Jianquan Liu
, Yongkang Wong
, Takahiro Komamizu
, Satoshi Yamazaki
, Karen Stephen
, Kajal Kansal
:
NarSUM '23: The 2nd Workshop on User-Centric Narrative Summarization of Long Videos. 9731-9733 - Jingkuan Song
, Wu Liu
, Xinchen Liu
, Dingwen Zhang
, Chaowei Fang
, Hongyuan Zhu
, Wenbing Huang
, John Smith
, Xin Wang
:
HCMA '23: 4th International Workshop on Human-Centric Multimedia Analysis. 9734-9735 - Adrian K. Davison
, Jingting Li
, Moi Hoon Yap
, John See
, Wen-Huang Cheng
, Xiaobai Li
, Xiaopeng Hong
, Su-Jing Wang
:
FME '23: 3rd Facial Micro-Expression Workshop. 9736-9738 - Wei Ji
, Yinwei Wei
, Zhedong Zheng
, Hao Fei
, Tat-Seng Chua
:
Deep Multimodal Learning for Information Retrieval. 9739-9741 - Junxin Chen
, Wei Wang
, Gwanggil Jeon
:
AMC-SME '23: 2023 Workshop on Advanced Multimedia Computing for Smart Manufacturing and Engineering. 9742-9743 - Zheng Wang
, Cheng Long
, Shihao Xu
, Bingzheng Gan
, Wei Shi
, Zhao Cao
, Tat-Seng Chua
:
LGM3A '23: 1st Workshop on Large Generative Models Meet Multimodal Applications. 9744-9745

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.