


default search action
32nd MM 2024: Melbourne, VIC, Australia
- Jianfei Cai, Mohan S. Kankanhalli, Balakrishnan Prabhakaran, Susanne Boll, Ramanathan Subramanian, Liang Zheng, Vivek K. Singh, Pablo César, Lexing Xie, Dong Xu:
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024 - 1 November 2024. ACM 2024, ISBN 979-8-4007-0686-8
Keynote Talks
- Pascale Fung
:
From Assistants to Agents in the LLM Era. 1 - Benoit Huet
:
Revolutionizing Lung Cancer Diagnostics with eyonis TM LCS: Cutting-edge AI/ML Technology-based SaMD for Enhanced Patient Care. 2-3 - Judy Kay
:
Empowering People to Harness and Control their Multimodal Data in Scrutable User models. 4-5 - Jiebo Luo
:
Large Multimodal Models as Social Multimedia Analysis Engines. 6-7
Oral Session 1: Large Language Models & Applications 1
- Haicheng Liao
, Yongkang Li
, Chengyue Wang
, Yanchen Guan
, Kahou Tam
, Chunlin Tian
, Li Li
, Chengzhong Xu
, Zhenning Li
:
When, Where, and What? A Benchmark for Accident Anticipation and Localization with Large Language Models. 8-17 - Haonan Zheng
, Xinyang Deng
, Wen Jiang
, Wenrui Li
:
A Unified Understanding of Adversarial Vulnerability Regarding Unimodal Models and Vision-Language Pre-training Models. 18-27 - Xiang Fang
, Wanlong Fang
, Daizong Liu
, Xiaoye Qu
, Jianfeng Dong
, Pan Zhou
, Renfu Li, Zichuan Xu
, Lixing Chen
, Panpan Zheng
, Yu Cheng
:
Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval using Language. 28-37 - Huishan Ji
, Qingyi Si
, Zheng Lin
, Weiping Wang
:
Towards Flexible Evaluation for Generative Visual Question Answering. 38-47 - Jiaqi Zhu
, Shaofeng Cai
, Fang Deng
, Beng Chin Ooi
, Junran Wu
:
Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection. 48-57 - Yudong Li
, Xianxu Hou
, Dezhi Zheng
, Linlin Shen
, Zhe Zhao
:
FLIP-80M: 80 Million Visual-Linguistic Pairs for Facial Language-Image Pre-Training. 58-67
Oral Session 2: Large Language Models & Applications 2
- Esmée Henrieke Anne de Haas
, Lik-Hang Lee
, Yiming Huang
, Carlos Bermejo
, Pan Hui
, Zijun Lin
:
Towards Trustworthy MetaShopping: Studying Manipulative Audiovisual Designs in Virtual-Physical Commercial Platforms. 68-77 - Weiqi Li
, Shijie Zhao
, Bin Chen
, Xinhua Cheng
, Junlin Li
, Li Zhang
, Jian Zhang
:
ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images. 78-87 - Yunqiang Pei
, Kaiyue Zhang
, Hongrong Yang
, Yong Tao
, Qihang Tang
, Jialei Tang
, Guoqing Wang
, Zhitao Liu
, Ning Xie
, Peng Wang
, Yang Yang
, Hengtao Shen
:
Improving Interaction Comfort in Authoring Task in AR-HRI through Dynamic Dual-Layer Interaction Adjustment. 88-97 - Yang Lu
, Junxian Li
, Zhitong Cui
, Jiapeng Hu
, Yanna Lin
, Shijian Luo
:
Designing Spatial Visualization and Interactions of Immersive Sankey Diagram in Virtual Reality. 98-107 - Zhang Wan
, Sheng Tang
, Jiawei Wei
, Ruize Zhang
, Juan Cao
:
DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships. 108-116 - Kento Shigyo
, Yifan Cao
, Kentaro Takahira
, Mingming Fan
, Huamin Qu
:
VR-Mediated Cognitive Defusion: A Comparative Study for Managing Negative Thoughts. 117-126
Oral Session 3: Novel Multimedia Applications 1
- Yinxuan Gui
, Bin Zhu
, Jingjing Chen, Chong Wah Ngo
, Yu-Gang Jiang
:
Navigating Weight Prediction with Diet Diary. 127-136 - Feiyu Chen
, Cong Xu
, Qi Jia
, Yihua Wang
, Yuhan Liu
, Haotian Zhang
, Endong Wang
:
Egocentric Vehicle Dense Video Captioning. 137-146 - Jinyue Chen
, Lingyu Kong
, Haoran Wei
, Chenglong Liu
, Zheng Ge
, Liang Zhao
, Jianjian Sun
, Chunrui Han
, Xiangyu Zhang
:
OneChart: Purify the Chart Structural Extraction via One Auxiliary Token. 147-155 - Jiawei Lin
, Zhaoyun Jiang
, Jiaqi Guo
, Shizhao Sun
, Ting Liu
, Zijiang Yang
, Jian-Guang Lou
, Dongmei Zhang
:
IconDM: Text-Guided Icon Set Expansion Using Diffusion Models. 156-165 - Haipeng Zhou
, Hongqiu Wang
, Tian Ye
, Zhaohu Xing
, Jun Ma
, Ping Li
, Qiong Wang
, Lei Zhu
:
Timeline and Boundary Guided Diffusion Network for Video Shadow Detection. 166-175 - Yichang Qu
, Bing Li
, Jie Huang
, Feng Zhao
:
Training Pansharpening Networks at Full Resolution Using Degenerate Invariance. 176-185
Oral Session 4: Graph and Diffusion Models
- Jielong Lu
, Zhihao Wu
, Zhaoliang Chen
, Zhiling Cai
, Shiping Wang
:
Towards Multi-view Consistent Graph Diffusion. 186-195 - Liyuan Ma
, Xueji Fang
, Guo-Jun Qi
:
Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization. 196-204 - Weilun Feng
, Chuanguang Yang
, Zhulin An
, Libo Huang
, Boyu Diao
, Fei Wang
, Yongjun Xu
:
Relational Diffusion Distillation for Efficient Image Generation. 205-213 - Hongjie Wu
, Linchao He
, Mingqin Zhang
, Dongdong Chen
, Kunming Luo
, Mengting Luo
, Jizhe Zhou
, Hu Chen
, Jiancheng Lv
:
Diffusion Posterior Proximal Sampling for Image Restoration. 214-223 - Yiheng Huang
, Hui Yang
, Chuanchen Luo
, Yuxi Wang
, Shibiao Xu
, Zhaoxiang Zhang
, Man Zhang
, Junran Peng
:
StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework. 224-232 - Yichi Zhang
, Zhuo Chen
, Lingbing Guo
, Yajing Xu
, Wen Zhang
, Huajun Chen
:
Making Large Language Models Perform Better in Knowledge Graph Completion. 233-242
Oral Session 5: Multimodal Models and Applications
- Rishikesh Devanathan
, Apoorva Singh
, A. S. Poornash
, Sriparna Saha
:
Seeing Beyond Words: Multimodal Aspect-Level Complaint Detection in Ecommerce Videos. 243-252 - Hsiang-Hui Hung
, Huu-Phu Do
, Yung-Hui Li
, Ching-Chun Huang
:
TimeNeRF: Building Generalizable Neural Radiance Fields across Time from Few-Shot Input Views. 253-262 - Xiaoxuan Shen
, Fenghua Yu
, Yaqi Liu
, Ruxia Liang
, Qian Wan
, Kai Yang
, Jianwen Sun
:
Revisiting Knowledge Tracing: A Simple and Powerful Model. 263-272 - Peiming Li
, Ziyi Wang
, Mengyuan Liu
, Hong Liu
, Chen Chen
:
ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models. 273-281 - Bochao Liu
, Pengju Wang
, Weijia Guo
, Yong Li
, Liansheng Zhuang
, Weiping Wang
, Shiming Ge
:
Private Gradient Estimation is Useful for Generative Modeling. 282-290 - Ke Zhu
, Liang Zhao
, Zheng Ge
, Xiangyu Zhang
:
Self-Supervised Visual Preference Alignment. 291-300
Oral Session 6: Innovations in Medical Imaging and Physiological Measurement
- Yuxin Hong
, Xiao Zhang
, Xin Zhang
, Joey Tianyi Zhou
:
Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification. 301-310 - Ruiqi Wang
, Jinyang Huang
, Jie Zhang
, Xin Liu
, Xiang Zhang
, Zhi Liu
, Peng Zhao
, Sigui Chen
, Xiao Sun
:
FacialPulse: An Efficient RNN-based Depression Detection via Temporal Facial Landmarks. 311-320 - Wei Zhang
, En Zhu
, Juan Chen
, YunPeng Li
:
MDDR: Multi-modal Dual-Attention aggregation for Depression Recognition. 321-329 - Wei Qian
, Kun Li
, Dan Guo
, Bin Hu
, Meng Wang
:
Cluster-Phys: Facial Clues Clustering Towards Efficient Remote Physiological Measurement. 330-339 - Zhenxi Song
, Ruihan Qin
, Huixia Ren
, Zhen Liang
, Yi Guo
, Min Zhang
, Zhiguo Zhang
:
EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations. 340-349 - Xueyuan Xu
, Li Zhuo
, Jinxin Lu
, Xia Wu
:
WSEL: EEG Feature Selection with Weighted Self-expression Learning for Incomplete Multi-dimensional Emotion Recognition. 350-359
Oral Session 7: Imaging, Computer Vision & Graphics
- Yuanbo Wen
, Tao Gao
, Ting Chen
:
Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion Model. 360-369 - Zeyu Li
, Ruitong Gan
, Chuanchen Luo
, Yuxi Wang
, Jiaheng Liu
, Ziwei Zhu
, Qing Li
, Xucheng Yin
, Man Zhang
, Zhaoxiang Zhang
, Junran Peng
:
MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets. 370-379 - Xiao Han
, Yiming Ren
, Peishan Cong
, Yujing Sun
, Jingya Wang
, Lan Xu
, Yuexin Ma
:
Gait Recognition in Large-scale Free Environment via Single LiDAR. 380-389 - Tang Tao
, Longfei Gao
, Guangrun Wang
, Yixing Lao
, Peng Chen
, Hengshuang Zhao
, Dayang Hao
, Xiaodan Liang
, Mathieu Salzmann
, Kaicheng Yu
:
LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields. 390-398 - Mu Chen
, Zhedong Zheng
, Yi Yang:
Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation. 399-408 - Yujian Mo
, Yan Wu
, Junqiao Zhao
, Zhenjie Hou
, Weiquan Huang
, Yinghao Hu
, Jijun Wang
, Jun Yan
:
Sparse Query Dense: Enhancing 3D Object Detection with Pseudo Points. 409-418
Oral Session 8: Multimodal Reasoning & Inference
- Changmeng Zheng
, Dayong Liang
, Wengyu Zhang
, Xiaoyong Wei
, Tat-Seng Chua
, Qing Li
:
A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning. 419-428 - Qian Guo
, Xinyan Liang
, Yuhua Qian
, Zhihua Cui
, Jie Wen
:
A Progressive Skip Reasoning Fusion Method for Multi-Modal Classification. 429-437 - Wenxin Xu
, Hexin Jiang
, Xuefeng Liang
:
Leveraging Knowledge of Modality Experts for Incomplete Multimodal Learning. 438-446 - Bo Xu
, Junzhe Zheng
, Jiayuan He
, Yuxuan Sun
, Hongfei Lin
, Liang Zhao
, Feng Xia
:
Generating Multimodal Metaphorical Features for Meme Understanding. 447-455 - Junjie Shi
, Caozhi Shang
, Zhaobin Sun
, Li Yu
, Xin Yang
, Zengqiang Yan
:
PASSION: Towards Effective Incomplete Multi-Modal Medical Image Segmentation with Imbalanced Missing Rates. 456-465 - Mengze Li
, Kairong Han
, Jiahe Xu
, Yueying Li
, Tao Wu
, Zhou Zhao
, Jiaxu Miao
, Shengyu Zhang
, Jingyuan Chen
:
Cross-modal Observation Hypothesis Inference. 466-475
Oral Session 9: Image, Video, and Multimedia Processing
- Jiyang Li
, Lechao Cheng
, Zhangye Wang
, Tingting Mu
, Jingxuan He
:
LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field. 476-485 - Chaofeng Chen
, Sensen Yang
, Haoning Wu
, Liang Liao
, Zicheng Zhang
, Annan Wang
, Wenxiu Sun
, Qiong Yan
, Weisi Lin
:
Q-Ground: Image Quality Grounding with Large Multi-modality Models. 486-495 - Cheng Ye
, Weidong Chen
, Jingyu Li
, Lei Zhang
, Zhendong Mao
:
Dual-path Collaborative Generation Network for Emotional Video Captioning. 496-505 - Hu Lin
, Chengjiang Long
, Yifeng Fei
, Qianchen Xia
, Erwei Yin
, Baocai Yin
, Xin Yang:
Exploring Matching Rates: From Keypoint Selection to Camera Relocalization. 506-514 - Zhihong Zhu
, Xuxin Cheng
, Zhaorun Chen
, Yuyan Chen
, Yunyan Zhang
, Xian Wu
, Yefeng Zheng
, Bowen Xing
:
InMu-Net: Advancing Multi-modal Intent Detection via Information Bottleneck and Multi-sensory Processing. 515-524 - Chaoya Jiang
, Hongrui Jia
, Mengfan Dong
, Wei Ye
, Haiyang Xu
, Ming Yan
, Ji Zhang
, Shikun Zhang
:
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models. 525-534
Oral Session 10: Speech and Audio in Multimedia Processing
- Zhongxu Wang
, Yujia Wang
, Mingzhu Li
, Hua Huang
:
ArtSpeech: Adaptive Text-to-Speech Synthesis with Articulatory Representations. 535-544 - Shuai Yu
, Xiaoliang He
, Ke Chen
, Yi Yu
:
HKDSME: Heterogeneous Knowledge Distillation for Semi-supervised Singing Melody Extraction Using Harmonic Supervision. 545-553 - Yixuan Zhou
, Xiaoyu Qin
, Zeyu Jin
, Shuoyi Zhou
, Shun Lei
, Songtao Zhou
, Zhiyong Wu
, Jia Jia
:
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling. 554-563 - Navonil Majumder
, Chia-Yu Hung
, Deepanway Ghosal
, Wei-Ning Hsu
, Rada Mihalcea
, Soujanya Poria
:
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization. 564-572 - Xihua Wang
, Yuyue Wang
, Yihan Wu
, Ruihua Song
, Xu Tan
, Zehua Chen
, Hongteng Xu
, Guodong Sui
:
TiVA: Time-Aligned Video-to-Audio Generation. 573-582 - Alejandro Galán-Cuenca
, Jose J. Valero-Mas
, Juan C. Martinez-Sevilla
, Antonio Hidalgo-Centeno
, Antonio Pertusa
, Jorge Calvo-Zaragoza
:
MUSCAT: A Multimodal mUSic Collection for Automatic Transcription of Real Recordings and Image Scores. 583-591
Oral Session 11: Emotion & Sentiment
- Jianing Zhao
, Jingjing Wang
, Yujie Jin
, Jiamin Luo
, Guodong Zhou
:
Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model. 592-601 - Daiqing Wu
, Dongbao Yang
, Yu Zhou
, Can Ma
:
Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs. 602-611 - Tan Yu
, Jingjing Wang
, Jiawen Wang
, Jiamin Luo
, Guodong Zhou
:
Towards Emotion-enriched Text-to-Motion Generation via LLM-guided Limb-level Emotion Manipulating. 612-621 - Wenjie Zheng
, Jianfei Yu
, Rui Xia
:
A Unimodal Valence-Arousal Driven Contrastive Learning Framework for Multimodal Multi-Label Emotion Recognition. 622-631 - Xinji Mai
, Junxiong Lin
, Haoran Wang
, Zeng Tao
, Yan Wang
, Shaoqi Yan
, Xuan Tong
, Jiawen Yu
, Boyang Wang
, Ziheng Zhou
, Qing Zhao
, Shuyong Gao
, Wenqiang Zhang
:
All rivers run into the sea: Unified Modality Brain-Inspired Emotional Central Mechanism. 632-641 - Xin Li
, Shangfei Wang
, Xuandong Huang
:
Temporal Enhancement for Video Affective Content Analysis. 642-650
Poster Session 1
- Pei He
, Licheng Jiao
, Lingling Li
, Xu Liu
, Fang Liu
, Wenping Ma
, Shuyuan Yang
, Ronghua Shang
:
Domain Generalization-Aware Uncertainty Introspective Learning for 3D Point Clouds Segmentation. 651-660 - Yi Ma
, Peiqi Duan
, Yuchen Hong
, Chu Zhou
, Yu Zhang
, Jimmy S. J. Ren
, Boxin Shi
:
Color4E: Event Demosaicing for Full-color Event Guided Image Deblurring. 661-670 - Jiajie Zhu
, Xia Du
, Jizhe Zhou
, Chi-Man Pun
, Qizhen Xu
, Xiaoyuan Liu
:
DP-RAE: A Dual-Phase Merging Reversible Adversarial Example for Image Privacy Protection. 671-680 - Xinyi Zhang
, Qinpeng Cui
, Qiqi Bao
, Wenming Yang
, Qingmin Liao
:
Geometry-Guided Diffusion Model with Masked Transformer for Robust Multi-View 3D Human Pose Estimation. 681-690 - Meiqi Cao
, Rui Yan
, Xiangbo Shu
, Guangzhao Dai
, Yazhou Yao
, Guo-Sen Xie
:
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition. 691-700 - Junsheng Wang
, Tiantian Gong
, Yan Yan
:
Partially Aligned Cross-modal Retrieval via Optimal Transport-based Prototype Alignment Learning. 701-709 - Hu Gao, Bowen Ma, Ying Zhang, Jingfan Yang, Jing Yang, Depeng Dang:
Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring. 710-718 - Hangjun Che
, Xinyu Pu
, Deqiang Ouyang
, Beibei Li
:
Enhanced Tensorial Self-representation Subspace Learning for Incomplete Multi-view Clustering. 719-728 - Jian-Jun Qiao
, Meng-Yu Duan
, Xiao Wu
, Yu-Pei Song
:
CartoonNet: Cartoon Parsing with Semantic Consistency and Structure Correlation. 729-737 - Qianyu Guo
, Jieji Ren
, Haofen Wang
, Tianxing Wu
, Weifeng Ge
, Wenqiang Zhang
:
Visual-Language Collaborative Representation Network for Broad-Domain Few-Shot Image Classification. 738-747 - Wenzhuo Xu
, Kai Chen
, Ziyi Gao
, Zhipeng Wei
, Jingjing Chen, Yu-Gang Jiang
:
Highly Transferable Diffusion-based Unrestricted Adversarial Attack on Pre-trained Vision-Language Models. 748-757 - Hongzhi Wang
, Xiubo Liang
, Tao Zhang
, Yue Gu
, Weidong Geng
:
PSSD-Transformer: Powerful Sparse Spike-Driven Transformer for Image Semantic Segmentation. 758-767 - Zengsheng Kuang
, Changxing Ding
, Huan Yao
:
Learning Context with Priors for 3D Interacting Hand-Object Pose Estimation. 768-777 - Yang Chen
, Jingcai Guo
, Tian He
, Xiaocheng Lu
, Ling Wang
:
Fine-Grained Side Information Guided Dual-Prompts for Zero-Shot Skeleton Action Recognition. 778-786 - Shuo Zhang
, Yupeng Zhai
, Jilin Mei
, Yu Hu
:
FusionOcc: Multi-Modal Fusion for 3D Occupancy Prediction. 787-796 - Shaokun Wang
, Yifan Yu
, Yuhang He
, Yihong Gong
:
Enhancing Pre-trained ViTs for Downstream Task Adaptation: A Locality-Aware Prompt Learning Method. 797-806 - Fangming Cui
, Xun Yang
, Chao Wu
, Liang Xiao
, Xinmei Tian
:
Advancing Prompt Learning through an External Layer. 807-816 - Hanzi Wang
, Jiamin Ren
, Yifeng Ding
, Lei Ren
, Huixing Jiang
, Wei Chen
, Fangxiang Feng
, Xiaojie Wang
:
Q-MoE: Connector for MLLMs with Text-Driven Routing. 817-825 - Guozhen Peng
, Yunhong Wang
, Yuwei Zhao
, Shaoxiong Zhang
, Annan Li
:
GLGait: A Global-Local Temporal Receptive Field Network for Gait Recognition in the Wild. 826-835 - Qiang Wang
, Yuning Cui
, Yawen Li
, Yaping Ruan
, Ben Zhu
, Wenqi Ren
:
RFFNet: Towards Robust and Flexible Fusion for Low-Light Image Denoising. 836-845 - Minghe Gao
, Shuang Chen
, Liang Pang
, Yuan Yao
, Jisheng Dang
, Wenqiao Zhang
, Juncheng Li
, Siliang Tang
, Yueting Zhuang
, Tat-Seng Chua
:
Fact : Teaching MLLMs with Faithful, Concise and Transferable Rationales. 846-855 - Yue Zhang
, Parisa Kordjamshidi
:
Narrowing the Gap between Vision and Action in Navigation. 856-865 - Zequn Zeng
, Jianqiao Sun
, Hao Zhang
, Tiansheng Wen
, Yudi Su
, Yan Xie
, Zhengjue Wang
, Bo Chen
:
HICEScore: A Hierarchical Metric for Image Captioning Evaluation. 866-875 - Chen Feng
, Georgios Tzimiropoulos
, Ioannis Patras
:
CLIPCleaner: Cleaning Noisy Labels with CLIP. 876-885 - Haochen Zhao
, Hui Meng
, Deqian Yang
, Xiaozheng Xie
, Xiaoze Wu
, Qingfeng Li
, Jianwei Niu
:
GuidedNet: Semi-Supervised Multi-Organ Segmentation via Labeled Data Guide Unlabeled Data. 886-895 - Kin-Chung Chan
, Jun Xiao
, Hana Lebeta Goshu
, Kin-Man Lam
:
Point Cloud Densification for 3D Gaussian Splatting from Sparse Input Views. 896-904 - Xiaorui Huang
, Gen Luo
, Chaoyang Zhu
, Bo Tong
, Yiyi Zhou
, Xiaoshuai Sun
, Rongrong Ji
:
Deep Instruction Tuning for Segment Anything Model. 905-914 - Ziyi Wang
, Yiming Rong
, Deyang Jiang
, Haoran Wu
, Shiyu Zhou
, Bo Xu
:
CIEASR: Contextual Image-Enhanced Automatic Speech Recognition for Improved Homophone Discrimination. 915-924 - Jinxu Zhang
, Yongqi Yu
, Yu Zhang
:
CREAM: Coarse-to-Fine Retrieval and Multi-modal Efficient Tuning for Document VQA. 925-934 - Hebaixu Wang
, Hao Zhang
, Xunpeng Yi
, Xinyu Xiang
, Leyuan Fang
, Jiayi Ma
:
TeRF: Text-driven and Region-aware Flexible Visible and Infrared Image Fusion. 935-944 - Ruonan Zhang
, Ziwei Shang
, Fengjuan Wang
, Zhaoqilin Yang
, Shan Cao
, Yigang Cen
, Gaoyun An
:
Synergetic Prototype Learning Network for Unbiased Scene Graph Generation. 945-954 - Jiawei Zhu
, Yishu Liu
, Huanjia Zhu
, Hui Lin
, Yuncheng Jiang
, Zheng Zhang
, Bingzhi Chen
:
Combating Visual Question Answering Hallucinations via Robust Multi-Space Co-Debias Learning. 955-964 - Qian Cao
, Xu Chen
, Ruihua Song
, Xiting Wang
, Xinting Huang
, Yuchen Ren
:
See or Guess: Counterfactually Regularized Image Captioning. 965-974 - Shuai Li
, Fan Qi
, Zixin Zhang
, Changsheng Xu
:
Cross-Modal Meta Consensus for Heterogeneous Federated Learning. 975-984 - Xiang He
, Xiangxi Liu
, Yang Li
, Dongcheng Zhao
, Guobin Shen
, Qingqun Kong
, Xin Yang
, Yi Zeng
:
CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization. 985-993 - Jiabao Guo
, Huan Liu
, Yizhi Luo
, Xueli Hu
, Hang Zou
, Yuan Zhang
, Hui Liu
, Bo Zhao
:
Style-conditional Prompt Token Learning for Generalizable Face Anti-spoofing. 994-1003 - Bowen Chen
, Yun Sing Koh
, Gillian Dobbie
:
SSAT-Adapter: Enhancing Vision-Language Model Few-shot Learning with Auxiliary Tasks. 1004-1013 - Haoyu Tong
, Xiaoyu Zhang
, Yulin Jin
, Jian Lou
, Kai Wu
, Xiaofeng Chen
:
Balancing Generalization and Robustness in Adversarial Training via Steering through Clean and Adversarial Gradient Directions. 1014-1023 - Shuo Zheng
, Yuanjie Dang
, Peng Chen
, Ruohong Huan
, Dongdong Zhao
, Ronghua Liang
:
Saliency-Guided Fine-Grained Temporal Mask Learning for Few-Shot Action Recognition. 1024-1033 - Mengyin Liu
, Chao Zhu
, Shiqi Ren
, Xu-Cheng Yin
:
Unsupervised Multi-view Pedestrian Detection. 1034-1042 - Zhilin Huang
, Yijie Yu
, Ling Yang
, Chujun Qin
, Bing Zheng
, Xiawu Zheng
, Zikun Zhou
, Yaowei Wang
, Wenming Yang
:
Motion-aware Latent Diffusion Models for Video Frame Interpolation. 1043-1052 - Zongxin Ye
, Wenyu Li
, Sidun Liu
, Peng Qiao
, Yong Dou
:
AbsGS: Recovering Fine Details in 3D Gaussian Splatting. 1053-1061 - Ziming Wang
, Boxiang Zhang, Ming Ma, Yue Wang, Taoli Du, Wenhui Li:
Multi-fineness Boundaries and the Shifted Ensemble-aware Encoding for Point Cloud Semantic Segmentation. 1062-1071 - Yubo Wang
, Chaohu Liu
, Yanqiu Qu
, Haoyu Cao
, Deqiang Jiang
, Linli Xu
:
Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models. 1072-1081 - Wenhao Li
, Qiangchang Wang
, Peng Zhao
, Yilong Yin
:
KNN Transformer with Pyramid Prompts for Few-Shot Learning. 1082-1091 - Lu Zhang
, Ke Yan
, Shouhong Ding
:
AlignCLIP: Align Multi Domains of Texts Input for CLIP models with Object-IoU Loss. 1092-1100 - Pengfei Yue
, Jianghang Lin
, Shengchuan Zhang
, Jie Hu
, Yilin Lu
, Hongwei Niu
, Haixin Ding
, Yan Zhang
, Guannan Jiang
, Liujuan Cao
, Rongrong Ji
:
Adaptive Selection based Referring Image Segmentation. 1101-1110 - Shanshan Wang
, ALuSi
, Xun Yang
, Ke Xu
, Huibin Tan
, Xingyi Zhang
:
Dual-stream Feature Augmentation for Domain Generalization. 1111-1119 - Yang Liu
, Xiang Huang
, Minghan Qin
, Qinwei Lin
, Haoqian Wang
:
Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars. 1120-1129 - Wei Feng
, Dongyuan Wei
, Qianqian Wang
, Bo Dong
, Quanxue Gao
:
Multi-View Clustering Based on Deep Non-negative Tensor Factorization. 1130-1138 - Aoqi Li
, Saihui Hou
, Chenye Wang
, Qingyuan Cai
, Yongzhen Huang
:
AerialGait: Bridging Aerial and Ground Views for Gait Recognition. 1139-1147 - Zefan Zhang
, Weiqi Zhang
, Yanhui Li
, Tian Bai
:
Caption-Aware Multimodal Relation Extraction with Mutual Information Maximization. 1148-1157 - Xiaochen Li
, Jian Cheng
, Ziying Xia
, Zichong Chen
, Junhao Shi
, Zhicheng Dong
, Nyima Tashi
:
TS-ILM: Class Incremental Learning for Online Action Detection. 1158-1167 - Yuxiang Cai
, Yongheng Shang
, Jianwei Yin
:
MultiDAN: Unsupervised, Multistage, Multisource and Multitarget Domain Adaptation for Semantic Segmentation of Remote Sensing Images. 1168-1177 - Yu Tong
, Weihai Lu
, Zhe Zhao
, Song Lai
, Tong Shi
:
MMDFND: Multi-modal Multi-Domain Fake News Detection. 1178-1186 - Minghang Zheng
, Jiahua Zhang
, Qingchao Chen
, Yuxin Peng
, Yang Liu
:
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding. 1187-1196 - Shilong Jia
, Tingting Wu
, Yingying Fang
, Tieyong Zeng
, Guixu Zhang
, Zhi Li
:
Purified Distillation: Bridging Domain Shift and Category Gap in Incremental Object Detection. 1197-1205 - Haonan Zhang
, Pengpeng Zeng
, Lianli Gao
, Jingkuan Song
, Heng Tao Shen
:
MPT: Multi-grained Prompt Tuning for Text-Video Retrieval. 1206-1214 - Ziwei Zheng
, Zechuan Zhang
, Yulin Wang
, Shiji Song
, Gao Huang
, Le Yang
:
Rethinking the Architecture Design for Efficient Generic Event Boundary Detection. 1215-1224 - Jinglun Li
, Xinyu Zhou
, Kaixun Jiang
, Lingyi Hong
, Pinxue Guo
, Zhaoyu Chen
, Weifeng Ge
, Wenqiang Zhang
:
TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning. 1225-1234 - Zihan Cao
, Xiao Wu
, Liang-Jian Deng
, Yu Zhong
:
A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion. 1235-1244 - Zhenyu Yang
, Shengsheng Qian
, Dizhan Xue
, Jiahong Wu
, Fan Yang
, Weiming Dong
, Changsheng Xu
:
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval. 1245-1254 - Zeyu Jin
, Jia Jia
, Qixin Wang
, Kehan Li
, Shuoyi Zhou
, Songtao Zhou
, Xiaoyu Qin
, Zhiyong Wu
:
SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description. 1255-1264 - Lihao Liu
, Yanqi Cheng
, Zhongying Deng
, Shujun Wang
, Dongdong Chen
, Xiaowei Hu
, Pietro Liò
, Carola-Bibiane Schönlieb
, Angelica E. Avilés-Rivero
:
TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex Traffic Scenarios. 1265-1273 - Jing Yang, Xiaowen Jiang, Yuan Gao, Laurence T. Yang, Jieming Yang:
Generalize to Fully Unseen Graphs: Learn Transferable Hyper-Relation Structures for Inductive Link Prediction. 1274-1282 - Panjun Liu
, Jiacheng Li
, Lizhi Wang
, Zheng-Jun Zha
, Zhiwei Xiong
:
MLP Embedded Inverse Tone Mapping. 1283-1291 - Mingkai Lin
, Wenzhong Li
, Xiaobin Hong
, Sanglu Lu
:
Scalable Multi-Source Pre-training for Graph Neural Networks. 1292-1301 - Xiaole Zhao
, Linze Li
, Chengxing Xie
, Xiaoming Zhang
, Ting Jiang
, Wenjie Lin
, Shuaicheng Liu
, Tianrui Li
:
Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation. 1302-1310 - Minsu Kim
, Jeong Hun Yeo
, Se Jin Park
, Hyeongseop Rha
, Yong Man Ro
:
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation. 1311-1320 - Shoutong Luo
, Zhengxing Sun
, Yi Wang
, Yunhan Sun
, Chendi Zhu
:
LDCNet: Long-Distance Context Modeling for Large-Scale 3D Point Cloud Scene Semantic Segmentation. 1321-1330 - Yiming Cui
, Liang Li
, Jiehua Zhang
, Chenggang Yan
, Hongkui Wang
, Shuai Wang
, Heng Jin
, Li Wu
:
Stochastic Context Consistency Reasoning for Domain Adaptive Object Detection. 1331-1340 - Zhuoling Li
, Yong Wang
, Kaitong Li
:
FewVS: A Vision-Semantics Integration Framework for Few-Shot Image Classification. 1341-1350 - Yuyan Bu
, Qiang Sheng
, Juan Cao
, Peng Qi
, Danding Wang
, Jintao Li
:
FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process. 1351-1360 - Subash Khanal
, Eric Xing
, Srikumar Sastry
, Aayush Dhakal
, Zhexiao Xiong
, Adeel Ahmad
, Nathan Jacobs
:
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping. 1361-1369 - Zizhao Wu
, Haohan Li
, Gongyi Chen
, Zhou Yu
, Xiaoling Gu
, Yigang Wang
:
3D Question Answering with Scene Graph Reasoning. 1370-1378 - Liang He
, Hongke Wang
, Zhen Wu
, Jianbing Zhang
, Xinyu Dai
, Jiajun Chen
:
Focus & Gating: A Multimodal Approach for Unveiling Relations in Noisy Social Media. 1379-1388 - Yuanchen Wu
, Xiaoqiang Li
, Jide Li
, Kequan Yang
, Pinpin Zhu
, Shaohua Zhang
:
DINO is Also a Semantic Guider: Exploiting Class-aware Affinity for Weakly Supervised Semantic Segmentation. 1389-1397 - Dongshuo Yin
, Xueting Han
, Bin Li
, Hao Feng
, Jing Bai
:
Parameter-efficient is not Sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions. 1398-1406 - Rongwen Li
, Haiyang Hu
, Liang Du
, Jiarong Chen
, Bingbing Jiang
, Peng Zhou
:
One-Stage Fair Multi-View Spectral Clustering. 1407-1416 - Jingfan Tan
, Hyunhee Park
, Ying Zhang
, Tao Wang
, Kaihao Zhang
, Xiangyu Kong
, Pengwen Dai
, Zikun Liu
, Wenhan Luo
:
Blind Face Video Restoration with Temporal Consistent Generative Prior and Degradation-Aware Prompt. 1417-1426 - Yinghui Sun
, Xingfeng Li
, Quansen Sun
, Min-Ling Zhang
, Zhenwen Ren
:
Improved Weighted Tensor Schatten p-Norm for Fast Multi-view Graph Clustering. 1427-1436 - Xinjie Jiang
, Chenxi Zheng
, Xuemiao Xu
, Bangzhen Liu
, Weiying Zheng
, Huaidong Zhang
, Shengfeng He
:
VrdONE: One-stage Video Visual Relation Detection. 1437-1446 - Chenxi Ma
, Weimin Tan
, Shili Zhou
, Bo Yan
:
Learning Cross-Spectral Prior for Image Super-Resolution. 1447-1455 - Dayu Hu
, Suyuan Liu
, Jun Wang
, Junpu Zhang
, Siwei Wang
, Xingchen Hu
, Xinzhong Zhu
, Chang Tang
, Xinwang Liu
:
Reliable Attribute-missing Multi-view Clustering with Instance-level and feature-level Cooperative Imputation. 1456-1466 - Duc Dang Trung Tran
, Byeongkeun Kang
, Yeejin Lee
:
MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation. 1467-1475 - Jingjing Hu
, Dan Guo
, Kun Li
, Zhan Si
, Xun Yang
, Meng Wang
:
Maskable Retentive Network for Video Moment Retrieval. 1476-1485 - Junming Hou
, Zihan Cao
, Naishan Zheng
, Xuan Li
, Xiaoyu Chen
, Xinyang Liu
, Xiaofeng Cong
, Danfeng Hong
, Man Zhou
:
Linearly-evolved Transformer for Pan-sharpening. 1486-1494 - Zhenhao Yang
, Xin Liu
, Deqiang Ouyang
, Guiduo Duan
, Dongyang Zhang
, Tao He
, Yuan-Fang Li
:
Towards Open-vocabulary HOI Detection with Calibrated Vision-language Models and Locality-aware Queries. 1495-1504 - Kang Zeng
, Hao Shi
, Jiacheng Lin
, Siyu Li
, Jintao Cheng
, Kaiwei Wang
, Zhiyong Li
, Kailun Yang
:
MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model. 1505-1513 - Tao Tang
, Hong Liu
, Yingxuan You
, Ti Wang
, Wenhao Li
:
ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos. 1514-1523 - Xudong Lu
, Yuqi Jiang
, Haiwen Hong
, Qi Sun
, Cheng Zhuo
:
DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion. 1524-1533 - Wenbin Zou
, Hongxia Gao
, Weipeng Yang
, Tongtong Liu
:
Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement. 1534-1543 - Junwei He, Qianqian Xu
, Yangbangyan Jiang
, Zitai Wang
, Yuchen Sun
, Qingming Huang
:
HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection. 1544-1553 - Ke Liang
, Lingyuan Meng
, Yue Liu
, Meng Liu
, Wei Wei
, Suyuan Liu
, Wenxuan Tu
, Siwei Wang
, Sihang Zhou
, Xinwang Liu
:
Simple Yet Effective: Structure Guided Pre-trained Transformer for Multi-modal Knowledge Graph Reasoning. 1554-1563 - Yuning Ding
, Sifan Zhang
, Shenglan Liu
, Jinrong Zhang
, Wenyue Chen
, Haifei Duan
, Bingcheng Dong
, Tao Sun
:
2M-AF: A Strong Multi-Modality Framework For Human Action Quality Assessment with Self-supervised Representation Learning. 1564-1572 - Liqiu Chen
, Yuqing Huang
, Hengyu Li
, Zikun Zhou
, Zhenyu He
:
Simplifying Cross-modal Interaction via Modality-Shared Features for RGBT Tracking. 1573-1582 - Can Cui
, Siteng Huang
, Wenxuan Song
, Pengxiang Ding
, Min Zhang
, Donglin Wang
:
ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification. 1583-1592 - Tianqi Wei
, Zhi Chen
, Zi Huang
, Xin Yu
:
Benchmarking In-the-Wild Multimodal Disease Recognition and A Versatile Baseline. 1593-1601 - Jiaming Lei
, Lin Li
, Chunping Wang
, Jun Xiao
, Long Chen
:
Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer. 1602-1611 - Jinyong Wen
:
Gaussian Mutual Information Maximization for Efficient Graph Self-Supervised Learning: Bridging Contrastive-based to Decorrelation-based. 1612-1621 - Haowei Kuang
, Yiyang Ma
, Wenhan Yang
, Zongming Guo
, Jiaying Liu
:
Consistency Guided Diffusion Model with Neural Syntax for Perceptual Image Compression. 1622-1631 - Zhangchi Feng
, Richong Zhang
, Zhijie Nie
:
Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives. 1632-1641 - Guanchen Ding
, Lingbo Liu
, Zhenzhong Chen
, Changwen Chen
:
Domain-Agnostic Crowd Counting via Uncertainty-Guided Style Diversity Augmentation. 1642-1651 - Cunhang Fan
, Jingjing Zhang
, Hongyu Zhang
, Wang Xiang
, Jianhua Tao
, Xinhui Li
, Jiangyan Yi
, Dianbo Sui
, Zhao Lv
:
MSFNet: Multi-Scale Fusion Network for Brain-Controlled Speaker Extraction. 1652-1661 - Zhong Ji
, Changxu Meng
, Yan Zhang
, Haoran Wang
, Yanwei Pang
, Jungong Han
:
Eliminate Before Align: A Remote Sensing Image-Text Retrieval Framework with Keyword Explicit Reasoning. 1662-1671 - Jinyan Zhang
, Mengyuan Liu
, Hong Liu
, Guoquan Wang
, Wenhao Li
:
APP: Adaptive Pose Pooling for 3D Human Pose Estimation from Videos. 1672-1681 - Jing Bi
, Yunlong Tang
, Luchuan Song
, Ali Vosoughi
, Nguyen Nguyen
, Chenliang Xu
:
EAGLE: Egocentric AGgregated Language-video Engine. 1682-1691 - Kai Yin
, Jie Shen
:
Expanded Convolutional Neural Network Based Look-Up Tables for High Efficient Single-Image Super-Resolution. 1692-1700 - Zheng Han
, Xiaobin Zhu
, Chun Yang
, Hongyang Zhou
, Jingyan Qin
, Xu-Cheng Yin
:
Exploring Stable Meta-Optimization Patterns via Differentiable Reinforcement Learning for Few-Shot Classification. 1701-1710 - Yixin Guo
, Yu Liu
, Jianghao Li
, Weimin Wang
, Qi Jia
:
Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection. 1711-1720 - Jiangbin Zheng
, Han Zhang
, Qianqing Xu
, An-Ping Zeng
, Stan Z. Li
:
MetaEnzyme: Meta Pan-Enzyme Learning for Task-Adaptive Redesign. 1721-1730 - Yiming Zhong
, Xiaolin Zhang
, Yao Zhao
, Yunchao Wei
:
DreamLCM: Towards High Quality Text-to-3D Generation via Latent Consistency Model. 1731-1740 - Anna Zhu
, Ke Xiao
, Bo Zhou
, Runmin Wang
:
Trust Prophet or Not? Taking a Further Verification Step toward Accurate Scene Text Recognition. 1741-1750 - Gongli Xi
, Ye Tian
, Mengyu Yang
, Lanshan Zhang
, Xirong Que
, Wendong Wang
:
Global Patch-wise Attention is Masterful Facilitator for Masked Image Modeling. 1751-1760 - Chenghao Deng, Haote Xu, Xiaolu Chen, Haodi Xu, Xiaotong Tu, Xinghao Ding, Yue Huang:
SimCLIP: Refining Image-Text Alignment with Simple Prompts for Zero-/Few-shot Anomaly Detection. 1761-1770 - Yuanhe Tian
, Fei Xia
, Yan Song
:
Diffusion Networks with Task-Specific Noise Control for Radiology Report Generation. 1771-1780 - Yun Xing
, Qing Guo
, Xiaofeng Cao
, Ivor W. Tsang
, Lei Ma
:
MetaRepair: Learning to Repair Deep Neural Networks from Repairing Experiences. 1781-1790 - Xingtao Wang
, Xianqi Zhang
, Wenxue Cui
, Ruiqin Xiong
, Xiaopeng Fan
, Debin Zhao
:
Mesh Denoising Using Filtering Coefficients Jointly Aware of Noise and Geometry. 1791-1799 - Yan Zhuang, Yanru Zhang, Zheng Hu, Xiaoyue Zhang, Jiawen Deng, Fuji Ren:
GLoMo: Global-Local Modal Fusion for Multimodal Sentiment Analysis. 1800-1809 - Yuhui Wu
, Guoqing Wang
, Zhiwen Wang
, Yang Yang
, Tianyu Li
, Malu Zhang
, Chongyi Li
, Heng Tao Shen
:
JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement. 1810-1818 - Zichen Wen
, Tianyi Wu
, Yazhou Ren
, Yawen Ling
, Chenhang Cui
, Xiaorong Pu
, Lifang He
:
Dual-Optimized Adaptive Graph Reconstruction for Multi-View Graph Clustering. 1819-1828 - Xiaobin Lu, Xiaobin Hu, Jun Luo, Ben Zhu, Yaping Ruan, Wenqi Ren:
3D Priors-Guided Diffusion for Blind Face Restoration. 1829-1838 - Hao Wu
, Likun Zhang
, Shucheng Li
, Fengyuan Xu
, Sheng Zhong
:
CoAst: Validation-Free Contribution Assessment for Federated Learning based on Cross-Round Valuation. 1839-1847 - Kang Xia
, Wenzhong Li
, Yimiao Shao
, Sanglu Lu
:
Vi2ACT: Video-enhanced Cross-modal Co-learning with Representation Conditional Discriminator for Few-shot Human Activity Recognition. 1848-1856 - Seonggwan Ko
, Yeong Jun Koh
, Donghyeon Cho
:
Reference-based Burst Super-resolution. 1857-1865 - Yi Zhang, Zhefeng Wang, Rui Hu, Xinyu Duan, Yi Zheng, Baoxing Huai, Jiarun Han, Jitao Sang:
Poisoning for Debiasing: Fair Recognition via Eliminating Bias Uncovered in Data Poisoning. 1866-1874 - Dizhan Xue
, Shengsheng Qian
, Changsheng Xu
:
Few-Shot Multimodal Explanation for Visual Question Answering. 1875-1884 - Jingtao Wang
, Zechao Li
:
3DPCP-Net: A Lightweight Progressive 3D Correspondence Pruning Network for Accurate and Efficient Point Cloud Registration. 1885-1894 - Jiawei Ge
, Jiuxin Cao
, Xuelin Zhu
, Xinyu Zhang
, Chang Liu
, Kun Wang
, Bo Liu
:
Consistencies are All You Need for Semi-supervised Vision-Language Tracking. 1895-1904 - Zhen Zou
, Hu Yu
, Jie Huang
, Feng Zhao
:
FreqMamba: Viewing Mamba from a Frequency Perspective for Image Deraining. 1905-1914 - Zhida Zhao
, Jia Li
, Lijun Wang
, Yifan Wang
, Huchuan Lu
:
MaskMentor: Unlocking the Potential of Masked Self-Teaching for Missing Modality RGB-D Semantic Segmentation. 1915-1923 - Linli Yao
, Yuanmeng Zhang
, Ziheng Wang
, Xinglin Hou
, Tiezheng Ge
, Yuning Jiang
, Xu Sun
, Qin Jin
:
Edit As You Wish: Video Caption Editing with Multi-grained User Control. 1924-1933 - Wenlin Li
, Yucheng Xu
, Xiaoqing Zheng
, Suoya Han
, Jun Wang
, Xiaobo Sun
:
Dual Advancement of Representation Learning and Clustering for Sparse and Noisy Images. 1934-1942 - Zhiwei Hao
, Zhongyu Xiao
, Yong Luo
, Jianyuan Guo
, Jing Wang
, Li Shen
, Han Hu
:
PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation. 1943-1951 - Kaixin Shen
, Ruijie Quan
, Linchao Zhu
, Jun Xiao
, Yi Yang:
Neural Interaction Energy for Multi-Agent Trajectory Prediction. 1952-1960 - Hao Gu
, Jiangyan Yi
, Chenglong Wang
, Yong Ren
, Jianhua Tao
, Xinrui Yan
, Yujie Chen
, Xiaohui Zhang
:
Utilizing Speaker Profiles for Impersonation Audio Detection. 1961-1970 - Zejun Li
, Ye Wang
, Mengfei Du
, Qingwen Liu
, Binhao Wu
, Jiwen Zhang
, Chengxing Zhou
, Zhihao Fan
, Jie Fu
, Jingjing Chen, Zhongyu Wei
, Xuanjing Huang
:
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks. 1971-1980 - Jiankang Chen
, Ling Deng
, Zhiyong Gan
, Wei-Shi Zheng
, Ruixuan Wang
:
FodFoM: Fake Outlier Data by Foundation Models Creates Stronger Visual Out-of-Distribution Detector. 1981-1990 - Xudong Wang
, Weihong Ren
, Xi'ai Chen
, Huijie Fan
, Yandong Tang
, Zhi Han
:
Uni-YOLO: Vision-Language Model-Guided YOLO for Robust and Fast Universal Detection in the Open World. 1991-2000 - Junliu Zhong
, Zhiyi Li
, Dan Xiang
, Maotang Han
, Changsheng Li
, Yanfen Gan
:
A Lightweight Multi-domain Multi-attention Progressive Network for Single Image Deraining. 2001-2010 - Weijia Zhang
, Dongnan Liu
, Weidong Cai
, Chao Ma
:
Cross-View Consistency Regularisation for Knowledge Distillation. 2011-2020 - Zikai Song
, Ying Tang
, Run Luo
, Lintao Ma
, Junqing Yu
, Yi-Ping Phoebe Chen
, Wei Yang
:
Autogenic Language Embedding for Coherent Point Tracking. 2021-2030 - Yuwen Pan
, Rui Sun
, Yuan Wang
, Tianzhu Zhang
, Yongdong Zhang
:
Rethinking the Implicit Optimization Paradigm with Dual Alignments for Referring Remote Sensing Image Segmentation. 2031-2040 - Zhaopeng Gu
, Bingke Zhu
, Guibo Zhu
, Yingying Chen
, Hao Li
, Ming Tang
, Jinqiao Wang
:
FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization. 2041-2049 - Yi Lei
, Huilin Zhu
, Jingling Yuan
, Guangli Xiang
, Xian Zhong
, Shengfeng He
:
DenseTrack: Drone-Based Crowd Tracking via Density-Aware Motion-Appearance Synergy. 2050-2058 - Fengze Jiang
, Shuling Wang
, Xiaojin Gong
:
Task-Conditional Adapter for Multi-Task Dense Prediction. 2059-2068 - Yitai Lin, Zhijie Wei, Wanfa Zhang, Xiping Lin, Yudi Dai, Chenglu Wen, Siqi Shen, Lan Xu, Cheng Wang:
HmPEAR: A Dataset for Human Pose Estimation and Action Recognition. 2069-2078 - Deji Zhao
, Donghong Han
, Ye Yuan
, Bo Ning
, Mengxiang Li
, Zhongjiang He
, Shuangyong Song
:
AutoGraph: Enabling Visual Context via Graph Alignment in Open Domain Multi-Modal Dialogue Generation. 2079-2088 - Jiaxin Zhang
, Yiqi Wang
, Xihong Yang
, Siwei Wang
, Yu Feng
, Yu Shi
, Ruichao Ren
, En Zhu
, Xinwang Liu
:
Test-Time Training on Graphs with Large Language Models (LLMs). 2089-2098 - Yujia Xiao
, Xi Wang
, Xu Tan
, Lei He
, Xinfa Zhu
, Sheng Zhao
, Tan Lee
:
Contrastive Context-Speech Pretraining for Expressive Text-to-Speech Synthesis. 2099-2107 - Junyu Lin
, Yan Zheng
, Xinyue Chen
, Yazhou Ren
, Xiaorong Pu
, Jing He
:
Cross-view Contrastive Unification Guides Generative Pretraining for Molecular Property Prediction. 2108-2116 - Bo Yuan
, Danpei Zhao
, Zhuoran Liu
, Wentao Li
, Tian Li
:
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images. 2117-2126 - Shidi Chen
, Lili Wei
, Liqian Liang
, Congyan Lang
:
Joint Homophily and Heterophily Relational Knowledge Distillation for Efficient and Compact 3D Object Detection. 2127-2135 - Zhiwen Wang
, Yuhui Wu
, Zheng Wang
, Jiwei Wei
, Tianyu Li
, Guoqing Wang
, Yang Yang
, Hengtao Shen
:
Cascaded Adversarial Attack: Simultaneously Fooling Rain Removal and Semantic Segmentation Networks. 2136-2145 - Jiexuan Yan
, Sheng Huang
, Nankun Mu
, Luwen Huangfu
, Bo Liu
:
Category-Prompt Refined Feature Learning for Long-Tailed Multi-Label Image Classification. 2146-2155 - Penglei Sun
, Yaoxian Song
, Xiang Liu
, Xiaofei Yang
, Qiang Wang
, Tiefeng Li
, Yang Yang
, Xiaowen Chu
:
3D Question Answering for City Scene Understanding. 2156-2165 - Qiuyu Kong
, Jiangming Chen
, Jie Jiang
, Zanxi Ruan
, Lai Kang
:
Dual-Branch Fusion with Style Modulation for Cross-Domain Few-Shot Semantic Segmentation. 2166-2174 - Jiaqi Wang
, Lu Lu
, Mingmin Chi
, Jian Chen
:
MDR: Multi-stage Decoupled Relational Knowledge Distillation with Adaptive Stage Selection. 2175-2183 - Xiongjun Zhao
, Zhengyu Liu
, Fen Liu
, Guanting Li
, Yutao Dou
, Shaoliang Peng
:
Report-Concept Textual-Prompt Learning for Enhancing X-ray Diagnosis. 2184-2193 - Jianzhi Lu
, Ruian He
, Shili Zhou
, Weimin Tan
, Bo Yan
:
FacialFlowNet: Advancing Facial Optical Flow Estimation with a Diverse Dataset and a Decomposed Model. 2194-2203 - Wei-Bang Jiang
, Yu-Ting Lan
, Bao-Liang Lu
:
REmoNet: Reducing Emotional Label Noise via Multi-regularized Self-supervision. 2204-2213 - Shuxun Wang, Yunfei Lei, Ziqi Zhang, Wei Liu, Haowei Liu, Li Yang, Bing Li, Wenjuan Li, Jin Gao, Weiming Hu:
NFT1000: A Cross-Modal Dataset For Non-Fungible Token Retrieval. 2214-2222 - Haoyang Su, Wenzhe Du, Xiaoliang Wang, Cam-Tu Nguyen:
Sample Efficiency Matters: Training Multimodal Conversational Recommendation Systems in a Small Data Setting. 2223-2232 - Xincheng Ju
, Dong Zhang
, Suyang Zhu
, Junhui Li
, Shoushan Li
, Guodong Zhou
:
ECFCON: Emotion Consequence Forecasting in Conversations. 2233-2241 - Xiangbo Yin
, Jiangming Shi
, Yachao Zhang
, Yang Lu
, Zhizhong Zhang
, Yuan Xie
, Yanyun Qu
:
Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification. 2242-2251 - Yubo Li
, De Cheng
, Chaowei Fang
, Changzhe Jiao
, Nannan Wang
, Xinbo Gao
:
Disentangling Identity Features from Interference Factors for Cloth-Changing Person Re-identification. 2252-2261 - Bing Wang, Shengsheng Wang, Changchun Li, Renchu Guan, Ximing Li:
Harmfully Manipulated Images Matter in Multimodal Misinformation Detection. 2262-2271 - Wuliang Huang
, Yiqiang Chen
, Xinlong Jiang
, Chenlong Gao
, Qian Chen
, Teng Zhang
, Bingjie Yan
, Yifan Wang
, Jianrong Yang
:
Correlation-Driven Multi-Modality Graph Decomposition for Cross-Subject Emotion Recognition. 2272-2281 - Wenbin Wang
, Liang Ding
, Li Shen
, Yong Luo
, Han Hu
, Dacheng Tao
:
WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge. 2282-2291 - Zhanpeng Chen
, Zhihong Zhu
, Wanshi Xu
, Yunyan Zhang
, Xian Wu
, Yefeng Zheng
:
Aspects are Anchors: Towards Multimodal Aspect-based Sentiment Analysis via Aspect-driven Alignment and Refinement. 2292-2300 - Haodong Chen
, Haojian Huang
, Junhao Dong
, Mingzhe Zheng
, Dian Shao
:
FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs. 2301-2310 - Honghao Li
, Lei Sang
, Yi Zhang
, Yiwen Zhang
:
SimCEN: Simple Contrast-enhanced Network for CTR Prediction. 2311-2320 - Yuanyuan Shi
, Yunan Li
, Siyu Liang
, Huizhou Chen
, Qiguang Miao
:
MGR-Dark: A Large Multimodal Video Dataset and RGB-IR Benchmark for Gesture Recognition in Darkness. 2321-2330 - Shuanglin Yan
, Jun Liu
, Neng Dong
, Liyan Zhang
, Jinhui Tang
:
Prototypical Prompting for Text-to-image Person Re-identification. 2331-2340 - Kexiang Feng
, Chuanmin Jia
, Siwei Ma
, Wen Gao
:
Unifying Spike Perception and Prediction: A Compact Spike Representation Model Using Multi-scale Correlation. 2341-2349 - Feifei Zhang
, Sijia Qu
, Fan Shi
, Changsheng Xu
:
Overcoming the Pitfalls of Vision-Language Model for Image-Text Retrieval. 2350-2359 - Francesco Tonini
, Nicola Dall'Asen
, Lorenzo Vaquero
, Cigdem Beyan
, Elisa Ricci
:
AL-GTD: Deep Active Learning for Gaze Target Detection. 2360-2369 - Yuxiang Zhou
, Zhe Sun
, Rui Liu
, Yong Chen
, Dell Zhang
:
AVHash: Joint Audio-Visual Hashing for Video Retrieval. 2370-2378 - Xin Jiang
, Hao Tang
, Rui Yan
, Jinhui Tang
, Zechao Li
:
DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines. 2379-2388 - Qian Li
, Yucheng Zhou
, Cheng Ji
, Feihong Lu
, Jianian Gong
, Shangguang Wang
, Jianxin Li
:
Multi-Modal Inductive Framework for Text-Video Retrieval. 2389-2398 - Hancheng Zhu
, Ju Shi
, Zhiwen Shao
, Rui Yao
, Yong Zhou
, Jiaqi Zhao
, Leida Li
:
Attribute-Driven Multimodal Hierarchical Prompts for Image Aesthetic Quality Assessment. 2399-2408 - Zeyu Xiao
, Dachun Kai
, Yueyi Zhang
, Xiaoyan Sun
, Zhiwei Xiong
:
Asymmetric Event-Guided Video Super-Resolution. 2409-2418 - Yuanfeng Pan
, Wenkang Su
, Jiangqun Ni
, Qingliang Liu
, Yulin Zhang
, Donghua Jiang
:
Model-Based Non-Independent Distortion Cost Design for Effective JPEG Steganography. 2419-2427 - Xianghu Yue
, Xueyi Zhang
, Yiming Chen
, Chengwei Zhang
, Mingrui Lao
, Huiping Zhuang
, Xinyuan Qian
, Haizhou Li
:
MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks. 2428-2437 - Yuzheng Wang
, Zhaoyu Chen
, Jie Zhang
, Dingkang Yang
, Zuhao Ge
, Yang Liu
, Siao Liu
, Yunquan Sun
, Wenqiang Zhang
, Lizhe Qi
:
Sampling to Distill: Knowledge Transfer from Open-World Data. 2438-2447 - Xi Wu
, Chuang Huang
, Xinliu Liu
, Fei Zhou
, Zhenwen Ren
:
Multiple Kernel Clustering with Shifted Laplacian on Grassmann Manifold. 2448-2456 - Guangyao Li
, Yajun Jian, Yan Yan, Hanzi Wang:
GLATrack: Global and Local Awareness for Open-Vocabulary Multiple Object Tracking. 2457-2466 - Xuze Hao
, Wenqian Ni
, Xuhao Jiang
, Weimin Tan
, Bo Yan
:
Addressing Imbalance for Class Incremental Learning in Medical Image Classification. 2467-2476 - Qiwei Li
, Yuxin Peng
, Jiahuan Zhou
:
Progressive Prototype Evolving for Dual-Forgetting Mitigation in Non-Exemplar Online Continual Learning. 2477-2486 - Fengfan Zhou
, Qianyu Zhou
, Bangjie Yin
, Hui Zheng
, Xuequan Lu
, Lizhuang Ma
, Hefei Ling
:
Rethinking Impersonation and Dodging Attacks on Face Recognition Systems. 2487-2496 - Xin Chen
, Bin Wang
, Jinzheng Jiang
, Kunkun Zhang
, Yongsheng Gao
:
SDePR: Fine-Grained Leaf Image Retrieval with Structural Deep Patch Representation. 2497-2505 - Yuhan Liu
, Qianxin Huang
, Siqi Hui
, Jingwen Fu
, Sanping Zhou
, Kangyi Wu
, Pengna Li
, Jinjun Wang
:
Semantic-aware Representation Learning for Homography Estimation. 2506-2514 - Chen Hui
, Haiqi Zhu
, Shuya Yan
, Shaohui Liu
, Feng Jiang
, Debin Zhao
:
S2-CSNet: Scale-Aware Scalable Sampling Network for Image Compressive Sensing. 2515-2524 - Gangyan Zeng
, Yuan Zhang
, Jin Wei
, Dongbao Yang
, Peng Zhang
, Yiwen Gao
, Xugong Qin
, Yu Zhou
:
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval. 2525-2534 - Hua Yu
, Weiming Liu
, Jiapeng Bai
, Xu Gui
, Yaqing Hou
, Yew-Soon Ong
, Qiang Zhang
:
Towards Efficient and Diverse Generative Model for Unconditional Human Motion Synthesis. 2535-2544 - Dan Zeng
, Yu Zhu
, Shuiwang Li
, Qijun Zhao
, Qiaomu Shen
, Bo Tang
:
Towards Labeling-free Fine-grained Animal Pose Estimation. 2545-2553 - Rui Xie
, Anlong Ming
, Shuai He
, Yi Xiao
, Huadong Ma
:
"Special Relativity" of Image Aesthetics Assessment: a Preliminary Empirical Perspective. 2554-2563 - Zhengwei Yin
, Mingze Ma
, Guixu Lin
, Yinqiang Zheng
:
Exploring Data Efficiency in Image Restoration: A Gaussian Denoising Case Study. 2564-2573 - Yuntao Wang
, Jinpu Zhang
, Ruonan Wei
, Wenbo Gao
, Yuehuan Wang
:
MFRGN: Multi-scale Feature Representation Generalization Network for Ground-to-Aerial Geo-localization. 2574-2583 - Chang Wu
, Guancheng Quan
, Gang He
, Xin-Quan Lai
, Yunsong Li
, Wenxin Yu
, Xianmeng Lin
, Cheng Yang
:
QS-NeRV: Real-Time Quality-Scalable Decoding with Neural Representation for Videos. 2584-2592 - Xiaoyu Han
, Shunyuan Zheng
, Zonglin Li
, Chenyang Wang
, Xin Sun
, Quanling Meng
:
Shape-Guided Clothing Warping for Virtual Try-On. 2593-2602 - Richen Liu
, Hansheng Wang
, Hailong Wang
, Siru Chen
, Chufan Lai
, Ayush Kumar
, Siming Chen
:
ScaleTraversal: Creating Multi-Scale Biomedical Animation with Limited Hardware Resources. 2603-2612 - Chenrui Wu
, Haishuai Wang
, Xiang Zhang
, Zhen Fang
, Jiajun Bu
:
Spatio-temporal Heterogeneous Federated Learning for Time Series Classification with Multi-view Orthogonal Training. 2613-2622 - Yaopeng Peng
, Milan Sonka
, Danny Z. Chen
:
Group Vision Transformer. 2623-2631 - Zhichao Yang
, Leida Li
, Pengfei Chen
, Jinjian Wu
, Weisheng Dong
:
Semantics-Aware Image Aesthetics Assessment using Tag Matching and Contrastive Ranking. 2632-2641 - Pengcheng Zhang
, Xiaohan Yu
, Xiao Bai
, Jin Zheng
, Xin Ning
:
Prompting Continual Person Search. 2642-2651 - Xiao Zhao
, Xukun Zhang
, Dingkang Yang
, Mingyang Sun
, Mingcheng Li
, Shunli Wang
, Lihua Zhang
:
MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation. 2652-2661 - Yong Yang
, Aoqi Zhao
, Shuying Huang
, Xiaozheng Wang
, Yajing Fan
:
SCPSN: Spectral Clustering-based Pyramid Super-resolution Network for Hyperspectral Images. 2662-2670 - Xiangyu Chen
, Yihao Liu
, Yuandong Pu
, Wenlong Zhang
, Jiantao Zhou
, Yu Qiao
, Chao Dong
:
Learning A Low-Level Vision Generalist via Visual Task Prompt. 2671-2680 - Wenxu Shi
, Bochuan Zheng
:
Alleviating the Equilibrium Challenge with Sample Virtual Labeling for Adversarial Domain Adaptation. 2681-2689 - Federico Espositi
, Andrea Bonarini
:
The Room: Design and Embodiment of Spaces as Social Beings. 2690-2699 - Chunjie Ma
, Lina Du
, Zan Gao
, Li Zhuo
, Meng Wang
:
A Coarse to Fine Detection Method for Prohibited Object in X-ray Images Based on Progressive Transformer Decoder. 2700-2708 - Qizhi Xie
, Kun Yuan
, Yunpeng Qu
, Mingda Wu
, Ming Sun
, Chao Zhou
, Jihong Zhu
:
QPT-V2: Masked Image Modeling Advances Visual Scoring. 2709-2718 - Shengguang Wu
, Zhenglun Chen
, Qi Su
:
Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision. 2719-2728 - Yu Feng
, Zhen Tian
, Yifan Zhu
, Zongfu Han
, Haoran Luo
, Guangwei Zhang
, Meina Song
:
CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning. 2729-2738 - Huixiang Wen
, Shizong Yan
, Shan Chang
, Jie Xu
, Hongzi Zhu
, Yanting Zhang
, Bo Li
:
DepthCloak: Projecting Optical Camouflage Patches for Erroneous Monocular Depth Estimation of Vehicles. 2739-2747 - Keming Wu
, Man Yao
, Yuhong Chou
, Xuerui Qiu
, Rui Yang
, Bo Xu
, Guoqi Li
:
RSC-SNN: Exploring the Trade-off Between Adversarial Robustness and Accuracy in Spiking Neural Networks via Randomized Smoothing Coding. 2748-2756 - Xueying Mao
, Xiaoxiao Hu
, Wanli Peng
, Zhenliang Gan
, Zhenxing Qian
, Xinpeng Zhang
, Sheng Li
:
From Covert Hiding To Visual Editing: Robust Generative Video Steganography. 2757-2765 - Wu Ran
, Peirong Ma
, Zhiquan He
, Hong Lu
:
Rainmer: Learning Multi-view Representations for Comprehensive Image Deraining and Beyond. 2766-2775 - Haoxuan Li
, Zhengmao Yang
, Yunshan Ma
, Yi Bin
, Yang Yang
, Tat-Seng Chua
:
MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models. 2776-2785 - Shuyuan Wen
, Bingrui Hu
, Wenchao Li
:
CDEA: Context- and Detail-Enhanced Unsupervised Learning for Domain Adaptive Semantic Segmentation. 2786-2794 - Xitong Ling
, Minxi Ouyang
, Yizhi Wang
, Xinrui Chen
, Renao Yan
, Hongbo Chu
, Junru Cheng
, Tian Guan
, Sufang Tian
, Xiaoping Liu
, Yonghong He
:
Agent Aggregator with Mask Denoise Mechanism for Histopathology Whole Slide Image Analysis. 2795-2803 - Kepeng Xu
, Zijia Ma
, Li Xu
, Gang He
, Yunsong Li
, Wenxin Yu
, Taichu Han
, Cheng Yang
:
An End-to-End Real-World Camera Imaging Pipeline. 2804-2813 - Lijian Yang
, Weisheng Li
, Yucheng Shu
, Jian-Xun Mi
, Yuping Huang
, Bin Xiao
:
ShiftMorph: A Fast and Robust Convolutional Neural Network for 3D Deformable Medical Image Registration. 2814-2823 - Ximing Wu
, Kongyange Zhao
, Xu Chen
, Teng Liang
:
Edge-assisted Real-time Dynamic 3D Point Cloud Rendering for Multi-party Mobile Virtual Reality. 2824-2832 - Nannan Yu
, Tao Ma
, Jiqing Zhang
, Yuji Zhang
, Qirui Bao
, Xiaopeng Wei
, Xin Yang:
Adaptive Vision Transformer for Event-Based Human Pose Estimation. 2833-2841 - Litian Zhang
, Xiaoming Zhang
, Chaozhuo Li
, Ziyi Zhou
, Jiacheng Liu
, Feiran Huang
, Xi Zhang
:
Mitigating Social Hazards: Early Detection of Fake News via Diffusion-Guided Propagation Path Generation. 2842-2851 - Yuzhen Du
, Teng Hu
, Ran Yi
, Lizhuang Ma
:
LD-BFR: Vector-Quantization-Based Face Restoration Model with Latent Diffusion Enhancement. 2852-2860 - Jie Huang
, Zhao-Min Chen, Xiaoqin Zhang, Yisu Ge, Lusi Ye, Guodao Zhang, Huiling Chen:
Label Decoupling and Reconstruction: A Two-Stage Training Framework for Long-tailed Multi-label Medical Image Recognition. 2861-2869 - Chengpei Xu
, Hao Fu
, Long Ma
, Wenjing Jia
, Chengqi Zhang
, Feng Xia
, Xiaoyu Ai
, Binghao Li
, Wenjie Zhang
:
Seeing Text in the Dark: Algorithm and Benchmark. 2870-2878 - Ye Tian
, Zhe Wang
, Jianguo Sun
, Liguo Zhang
:
Time-Frequency Domain Fusion Enhancement for Audio Super-Resolution. 2879-2887 - Lei Liu, Li Liu, Yawen Cui:
Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for Long-Tailed Continual Learning. 2888-2897 - Tianjiao Xu
, Aoxuan Chen
, Yuxi Zhao
, Jinfei Gao
, Tian Gan
:
A Chinese Multimodal Social Video Dataset for Controversy Detection. 2898-2907 - Zhe Ji
, Qiansiqi Hu
, Yicheng Zheng
, Liyao Xiang
, Xinbing Wang
:
A Principled Approach to Natural Language Watermarking. 2908-2916 - Hao Wu
, Fan Xu
, Chong Chen
, Xian-Sheng Hua
, Xiao Luo
, Haixin Wang
:
PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction. 2917-2926 - Jiawei Yao
, Yingxin Lai
, Hongrui Kou
, Tong Wu
, Ruixi Liu
:
QE-BEV: Query Evolution for Bird's Eye View Object Detection in Varied Contexts. 2927-2935 - Xiangrui Liu
, Xinju Wu
, Pingping Zhang
, Shiqi Wang
, Zhu Li
, Sam Kwong
:
CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting. 2936-2944 - Shengyu Hao
, Wenhao Chai
, Zhonghan Zhao
, Meiqi Sun
, Wendi Hu
, Jieyang Zhou
, Yixian Zhao
, Qi Li
, Yizhou Wang
, Xi Li
, Gaoang Wang
:
Ego3DT: Tracking Every 3D Object in Ego-centric Videos. 2945-2954 - Junkang Liu
, Fanhua Shang
, Yuanyuan Liu
, Hongying Liu
, Yuangang Li
, YunXiang Gong
:
FedBCGD: Communication-Efficient Accelerated Block Coordinate Gradient Descent for Federated Learning. 2955-2963 - Yiran Cheng
, Bintao He
, Fa Zhang
, Renmin Han
:
Serial Section Microscopy Image Inpainting Guided by Axial Optical Flow. 2964-2972 - Han Fang
, Kejiang Chen
, Yupeng Qiu
, Zehua Ma
, Weiming Zhang
, Ee-Chien Chang
:
DERO: Diffusion-Model-Erasure Robust Watermarking. 2973-2981 - Yin Wang
, Hao Lu
, Ying-Cong Chen
, Li Kuang
, Mengchu Zhou
, Shuiguang Deng
:
rPPG-HiBa: Hierarchical Balanced Framework for Remote Physiological Measurement. 2982-2991 - Huan Chen
, Tingfa Xu
, Zhenxiang Chen
, Peifu Liu
, Huiyan Bai
, Jianan Li
:
Multi-scale Change-Aware Transformer for Remote Sensing Image Change Detection. 2992-3000 - Yinyin Peng
, Yaofei Wang
, Donghui Hu
, Kejiang Chen
, Xianjin Rong
, Weiming Zhang
:
LDStega: Practical and Robust Generative Image Steganography based on Latent Diffusion Models. 3001-3009 - Lei Lu, Yanyue Xie, Wei Jiang, Wei Wang, Xue Lin, Yanzhi Wang:
HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression. 3010-3018 - Linfei Li
, Lin Zhang
, Zhong Wang
, Ying Shen
:
GS3LAM: Gaussian Semantic Splatting SLAM. 3019-3027 - Shuang Wang
, Pengyi Hao
, Fuli Wu
, Cong Bai
:
Live on the Hump: Self Knowledge Distillation via Virtual Teacher-Students Mutual Learning. 3028-3036 - Xuhan Zhu
, Yifei Xing
, Ruiping Wang
, Yaowei Wang
, Xiangyuan Lan
:
Calibration for Long-tailed Scene Graph Generation. 3037-3046 - Minjing Yu
, Lingzhi Zeng
, Xinxin Du
, Jenny Sheng
, Qiantian Liao
, Yong-Jin Liu
:
VisHanfu: An Interactive System for the Promotion of Hanfu Knowledge via Cross-Shaped Flat Structure. 3047-3055 - Xiuquan Du
, Jiajia Chen
, Xuejun Zhang
:
CBNet: Cooperation-Based Weakly Supervised Polyp Detection. 3056-3064 - Zeyu Xiao
, Zhihe Lu
, Michael Bi Mi
, Zhiwei Xiong
, Xinchao Wang
:
Unraveling Motion Uncertainty for Local Motion Deblurring. 3065-3074 - Yi Wang
, Ningze Zhong
, Minglin Chen
, Longguang Wang
, Yulan Guo
:
Tangram-Splatting: Optimizing 3D Gaussian Splatting Through Tangram-inspired Shape Priors. 3075-3083 - Jiali Chen
, Yi Cai
, Ruohang Xu
, Jiexin Wang
, Jiayuan Xie
, Qing Li
:
Deconfounded Emotion Guidance Sticker Selection with Causal Inference. 3084-3093 - Zhijian Wu
, Jun Li
, Yang Hu
, Dingjiang Huang
:
Compacter: A Lightweight Transformer for Image Restoration. 3094-3103 - Xiuli Bi
, Yang Hu
, Bo Liu
, Weisheng Li
, Pamela C. Cosman
, Bin Xiao:
PriFU: Capturing Task-Relevant Information Without Adversarial Learning. 3104-3112 - Zan Chen
, Xiao Yu
, Yuanjing Feng
:
Connectivity-based Cerebrovascular Segmentation in Time-of-Flight Magnetic Resonance Angiography. 3113-3121 - Jiawei Chen
, Dingkang Yang
, Yue Jiang
, Mingcheng Li
, Jinjie Wei
, Xiaolu Hou
, Lihua Zhang
:
Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Models. 3122-3130 - Keke Tang
, Zhensu Wang
, Weilong Peng
, Lujie Huang
, Le Wang
, Peican Zhu
, Wenping Wang
, Zhihong Tian
:
SymAttack: Symmetry-aware Imperceptible Adversarial Attacks on 3D Point Clouds. 3131-3140 - Jie Liang
, Rongjie Wang
, Rui Peng
, Zhe Zhang
, Kaiqiang Xiong
, Ronggang Wang
:
High Fidelity Aggregated Planar Prior Assisted PatchMatch Multi-View Stereo. 3141-3150 - Tao Huang
, Xinjia Ou
, Huali Yang
, Shengze Hu
, Jing Geng
, Junjie Hu
, Zhuoran Xu
:
Remembering is Not Applying: Interpretable Knowledge Tracing for Problem-solving Processes. 3151-3159 - Kien T. Pham
, Jingye Chen
, Qifeng Chen
:
TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization. 3160-3169 - Lingyu Xiong
, Xize Cheng
, Jintao Tan
, Xianjia Wu
, Xiandong Li
, Lei Zhu
, Fei Ma
, Minglei Li
, Huang Xu
, Zhihui Hu
:
SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing. 3170-3179 - Changshuo Wang
, Mingzhe Yu
, Lei Wu
, Lei Meng
, Xiang Li
, Xiangxu Meng
:
InstantAS: Minimum Coverage Sampling for Arbitrary-Size Image Generation. 3180-3188 - Du Chen
, Zhengqiang Zhang
, Jie Liang
, Lei Zhang
:
SSL: A Self-similarity Loss for Improving Generative Image Super-resolution. 3189-3198 - Zhengze Xu
, Mengting Chen
, Zhao Wang
, Linyu Xing
, Zhonghua Zhai
, Nong Sang
, Jinsong Lan
, Shuai Xiao
, Changxin Gao
:
Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos. 3199-3208 - Lixing Tan
, Shuang Song
, Kangneng Zhou
, Chengbo Duan
, Lanying Wang
, Huayang Ren
, Linlin Liu
, Wei Zhang
, Ruoxiu Xiao
:
Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans. 3209-3218 - Zecheng Wang
, Xinye Li
, Zhanyue Qin
, Chunshan Li
, Zhiying Tu
, Dianhui Chu
, Dianbo Sui
:
Can We Debias Multimodal Large Language Models via Model Editing? 3219-3228 - Shuqi Dai
, Ming-Yu Liu
, Rafael Valle
, Siddharth Gururani
:
ExpressiveSinger: Multilingual and Multi-Style Score-based Singing Voice Synthesis with Expressive Performance Control. 3229-3238 - Dehao Ying
, Fengchang Yu
, Haihua Chen
, Wei Lu
:
DIG: Complex Layout Document Image Generation with Authentic-looking Text for Enhancing Layout Analysis. 3239-3247 - Shibo Hong
, Xuhong Zhang
, Tianyu Du
, Sheng Cheng
, Xun Wang
, Jianwei Yin
:
Cons2Plan: Vector Floorplan Generation from Various Conditions via a Learning Framework based on Conditional Diffusion Models. 3248-3256 - Qihe Pan
, Zhen Zhao
, Zicheng Wang
, Sifan Long
, Yiming Wu
, Wei Ji
, Haoran Liang
, Ronghua Liang
:
Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach. 3257-3265 - Xiaofeng Mao
, Zhengkai Jiang
, Qilin Wang
, Chencan Fu
, Jiangning Zhang
, Jiafu Wu
, Yabiao Wang
, Chengjie Wang
, Wei Li
, Mingmin Chi
:
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation. 3266-3274 - Jihoon Lee
, Yunhong Min
, Hwidong Kim
, Sangtae Ahn
:
DAFT-GAN: Dual Affine Transformation Generative Adversarial Network for Text-Guided Image Inpainting. 3275-3283 - Boyong He
, Yuxiang Ji
, Zhuoyue Tan
, Liaoni Wu
:
Diffusion Domain Teacher: Diffusion Guided Domain Adaptive Object Detector. 3284-3293 - Weizhi Liu
, Yue Li
, Dongdong Lin
, Hui Tian
, Haizhou Li
:
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis. 3294-3302 - Feihong Lu
, Weiqi Wang
, Yangyifei Luo
, Ziqin Zhu
, Qingyun Sun
, Baixuan Xu
, Haochen Shi
, Shiqi Gao
, Qian Li
, Yangqiu Song
, Jianxin Li
:
Miko: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery. 3303-3312 - Guojin Zhong
, Yihu Guo
, Jin Yuan
, Qianjun Zhang
, Weili Guan
, Long Chen
:
PROMOTE: Prior-Guided Diffusion Model with Global-Local Contrastive Learning for Exemplar-Based Image Translation. 3313-3322 - Xiangcheng Zhai
, Yingqi Jie
, Xueguang Xie
, Aimin Hao
, Na Jiang
, Yang Gao
:
ANFluid: Animate Natural Fluid Photos base on Physics-Aware Simulation and Dual-Flow Texture Learning. 3323-3331 - Shoubin Yu
, Jacob Zhiyuan Fang
, Jian Zheng
, Gunnar A. Sigurdsson
, Vicente Ordonez
, Robinson Piramuthu
, Mohit Bansal
:
Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition. 3332-3341 - Goirik Chakrabarty
, Aditya Chandrasekar
, Ramya Hebbalaguppe
, Prathosh AP
:
LoMOE: Localized Multi-Object Editing via Multi-Diffusion. 3342-3351 - Yuyan Chen
, Songzhou Yan
, Zhihong Zhu
, Zhixu Li
, Yanghua Xiao
:
XMeCap: Meme Caption Generation with Sub-Image Adaptability. 3352-3361 - Zhenqiang Li
, Jie Li
, Yangjie Cao
, Jiayi Wang
, Runfeng Lv
:
ImageBind3D: Image as Binding Step for Controllable 3D Generation. 3362-3371 - Pengxiang Cai
, Zhiwei Liu
, Guibo Zhu
, Yunfang Niu
, Jinqiao Wang
:
Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner. 3372-3380 - Chengwei Zhang
, Xueyi Zhang
, Xianghu Yue
, Mingrui Lao
, Tao Jiang
, Jiawei Wang
, Fubo Zhang
, Longyong Chen
:
PD-Refiner: An Underlying Surface Inheritance Refiner with Adaptive Edge-Aware Supervision for Point Cloud Denoising. 3381-3390 - Yue Jiang
, Yueming Lyu
, Ziwen He
, Bo Peng
, Jing Dong
:
Mitigating Social Biases in Text-to-Image Diffusion Models via Linguistic-Aligned Attention Guidance. 3391-3400 - Peng Zhou
, Dunbo Cai
, Yujian Du
, Runqing Zhang
, Bingbing Ni
, Jie Qin
, Ling Qian
:
Edit3D: Elevating 3D Scene Editing with Attention-Driven Multi-Turn Interactivity. 3401-3410 - Ziyu Yao
, Xuxin Cheng
, Zhiqi Huang
:
FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model. 3411-3420 - Xiaomin Li
, Xu Jia
, Qinghe Wang
, Haiwen Diao
, Mengmeng Ge
, Pengxiang Li
, You He
, Huchuan Lu
:
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models. 3421-3430 - Qi Xu
, Yaxin Li
, Xuanye Fang
, Jiangrong Shen
, Qiang Zhang
, Gang Pan
:
Reversing Structural Pattern Learning with Biologically Inspired Knowledge Distillation for Spiking Neural Networks. 3431-3439 - Xiaogang Wang
, Yuhang Cheng
, Ziyang Fan
, Kai Xu
:
Learning to Transfer Heterogeneous Translucent Materials from a 2D Image to 3D Models. 3440-3448 - Zonglin Lyu
, Ming Li
, Jianbo Jiao
, Chen Chen
:
Frame Interpolation with Consecutive Brownian Bridge Diffusion. 3449-3458 - Teng Hu
, Jiangning Zhang
, Ran Yi
, Yating Wang
, Jieyu Weng
, Hongrui Huang
, Yabiao Wang
, Lizhuang Ma
:
COMD: Training-free Video Motion Transfer With Camera-Object Motion Disentanglement. 3459-3468 - Yihao Liu
, Feng Xue
, Anlong Ming
, Mingshuai Zhao
, Huadong Ma
, Nicu Sebe
:
SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model. 3469-3478 - Qinfeng Li
, Zhiqiang Shen
, Zhenghan Qin
, Yangfan Xie
, Xuhong Zhang
, Tianyu Du
, Sheng Cheng
, Xun Wang
, Jianwei Yin
:
TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment. 3479-3488 - Tao Wu
, Mengze Li
, Jingyuan Chen
, Wei Ji
, Wang Lin
, Jinyang Gao
, Kun Kuang
, Zhou Zhao
, Fei Wu
:
Semantic Alignment for Multimodal Large Language Models. 3489-3498 - Wenxuan Yang
, Weimin Tan
, Yuqi Sun
, Bo Yan
:
A Medical Data-Effective Learning Benchmark for Highly Efficient Pre-training of Foundation Models. 3499-3508 - Jin Liu
, Huaibo Huang
, Jie Cao
, Ran He
:
ZePo: Zero-Shot Portrait Stylization with Faster Sampling. 3509-3518 - Yiding Li
, Lingyun Yu
, Li Wang
, Hongtao Xie
:
Control-Talker: A Rapid-Customization Talking Head Generation Method for Multi-Condition Control and High-Texture Enhancement. 3519-3527 - Zhaoyang Li
, Zhu Teng
, Baopeng Zhang
, Jianping Fan
:
Boosting Non-causal Semantic Elimination: An Unconventional Harnessing of LVM for Open-World Deepfake Interpretation. 3528-3537 - Zhihao Sun
, Haipeng Fang
, Juan Cao
, Xinying Zhao
, Danding Wang
:
Rethinking Image Editing Detection in the Era of Generative AI Revolution. 3538-3547 - Hongyun Yu
, Zhan Qu
, Qihang Yu
, Jianchuan Chen
, Zhonghua Jiang
, Zhiwen Chen
, Shengyu Zhang
, Jimin Xu
, Fei Wu
, Chengfei Lv
, Gang Yu
:
GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting. 3548-3557 - Xingqi Wang
, Xiaoyuan Yi
, Xing Xie
, Jia Jia
:
Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization. 3558-3567 - Weili Zeng
, Yichao Yan
, Qi Zhu
, Zhuo Chen
, Pengzhi Chu
, Weiming Zhao
, Xiaokang Yang
:
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting. 3568-3577 - Yi Liu
, Chengjun Cai
, Xiaoli Zhang
, Xingliang Yuan
, Cong Wang
:
Arondight: Red Teaming Large Vision Language Models with Auto-generated Multi-modal Jailbreak Prompts. 3578-3586 - Yisu Liu
, Jinyang An
, Wanqian Zhang
, Dayan Wu
, Jingzi Gu
, Zheng Lin
, Weiping Wang
:
Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization. 3587-3596 - Yiren Lu
, Jing Ma
, Yu Yin
:
View-consistent Object Removal in Radiance Fields. 3597-3606 - Shaocong Long
, Qianyu Zhou
, Xiangtai Li
, Xuequan Lu
, Chenhao Ying
, Yuan Luo
, Lizhuang Ma
, Shuicheng Yan
:
DGMamba: Domain Generalization via Generalized State Space Model. 3607-3616 - Wangguandong Zheng
, Haifeng Xia
, Rui Chen
, Libo Sun
, Ming Shao
, Siyu Xia
, Zhengming Ding
:
Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation. 3617-3626 - Ziyin Zhou
, Ke Sun
, Zhongxi Chen
, Huafeng Kuang
, Xiaoshuai Sun
, Rongrong Ji
:
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model. 3627-3636 - Hong Chen
, Xin Wang
, Yipeng Zhang
, Yuwei Zhou
, Zeyang Zhang
, Siao Tang
, Wenwu Zhu
:
DisenStudio: Customized Multi-Subject Text-to-Video Generation with Disentangled Spatial Control. 3637-3646 - Ziqi Yu
, Jing Zhou
, Zhongyun Bao
, Gang Fu
, Weilei He
, Chao Liang
, Chunxia Xiao
:
CFDiffusion: Controllable Foreground Relighting in Image Compositing via Diffusion Model. 3647-3656 - Hao Wang
, Shangwei Guo
, Jialing He
, Kangjie Chen
, Shudong Zhang
, Tianwei Zhang
, Tao Xiang
:
EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second. 3657-3665 - Haiyan Jiang
, Leiyu Song
, Dongdong Weng
, Zhe Sun
, Huiying Li
, Xiaonuo Dongye
, Zhenliang Zhang
:
In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces. 3666-3675 - Haoning Wu
, Xiele Wu
, Chunyi Li
, Zicheng Zhang
, Chaofeng Chen
, Xiaohong Liu
, Guangtao Zhai
, Weisi Lin
:
T2I-Scorer: Quantitative Evaluation on Text-to-Image Generation via Fine-Tuned Large Multi-Modal Models. 3676-3685 - Shiwei Li
, Yingyi Cheng
, Haozhao Wang
, Xing Tang
, Shijie Xu
, Weihong Luo
, Yuhua Li
, Dugang Liu
, Xiuqiang He
, Ruixuan Li
:
Masked Random Noise for Communication-Efficient Federated Learning. 3686-3694 - Sa Yan
, Nuowen Kan
, Chenglin Li
, Wenrui Dai
, Junni Zou
, Hongkai Xiong
:
Task-Oriented Multi-Bitstream Optimization for Image Compression and Transmission via Optimal Transport. 3695-3703 - Tingting Li
, Ziming Zhao
, Jianwei Yin
:
Minerva: Enhancing Quantum Network Performance for High-Fidelity Multimedia Transmission. 3704-3712 - Xiaotong Yu
, Chang-Wen Chen
:
Semantic-aware Next-Best-View for Multi-DoFs Mobile System in Search-and-Acquisition based Visual Perception. 3713-3721 - Yu Chen
, Yanan Wu
, Na Han
, Xiaozhao Fang
, Bingzhi Chen
, Jie Wen
:
Partial Multi-label Learning Based On Near-Far Neighborhood Label Enhancement And Nonlinear Guidance. 3722-3731 - Ruofan Jia
, Weiying Xie
, Jie Lei
, Yunsong Li
:
Adaptive Hierarchical Aggregation for Federated Object Detection. 3732-3740 - Liang Xie
, Wei Gao
, Huiming Zheng
, Ge Li
:
ROI-Guided Point Cloud Geometry Compression Towards Human and Machine Vision. 3741-3750
Oral Session 12: Human-centric and Interactive Multimedia
- Xiyu Wang
, Yufei Wang
, Satoshi Tsutsui
, Weisi Lin
, Bihan Wen
, Alex C. Kot
:
Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models. 3751-3760 - Shiyu Liu
, Zibo Zhao
, Yihao Zhi
, Yiqun Zhao
, Binbin Huang
, Shuo Wang
, Ruoyu Wang
, Michael Xuan
, Zhengxin Li
, Shenghua Gao
:
HeroMaker: Human-centric Video Editing with Motion Priors. 3761-3770 - Yunze Liu
, Changxi Chen
, Chenjing Ding
, Li Yi
:
PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation. 3771-3780 - Wenxuan Wang
, Haonan Bai
, Jen-tse Huang
, Yuxuan Wan
, Youliang Yuan
, Haoyi Qiu
, Nanyun Peng
, Michael R. Lyu
:
New Job, New Gender? Measuring the Social Bias in Image Generation Models. 3781-3789 - Mengzhen Liu
, Mengyu Wang
, Henghui Ding
, Yilong Xu
, Yao Zhao
, Yunchao Wei
:
Segment Anything with Precise Interaction. 3790-3799 - Zhihua Xu
, Tianshui Chen
, Zhijing Yang
, Chunmei Qing
, Yukai Shi
, Liang Lin
:
Self-Supervised Emotion Representation Disentanglement for Speech-Preserving Facial Expression Manipulation. 3800-3808
Oral Session 13: Machine Learning for Multimedia
- Dongyu Xie
, Chaofan Qiao
, Lanyue Liang
, Zhiwen Wang
, Tianyu Li
, Qiao Liu
, Chongyi Li
, Guoqing Wang
, Yang Yang
:
Generalizing ISP Model by Unsupervised Raw-to-raw Mapping. 3809-3817 - Yang Liu
, Daizong Liu
, Zongming Guo
, Wei Hu
:
Cross-Task Knowledge Transfer for Semi-supervised Joint 3D Grounding and Captioning. 3818-3827 - Yang Liu
, Qianqian Xu
, Peisong Wen
, Siran Dai
, Qingming Huang
:
Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval. 3828-3837 - Dongjie Fu
, Xize Cheng
, Xiaoda Yang
, Hanting Wang
, Zhou Zhao
, Tao Jin
:
Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts. 3838-3847 - Xingyu Zhu
, Beier Zhu
, Yi Tan
, Shuo Wang
, Yanbin Hao
, Hanwang Zhang
:
Selective Vision-Language Subspace Projection for Few-shot CLIP. 3848-3857 - Jin Liu
, Bo Wang
, Chuanming Wang
, Huiyuan Fu
, Huadong Ma
:
Learning Exposure Correction in Dynamic Scenes. 3858-3866
Oral Session 14: Multimodal Datasets, Models & Analytics
- Fuqiang Niu
, Zebang Cheng
, Xianghua Fu
, Xiaojiang Peng
, Genan Dai
, Yin Chen
, Hu Huang
, Bowen Zhang
:
Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model. 3867-3876 - Ruilin Yao
, Shengwu Xiong
, Yichen Zhao
, Yi Rong
:
Visual Grounding with Multi-modal Conditional Adaptation. 3877-3886 - Junhao Xu
, Jingjing Chen, Xue Song
, Feng Han
, Haijun Shan
, Yu-Gang Jiang
:
Identity-Driven Multimedia Forgery Detection via Reference Assistance. 3887-3896 - Bowen Zhao
, Tianhao Cheng
, Yuejie Zhang
, Ying Cheng
, Rui Feng
, Xiaobo Zhang
:
CT2C-QA: Multimodal Question Answering over Chinese Text, Table and Chart. 3897-3906 - Zhanyu Wang
, Longyue Wang
, Zhen Zhao
, Minghao Wu
, Chenyang Lyu
, Huayang Li
, Deng Cai
, Luping Zhou
, Shuming Shi
, Zhaopeng Tu
:
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation. 3907-3916 - Linmei Hu
, Duokang Wang
, Yiming Pan
, Jifan Yu
, Yingxia Shao
, Chong Feng
, Liqiang Nie
:
NovaChart: A Large-scale Dataset towards Chart Understanding and Generation of Multimodal Large Language Models. 3917-3925
Oral Session 15: Video Applications
- Jiaxu Li
, Songsong Yu
, Yifan Wang
, Lijun Wang
, Huchuan Lu
:
SelM: Selective Mechanism based Audio-Visual Segmentation. 3926-3935 - Yuqing Wang, Lei Meng, Haokai Ma, Yuqing Wang, Haibei Huang, Xiangxu Meng:
Modeling Event-level Causal Representation for Video Classification. 3936-3944 - Te Yang
, Jian Jia
, Bo Wang
, Yanhua Cheng
, Yan Li
, Dongze Hao
, Xipeng Cao
, Quan Chen
, Han Li
, Peng Jiang
, Xiangyu Zhu
, Zhen Lei
:
Spatiotemporal Fine-grained Video Description for Short Videos. 3945-3954 - Yili Li
, Jing Yu
, Keke Gai
, Bang Liu
, Gang Xiong
, Qi Wu
:
T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval. 3955-3963 - Haijie Yang
, Zhenyu Zhang
, Hao Tang
, Jianjun Qian
, Jian Yang
:
ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance. 3964-3973 - Zhiyu Zhang
, Guo Lu
, Huanxiong Liang
, Zhengxue Cheng
, Anni Tang
, Li Song
:
Rate-aware Compression for NeRF-based Volumetric Video. 3974-3983
Oral Session 16: Biological and Health Applications
- Jingxiong Li
, Sunyi Zheng
, Chenglu Zhu
, Yuxuan Sun
, Pingyi Chen
, Zhongyi Shui
, Yunlong Zhang
, Honglin Li
, Lin Yang
:
PathUp: Patch-wise Timestep Tracking for Multi-class Large Pathology Image Synthesising Diffusion Model. 3984-3993 - Dian Xie
, Peiang Zhao
, Jiarui Zhang
, Kangqi Wei
, Xiaobao Ni
, Jiong Xia
:
BrainRAM: Cross-Modality Retrieval-Augmented Image Reconstruction from Human Brain Activity. 3994-4003 - Shuo Ma
, Yingwei Zhang
, Qiqi Zhang
, Yiqiang Chen
, Haoran Wang
, Ziyu Jia
:
SleepMG: Multimodal Generalizable Sleep Staging with Inter-modal Balance of Classification and Domain Discrimination. 4004-4013 - Zixuan Gong
, Qi Zhang
, Guangyin Bao
, Lei Zhu
, Yu Zhang
, Ke Liu
, Liang Hu
, Duoqian Miao
:
Lite-Mind: Towards Efficient and Robust Brain Representation Learning. 4014-4023 - Kun Dong
, Jian Xue
, Zehai Niu
, Xing Lan
, Ke Lu
, Qingyuan Liu
, Xiaoyu Qin
:
Realistic Full-Body Motion Generation from Sparse Tracking with State Space Model. 4024-4033 - Usman Naseem
, Adam G. Dunn
, Matloob Khushi
, Jinman Kim
:
Vaccine Misinformation Detection in X using Cooperative Multimodal Framework. 4034-4042
Oral Session 17: Person Modeling and Tracking
- Shizong Yan
, Huixiang Wen
, Shan Chang
, Hongzi Zhu
, Luo Zhou
:
Fooling 3D Face Recognition with One Single 2D Image. 4043-4052 - Fangyi Liu
, Mang Ye
, Bo Du
:
Cloth-aware Augmentation for Cloth-generalized Person Re-identification. 4053-4062 - Zhiqi Pang
, Lingling Zhao
, Chunyu Wang
:
Dual-Resolution Fusion Modeling for Unsupervised Cross-Resolution Person Re-Identification. 4063-4072 - Huilin Tian
, Jingke Meng
, Wei-Shi Zheng
, Yuan-Ming Li
, Junkai Yan
, Yunong Zhang
:
Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation. 4073-4081 - Changcheng Xiao
, Qiong Cao
, Zhigang Luo
, Long Lan
:
MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model. 4082-4091 - Ling Li
, WenRui Yang
, Xinchun Yu
, Junliang Xing
, Xiao-Ping Zhang
:
Translating Motion to Notation: Hand Labanotation for Intuitive and Comprehensive Hand Movement Documentation. 4092-4100
Poster Session 2
- Xiang Gao
, Jiaying Liu
:
FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation. 4101-4109 - Wen Yin
, Bin Benjamin Zhu
, Yulai Xie
, Pan Zhou
, Dan Feng
:
Backdoor Attacks on Bimodal Salient Object Detection with RGB-Thermal Data. 4110-4119 - Zhixiang Shen
, Haolan He
, Zhao Kang
:
Balanced Multi-Relational Graph Clustering. 4120-4128 - Jiyuan Wang
, Chunyu Lin
, Lang Nie
, Kang Liao
, Shuwei Shao
, Yao Zhao
:
Digging into Contrastive Learning for Robust Depth Estimation with Diffusion Models. 4129-4137 - Zhuoxiao Chen
, Zixin Wang
, Yadan Luo
, Sen Wang
, Zi Huang
:
DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection. 4138-4147 - Xian Zhang
, Haokun Wen
, Jianlong Wu
, Pengda Qin
, Hui Xue'
, Liqiang Nie
:
Differential-Perceptive and Retrieval-Augmented MLLM for Change Captioning. 4148-4157 - Bingyan Liu
, Chengyu Wang
, Jun Huang
, Kui Jia
:
Attentive Linguistic Tracking in Diffusion Models for Training-free Text-guided Image Editing. 4158-4166 - Changhao He
, Hongyuan Zhu
, Peng Hu
, Xi Peng
:
Robust Variational Contrastive Learning for Partially View-unaligned Clustering. 4167-4176 - Shengxin Chen
, Gen Luo
, Yiyi Zhou
, Xiaoshuai Sun
, Guannan Jiang
, Rongrong Ji
:
QueryMatch: A Query-based Contrastive Learning Framework for Weakly Supervised Visual Grounding. 4177-4186 - Rui Liu
, Yifan Hu
, Yi Ren
, Xiang Yin
, Haizhou Li
:
Generative Expressive Conversational Speech Synthesis. 4187-4196 - Zhien Dai
, Zhaohui Tang
, Hu Zhang
, Can Tian
, Mingjun Pan
, Yongfang Xie
:
Eglcr: Edge Structure Guidance and Scale Adaptive Attention for Iterative Stereo Matching. 4197-4206 - Humen Zhong
, Zhibo Yang
, Zhaohai Li
, Peng Wang
, Jun Tang
, Wenqing Cheng
, Cong Yao
:
VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer. 4207-4216 - Chaofan Gan
, Yuanpeng Tu
, Yuxi Li
, Weiyao Lin
:
DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction. 4217-4226 - Zhenyu Hou
, Junjun Guo
:
Virtual Visual-Guided Domain-Shadow Fusion via Modal Exchanging for Domain-Specific Multi-Modal Neural Machine Translation. 4227-4235 - Yuxiang Yang
, Lu Wen
, Xinyi Zeng
, Yuanyuan Xu
, Xi Wu
, Jiliu Zhou
, Yan Wang
:
Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression Recognition. 4236-4245 - Shuhuang Chen
, Dingjie Fu
, Shiming Chen
, Shuo Ye
, Wenjin Hou
, Xinge You
:
Causal Visual-semantic Correlation for Zero-shot Learning. 4246-4255 - Patrick Steinert
, Stefan Wagenpfeil
, Ingo Frommholz
, Matthias L. Hemmje
:
256 Metaverse Records Dataset. 4256-4263 - Yifeng Xie
, Zhihong Zhu
, Xin Chen
, Zhanpeng Chen
, Zhiqi Huang
:
MoBA: Mixture of Bi-directional Adapter for Multi-modal Sarcasm Detection. 4264-4272 - Jiulin Li
, Mengyu Yang
, Ye Tian
, Lanshan Zhang
, Yongchun Lu
, Jice Liu
, Wendong Wang
:
WaveDN: A Wavelet-based Training-free Zero-shot Enhancement for Vision-Language Models. 4273-4282 - Runkai Zhao
, Heng Wang
, Weidong Cai
:
LaneCMKT: Boosting Monocular 3D Lane Detection with Cross-Modal Knowledge Transfer. 4283-4291 - Wenju Sun
, Qingyong Li
, Siyu Zhang
, Wen Wang
, Yangli-ao Geng
:
Incremental Learning via Robust Parameter Posterior Fusion. 4292-4301 - Tao Jin
, Weicai Yan
, Ye Wang
, Sihang Cai
, Qifan Shuai
, Zhou Zhao
:
Calibrating Prompt from History for Continual Vision-Language Retrieval and Grounding. 4302-4311 - Pengyue Lin
, Ruifan Li
, Yuzhe Ji
, Zhihan Yu
, Fangxiang Feng
, Zhanyu Ma
, Xiaojie Wang
:
Triple Alignment Strategies for Zero-shot Phrase Grounding under Weak Supervision. 4312-4321 - Zhenni Yu
, Xiaoqin Zhang
, Li Zhao
, Yi Bin
, Guobao Xiao
:
Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection. 4322-4330 - Jiawei Wang
, Da Cao
, Shaofei Lu
, Zhanchang Ma
, Junbin Xiao
, Tat-Seng Chua
:
Causal-driven Large Language Models with Faithful Reasoning for Knowledge Question Answering. 4331-4340 - Zijian Yi
, Ziming Zhao
, Zhishu Shen
, Tiehua Zhang
:
Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation. 4341-4348 - Cheng Shen
, Liquan Shen
, Mengyao Li
, Meng Yu
:
EPL-UFLSID: Efficient Pseudo Labels-Driven Underwater Forward-Looking Sonar Images Object Detection. 4349-4357 - Shuiping Gou
, Xin Wang
, Xinlin Wang
, Yunzhi Chen
:
Interpretable Matching of Optical-SAR Image via Dynamically Conditioned Diffusion Models. 4358-4367 - Xiaohuan Ding
, Yangrui Gong
, Tianyi Shi
, Zihang Huang
, Gangwei Xu
, Xin Yang
:
Masked Snake Attention for Fundus Image Restoration with Vessel Preservation. 4368-4376 - Yajie Zhang
, Zhi-An Huang
, Zhiliang Hong
, Songsong Wu
, Jibin Wu
, Kay Chen Tan
:
Mixed Prototype Correction for Causal Inference in Medical Image Classification. 4377-4386 - Yi Zhang
, Ke Yu
, Angelica I. Avilés-Rivero
, Jiyuan Jia
, Yushun Tang
, Zhihai He
:
Training-Free Feature Reconstruction with Sparse Optimization for Vision-Language Models. 4387-4396 - Nan Wang
, Zonglin Di
, Houlin He
, Qingchao Jiang
, Xiaoxiao Li
:
A Simple and Provable Approach for Learning on Noisy Labeled Medical Images. 4397-4405 - Mengmeng Sheng
, Zeren Sun
, Gensheng Pei
, Tao Chen
, Haonan Luo
, Yazhou Yao
:
Enhancing Robustness in Learning with Noisy Labels: An Asymmetric Co-Training Approach. 4406-4415 - Muquan Li
, Dongyang Zhang
, Tao He
, Xiurui Xie
, Yuan-Fang Li
, Ke Qin
:
Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation. 4416-4425 - Qiuhui Chen
, Yi Hong
:
SMART: Self-Weighted Multimodal Fusion for Diagnostics of Neurodegenerative Disorders. 4426-4435 - Taoyu Su
, Jiawei Sheng
, Shicheng Wang
, Xinghua Zhang
, Hongbo Xu
, Tingwen Liu
:
IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment. 4436-4445 - Zhijun Jia
, Huaying Xue
, Xiulian Peng
, Yan Lu
:
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision. 4446-4454 - Yihan Zhao
, Wei Xi
, Yuhang Cui
, Gairui Bai
, Xinhui Liu
, Jizhong Zhao
:
CoPL: Parameter-Efficient Collaborative Prompt Learning for Audio-Visual Tasks. 4455-4464 - Junbo Hu
, Zhixin Li
:
Distilled Cross-Combination Transformer for Image Captioning with Dual Refined Visual Features. 4465-4474 - Siyuan Xu
, Guannan Li
, Haofei Song
, Jiansheng Wang
, Yan Wang
, Qingli Li
:
GeNSeg-Net: A General Segmentation Framework for Any Nucleus in Immunohistochemistry Images. 4475-4484 - Ziyi Gao
, Kai Chen
, Zhipeng Wei
, Tingshu Mou
, Jingjing Chen, Zhiyu Tan
, Hao Li
, Yu-Gang Jiang
:
ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack. 4485-4494 - Kunyu Peng
, David Schneider
, Alina Roitberg
, Kailun Yang
, Jiaming Zhang
, Chen Deng
, Kaiyu Zhang
, M. Saquib Sarfraz
, Rainer Stiefelhagen
:
Towards Video-based Activated Muscle Group Estimation in the Wild. 4495-4504 - Rui Xu
, Gaolei Li
, Changze Li
, Zhaohui Yang
, Yuchen Liu
, Mingzhe Chen
:
OSNeRF: On-demand Semantic Neural Radiance Fields for Fast and Robust 3D Object Reconstruction. 4505-4514 - Wenjie Li
, Heng Guo
, Xuannan Liu
, Kongming Liang
, Jiani Hu
, Zhanyu Ma
, Jun Guo
:
Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network. 4515-4523 - Ruoxi Deng
, Bin Yu
, Jinxuan Lu
, Caixia Zhou
, Zhao-Min Chen
, Jie Hu
:
Advancing Semantic Edge Detection through Cross-Modal Knowledge Learning. 4524-4532 - Jiacheng Zhang
, Jie Wu
, Huafeng Kuang
, Haiming Zhang
, Yuxi Ren
, Weifeng Chen
, Manlin Zhang
, Xuefeng Xiao
, Guanbin Li
:
TreeReward: Improve Diffusion Model via Tree-Structured Feedback Learning. 4533-4542 - Chaomin Shen
, Yaomin Huang
, Haokun Zhu
, Jinsong Fan
, Guixu Zhang
:
Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation. 4543-4552 - Yanshan Zhou
, Pingrui Lai
, Jiaqi Yu
, Yingjie Xiong
, Hua Yang
:
Hydrodynamics-Informed Neural Network for Simulating Dense Crowd Motion Patterns. 4553-4561 - Zhidong Yu
, Zhenbo Shi
, Xiaoman Liu
, Wei Yang
:
PFFAA: Prototype-based Feature and Frequency Alteration Attack for Semantic Segmentation. 4562-4571 - Wenbo Huang
, Jinghui Zhang
, Xuwei Qian
, Zhen Wu
, Meng Wang
, Lei Zhang
:
SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition. 4572-4580 - Xiangyan Qu
, Jing Yu
, Keke Gai
, Jiamin Zhuang
, Yuanmin Tang
, Gang Xiong
, Gaopeng Gou
, Qi Wu
:
Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning. 4581-4590 - Weixiang Han
, Chengjun Cai
, Yu Guo
, Jialiang Peng
:
ERL-MR: Harnessing the Power of Euler Feature Representations for Balanced Multi-modal Learning. 4591-4600 - Luca Rossetto
, Cristina Sarasua
, Abraham Bernstein
:
Estimating the Semantic Density of Visual Media. 4601-4609 - Shaokun Zhang
, Yiran Wu
, Zhonghua Zheng
, Qingyun Wu
, Chi Wang
:
HyperTime: Hyperparameter Optimization for Combating Temporal Distribution Shifts. 4610-4619 - Xiaomeng Chu
, Jiajun Deng
, Guoliang You
, Yifan Duan
, Yao Li
, Yanyong Zhang
:
RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies. 4620-4629 - Yi Bin
, Junrong Liao
, Yujuan Ding
, Haoxuan Li
, Yang Yang
, See-Kiong Ng
, Heng Tao Shen
:
Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning. 4630-4639 - Chengyou Jia
, Minnan Luo
, Xiaojun Chang
, Zhuohang Dang
, Mingfei Han
, Mengmeng Wang
, Guang Dai
, Sizhe Dang
, Jingdong Wang
:
Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition. 4640-4649 - Jialu Zhang
, Xinyi Wang
, Chenglin Yao
, Jianfeng Ren
, Xudong Jiang
:
Visual-linguistic Cross-domain Feature Learning with Group Attention and Gamma-correct Gated Fusion for Extracting Commonsense Knowledge. 4650-4659 - Wenhan Wu
, Ce Zheng
, Zihao Yang
, Chen Chen
, Srijan Das
, Aidong Lu
:
Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer. 4660-4669 - Xianwei Zhuang
, Xuxin Cheng
, Zhihong Zhu
, Zhanpeng Chen
, Hongxiang Li
, Yuexian Zou
:
Towards Multimodal-augmented Pre-trained Language Models via Self-balanced Expectation-Maximization Iteration. 4670-4679 - Hongze Zhu
, Guoyang Xie
, Chengbin Hou
, Tao Dai
, Can Gao
, Jinbao Wang
, Linlin Shen
:
Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning. 4680-4689 - Kaixiang Wang
, Xiaojian Ding
, Fan Yang
:
Non-Overlapped Multi-View Weak-Label Learning Guided by Multiple Correlations. 4690-4698 - Xin Mei
, Rui Mao
, Xiaoyan Cai
, Libin Yang
, Erik Cambria
:
Medical Report Generation via Multimodal Spatio-Temporal Fusion. 4699-4708 - Guofan Fan
, Zekun Qi
, Wenkai Shi
, Kaisheng Ma
:
Point-GCC: Universal Self-supervised 3D Scene Pre-training via Geometry-Color Contrast. 4709-4718 - Menghao Zhang
, Jingyu Wang
, Qi Qi
, Pengfei Ren
, Haifeng Sun
, Zirui Zhuang
, Huazheng Wang
, Lei Zhang
, Jianxin Liao
:
Video Anomaly Detection via Progressive Learning of Multiple Proxy Tasks. 4719-4728 - Xingyu Zhang
, Siyu Zhao
, Zeen Song
, Huijie Guo
, Jianqi Zhang
, Changwen Zheng
, Wenwen Qiang
:
Not All Frequencies Are Created Equal: Towards a Dynamic Fusion of Frequencies in Time-Series Forecasting. 4729-4737 - Shijie Chen
, Junbao Zhuo
, Xin Li
, Haizhuang Liu
, Rongquan Wang
, Jiansheng Chen
, Huimin Ma
:
CMT: Co-training Mean-Teacher for Unsupervised Domain Adaptation on 3D Object Detection. 4738-4747 - Tianrui Pan
, Jie Liu
, Bohan Wang
, Jie Tang
, Gangshan Wu
:
RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues. 4748-4756 - Siqi Wang
, Chao Liang
, Yunfan Gao
, Yang Liu
, Jing Li
, Haofen Wang
:
Decoding Urban Industrial Complexity: Enhancing Knowledge-Driven Insights via IndustryScopeGPT. 4757-4765 - Yuanbin Fu
, Jie Ying
, Houlei Lv
, Xiaojie Guo
:
Semi-supervised Camouflaged Object Detection from Noisy Data. 4766-4775 - Bolei Chen
, Jiaxu Kang
, Ping Zhong
, Yixiong Liang
, Yu Sheng
, Jianxin Wang
:
Embodied Contrastive Learning with Geometric Consistency and Behavioral Awareness for Object Navigation. 4776-4785 - Jia-Li Yin
, Menghao Chen
, Jin Han
, Bo-Hao Chen
, Ximeng Liu:
Adversarial Example Quality Assessment: A Large-scale Dataset and Strong Baseline. 4786-4794 - Ye Jing
, Xinpei Zhao
:
DQ-Former: Querying Transformer with Dynamic Modality Priority for Cognitive-aligned Multimodal Emotion Recognition in Conversation. 4795-4804 - Xicong Wang
, Huiyuan Fu
, Jiaxuan Wang
, Xin Wang
, Heng Zhang
, Huadong Ma
:
Exploring in Extremely Dark: Low-Light Video Enhancement with Real Events. 4805-4813 - Qing Zhang
, Haocheng Lv
, Jie Liu
, Zhiyun Chen
, Jianyong Duan
, Hao Wang
, Li He
, Mingying Xu
:
An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism. 4814-4822 - Kangpeng Hu
, Quansen Sun
, Yinghui Sun
, Tao Wang
:
Interactive Segmentation by Considering First-Click Intentional Ambiguity. 4823-4831 - Leqi Shen
, Sicheng Zhao
, Yifeng Zhang
, Hui Chen
, Jundong Zhou
, Pengzhang Liu
, Yongjun Bao
, Guiguang Ding
:
Multi-Label Learning with Block Diagonal Labels. 4832-4840 - Wentao He
, Jianfeng Ren
, Ruibin Bai
, Xudong Jiang
:
Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning. 4841-4850 - Wenxi Li
, Yuchen Guo
, Jilai Zheng
, Haozhe Lin
, Chao Ma
, Lu Fang
, Xiaokang Yang
:
SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer. 4851-4860 - Bo Liu
, Zexin Lu
, Yan Wang
:
Towards Medical Vision-Language Contrastive Pre-training via Study-Oriented Semantic Exploration. 4861-4870 - Zihao Liu
, Xiaoyu Wu
, Shengjin Wang
, Jiayao Qian
:
Adaptively Building a Video-language Model for Video Captioning and Retrieval without Massive Video Pretraining. 4871-4880 - Wenhao Guo
, Peng Lu
, Xujun Peng
, Zhaoran Zhao
, Ji Qiu
, Xiangtao Dong
:
BCSCN: Reducing Domain Gap through Bézier Curve basis-based Sparse Coding Network for Single-Image Super-Resolution. 4881-4889 - Yi Tu
, Chong Zhang
, Ya Guo
, Huan Chen
, Jinyang Tang
, Huijia Zhu
, Qi Zhang
:
UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents. 4890-4898 - Tao Ling
, Siping Shi
, Hao Wang
, Chuang Hu
, Dan Wang
:
Federated Morozov Regularization for Shortcut Learning in Privacy Preserving Learning with Watermarked Image Data. 4899-4908 - Jinfu Liu
, Chen Chen
, Mengyuan Liu
:
Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition. 4909-4918 - Zewen Du
, Zhenjiang Hu
, Guiyu Zhao
, Ying Jin
, Hongbin Ma
:
LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention. 4919-4927 - Shichen Lu
, Longteng Guo
, Wenxuan Wang
, Zijia Zhao
, Tongtian Yue
, Jing Liu
, Si Liu
:
Collaborative Training of Tiny-Large Vision Language Models. 4928-4937 - Xudong Zhou
, Tianxiang Chen
:
BSBP-RWKV: Background Suppression with Boundary Preservation for Efficient Medical Image Segmentation. 4938-4946 - Yuxing Zhang
, Siyuan Meng
, Chunchun Chen
, Mengyao Peng
, Hongyan Gu
, Xinli Huang
:
LinkThief: Combining Generalized Structure Knowledge with Node Similarity for Link Stealing Attack against GNN. 4947-4956 - Yeqing Shen
, Shang Li
, Kun Song
:
Restoring Real-World Degraded Events Improves Deblurring Quality. 4957-4966 - Xiao Liang
, Yanlei Zhang
, Di Wang
, Haodi Zhong
, Ronghan Li
, Quan Wang
:
Divide and Conquer: Isolating Normal-Abnormal Attributes in Knowledge Graph-Enhanced Radiology Report Generation. 4967-4975 - Zhen Wang
, Dongyuan Li
, Guang Li
, Ziqing Zhang
, Renhe Jiang
:
Multimodal Low-light Image Enhancement with Depth Information. 4976-4985 - Zishuo Wang
, Wenhao Zhou
, Jinglin Xu
, Yuxin Peng
:
SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection. 4986-4994 - Xu Han
, Yuan Tang
, Zhaoxuan Wang
, Xianzhi Li
:
Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model. 4995-5004 - Wenqi Ren
, Ruihao Xia
, Meng Zheng
, Ziyan Wu
, Yang Tang
, Nicu Sebe
:
Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models. 5005-5014 - Xuefeng Yin
, Chenyang Zhu
, Shanglai Qu
, Yuqi Li
, Kai Xu
, Baocai Yin
, Xin Yang:
CSO: Constraint-Guided Space Optimization for Active Scene Mapping. 5015-5024 - Luoyi Sun
, Xuenan Xu
, Mengyue Wu
, Weidi Xie
:
Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning. 5025-5034 - Xinyue Liu
, Jianyuan Wang
, Biao Leng
, Shuo Zhang
:
Dual-Modeling Decouple Distillation for Unsupervised Anomaly Detection. 5035-5044 - Huimin Ma
, Siwei Wang
, Shengju Yu
, Suyuan Liu
, Junjie Huang
, Huijun Wu
, Xinwang Liu
, En Zhu
:
Automatic and Aligned Anchor Learning Strategy for Multi-View Clustering. 5045-5054 - Shengyang Sun
, Jiashen Hua
, Junyi Feng
, Dongxu Wei
, Baisheng Lai
, Xiaojin Gong
:
TDSD: Text-Driven Scene-Decoupled Weakly Supervised Video Anomaly Detection. 5055-5064 - Yang Xin
, Yu Zhou
, Jianmin Jiang
:
RobustFace: Adaptive Mining of Noise and Hard Samples for Robust Face Recognitions. 5065-5073 - Xiang Ma
, Xuemei Li
, Lexin Fang
, Caiming Zhang
:
Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching. 5074-5082 - Chunli Peng
, Xuan Dong
, Tiantian Cao
, Zhengqing Li
, Kun Dong
, Weixin Li
:
ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig. 5083-5091 - Yang Fang
, Xuefeng Rao
, Xinbo Gao
, Weisheng Li
, Zijian Min
:
MTSNet: Joint Feature Adaptation and Enhancement for Text-Guided Multi-view Martian Terrain Segmentation. 5092-5101 - Le Jiang
, Yan Huang
, Lianxin Xie
, Wen Xue
, Cheng Liu, Si Wu
, Hau-San Wong
:
Hunting Blemishes: Language-guided High-fidelity Face Retouching Transformer with Limited Paired Data. 5102-5111 - Yijia Guo
, Yuanxi Bai
, Liwen Hu
, Ziyi Guo
, Mianzhi Liu
, Yu Cai
, Tiejun Huang
, Lei Ma
:
PRTGS: Precomputed Radiance Transfer of Gaussian Splats for Real-Time High-Quality Relighting. 5112-5120 - Mingcan Xiang
, Jiaxun Tang
, Qizheng Yang
, Hui Guan
, Tongping Liu
:
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model. 5121-5130 - Xinwei Zhang
, Aishan Liu
, Tianyuan Zhang
, Siyuan Liang
, Xianglong Liu
:
Towards Robust Physical-world Backdoor Attacks on Lane Detection. 5131-5140 - Longtao Jiang
, Min Wang
, Zecheng Li
, Yao Fang
, Wengang Zhou
, Houqiang Li
:
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval. 5141-5150 - Pinxue Guo
, Wanyun Li
, Hao Huang
, Lingyi Hong
, Xinyu Zhou
, Zhaoyu Chen
, Jinglun Li
, Kaixun Jiang
, Wei Zhang
, Wenqiang Zhang
:
X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation. 5151-5160 - Ling Huang
, Wenqian Dong
, Song Xiao
, Jiahui Qu
, Yuanbo Yang
, Yunsong Li
:
Language-Guided Visual Prompt Compensation for Multi-Modal Remote Sensing Image Classification with Modality Absence. 5161-5170 - Zening Lin
, Jiapeng Wang
, Teng Li
, Wenhui Liao
, Dayi Huang
, Longfei Xiong
, Lianwen Jin
:
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction. 5171-5180 - Haojian Huang
, Xiaozhen Qiao
, Zhuo Chen
, Haodong Chen
, Bingyu Li
, Zhe Sun
, Mulin Chen
, Xuelong Li
:
CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning. 5181-5190 - Shuai Zhao
, Yongkun Du
, Zhineng Chen
, Yu-Gang Jiang
:
Decoder Pre-Training with only Text for Scene Text Recognition. 5191-5200 - Naibo Wang
, Yuchen Deng
, Wenjie Feng
, Shichen Fan
, Jianwei Yin
, See-Kiong Ng
:
One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity. 5201-5210 - Wendong Huang
, Jinwu Hu
, Xiuli Bi
, Bin Xiao:
Anatomical Prior Guided Spatial Contrastive Learning for Few-Shot Medical Image Segmentation. 5211-5220 - Libo Long
, Xiao Hu
, Jochen Lang
:
Learning to Handle Large Obstructions in Video Frame Interpolation. 5221-5229 - Hefei Huang
, Xu Jia
, Xinyu Zhang
, Shengming Li
, Huchuan Lu
:
Event-Guided Rolling Shutter Correction with Time-Aware Cross-Attentions. 5230-5239 - Xibiao Wang
, Hang Gao
, Xindian Wei
, Liang Peng
, Rui Li
, Cheng Liu, Si Wu
, Hau-San Wong
:
Contrastive Graph Distribution Alignment for Partially View-Aligned Clustering. 5240-5249 - Xudong Cai
, Yongcai Wang
, Lun Luo
, Minhang Wang
, Deying Li
, Jintao Xu
, Weihao Gu
, Rui Ai
:
PRISM: PRogressive dependency maxImization for Scale-invariant image Matching. 5250-5259 - Yang Du
, Yuqi Liu
, Qin Jin
:
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval. 5260-5269 - Wen Luo
, Yu Xia
, Tianshu Shen
, Sujian Li
:
Shapley Value-based Contrastive Alignment for Multimodal Information Extraction. 5270-5279 - Hao Yu
, Xin Yang
, Xin Gao
, Yihui Feng
, Hao Wang
, Yan Kang
, Tianrui Li
:
Overcoming Spatial-Temporal Catastrophic Forgetting for Federated Class-Incremental Learning. 5280-5288 - Haibo Wang
, Chenghang Lai
, Yixuan Sun
, Weifeng Ge
:
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering. 5289-5298 - Shudong Huang
, Hecheng Cai
, Hao Dai
, Wentao Feng
, Jiancheng Lv
:
Adaptive Instance-wise Multi-view Clustering. 5299-5307 - Ze Yuan
, Jinyang Guo
, Dakai An
, Junran Wu
, He Zhu
, Jianhao Li
, Xueyuan Chen
, Ke Xu
, Jiaheng Liu
:
VRDistill: Vote Refinement Distillation for Efficient Indoor 3D Object Detection. 5308-5317 - Sunoh Kim
, Daeho Um
, Hyunjun Choi
, Jin Young Choi
:
Learnable Negative Proposals Using Dual-Signed Cross-Entropy Loss for Weakly Supervised Video Moment Localization. 5318-5327 - Yansong Qu
, Shaohui Dai
, Xinyang Li
, Jianghang Lin
, Liujuan Cao
, Shengchuan Zhang
, Rongrong Ji
:
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane. 5328-5337 - Huan Yao
, Changxing Ding
, Xuanda Xu
, Zhifeng Lin
:
Decoupling Heterogeneous Features for Robust 3D Interacting Hand Poses Estimation. 5338-5346 - Zhiyu Zhu
, Zhibo Jin
, Jiayu Zhang
, Huaming Chen
:
Enhancing Model Interpretability with Local Attribution over Global Exploration. 5347-5355 - Ruxue Yan
, Wenya Guo
, Xubo Liu
, Xumeng Liu
, Ying Zhang
, Xiaojie Yuan
:
Tracking-forced Referring Video Object Segmentation. 5356-5364 - Xin Zhang
, Shenghua Zhong
, Jianmin Jiang
:
Effective Optimization of Root Selection Towards Improved Explanation of Deep Classifiers. 5365-5373 - Guangchen Shi
, Wei Zhu
, Yirui Wu
, Danhuai Zhao
, Kang Zheng
, Tong Lu
:
Few-shot Semantic Segmentation via Perceptual Attention and Spatial Control. 5374-5383 - Zibo Ma
, Bo Zhang
, Zheng Zhang
, Wu Liu
, Wufan Wang
, Hui Gao
, Wendong Wang
:
ADDG: An Adaptive Domain Generalization Framework for Cross-Plane MRI Segmentation. 5384-5392 - Lixiang Ru
, Xin Guo
, Lei Yu
, Yingying Zhang
, Jiangwei Lao
, Jian Wang
, Jingdong Chen
, Yansheng Li
, Ming Yang
:
Parameter-Efficient Complementary Expert Learning for Long-Tailed Visual Recognition. 5393-5402 - Tianyuan Zhang
, Lu Wang
, Hainan Li
, Yisong Xiao
, Siyuan Liang
, Aishan Liu
, Xianglong Liu
, Dacheng Tao
:
LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions. 5403-5412 - Xinyue Zhang
, Tingjin Luo
, Yueying Liu
, Chenping Hou
:
Imbalanced Multi-instance Multi-label Learning via Coding Ensemble and Adaptive Thresholds. 5413-5422 - Pengxu Chen
, Huazhong Liu
, Jihong Ding
, Jiawen Luo
, Peng Tan
, Laurence T. Yang
:
Holistic-CAM: Ultra-lucid and Sanity Preserving Visual Interpretation in Holistic Stage of CNNs. 5423-5431 - Yihao Wang
, Meng Yang
, Rui Cao
:
Fine-grained Semantic Alignment with Transferred Person-SAM for Text-based Person Retrieval. 5432-5441 - Qijie Wang
, Guandu Liu
, Bin Wang
:
CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification. 5442-5450 - Rongyu Zhang
, Zefan Cai
, Huanrui Yang
, Zidong Liu
, Denis A. Gudovskiy
, Tomoyuki Okuno
, Yohei Nakata
, Kurt Keutzer
, Baobao Chang
, Yuan Du
, Li Du
, Shanghang Zhang
:
VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness. 5451-5459 - Linhui Xiao
, Xiaoshan Yang
, Fang Peng
, Yaowei Wang
, Changsheng Xu
:
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding. 5460-5469 - Yunfeng Fan
, Wenchao Xu
, Haozhao Wang
, Junhong Liu
, Song Guo
:
Detached and Interactive Multimodal Learning. 5470-5478 - Chenglong Zhang
, Xinyan Liang
, Peng Zhou
, Zhaolong Ling
, Yingwei Zhang
, Xingyu Wu
, Weiguo Sheng
, Bingbing Jiang
:
Scalable Multi-view Unsupervised Feature Selection with Structure Learning and Fusion. 5479-5488 - Chengyi Yang
, Mingda Dong
, Xiaoyue Zhang
, Jiayin Qi
, Aimin Zhou
:
Introducing Common Null Space of Gradients for Gradient Projection Methods in Continual Learning. 5489-5497 - Masoumeh Zareapoor
, Pourya Shamsolmoali
, Huiyu Zhou
, Yue Lu
, Salvador García
:
Fractional Correspondence Framework in Detection Transformer. 5498-5506 - Geuntaek Lim
, Hyunwoo Kim
, Joonsoo Kim
, Yukyung Choi
:
Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization. 5507-5516 - Xihong Yang
, Erxue Min
, Ke Liang
, Yue Liu
, Siwei Wang
, Sihang Zhou
, Huijun Wu
, Xinwang Liu
, En Zhu
:
GraphLearner: Graph Node Clustering with Fully Learnable Augmentation. 5517-5526 - Hongqiu Wang
, Wei Wang
, Haipeng Zhou
, Huihui Xu
, Shaozhi Wu
, Lei Zhu
:
Language-Driven Interactive Shadow Detection. 5527-5536 - Jinyu Cai
, Yunhe Zhang
, Zhoumin Lu
, Wenzhong Guo
, See-Kiong Ng
:
Towards Effective Federated Graph Anomaly Detection via Self-boosted Knowledge Distillation. 5537-5546 - Chaofan Huo
, Ye Shi
, Jingya Wang
:
Monocular Human-Object Reconstruction in the Wild. 5547-5555 - Baoqi Gao
, Daoxu Sheng
, Lei Zhang
, Qi Qi
, Bo He
, Zirui Zhuang
, Jingyu Wang
:
STAR-VP: Improving Long-term Viewport Prediction in 360° Videos via Space-aligned and Time-varying Fusion. 5556-5565 - Hu Gao, Jing Yang, Ying Zhang, Jingfan Yang, Bowen Ma, Depeng Dang:
Learning Optimal Combination Patterns for Lightweight Stereo Image Super-Resolution. 5566-5574 - Yifan Wang
, Wuliang Huang
, Lei Li
, Chun Yuan
:
Semantic Distillation from Neighborhood for Composed Image Retrieval. 5575-5583 - Zhentao He
, Changqun Xia
, Shengye Qiao
, Jia Li
:
Text-prompt Camouflaged Instance Segmentation with Graduated Camouflage Learning. 5584-5593 - Zuyu Zhang
, Yan Li
, Byung-Seok Shin
:
Embracing Domain Gradient Conflicts: Domain Generalization Using Domain Gradient Equilibrium. 5594-5603 - Ting Zhe
, Jing Zhang
, Yongqian Li
, Yong Luo
, Han Hu
, Dacheng Tao
:
Multi-Granularity Hand Action Detection. 5604-5613 - Xingyuan Mao
, Yuwen Liu
, Lianyong Qi
, Li Duan
, Xiaolong Xu
, Xuyun Zhang
, Wanchun Dou
, Amin Beheshti
, Xiaokang Zhou
:
Cluster-driven Personalized Federated Recommendation with Interest-aware Graph Convolution Network for Multimedia. 5614-5622 - Yuan Sun
, Kaiming Liu
, Yongxiang Li
, Zhenwen Ren
, Jian Dai
, Dezhong Peng
:
Distribution Consistency Guided Hashing for Cross-Modal Retrieval. 5623-5632 - Luanyuan Dai
, Xiaoyu Du
, Jinhui Tang
:
TrGa: Reconsidering the Application of Graph Neural Networks in Two-View Correspondence Pruning. 5633-5642 - Han Jiang
, Haoyu Tang
, Ming Yan
, Ji Zhang
, Mingzhu Xu
, Yupeng Hu
, Jihua Zhu
, Liqiang Nie
:
Revisiting Unsupervised Temporal Action Localization: The Primacy of High-Quality Actionness and Pseudolabels. 5643-5652 - Yu Liao
, Xinfeng Zhang
, Rui Yang
, Jianwei Tao
, Bai Liu
, Zhipeng Hu
, Shuang Wang
, Zeng Zhao
:
Selection and Reconstruction of Key Locals: A Novel Specific Domain Image-Text Retrieval Method. 5653-5662 - Wei Yang
, Qingchen Yang
:
Multimodal-aware Multi-intention Learning for Recommendation. 5663-5672 - Liupeng Li
, Yuhua Zheng
, Shupeng Liu
, Xiaoyin Xu
, Taihao Li
:
Domain Knowledge Enhanced Vision-Language Pretrained Model for Dynamic Facial Expression Recognition. 5673-5682 - Yuting Zhang
, Zhao Zhang
, Yiqing Wu
, Ying Sun
, Fuzhen Zhuang
, Wenhui Yu
, Lantao Hu
, Han Li
, Kun Gai
, Zhulin An
, Yongjun Xu
:
Tag Tree-Guided Multi-grained Alignment for Multi-Domain Short Video Recommendation. 5683-5691 - Kai Shao
, Rui Wang
, Yixue Hao
, Long Hu
, Min Chen
, Hans-Arno Jacobsen
:
Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition. 5692-5701 - Xinyu Li
, Wenqing Ye
, Yueyi Zhang
, Xiaoyan Sun
:
GRACE: GRadient-based Active Learning with Curriculum Enhancement for Multimodal Sentiment Analysis. 5702-5711 - Yuchen Pan
, Junjun Jiang
, Kui Jiang
, Xianming Liu
:
Disentangled-Multimodal Privileged Knowledge Distillation for Depression Recognition with Incomplete Multimodal Data. 5712-5721 - Yuanyuan Liu
, Yuxuan Huang
, Shuyang Liu
, Yibing Zhan
, Zijing Chen
, Zhe Chen
:
Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting. 5722-5731 - Aoqiang Zhu
, Min Hu
, Xiaohua Wang
, Jiaoyun Yang
, Yiming Tang
, Fuji Ren
:
KEBR: Knowledge Enhanced Self-Supervised Balanced Representation for Multimodal Sentiment Analysis. 5732-5741 - Zining Wang
, Jinyang Guo
, Ruihao Gong
, Yang Yong
, Aishan Liu
, Yushi Huang
, Jiaheng Liu
, Xianglong Liu
:
PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models. 5742-5751 - Longan Wang
, Yang Qin
, Yuan Sun
, Dezhong Peng
, Xi Peng
, Peng Hu
:
Robust Contrastive Cross-modal Hashing with Noisy Labels. 5752-5760 - Xiying Zheng
, Yukang Zhang
, Yang Lu
, Hanzi Wang
:
Semi-supervised Visible-Infrared Person Re-identification via Modality Unification and Confidence Guidance. 5761-5770 - Ziyang Zhou
, Pinghui Wang
, Zi Liang
, Ruofei Zhang
, Haitao Bai
:
PAIR: Pre-denosing Augmented Image Retrieval Model for Defending Adversarial Patches. 5771-5779 - Daiqing Wu
, Dongbao Yang
, Yu Zhou
, Can Ma
:
Robust Multimodal Sentiment Analysis of Image-Text Pairs by Distribution-Based Feature Recovery and Fusion. 5780-5789 - Kunlun Xu
, Haozhuo Zhang
, Yu Li
, Yuxin Peng
, Jiahuan Zhou
:
Mitigate Catastrophic Remembering via Continual Knowledge Purification for Noisy Lifelong Person Re-Identification. 5790-5799 - Wei Shen
, Mang Ye
, Wenke Huang
:
Resisting Over-Smoothing in Graph Neural Networks via Dual-Dimensional Decoupling. 5800-5809 - Junlin Fang
, Wenya Wang
, Guosheng Lin
, Fengmao Lv
:
Sentiment-oriented Sarcasm Integration for Video Sentiment Analysis Enhancement with Sarcasm Assistance. 5810-5819 - Fanfan Wang
, Heqing Ma
, Xiangqing Shen
, Jianfei Yu
, Rui Xia
:
Observe before Generate: Emotion-Cause aware Video Caption for Multimodal Emotion Cause Generation in Conversations. 5820-5828 - Yang Yang
, Liyuan Cao
, Haoyu Shi
, Huaiwen Zhang
:
Multi-Instance Multi-Label Learning for Text-motion Retrieval. 5829-5837 - Hongzu Su
, Jingjing Li
, Fengling Li
, Ke Lu
, Lei Zhu
:
SOIL: Contrastive Second-Order Interest Learning for Multimodal Recommendation. 5838-5846 - Jiansong Qi
, Yaping Huang
, Ying Zhang
, Sihui Zhang
, Mei Tian
, Yi Tian
, Fanchao Meng
, Lin Guan
, Tianyi Chang
:
Visual Question Answering Driven Eye Tracking Paradigm for Identifying Children with Autism Spectrum Disorder. 5847-5855 - Dongxiao He
, Jinghan Zhang
, Xiaobao Wang
, Meng Ge
, Zhiyong Feng
, Longbiao Wang
, Xiaoke Ma
:
TUT4CRS: Time-aware User-preference Tracking for Conversational Recommendation System. 5856-5864 - Guoqing Yang
, Zhiming Luo
, Jianzhe Gao
, Yingxin Lai
, Kun Yang
, Yifan He
, Shaozi Li
:
A Multilevel Guidance-Exploration Network and Behavior-Scene Matching Method for Human Behavior Anomaly Detection. 5865-5873 - Zekun Ai
, Xiaotong Luo
, Yanyun Qu
, Yuan Xie
:
SkipVSR: Adaptive Patch Routing for Video Super-Resolution with Inter-Frame Mask. 5874-5882 - Qianxin Huang
, Siyao Peng
, Xiaobo Shen
, Yunhao Yuan
, Shirui Pan
:
Similarity Preserving Transformer Cross-Modal Hashing for Video-Text Retrieval. 5883-5891 - Wenxiao Zhang
, Hossein Rahmani
, Xun Yang
, Jun Liu
:
Reverse2Complete: Unpaired Multimodal Point Cloud Completion via Guided Diffusion. 5892-5901 - Yitong Sun
, Yao Huang
, Xingxing Wei
:
Embodied Laser Attack: Leveraging Scene Priors to Achieve Agent-based Robust Non-contact Attacks. 5902-5910 - Yipo Huang, Xiangfei Sheng, Zhichao Yang, Quan Yuan, Zhichao Duan, Pengfei Chen, Leida Li, Weisi Lin, Guangming Shi:
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception. 5911-5920 - Ji Qiu
, Peng Lu
, Xujun Peng
, Wenhao Guo
, Zhaoran Zhao
, Xiangtao Dong
:
Learning Realistic Sketching: A Dual-agent Reinforcement Learning Approach. 5921-5929 - Xiaobo Shen
, Gaoyao Yu
, Yinfan Chen
, Xichen Yang
, Yuhui Zheng
:
Graph Convolutional Semi-Supervised Cross-Modal Hashing. 5930-5938 - Harry Cheng
, Yangyang Guo
, Tianyi Wang
, Liqiang Nie
, Mohan S. Kankanhalli
:
Diffusion Facial Forgery Detection. 5939-5948 - Hengxing Liu
, Mingjia Li
, Xiaojie Guo
:
Regional Attention For Shadow Removal. 5949-5957 - Hao Fang
, Haoyuan Zhao
, Jianxin Shi
, Miao Zhang
, Guanzhen Wu
, Yi Ching Chou
, Feng Wang
, Jiangchuan Liu
:
Robust Live Streaming over LEO Satellite Constellations: Measurement, Analysis, and Handover-Aware Adaptation. 5958-5966 - Qi Zang
, Shuang Wang
, Dong Zhao
, Yang Hu
, Dou Quan
, Jinlong Li
, Nicu Sebe
, Zhun Zhong
:
Generalized Source-Free Domain-adaptive Segmentation via Reliable Knowledge Propagation. 5967-5976 - Yunqiang Pei
, Jialei Tang
, Qihang Tang
, Mingfeng Zha
, Dongyu Xie
, Guoqing Wang
, Zhitao Liu
, Ning Xie
, Peng Wang
, Yang Yang
, Hengtao Shen
:
Emotion Recognition in HMDs: A Multi-task Approach Using Physiological Signals and Occluded Faces. 5977-5986 - Xiaochao Pan
, Jiawei Yao
, Hongrui Kou
, Tong Wu
, Canran Xiao
:
HarmonicNeRF: Geometry-Informed Synthetic View Augmentation for 3D Scene Reconstruction in Driving Scenarios. 5987-5996 - Guangyao Li
, Henghui Du
, Di Hu
:
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues. 5997-6005 - Jiongming Qin
, Fei Luo
, Tuo Cao
, Wenju Xu
, Chunxia Xiao
:
HS-Surf: A Novel High-Frequency Surface Shell Radiance Field to Improve Large-Scale Scene Rendering. 6006-6014 - Gang Wu
, Junjun Jiang
, Kui Jiang
, Xianming Liu
:
Harmony in Diversity: Improving All-in-One Image Restoration via Multi-Task Collaboration. 6015-6023 - Meichen Liu
, Shuting He
, Songnan Lin
, Bihan Wen
:
Dual-head Genre-instance Transformer Network for Arbitrary Style Transfer. 6024-6032 - Yingjie Zhou
, Zicheng Zhang
, Wei Sun
, Xiaohong Liu
, Xiongkuo Min
, Guangtao Zhai
:
Subjective and Objective Quality-of-Experience Assessment for 3D Talking Heads. 6033-6042 - Zhi Zhou
, Junke Zhu
, Zhangjin Huang
:
Gaussian Splatting with Neural Basis Extension. 6043-6052 - Zhenyu Zhang
, Guangyao Chen
, Yixiong Zou
, Yuhua Li
, Ruixuan Li
:
Learning Unknowns from Unknowns: Diversified Negative Prototypes Generator for Few-shot Open-Set Recognition. 6053-6062 - Jinxiao Zhang
, Runmin Dong
, Juepeng Zheng
, Mengxuan Chen
, Lixian Zhang
, Yi Zhao
, Haohuan Fu
:
Spatial-Temporal Context Model for Remote Sensing Imagery Compression. 6063-6072 - Weiying Xie
, Mei Yuan
, Jitao Ma
, Yunsong Li
:
Adaptive Pruning of Channel Spatial Dependability in Convolutional Neural Networks. 6073-6082 - Heng Fang
, Sheng Huang
, Wenhao Tang
, Luwen Huangfu
, Bo Liu
:
SAM-MIL: A Spatial Contextual Aware Multiple Instance Learning Approach for Whole Slide Image Classification. 6083-6092 - Wenhao Shen
, Wanqi Yin
, Hao Wang
, Chen Wei
, Zhongang Cai
, Lei Yang
, Guosheng Lin
:
HMR-Adapter: A Lightweight Adapter with Dual-Path Cross Augmentation for Expressive Human Mesh Recovery. 6093-6102 - Shalayiding Sirejiding
, Bayram Bayramli
, Yuxiang Lu
, Yuwen Yang
, Tamam Alsarhan
, Hongtao Lu
, Yue Ding
:
Task-Interaction-Free Multi-Task Learning with Efficient Hierarchical Feature Representation. 6103-6112 - Yiyong Xiao
, Kai Shu
, Haoyi Zhang
, Baohua Yin
, Wai Seng Cheang
, Haoyang Wang
, Jiechao Gao
:
EGGesture: Entropy-Guided Vector Quantized Variational AutoEncoder for Co-Speech Gesture Generation. 6113-6122 - Yuqi Sun
, Qing Lin
, Weimin Tan
, Bo Yan
:
Audio-Driven Identity Manipulation for Face Inpainting. 6123-6132 - Leilei Ma
, Hongxing Xie
, Lei Wang
, Yanping Fu
, Dengdi Sun
, Haifeng Zhao
:
Text-Region Matching for Multi-Label Image Recognition with Missing Labels. 6133-6142 - Zhengwei Yin
, Guixu Lin
, Mengshun Hu
, Hao Zhang
, Yinqiang Zheng
:
FlexIR: Towards Flexible and Manipulable Image Restoration. 6143-6152 - Hamed Alimohammadzadeh
, Shahram Ghandeharizadeh
:
Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks. 6153-6161 - Xiaowen Cai
, Yunbo Tao
, Daizong Liu
, Pan Zhou
, Xiaoye Qu
, Jianfeng Dong
, Keke Tang
, Lichao Sun
:
Frequency-Aware GAN for Imperceptible Transfer Attack on 3D Point Clouds. 6162-6171 - Mingjin Zhang
, Shilong Liu
, Yuanjun Ouyang
, Jie Guo
, Zhihong Tang
, Yunsong Li
:
Explore Hybrid Modeling for Moving Infrared Small Target Detection. 6172-6181 - Yuhui Quan
, Xiaoheng Tan
, Yan Huang
, Yong Xu
, Hui Ji
:
Enhancing Underwater Images via Asymmetric Multi-Scale Invertible Networks. 6182-6191 - Lishuang Zhan
, Enting Ying
, Jiabao Gan
, Shihui Guo
, Boyu Gao
, Yipeng Qin
:
SATPose: Improving Monocular 3D Pose Estimation with Spatial-aware Ground Tactility. 6192-6201 - Hongjian Zhan
, Yangfu Li
, Yu-Jie Xiong
, Umapada Pal
, Yue Lu
:
Free Lunch: Frame-level Contrastive Learning with Text Perceiver for Robust Scene Text Recognition in Lightweight Models. 6202-6211 - Xin Wang
, Kai Chen
, Xingjun Ma
, Zhineng Chen
, Jingjing Chen
, Yu-Gang Jiang
:
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning. 6212-6221 - Xudong Lv
, Zhiwei He
, Yuxiang Yang
, Jiahao Nie
, Jing Zhang
:
SAR-SLAM: Self-Attentive Rendering-based SLAM with Neural Point Cloud Encoding. 6222-6231 - Shao-Kui Zhang
, Junkai Huang
, Liang Yue
, Jia-Tong Zhang
, Jia-Hong Liu
, Yu-Kun Lai
, Song-Hai Zhang
:
SceneExpander: Real-Time Scene Synthesis for Interactive Floor Plan Editing. 6232-6240 - Long Tian
, Hongyi Zhao
, Ruiying Lu
, Rongrong Wang
, Yujie Wu
, Liming Wang
, Xiongpeng He
, Xiyang Liu
:
FOCT: Few-shot Industrial Anomaly Detection with Foreground-aware Online Conditional Transport. 6241-6249 - Chuang Liu
, Yichao Cao
, Xiu Su
, Haogang Zhu
:
Universal Frequency Domain Perturbation for Single-Source Domain Generalization. 6250-6259 - Yushun Tang
, Shuoshuo Chen
, Jiyuan Jia
, Yi Zhang
, Zhihai He
:
Domain-Conditioned Transformer for Fully Test-time Adaptation. 6260-6269 - Zhiru Wang
, Shiyun Xie
, Chengwei Pan
, Guoping Wang
:
SpecGaussian with Latent Features: A High-quality Modeling of the View-dependent Appearance for 3D Gaussian Splatting. 6270-6278 - Wencheng Han
, Chen Zhang
, Yang Zhou
, Wentao Liu
, Chen Qian
, Chengzhong Xu
, Jianbing Shen
:
Prior Metadata-Driven RAW Reconstruction: Eliminating the Need for Per-Image Metadata. 6279-6287 - Fulin Luo
, Yi Liu
, Xiuwen Gong
, Zhixiong Nan
, Tan Guo
:
EMVCC: Enhanced Multi-View Contrastive Clustering for Hyperspectral Images. 6288-6296 - Fan Nie
, Jiangqun Ni
, Jian Zhang
, Bin Zhang
, Weizhe Zhang
:
FRADE: Forgery-aware Audio-distilled Multimodal Learning for Deepfake Detection. 6297-6306 - Siru Zhong
, Xixuan Hao
, Yibo Yan
, Ying Zhang
, Yangqiu Song
, Yuxuan Liang
:
UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation. 6307-6315 - Yuzhen Niu
, Lifen Yang
, Rui Xu
, Yuezhou Li
, Yuzhong Chen
:
MiNet: Weakly-Supervised Camouflaged Object Detection through Mutual Interaction between Region and Edge Cues. 6316-6325 - Delong Zhang
, Yi-Xing Peng
, Xiao-Ming Wu
, Ancong Wu
, Weishi Zheng
:
PixelFade: Privacy-preserving Person Re-identification with Noise-guided Progressive Replacement. 6326-6334 - Wei He
, Xiang Li
, Shengtian Xu
, Yuzheng Chen
, Chan-In Sio
, Ge Lin Kan
, Lik-Hang Lee
:
MetaDragonBoat: Exploring Paddling Techniques of Virtual Dragon Boating in a Metaverse Campus. 6335-6344 - Yuxuan Lu
, Jiahao Nie
, Zhiwei He
, Hongjie Gu
, Xudong Lv
:
VoxelTrack: Exploring Multi-level Voxel Representation for 3D Point Cloud Object Tracking. 6345-6354 - Yu Liu
, Longhan Feng
, Qi Jia
, Zezheng Liu
, Zi-Huang Cao
:
Two Teachers Are Better Than One: Semi-supervised Elliptical Object Detection by Dual-Teacher Collaborative Guidance. 6355-6363 - Yao Luo
, Ming Yang
, Jinhui Tang
:
Dual-view Pyramid Network for Video Frame Interpolation. 6364-6373 - Junxiong Lin
, Zen Tao
, Xuan Tong
, Xinji Mai
, Haoran Wang
, Boyang Wang
, Yan Wang
, Qing Zhao
, Jiawen Yu
, Yuxuan Lin
, Shaoqi Yan
, Shuyong Gao
, Wenqiang Zhang
:
Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution. 6374-6383 - Wenxiao Zhang
, Ziqi Wang
, Li Xu
, Xun Yang
, Jun Liu
:
Informative Point cloud Dataset Extraction for Classification via Gradient-based Points Moving. 6384-6393 - Jia-Hong Liu
, Shao-Kui Zhang
, Chuyue Zhang
, Song-Hai Zhang
:
Controllable Procedural Generation of Landscapes. 6394-6403 - Fangjian Liao
, Xingxing Zou
, Waikeung Wong
:
Uni-DlLoRA: Style Fine-Tuning for Fashion Image Translation. 6404-6413 - Yusen Wang
, Kaixuan Zhou
, Wenxiao Zhang
, Chunxia Xiao
:
MegaSurf: Scalable Large Scene Neural Surface Reconstruction. 6414-6423 - Zherui Qiu
, Chenqu Ren
, Kaiwen Song
, Xiaoyi Zeng
, Leyuan Yang
, Juyong Zhang
:
Deformable NeRF using Recursively Subdivided Tetrahedra. 6424-6432 - Mamta
, Gopendra Vikram Singh
, Deepak Raju Kori
, Asif Ekbal
:
Aspect-Based Multimodal Mining: Unveiling Sentiments, Complaints, and Beyond in User-Generated Content. 6433-6442 - Zichen Liu
, Yuxin Peng
, Jiahuan Zhou
:
InsVP: Efficient Instance Visual Prompting from Image Itself. 6443-6452 - Zidu Wang
, Xiangyu Zhu
, Jiang Yu
, Tianshuo Zhang
, Zhen Lei
:
S2TD-Face: Reconstruct a Detailed 3D Face with Controllable Texture from a Single Sketch. 6453-6462 - Satoshi Kosugi
:
Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement. 6463-6471 - Xun Jiang
, Zhuoyuan Wei
, Shenshen Li
, Xing Xu
, Jingkuan Song
, Heng Tao Shen
:
Counterfactually Augmented Event Matching for De-biased Temporal Sentence Grounding. 6472-6481 - Bingzhi Chen
, Ruihan Liu
, Yishu Liu
, Xiaozhao Fang
, Jiahui Pan
, Guangming Lu
, Zheng Zhang
:
Stay Focused is All You Need for Adversarial Robustness. 6482-6491 - Zhi Zeng
, Minnan Luo
, Xiangzheng Kong
, Huan Liu
, Hao Guo
, Hao Yang
, Zihan Ma
, Xiang Zhao
:
Mitigating World Biases: A Multimodal Multi-View Debiasing Framework for Fake News Video Detection. 6492-6500 - Zibin Liu
, Banglei Guan
, Yang Shang
, Shunkun Liang
, Zhenbao Yu
, Qifeng Yu
:
Optical Flow-Guided 6DoF Object Pose Tracking with an Event Camera. 6501-6509 - Junran Wu
, Xueyuan Chen
, Shangzhe Li
:
Uncovering Capabilities of Model Pruning in Graph Contrastive Learning. 6510-6519 - Zheng Wei
, Yuzheng Chen
, Wai Tong
, Xuan Zong
, Huamin Qu
, Xian Xu
, Lik-Hang Lee
:
Hearing the Moment with MetaEcho! From Physical to Virtual in Synchronized Sound Recording. 6520-6529 - Cong Wang
, Chengjin Yu
, Jie Mu
, Wei Wang
:
PercepLIE: A New Path to Perceptual Low-Light Image Enhancement. 6530-6539 - Xin Cheng
, Hao Wang
, Jinwei Wang
, Xiangyang Luo
, Bin Ma
:
Advancing Quantization Steps Estimation: A Two-Stream Network Approach for Enhancing Robustness. 6540-6548 - Mingjin Zhang
, Longyi Li
, Wenxuan Shi
, Jie Guo
, Yunsong Li
, Xinbo Gao
:
VmambaSCI: Dynamic Deep Unfolding Network with Mamba for Compressive Spectral Imaging. 6549-6558 - Rui-Chen Zheng
, Yang Ai
, Zhen-Hua Ling
:
Speech Reconstruction from Silent Lip and Tongue Articulation by Diffusion Models and Text-Guided Pseudo Target Generation. 6559-6568 - Junyuan Guo
, Hao Tang
, Teng Wang
, Chao Wang
:
R4D-planes: Remapping Planes For Novel View Synthesis and Self-Supervised Decoupling of Monocular Videos. 6569-6577 - Wu Chen
, Hehe Fan
, Qiuping Jiang
, Chao Huang
, Yi Yang
:
Progressive Point Cloud Denoising with Cross-Stage Cross-Coder Adaptive Edge Graph Convolution Network. 6578-6587 - Mingyang Sun
, Qipeng Yan
, Zhuoer Liang
, Dongliang Kou
, Dingkang Yang
, Ruisheng Yuan
, Xiao Zhao
, Mingcheng Li
, Lihua Zhang
:
IF-Garments: Reconstructing Your Intersection-Free Multi-Layered Garments from Monocular Videos. 6588-6597 - Bo Dong
, Pichao Wang
, Hao Luo
, Fan Wang
:
Adaptive Query Selection for Camouflaged Instance Segmentation. 6598-6606 - Yuxin Mao
, Xuyang Shen
, Jing Zhang
, Zhen Qin
, Jinxing Zhou
, Mochu Xiang
, Yiran Zhong
, Yuchao Dai
:
TAVGBench: Benchmarking Text to Audible-Video Generation. 6607-6616 - Yuan Tang
, Xu Han
, Xianzhi Li
, Qiao Yu
, Yixue Hao
, Long Hu
, Min Chen
:
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors. 6617-6626 - Guan Luo
, Tian-Xing Xu
, Ying-Tian Liu
, Xiaoxiong Fan
, Fang-Lue Zhang
, Song-Hai Zhang
:
3D Gaussian Editing with A Single Image. 6627-6636 - Zhenhong Sun
, Junyan Wang
, Zhiyu Tan
, Daoyi Dong
, Hailan Ma
, Hao Li
, Dong Gong
:
EGGen: Image Generation with Multi-entity Prior Learning through Entity Guidance. 6637-6645 - Zhengzhong Kuang
, Jianan Lu
, Chenhui Hong
, Haobin Huang
, Suguo Zhu
, Xiaowei Zhao
, Jun Yu
, Jianping Fan
:
Latent Representation Reorganization for Face Privacy Protection. 6646-6655 - Wulin Xie
, Xiaohuan Lu
, Yadong Liu
, Jiang Long
, Bob Zhang
, Shuping Zhao
, Jie Wen
:
Uncertainty-Aware Pseudo-Labeling and Dual Graph Driven Network for Incomplete Multi-View Multi-Label Classification. 6656-6665 - Mingzhao Yang
, Shangchao Su
, Bin Li
, Xiangyang Xue
:
FedDEO: Description-Enhanced One-Shot Federated Learning with Diffusion Models. 6666-6675 - Ruiyang Xia
, Dawei Zhou
, Decheng Liu
, Lin Yuan
, Shuodi Wang
, Jie Li
, Nannan Wang
, Xinbo Gao
:
Advancing Generalized Deepfake Detector with Forgery Perception Guidance. 6676-6685 - Hongye Hou
, Xuehao Gao
, Zhan Liu
, Yang Yang
:
Dig into Detailed Structures: Key Context Encoding and Semantic-based Decoding for Point Cloud Completion. 6686-6695 - Tao Liu
, Feilong Chen
, Shuai Fan
, Chenpeng Du
, Qi Chen
, Xie Chen
, Kai Yu
:
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding. 6696-6705 - Qi Chen
, Wenjie Liu
, Hu Ding
:
A Novel Confidence Guided Training Method for Conditional GANs with Auxiliary Classifier. 6706-6714 - Yukang Lin
, Haonan Han
, Chaoqun Gong
, Zunnan Xu
, Yachao Zhang
, Xiu Li
:
Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors. 6715-6724 - Zhaoyu Zhang
, Yang Hua
, Guanxiong Sun
, Hui Wang
, Seán F. McLoone
:
Improving the Training of the GANs with Limited Data via Dual Adaptive Noise Injection. 6725-6734 - Changgu Chen
, Libing Yang
, Xiaoyan Yang
, Lianggangxu Chen
, Gaoqi He
, Changbo Wang
, Yang Li
:
FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models. 6735-6744 - Tianyi Lu
, Xing Zhang
, Jiaxi Gu
, Renjing Pei
, Songcen Xu
, Xingjun Ma
, Hang Xu
, Zuxuan Wu
:
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models. 6745-6754 - Zhichao Liao
, Fengyuan Piao
, Di Huang
, Xinghui Li
, Yue Ma
, Pingfa Feng
, Heming Fang
, Long Zeng
:
Freehand Sketch Generation from Mechanical Components. 6755-6764 - Qishan Zhang
, Shuangbing Wen
, Tao Hu
:
Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier. 6765-6773 - Bohong Chen
, Yumeng Li
, Yao-Xiang Ding
, Tianjia Shao
, Kun Zhou
:
Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion Generation. 6774-6783 - Xiangcheng Du
, Zhao Zhou
, Xingjiao Wu
, Yanlong Wang
, Zhuoyao Wang
, Yingbin Zheng
, Cheng Jin
:
MultiColor: Image Colorization by Learning from Multiple Color Spaces. 6784-6792 - Haozhe Jia
, Yan Li
, Hengfei Cui
, Di Xu
, Yuwang Wang
, Tao Yu
:
DisControlFace: Adding Disentangled Control to Diffusion Autoencoder for One-shot Explicit Facial Image Editing. 6793-6802 - Lutao Jiang
, Hangyu Li
, Lin Wang
:
A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness. 6803-6812 - Yiluo Wei
, Gareth Tyson
:
Understanding the Impact of AI-Generated Content on Social Media: The Pixiv Case. 6813-6822 - Ruiqi Zhang
, Jie Chen
:
Mesh-Centric Gaussian Splatting for Human Avatar Modelling with Real-time Dynamic Mesh Reconstruction. 6823-6832 - Bo Xiong
, Changqing Su
, Zihan Lin
, Yanqin Chen
, You Zhou
, Zhen Cheng
, Zhaofei Yu
, Tiejun Huang
:
Real-time Parameter Evaluation of High-speed Microfluidic Droplets using Continuous Spike Streams. 6833-6841 - Qi Mao
, Lan Chen
, Yuchao Gu
, Zhen Fang
, Mike Zheng Shou
:
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance. 6842-6850 - Guan-Yuan Chen
, Von-Wun Soo
:
Controllable Music Loops Generation with MIDI and Text via Multi-Stage Cross Attention and Instrument-Aware Reinforcement Learning. 6851-6859 - Weitian Zhang
, Yichao Yan
, Yunhui Liu
, Xingdong Sheng
, Xiaokang Yang
:
E3Gen: Efficient, Expressive and Editable Avatars Generation. 6860-6869 - Haibo Yang
, Yang Chen
, Yingwei Pan
, Ting Yao
, Zhineng Chen
, Chong-Wah Ngo
, Tao Mei
:
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models. 6870-6879 - Shuo Huang
, Shikun Sun
, Zixuan Wang
, Xiaoyu Qin
, Yanmin Xiong
, Yuan Zhang
, Pengfei Wan
, Di Zhang
, Jia Jia
:
PlacidDreamer: Advancing Harmony in Text-to-3D Generation. 6880-6889 - Xiaodi Li
:
Streamable Portrait Video Editing with Probabilistic Pixel Correspondence. 6890-6899 - Xuan Hai
, Xin Liu
, Yuan Tan
, Gang Liu
, Song Li
, Weina Niu
, Rui Zhou
, Xiaokang Zhou
:
What's the Real: A Novel Design Philosophy for Robust AI-Synthesized Voice Detection. 6900-6909 - Xiangyang Luo
, Xin Zhang
, Yifan Xie
, Xinyi Tong
, Weijiang Yu
, Heng Chang
, Fei Ma
, Fei Richard Yu
:
CodeSwap: Symmetrically Face Swapping Based on Prior Codebook. 6910-6919 - Ruofan Wang
, Xingjun Ma
, Hanxu Zhou
, Chuanjun Ji
, Guangnan Ye
, Yu-Gang Jiang
:
White-box Multimodal Jailbreaks Against Large Vision-Language Models. 6920-6928 - Anwen Hu
, Yaya Shi
, Haiyang Xu
, Jiabo Ye
, Qinghao Ye
, Ming Yan
, Chenliang Li
, Qi Qian
, Ji Zhang
, Fei Huang
:
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model. 6929-6938 - Weifeng Chen
, Tao Gu
, Yuhao Xu
, Arlene Chen
:
Magic Clothing: Controllable Garment-Driven Image Synthesis. 6939-6948 - Yiluo Wei
, Yiming Zhu
, Pan Hui
, Gareth Tyson
:
Exploring the Use of Abusive Generative AI Models on Civitai. 6949-6958 - Xiuliang Duan
, Dating Tan
, Liangda Fang
, Yuyu Zhou
, Chaobo He
, Ziliang Chen
, Lusheng Wu
, Guanliang Chen
, Zhiguo Gong
, Weiqi Luo
, Quanlong Guan
:
Reason-and-Execute Prompting: Enhancing Multi-Modal Large Language Models for Solving Geometry Questions. 6959-6968 - Weiye Xu
, Min Wang
, Wengang Zhou
, Houqiang Li
:
P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task. 6969-6978 - Wenjie Xuan
, Yufei Xu
, Shanshan Zhao
, Chaoyue Wang
, Juhua Liu
, Bo Du
, Dacheng Tao
:
When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability. 6979-6988 - Wenshuo Chen
, Hongru Xiao
, Erhang Zhang
, Lijie Hu
, Lei Wang
, Mengyuan Liu
, Chen Chen
:
SATO: Stable Text-to-Motion Framework. 6989-6997 - Zhen Ye
, Zeqian Ju
, Haohe Liu
, Xu Tan
, Jianyi Chen
, Yiwen Lu
, Peiwen Sun
, Jiahao Pan
, Weizhen Bian
, Shulin He
, Wei Xue
, Qifeng Liu
, Yike Guo
:
FlashSpeech: Efficient Zero-Shot Speech Synthesis. 6998-7007 - Huadai Liu
, Rongjie Huang
, Yang Liu
, Hengyuan Cao
, Jialei Wang
, Xize Cheng
, Siqi Zheng
, Zhou Zhao
:
AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps. 7008-7017 - Jiaxu Zhang
, Xin Chen
, Gang Yu
, Zhigang Tu
:
Generative Motion Stylization of Cross-structure Characters within Canonical Motion Space. 7018-7026 - Fengqi Liu
, Hexiang Wang
, Jingyu Gong
, Ran Yi
, Qianyu Zhou
, Xuequan Lu
, Jiangbo Lu
, Lizhuang Ma
:
Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture Generation. 7027-7035 - Tianyi Zheng
, Cong Geng
, Peng-Tao Jiang
, Ben Wan
, Hao Zhang
, Jinwei Chen
, Jia Wang
, Bo Li
:
Non-uniform Timestep Sampling: Towards Faster Diffusion Model Training. 7036-7045 - Miaoxin Ye
, Saixing Zhou
, Weiqi Luo
, Shunquan Tan
, Jiwu Huang
:
GAN-based Symmetric Embedding Costs Adjustment for Enhancing Image Steganographic Security. 7046-7054 - Yaqi Li
, Han Fang
, Zerun Feng
, Kaijing Ma
, Chao Ban
, Xianghao Zang
, Lanxiang Zhou
, Zhongjiang He
, Jingyan Chen
, Jiani Hu
, Hao Sun
, Huayu Zhang
:
GOAL: Grounded text-to-image Synthesis with Joint Layout Alignment Tuning. 7055-7064 - Jinfeng Wei
, Xiaofeng Zhang
:
DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer. 7065-7074 - Yang Luo
, Yiheng Zhang
, Zhaofan Qiu
, Ting Yao
, Zhineng Chen
, Yu-Gang Jiang
, Tao Mei
:
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process. 7075-7084 - Wenquan Lu
, Yufei Xu
, Jing Zhang
, Chaoyue Wang
, Dacheng Tao
:
HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting. 7085-7093 - Miao Liu
, Jing Wang
, Xinyuan Qian
, Haizhou Li
:
ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers. 7094-7103 - Jie Hu
, Jie Li
, Yue Ma
, Liujuan Cao
, Songan Zhang
, Wei Zhang
, Guannan Jiang
, Rongrong Ji
:
Prompting to Adapt Foundational Segmentation Models. 7104-7112 - Zhiyuan Ma
, Guoli Jia
, Biqing Qi
, Bowen Zhou
:
Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking. 7113-7122 - Jin Sun
, Xiaoshuang Shi
, Zhiyuan Wang
, Kaidi Xu
, Heng Tao Shen
, Xiaofeng Zhu
:
Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation. 7123-7132 - Yuanbin Wang
, Weilun Dai
, Long Chan
, Huanyu Zhou
, Aixi Zhang
, Si Liu
:
GPD-VVTO: Preserving Garment Details in Video Virtual Try-On. 7133-7142 - Hengfei Wang
, Zhongqun Zhang
, Yihua Cheng
, Hyung Jin Chang
:
TextGaze: Gaze-Controllable Face Generation with Natural Language. 7143-7151 - Huiming Zheng
, Wei Gao
, Zhuozhen Yu
, Tiesong Zhao
, Ge Li
:
ViewPCGC: View-Guided Learned Point Cloud Geometry Compression. 7152-7161 - Liyang He
, Zhenya Huang
, Chenglong Liu
, Rui Li
, Runze Wu
, Qi Liu
, Enhong Chen
:
One-bit Deep Hashing: Towards Resource-Efficient Hashing Model with Binary Neural Network. 7162-7171 - Xinghao Wu
, Xuefeng Liu
, Jianwei Niu
, Haolin Wang
, Shaojie Tang
, Guogang Zhu
, Hao Su
:
Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-rank Decomposition. 7172-7181 - Hengyi Wang
, Weiying Xie
, Jitao Ma
, Daixun Li
, Yunsong Li
:
FedSLS: Exploring Federated Aggregation in Saliency Latent Space. 7182-7190 - Zhongchi Wang
, Hailong Sun
, Zhengyang Zhao
:
FedEvalFair: A Privacy-Preserving and Statistically Grounded Federated Fairness Evaluation Framework. 7191-7199 - Weitao Tang
, Jianqiang Li
, Meijie Du
, Die Hu
, Qingyun Liu
:
Zenith: Real-time Identification of DASH Encrypted Video Traffic with Distortion. 7200-7209 - Beizhang Guo
, Juntao Bao
, Baili Chai
, Di Wu
, Miao Hu
:
Lumos: Optimizing Live 360-degree Video Upstreaming via Spatial-Temporal Integrated Neural Enhancement. 7210-7219 - Zhongnian Li
, Meng Wei
, Peng Ying
, Tongfeng Sun
, Xinzheng Xu
:
Learning from Concealed Labels. 7220-7228 - Xiangxiang Dai
, Zeyu Zhang
, Peng Yang
, Yuedong Xu
, Xutong Liu
, John C. S. Lui
:
AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics. 7229-7238 - Shuo Wang
, Yongcai Wang
, Zhimin Xu
, Yongyu Guo
, Wanting Li
, Zhe Huang
, Xuewei Bai
, Deying Li
:
GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System. 7239-7248 - Yiyang Jiang
, Wengyu Zhang
, Xulu Zhang
, Xiaoyong Wei
, Chang Wen Chen
, Qing Li
:
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval. 7249-7258
Oral Session 18: Fairness, Trust, Explainability & Inperpretability in Multimedia
- Peiwen Sun, Honggang Zhang
, Di Hu
:
Unveiling and Mitigating Bias in Audio Visual Segmentation. 7259-7268 - Ying Liu
, Lihong Liu
, Cai Xu
, Xiangyu Song
, Ziyu Guan
, Wei Zhao
:
Dynamic Evidence Decoupling for Trusted Multi-view Learning. 7269-7277 - Wei Liu
, Yufei Chen
, Xiaodong Yue
:
Building Trust in Decision with Conformalized Multi-view Deep Classification. 7278-7287 - Daoming Zong
, Chaoyue Ding
, Kaitao Chen
:
Toward Explainable Physical Audiovisual Commonsense Reasoning. 7288-7297 - Jingjie Zeng
, Zhihao Yang
, Qi Yang
, Liang Yang
, Hongfei Lin
:
Peeling Back the Layers: Interpreting the Storytelling of ViT. 7298-7306 - Chihaya Matsuhira
, Marc A. Kastner
, Takahiro Komamizu
, Takatsugu Hirayama
, Ichiro Ide
:
Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation. 7307-7315
Oral Session 19: Multimodal Applications
- Minghui Wu
, Chenxu Zhao
, Anyang Su
, Donglin Di
, Tianyu Fu
, Da An
, Min He
, Ya Gao
, Meng Ma
, Kun Yan
, Ping Wang
:
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding. 7316-7325 - Yanglin Deng
, Tianyang Xu
, Chunyang Cheng
, Xiao-Jun Wu
, Josef Kittler
:
MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion. 7326-7335 - Ziyan Li
, Jianfei Yu
, Jia Yang
, Wenya Wang
, Li Yang
, Rui Xia
:
Generative Multimodal Data Augmentation for Low-Resource Multimodal Named Entity Recognition. 7336-7345 - Zhiqi Ge
, Hongzhe Huang
, Mingze Zhou
, Juncheng Li
, Guoming Wang
, Siliang Tang
, Yueting Zhuang
:
WorldGPT: Empowering LLM as Multimodal World Model. 7346-7355 - Yiming Li
, Zhifang Guo
, Xiangdong Wang
, Hong Liu
:
Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training. 7356-7365 - Yingxuan Li
, Ryota Hinami
, Kiyoharu Aizawa
, Yusuke Matsui
:
Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion. 7366-7374
Oral Session 20: Datasets & Algorithms for Multimedia Analysis
- Chunyi Li
, Haoning Wu
, Hongkun Hao
, Zicheng Zhang
, Tengchuan Kou
, Chaofeng Chen
, Lei Bai
, Xiaohong Liu
, Weisi Lin
, Guangtao Zhai
:
G-Refine: A General Quality Refiner for Text-to-Image Generation. 7375-7384 - Wenqiang Xu
, Wenrui Dai
, Ziyang Zheng
, Chenglin Li
, Junni Zou
, Hongkai Xiong
:
Point Cloud Upsampling with Geometric Algebra Driven Inverse Heat Dissipation. 7385-7394 - Junyan Wu
, Wei Lu
, Xiangyang Luo
, Rui Yang
, Qian Wang
, Xiaochun Cao
:
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization. 7395-7403 - Fujun Han
, Peng Ye
, Shukai Duan
, Lidan Wang
:
Ada-iD: Active Domain Adaptation for Intrusion Detection. 7404-7413 - Zhixi Cai
, Shreya Ghosh
, Aman Pankaj Adatia
, Munawar Hayat
, Abhinav Dhall
, Tom Gedeon
, Kalin Stefanov
:
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset. 7414-7423 - Rintaro Yanagi
, Ren Togo
, Takahiro Ogawa
, Miki Haseyama
:
DQG: Database Question Generation for Exact Text-based Image Retrieval. 7424-7433
Oral Session 21: Image Enhancement and Super-Resolution
- Tongshun Zhang
, Pingping Liu
, Ming Zhao
, Haotian Lv
:
DMFourLLIE: Dual-Stage and Multi-Branch Fourier Network for Low-Light Image Enhancement. 7434-7443 - Fei Gao
, Yuhao Lin
, Jiaqi Shi
, Maoying Qiao
, Nannan Wang
:
AesMamba: Universal Image Aesthetic Assessment with State Space Models. 7444-7453 - Yi Dong
, Yuxi Wang
, Zheng Fang
, Wenqi Ouyang
, Xianhui Lin
, Zhiqi Shen
, Peiran Ren
, Xuansong Xie
, Qingming Huang
:
MovingColor: Seamless Fusion of Fine-grained Video Color Enhancement. 7454-7463 - Ruibin Li
, Jingcai Guo
, Qihua Zhou
, Song Guo
:
FreePIH: Training-Free Painterly Image Harmonization with Diffusion Model. 7464-7473 - Qian Huang
, Cheng Xu
, Guiqing Li
, Ziheng Wu
, Shengxin Liu
, Shengfeng He
:
Portrait Shadow Removal via Self-Exemplar Illumination Equalization. 7474-7482 - Qiwen Zhu
, Yanjie Wang
, Shilv Cai
, Liqun Chen
, Jiahuan Zhou
, Luxin Yan
, Sheng Zhong
, Xu Zou
:
Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem. 7483-7492
Oral Session 22: Audio-visual Datasets and Applications
- Han Wang
, Tan Rui Yang
, Usman Naseem
, Roy Ka-Wei Lee
:
MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili. 7493-7502 - Jiale Yu
, Baopeng Zhang
, Zhu Teng
, Jianping Fan
:
OpenAVE: Moving towards Open Set Audio-Visual Event Localization. 7503-7512 - Xinfa Zhu
, Wenjie Tian
, Xinsheng Wang
, Lei He
, Yujia Xiao
, Xi Wang
, Xu Tan
, Sheng Zhao
, Lei Xie
:
UniStyle: Unified Style Modeling for Speaking Style Captioning and Stylistic Speech Synthesis. 7513-7522 - Zhedong Zhang
, Liang Li
, Gaoxiang Cong
, Haibing Yin
, Yuhan Gao
, Chenggang Yan
, Anton van den Hengel
, Yuankai Qi
:
From Speaker to Dubber: Movie Dubbing with Prosody and Duration Consistency Learning. 7523-7532 - Ruohao Guo
, Liao Qu
, Dantong Niu
, Yanyu Qi
, Wenzhen Yue
, Ji Shi
, Bowei Xing
, Xianghua Ying
:
Open-Vocabulary Audio-Visual Semantic Segmentation. 7533-7541
Oral Session 23: Multimodal Learning and Recommendation Systems
- Hongcheng Li
, Yucan Zhou
, Xiaoyan Gu
, Bo Li
, Weiping Wang
:
Diversified Semantic Distribution Matching for Dataset Distillation. 7542-7550 - Jinghao Zhang
, Guofan Liu
, Qiang Liu
, Shu Wu
, Liang Wang
:
Modality-Balanced Learning for Multimedia Recommendation. 7551-7560 - Ziyi Ye
, Jingtao Zhan
, Qingyao Ai
, Yiqun Liu
, Maarten de Rijke
, Christina Lioma
, Tuukka Ruotsalo
:
Query Augmentation with Brain Signals. 7561-7570 - Lei Shi
, Jiapeng Yang
, Pengtao Lv
, Lu Yuan
, Feifei Kou
, Jia Luo
, Mingying Xu
:
Self-derived Knowledge Graph Contrastive Learning for Recommendation. 7571-7580 - Jiaye Lin
, Qing Li
, Guorui Xie
, Zhongxu Guan
, Yong Jiang
, Ting Xu
, Zhong Zhang
, Peilin Zhao
:
Mitigating Sample Selection Bias with Robust Domain Adaption in Multimedia Recommendation. 7581-7590 - Yangqin Jiang
, Lianghao Xia
, Wei Wei
, Da Luo
, Kangyi Lin
, Chao Huang
:
DiffMM: Multi-Modal Diffusion Model for Recommendation. 7591-7599
Oral Session 24: Novel Multimedia Applications 2
- Tongtong Feng
, Xin Wang
, Feilin Han
, Leping Zhang
, Wenwu Zhu
:
U2UData: A Large-scale Cooperative Perception Dataset for Swarm UAVs Autonomous Flight. 7600-7608 - Chaoqun Niu
, Dongdong Chen
, Jizhe Zhou
, Jian Wang
, Xiang Luo
, Quan-Hui Liu
, Yuan Li
, Jiancheng Lv
:
Neural Boneprint: Person Identification from Bones Using Generative Contrastive Deep Learning. 7609-7618 - Xueli Hu
, Huan Liu
, Haocheng Yuan
, Zhiyang Fu
, Yizhi Luo
, Ning Zhang
, Hang Zou
, Jianwen Gan
, Yuan Zhang
:
Fine-Grained Prompt Learning for Face Anti-Spoofing. 7619-7628 - Xiao Han
, Yiming Ren
, Yichen Yao
, Yujing Sun
, Yuexin Ma
:
Towards Practical Human Motion Prediction with LiDAR Point Clouds. 7629-7638 - Haodong Hong
, Sen Wang
, Zi Huang
, Qi Wu
, Jiajun Liu
:
Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments. 7639-7648 - Minghe Gao
, Juncheng Li
, Hao Fei
, Liang Pang
, Wei Ji
, Guoming Wang
, Zheqi Lv
, Wenqiao Zhang
, Siliang Tang
, Yueting Zhuang
:
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback. 7649-7657
Oral Session 25: Media and Communication Technologies
- Jingjing Liu
, Youyi Zheng
, Kun Zhou
:
Virtual Agent Positioning Driven by Personal Characteristics. 7658-7666 - Meng Luo
, Hao Fei
, Bobo Li
, Shengqiong Wu
, Qian Liu
, Soujanya Poria
, Erik Cambria
, Mong-Li Lee
, Wynne Hsu
:
PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis. 7667-7676 - Yawen Luo
, Min Shi
, Liao Shen
, Yachuan Huang
, Zixuan Ye
, Juewen Peng
, Zhiguo Cao
:
Video Bokeh Rendering: Make Casual Videography Cinematic. 7677-7685 - Zhenyu Zhang
, Guangyao Chen
, Yixiong Zou
, Zhimeng Huang
, Yuhua Li
, Ruixuan Li
:
MICM: Rethinking Unsupervised Pretraining for Enhanced Few-shot Learning. 7686-7695 - Zejun Zhang
, Xiao Zhu
, Anlan Zhang
, Feng Qian
:
An In-depth Study of Bandwidth Allocation across Media Sources in Video Conferencing. 7696-7704 - Zixuan Yang
, Yushu Zhang
, Tao Wang
, Zhongyun Hua
, Zhihua Xia
, Jian Weng
:
Once-for-all: Efficient Visual Face Privacy Protection via Person-specific Veils. 7705-7713
Oral Session 26: Cultural Heritage & Media Analysis
- Shipeng Zhu
, Hui Xue
, Na Nie
, Chenjie Zhu
, Haiyue Liu
, Pengfei Fang
:
Reproducing the Past: A Dataset for Benchmarking Inscription Restoration. 7714-7723 - Jiao Pan
, Liang Li
, Hiroshi Yamaguchi
, Kyoko Hasegawa
, Fadjar Ibnu Thufail
, Brahmantara
, Xiaojuan Ban
, Satoshi Tanaka
:
Reconstructing, Understanding, and Analyzing Relief Type Cultural Heritage from a Single Old Photo. 7724-7733 - Yi Bin
, Wenhao Shi
, Yujuan Ding
, Zhiqiang Hu
, Zheng Wang
, Yang Yang
, See-Kiong Ng
, Heng Tao Shen
:
GalleryGPT: Analyzing Paintings with Large Multimodal Models. 7734-7743 - Jun Ma
, Tuukka Ruotsalo
:
Cognition-Supervised Saliency Detection: Contrasting EEG Signals and Visual Stimuli. 7744-7753 - Yizhang Liu
, Weiwei Zhou
, Yanping Li
, Shengjie Zhao
:
RoSe: Rotation-Invariant Sequence-Aware Consensus for Robust Correspondence Pruning. 7754-7763 - Yujia Wang
, Fang-Lue Zhang
, Neil A. Dodgson
:
ScanTD: 360° Scanpath Prediction based on Time-Series Diffusion. 7764-7773
Oral Session 27: Security & Quality in Multimedia Systems
- Dunyun Chen
, Xin Liao
, Xiaoshuai Wu
, Shiwei Chen
:
SafePaint: Anti-forensic Image Inpainting with Domain Adaptation. 7774-7782 - Zicheng Zhang
, Haoning Wu
, Yingjie Zhou
, Chunyi Li
, Wei Sun
, Chaofeng Chen
, Xiongkuo Min
, Xiaohong Liu
, Weisi Lin
, Guangtao Zhai
:
LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM. 7783-7792 - Tengchuan Kou
, Xiaohong Liu
, Zicheng Zhang
, Chunyi Li
, Haoning Wu
, Xiongkuo Min
, Guangtao Zhai
, Ning Liu
:
Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment. 7793-7802 - Puyi Wang
, Wei Sun
, Zicheng Zhang
, Jun Jia
, Yanwei Jiang
, Zhichao Zhang
, Xiongkuo Min
, Guangtao Zhai
:
Large Multi-modality Model Assisted AI-Generated Image Quality Assessment. 7803-7812 - Xuemei Zhou
, Irene Viola
, Yunlu Chen
, Jiahuan Pei
, Pablo César
:
Deciphering Perceptual Quality in Colored Point Cloud: Prioritizing Geometry or Texture Distortion? 7813-7822 - Desen Yuan
, Lei Wang
:
Dual-Criterion Quality Loss for Blind Image Quality Assessment. 7823-7832
Oral Session 28: Complex Scene Processing
- Zhe Huang
, Shuo Wang
, Yongcai Wang
, Wanting Li
, Deying Li
, Lei Wang
:
RoCo: Robust Cooperative Perception By Iterative Object Matching and Pose Adjustment. 7833-7842 - Shao-Kui Zhang
, Hanxi Zhu
, Xuebin Chen
, Jinghuan Chen
, Zhike Peng
, Ziyang Chen
, Yong-Liang Yang
, Song-Hai Zhang
:
ScenePhotographer: Object-Oriented Photography for Residential Scenes. 7843-7851 - Changli Wu
, Yihang Liu
, Jiayi Ji
, Yiwei Ma
, Haowei Wang
, Gen Luo
, Henghui Ding
, Xiaoshuai Sun
, Rongrong Ji
:
3D-GRES: Generalized 3D Referring Expression Segmentation. 7852-7861 - Xuan Han
, Yihao Zhao
, Mingyu You
:
Scene Diffusion: Text-driven Scene Image Synthesis Conditioning on a Single 3D Model. 7862-7870 - Jinbo Yan
, Rui Peng
, Luyang Tang
, Ronggang Wang
:
4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes. 7871-7880 - Hongtao Wu
, Yijun Yang
, Huihui Xu
, Weiming Wang
, Jinni Zhou
, Lei Zhu
:
RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining. 7881-7890
Oral Session 29: Enhancements in Video Streaming and Compression
- Bo Wu
, Tong Li
, Cheng Luo
, Xu Yan
, Fuyu Wang
, Xinle Du
, Ke Xu
:
Toward Timeliness-Enhanced Loss Recovery for Large-Scale Live Streaming. 7891-7899 - Fangtao Zhou
, Xiaofeng Huang
, Peng Zhang
, Meng Wang
, Zhao Wang
, Yang Zhou
, Haibing Yin
:
Enhanced Screen Content Image Compression: A Synergistic Approach for Structural Fidelity and Text Integrity Preservation. 7900-7908 - Miao Zhang
, Jiaxing Li
, Haoyuan Zhao
, Linfeng Shen
, Jiangchuan Liu
:
StarStream: Live Video Analytics over Space Networking. 7909-7917 - Pengqiang Bi
, Yifei Zou
, Mengbai Xiao
, Dongxiao Yu
, Yijun Li
, Zhixiong Liu
, Qun Xie
:
LiteQUIC: Improving QoE of Video Streams by Reducing CPU Overhead of QUIC. 7918-7927 - Yili Jin
, Xize Duan
, Fangxin Wang
, Xue Liu
:
HeadsetOff: Enabling Photorealistic Video Conferencing on Economical VR Headsets. 7928-7936 - Zihan Zheng
, Houqiang Zhong
, Qiang Hu
, Xiaoyun Zhang
, Li Song
, Ya Zhang
, Yanfeng Wang
:
HPC: Hierarchical Progressive Coding Framework for Volumetric Video. 7937-7946
Poster Session 3
- Lianghui Zhu
, Junwei Zhou
, Yan Liu
, Xin Hao
, Wenyu Liu
, Xinggang Wang
:
WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition. 7947-7956 - Xiangyu Sun
, Joo Chan Lee
, Daniel Rho
, Jong Hwan Ko
, Usman Ali
, Eunbyung Park
:
F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting. 7957-7965 - Sijing Wu
, Yunhao Li
, Yichao Yan
, Huiyu Duan
, Ziwei Liu
, Guangtao Zhai
:
MMHead: Towards Fine-grained Multi-modal 3D Facial Animation. 7966-7975 - Chunxiao Li
, Shuyang Wang
, Xuejing Kang
, Anlong Ming
:
Thinking Temporal Automatic White Balance: Datasets, Models and Benchmarks. 7976-7984 - Zhe Luo
, Weina Fu
, Shuai Liu
, Saeed Anwar
, Muhammad Saqib
, Sambit Bakshi
, Khan Muhammad
:
Cefdet: Cognitive Effectiveness Network Based on Fuzzy Inference for Action Detection. 7985-7994 - Wencan Huang
, Daizong Liu
, Wei Hu
:
Advancing 3D Object Grounding Beyond a Single 3D Scene. 7995-8004 - Bin Huang
, Feng He
, Qi Wang
, Hong Chen
, Guohao Li
, Zhifan Feng
, Xin Wang
, Wenwu Zhu
:
Neighbor Does Matter: Curriculum Global Positive-Negative Sampling for Vision-Language Pre-training. 8005-8014 - Haoyuan Jin
, Xuesong Nie
, Yunfeng Yan
, Xi Chen
, Zhihang Zhu
, Donglian Qi
:
Object-Level Pseudo-3D Lifting for Distance-Aware Tracking. 8015-8023 - Xinwei Liu
, Xiaojun Jia
, Yuan Xun
, Siyuan Liang
, Xiaochun Cao
:
Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning. 8024-8033 - Ge Luo
, Yuchen Ma
, Manman Zhang
, Junqiang Huang
, Sheng Li
, Zhenxing Qian
, Xinpeng Zhang
:
Engaging Live Video Comments Generation. 8034-8042 - Lu Chen
, Qiangchang Wang
, Zhaohui Li
, Yilong Yin
:
Hypergraph-guided Intra- and Inter-category Relation Modeling for Fine-grained Visual Recognition. 8043-8052 - Yuan Xie
, Yichen Zhang
, Yifang Yin
, Sheng Zhang
, Ying Zhang
, Rajiv Ratn Shah
, Roger Zimmermann
, Guoqing Xiao
:
Traj2Former: A Local Context-aware Snapshot and Sequential Dual Fusion Transformer for Trajectory Classification. 8053-8061 - Guilin Li
, Mengdan Zhang
, Xiawu Zheng
, Peixian Chen
, Zihan Wang
, Yunhang Shen
, Mingchen Zhuge
, Chenglin Wu
, Fei Chao
, Ke Li
, Xing Sun
, Rongrong Ji
:
Multimodal Inplace Prompt Tuning for Open-set Object Detection. 8062-8071 - Shengran Cheng
, Chuhang Ma
, Ye Pan
:
StylizedFacePoint: Facial Landmark Detection for Stylized Characters. 8072-8080 - Sheng Zhang
, Xi Yang
:
Information Fusion with Knowledge Distillation for Fine-grained Remote Sensing Object Detection. 8081-8089 - Bowen Zhao
, Qianqian Wang
, Zhiqiang Tao
, Wei Feng
, Quanxue Gao
:
DFMVC: Deep Fair Multi-view Clustering. 8090-8099 - Ruyu Liu
, Zhengzhe Liu
, Haoyu Zhang
, Guodao Zhang
, Jianhua Zhang
, Sunbo
, Weiguo Sheng
, Xiufeng Liu
, Yaochu Jin
:
ColVO: Colonoscopic Visual Odometry Considering Geometric and Photometric Consistency. 8100-8109 - Xun Lin
, Yi Yu
, Zitong Yu
, Ruohan Meng
, Jiale Zhou
, Ajian Liu
, Yizhong Liu
, Shuai Wang
, Wenzhong Tang
, Zhen Lei
, Alex C. Kot
:
HideMIA: Hidden Wavelet Mining for Privacy-Enhancing Medical Image Analysis. 8110-8119 - Shuyuan Liu
, Jiawei Chen
, Shouwei Ruan
, Hang Su
, Zhaoxia Yin
:
Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models. 8120-8128 - Jiahe Tian
, Cai Yu
, Xi Wang
, Peng Chen
, Zihao Xiao
, Jizhong Han
, Yesheng Chai
:
Dynamic Mixed-Prototype Model for Incremental Deepfake Detection. 8129-8138 - Tianshan Liu
, Kin-Man Lam
, Bing-Kun Bao
:
Label Text-aided Hierarchical Semantics Mining for Panoramic Activity Recognition. 8139-8148 - Xiaoda Yang
, Xize Cheng
, Dongjie Fu
, Minghui Fang
, Jialong Zuo
, Shengpeng Ji
, Zhou Zhao
, Tao Jin:
SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning. 8149-8158 - Jingjun Yi
, Qi Bi
, Hao Zheng
, Haolan Zhan
, Wei Ji
, Yawen Huang
, Yuexiang Li
, Yefeng Zheng
:
Learning Spectral-Decomposited Tokens for Domain Generalized Semantic Segmentation. 8159-8168 - Peng Yin
, Xiaosu Zhu
, Jingkuan Song
, Lianli Gao
, Heng Tao Shen
:
SI-BiViT: Binarizing Vision Transformers with Spatial Interaction. 8169-8178 - Ao Li
, Huijun Liu
, Jinrong Sheng
, Zhongming Chen
, Yongxin Ge
:
Efficient Dual-Confounding Eliminating for Weakly-supervised Temporal Action Localization. 8179-8188 - Xuri Ge
, Junchen Fu
, Fuhai Chen
, Shan An
, Nicu Sebe
, Joemon M. Jose
:
Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning. 8189-8198 - Jongbhin Woo
, Hyeonggon Ryu
, Youngjoon Jang
, Jae-Won Cho
, Joon Son Chung
:
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding. 8199-8208 - Jiali Chen
, Xusen Hei
, Yuqi Xue
, Yuancheng Wei
, Jiayuan Xie
, Yi Cai
, Qing Li
:
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor. 8209-8218 - Yu-Pei Song
, Yuantong Liu
, Xiao Wu
, Qi He
, Zhaoquan Yuan
, Ao Luo
:
MagicCartoon: 3D Pose and Shape Estimation for Bipedal Cartoon Characters. 8219-8227 - Ajian Liu
, Hui Ma
, Junze Zheng
, Haocheng Yuan
, Xiaoyuan Yu
, Yanyan Liang
, Sergio Escalera
, Jun Wan
, Zhen Lei
:
FM-CLIP: Flexible Modal CLIP for Face Anti-Spoofing. 8228-8237 - Jiaqi Guo
, Lianli Gao
, Junchen Zhu
, Jiaxin Zhang
, Siyang Li
, Jingkuan Song
:
MagicVFX: Visual Effects Synthesis in Just Minutes. 8238-8246 - Kangzheng Liu
, Feng Zhao
, Yu Yang
, Guandong Xu
:
DySarl: Dynamic Structure-Aware Representation Learning for Multimodal Knowledge Graph Reasoning. 8247-8256 - Weicai Yan
, Ye Wang
, Wang Lin
, Zirun Guo
, Zhou Zhao
, Tao Jin
:
Low-rank Prompt Interaction for Continual Vision-Language Retrieval. 8257-8266 - Jing Zhou
, Ziqi Yu
, Zhongyun Bao
, Gang Fu
, Weilei He
, Chao Liang
, Chunxia Xiao
:
Foreground Harmonization and Shadow Generation for Composite Image. 8267-8276 - Zhen-Xiang Ma
, Zhen-Duo Chen
, Li-Jun Zhao
, Zi-Chao Zhang
, Tai Zheng
, Xin Luo
, Xin-Shun Xu
:
Bi-directional Task-Guided Network for Few-Shot Fine-Grained Image Classification. 8277-8286 - Xiao He
, Chang Tang
, Xinwang Liu
, Chuankun Li
, Shan An
, Zhenglai Li
:
Heterogeneous Graph Guided Contrastive Learning for Spatially Resolved Transcriptomics Data. 8287-8295 - Yabing Wang
, Le Wang
, Qiang Zhou
, Zhibin Wang
, Hao Li
, Gang Hua
, Wei Tang
:
Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval. 8296-8305 - Zhiwen Yang
, Liang Li
, Jiehua Zhang
, Tingyu Wang
, Yaoqi Sun
, Chenggang Yan
:
Domain Shared and Specific Prompt Learning for Incremental Monocular Depth Estimation. 8306-8315 - Shuting He
, Henghui Ding
:
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation. 8316-8325 - Yunwei Bai
, Bill Yang Cai
, Ying Kiat Tan
, Zangwei Zheng
, Shiming Chen
, Tsuhan Chen
:
FSL-QuickBoost: Minimal-Cost Ensemble for Few-Shot Learning. 8326-8335 - Jinhui Pang
, Changqing Lin
, Xiaoshuai Hao
, Rong Yin
, Zixuan Wang
, Zhihui Zhang
, Jinglin He
, Huang Tai Sheng
:
FTF-ER: Feature-Topology Fusion-Based Experience Replay Method for Continual Graph Learning. 8336-8344 - Fengmao Lv
, Changru Nie
, Jianyang Zhang
, Guowu Yang
, Guosheng Lin
, Xiao Wu
, Tianrui Li
:
Rethinking the Effect of Uninformative Class Name in Prompt Learning. 8345-8354 - Yuhan Wang
, Mofei Song
:
UniL: Point Cloud Novelty Detection through Multimodal Pre-training. 8355-8364 - Zeyu Xiao
, Zhihe Lu
, Xinchao Wang
:
P-BiC: Ultra-High-Definition Image Moiré Patterns Removal via Patch Bilateral Compensation. 8365-8373 - Jing Yang, Shundong Yang, Yuan Gao, Jieming Yang, Laurence T. Yang:
Multimodal Contextual Interactions of Entities: A Modality Circular Fusion Approach for Link Prediction. 8374-8382 - Chaolei Tan
, Zihang Lin
, Junfu Pu
, Zhongang Qi
, Wei-Yi Pei
, Zhi Qu
, Yexin Wang
, Ying Shan
, Wei-Shi Zheng
, Jian-Fang Hu
:
SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses. 8383-8392 - Buyu Liu
, Kai Wang
, Yansong Liu
, Jun Bao
, Tingting Han
, Jun Yu
:
MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability. 8393-8401 - Junzhang Liu
, Zhecan Wang
, Hammad A. Ayyubi
, Haoxuan You
, Chris Thomas
, Rui Sun
, Shih-Fu Chang
, Kai-Wei Chang
:
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions. 8402-8411 - Yingchun Wang
, Jingcai Guo
, Song Guo
, Yi Liu
, Jie Zhang
, Weizhan Zhang
:
SFP: Spurious Feature-Targeted Pruning for Out-of-Distribution Generalization. 8412-8420 - Yao Li
, Jiajun Deng
, Yuxuan Xiao
, Yingjie Wang
, Xiaomeng Chu
, Jianmin Ji
, Yanyong Zhang
:
FARFusion V2: A Geometry-based Radar-Camera Fusion Method on the Ground for Roadside Far-Range 3D Object Detection. 8421-8430 - Fangdi Wang
, Jiaqi Jin
, Zhibin Dong
, Xihong Yang
, Yu Feng
, Xinwang Liu
, Xinzhong Zhu
, Siwei Wang
, Tianrui Liu
, En Zhu
:
View Gap Matters: Cross-view Topology and Information Decoupling for Multi-view Clustering. 8431-8440 - Wenjie Wei
, Yu Liang
, Ammar Belatreche
, Yichen Xiao
, Honglin Cao
, Zhenbang Ren
, Guoqing Wang
, Malu Zhang
, Yang Yang
:
Q-SNNs: Quantized Spiking Neural Networks. 8441-8450 - Shihua Zhang
, Jiayi Ma
:
DiffGlue: Diffusion-Aided Image Feature Matching. 8451-8460 - Xueyang Li
, Yu Song
, Yunzhong Lou
, Xiangdong Zhou
:
CAD Translator: An Effective Drive for Text to 3D Parametric Computer-Aided Design Generative Modeling. 8461-8470 - Weichen Xu
, Jian Cao
, Tianhao Fu
, Ruilong Ren, Zicong Hu
, Xixin Cao
, Xing Zhang
:
Point Cloud Reconstruction Is Insufficient to Learn 3D Representations. 8471-8479 - Xiao Yu
, Kejiang Chen
, Kai Zeng
, Han Fang
, Zijin Yang
, Xiuwei Shang
, Yuang Qi
, Weiming Zhang
, Nenghai Yu
:
SemGIR: Semantic-Guided Image Regeneration Based Method for AI-generated Image Detection and Attribution. 8480-8488 - Jiahua Xiao
, Yang Liu
, Shizhou Zhang
, Xing Wei
:
Bridging Fourier and Spatial-Spectral Domains for Hyperspectral Image Denoising. 8489-8497 - Heng Jia
, Yunqiu Xu
, Linchao Zhu
, Guang Chen
, Yufei Wang
, Yi Yang:
MoS2: Mixture of Scale and Shift Experts for Text-Only Video Captioning. 8498-8507 - Qi Zhang
, Chi Huang
, Qian Zhang
, Nan Li
, Wei Feng
:
Learning Geometry Consistent Neural Radiance Fields from Sparse and Unposed Views. 8508-8517 - Zihan Fang
, Shide Du
, Yuhong Chen
, Shiping Wang
:
Beyond the Known: Ambiguity-Aware Multi-view Learning. 8518-8526 - Jingchao Wang
, Zhengnan Deng
, Tongxu Lin
, Wenyuan Li
, Shaobin Ling
, Junyu Lin
:
Beyond Direct Relationships: Exploring Multi-Order Label Pair Dependencies for Knowledge Distillation. 8527-8535 - Yuhang Li
, Jincen Jiang
, Xiaosong Yang
, Youdong Ding
, Jian Jun Zhang
:
Harmony Everything! Masked Autoencoders for Video Harmonization. 8536-8545 - Linfeng Tang
, Yuxin Deng
, Xunpeng Yi
, Qinglong Yan
, Yixuan Yuan
, Jiayi Ma
:
DRMF: Degradation-Robust Multi-Modal Image Fusion via Composable Diffusion Prior. 8546-8555 - Jintao Chen
, Fan Wang
, Shengye Pang
, Siwei Tan
, Mingshuai Chen
, Tiancheng Zhao
, Meng Xi
, Jianwei Yin
:
UniGM: Unifying Multiple Pre-trained Graph Models via Adaptive Knowledge Aggregation. 8556-8565 - Ziyue Wu
, Junyu Gao
, Changsheng Xu
:
Open-Vocabulary Video Scene Graph Generation via Union-aware Semantic Alignment. 8566-8575 - Li Zheng
, Boyu Chen
, Hao Fei
, Fei Li
, Shengqiong Wu
, Lizi Liao
, Donghong Ji
, Chong Teng
:
Self-Adaptive Fine-grained Multi-modal Data Augmentation for Semi-supervised Muti-modal Coreference Resolution. 8576-8585 - Daqin Luo
, Chengjian Feng
, Yuxuan Nong
, Yiqing Shen
:
AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models. 8586-8594 - Xu Zhang
, Zhipeng Xie
, Haiyang Yu
, Qitong Wang
, Peng Wang
, Wei Wang
:
Enhancing Adaptive Deep Networks for Image Classification via Uncertainty-aware Decision Fusion. 8595-8603 - Ran Wang
, Hua Zuo
, Zhen Fang
, Jie Lu
:
Towards Robustness Prompt Tuning with Fully Test-Time Adaptation for CLIP's Zero-Shot Generalization. 8604-8612 - Lijun Zhang
, Wei Suo
, Peng Wang
, Yanning Zhang
:
A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap. 8613-8622 - Haojie Wei
, Jun Yuan
, Rui Zhang
, Quanyu Dai
, Yueguo Chen
:
MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation. 8623-8632 - Binbin Xu
, Jun Yin
, Nan Zhang
:
Graph based Consistency Learning for Contrastive Multi-View Clustering. 8633-8641 - Jiaxin Gao
, Yaohua Liu
:
Enhancing Images with Coupled Low-Resolution and Ultra-Dark Degradations: A Tri-level Learning Framework. 8642-8651 - Qian Qu
, Xinhang Wan
, Weixuan Liang
, Jiyuan Liu
, Yu Feng
, Huiying Xu
, Xinwang Liu
, En Zhu
:
A Lightweight Anchor-Based Incremental Framework for Multi-view Clustering. 8652-8661 - Yao Wu
, Mingwei Xing
, Yachao Zhang
, Yuan Xie
, Yanyun Qu
:
CLIP2UDA: Making Frozen CLIP Reward Unsupervised Domain Adaptation in 3D Semantic Segmentation. 8662-8671 - Zongqian Wu
, Yujing Liu
, Mengmeng Zhan
, Ping Hu
, Xiaofeng Zhu
:
Adaptive Multi-Modality Prompt Learning. 8672-8680 - Shiwei Zhang
, Wei Ke
, Shuai Liu
, Xiaopeng Hong
, Tong Zhang
:
Boosting Semi-supervised Crowd Counting with Scale-based Active Learning. 8681-8690 - Yingjie Gao
, Yanan Zhang
, Ziyue Huang
, Nanqing Liu
, Di Huang
:
PS-TTL: Prototype-based Soft-labels and Test-Time Learning for Few-shot Object Detection. 8691-8700 - Li Yuan
, Yi Cai
, Junsheng Huang
:
Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model. 8701-8710 - Yijia Wang
, Qianqian Xu
, Yangbangyan Jiang
, Siran Dai
, Qingming Huang
:
Regularized Contrastive Partial Multi-view Outlier Detection. 8711-8720 - Rui Liu
, Mingjie Li
, Shen Zhao
, Ling Chen
, Xiaojun Chang
, Lina Yao
:
In-Context Learning for Zero-shot Medical Report Generation. 8721-8730 - Guoliang Zou
, Yangdong Ye
, Tongji Chen
, Shizhe Hu
:
Learning Dual Enhanced Representation for Contrastive Multi-view Clustering. 8731-8739 - Yang Zhao, Gangwei Xu, Gang Wu:
Hybrid Cost Volume for Memory-Efficient Optical Flow. 8740-8749 - Xiao-Qian Liu
, Minghui Liu
, Zhen-Duo Chen
, Xin Luo
, Xin-Shun Xu
:
Hierarchical Multi-label Learning for Incremental Multilingual Text Recognition. 8750-8758 - Yuzhuo Wang
, Junwei He
, Hongzhi Wang
:
RHKH: Relational Hypergraph Neural Network for Link Prediction on N-ary Knowledge Hypergraph. 8759-8767 - Fengbo Lan
, Chang Wen Chen
:
Understanding and Tackling Scattering and Reflective Flare for Mobile Camera Systems. 8768-8776 - Ziyu Zhao
, Pingping Cai
, Canyu Zhang
, Xiaoguang Li
, Song Wang
:
Crossmodal Few-shot 3D Point Cloud Semantic Segmentation via View Synthesis. 8777-8785 - Jinkai Zheng
, Xinchen Liu
, Boyue Zhang
, Chenggang Yan
, Jiyong Zhang
, Wu Liu
, Yongdong Zhang
:
It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment. 8786-8794 - Kenan Huang
, Junbao Zhuo
, Shuhui Wang
, Chi Su
, Qingming Huang
, Huimin Ma
:
Unsupervised Image-to-Video Adaptation via Category-aware Flow Memory Bank and Realistic Video Generation. 8795-8804 - Lv Tang
, Peng-Tao Jiang
, Zhihao Shen
, Hao Zhang
, Jin-Wei Chen
, Bo Li
:
Chain of Visual Perception: Harnessing Multimodal Large Language Models for Zero-shot Camouflaged Object Detection. 8805-8814 - Xinyao Liao
, Wei Wei
, Dangyang Chen
, Yuanyuan Fu
:
UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation. 8815-8824 - Siyang Wang
, Jinghao Zhang
, Jie Huang
, Feng Zhao
:
Image-free Pre-training for Low-Level Vision. 8825-8834 - Jiacheng Ruan
, Jingsheng Gao
, Mingye Xie
, Suncheng Xiang
, Zefang Yu
, Ting Liu
, Yuzhuo Fu
, Xiaoye Qu
:
GIST: Improving Parameter Efficient Fine-Tuning via Knowledge Interaction. 8835-8844 - Xuechen Guo
, Wenhao Chai
, Shiyan Li
, Gaoang Wang
:
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound. 8845-8854 - Xiao Han
, Zhenduo Zhang
, Yiling Wu
, Xinfeng Zhang
, Zhe Wu
:
Event Traffic Forecasting with Sparse Multimodal Data. 8855-8864 - Wanru Xu
, Zhenjiang Miao
, Yi Tian
, Yigang Cen
, Lili Wan
, Xiaole Ma
:
Probabilistic Distillation Transformer: Modelling Uncertainties for Visual Abductive Reasoning. 8865-8873 - Shiye Wang
, Changsheng Li
, Jialin Tang
, Xing Gong
, Ye Yuan
, Guoren Wang
:
Importance-aware Shared Parameter Subspace Learning for Domain Incremental Learning. 8874-8883 - Chengshun Wang
, Na Zhao
:
GS2-GNeSF: Geometry-Semantics Synergy for Generalizable Neural Semantic Fields. 8884-8892 - Liang Du
, Yukai Shi
, Yan Chen
, Peng Zhou
, Yuhua Qian
:
Fast and Scalable Incomplete Multi-View Clustering with Duality Optimal Graph Filtering. 8893-8902 - Zhilin He
, Yawei Zhang
, Jingchang Mu
, Xiaoyue Gu
, Tianhao Gu
:
LiteGfm: A Lightweight Self-supervised Monocular Depth Estimation Framework for Artifacts Reduction via Guided Image Filtering. 8903-8912 - Chengyi Yang
, Wentao Liu
, Shisong Chen
, Jiayin Qi
, Aimin Zhou
:
Generating Prompts in Latent Space for Rehearsal-free Continual Learning. 8913-8922 - Choubo Ding
, Guansong Pang
:
Improving Out-of-Distribution Detection with Disentangled Foreground and Background Features. 8923-8931 - Yi Lu
, Shenghao Ren
, Qiu Shen
, Xun Cao
:
Leveraging RGB-Pressure for Whole-body Human-to-Humanoid Motion Imitation. 8932-8941 - Li Zhang
, Zean Han
, Yan Zhong
, Qiaojun Yu
, Xingyu Wu
, Xue Wang
, Rujing Wang
:
VoCAPTER: Voting-based Pose Tracking for Category-level Articulated Object via Inter-frame Priors. 8942-8951 - Jinpeng Yu
, Binbin Huang
, Yuxuan Zhang
, Huaxia Li
, Xu Tang
, Shenghua Gao
:
GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer. 8952-8961 - Sifan Wu
, Haipeng Chen
, Yifang Yin
, Sihao Hu
, Runyang Feng
, Yingying Jiao
, Ziqi Yang
, Zhenguang Liu
:
Joint-Motion Mutual Learning for Pose Estimation in Video. 8962-8971 - Jiaqi Wang
, Pichao Wang
, Yi Feng
, Huafeng Liu
, Chang Gao
, Liping Jing
:
Align2Concept: Language Guided Interpretable Image Recognition by Visual Prototype and Textual Concept Alignment. 8972-8981 - Siying Xiao
, Mao Ye
, Qichen He
, Shuaifeng Li
, Song Tang
, Xiatian Zhu
:
Adversarial Experts Model for Black-box Domain Adaptation. 8982-8991 - Yayun Wei
, Lei Cao
, Hao Li
, Yilin Dong
:
MB2C: Multimodal Bidirectional Cycle Consistency for Learning Robust Visual Neural Representations. 8992-9000 - Qiang Wang
, Ke Yan
, Shouhong Ding
:
Bilateral Adaptive Cross-Modal Fusion Prompt Learning for CLIP. 9001-9009 - Yifei Gao
, Jiaqi Wang
, Zhiyu Lin
, Jitao Sang
:
AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models. 9010-9018 - Haizhuang Liu
, Junbao Zhuo
, Chen Liang
, Jiansheng Chen
, Huimin Ma
:
Affinity3D: Propagating Instance-Level Semantic Affinity for Zero-Shot Point Cloud Semantic Segmentation. 9019-9028 - Zhaojian Li
, Bin Zhao
, Yuan Yuan
:
TAS: Personalized Text-guided Audio Spatialization. 9029-9037 - Congqi Cao
, Yueran Zhang
, Yating Yu
, Qinyi Lv
, Lingtong Min
, Yanning Zhang
:
Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition. 9038-9047 - Quanjiang Li
, Tingjin Luo
, Mingdie Jiang
, Jiahui Liao
, Zhangqi Jiang
:
Deep Incomplete Multi-View Network Semi-Supervised Multi-Label Learning with Unbiased Loss. 9048-9056 - Xinyue Liu
, Jiahui Wan
, Linlin Zong
, Bo Xu
:
Conditional Diffusion Model for Open-ended Video Question Answering. 9057-9066 - Yulin He
, Siqi Wang
, Wei Chen
, Tianci Xun
, Yusong Tan
:
Sniffing Threatening Open-World Objects in Autonomous Driving by Open-Vocabulary Models. 9067-9076 - Haosen Sun
, Yiming Li
, Xixiang Lyu
, Jing Ma
:
Learning from Distinction: Mitigating Backdoors Using a Low-Capacity Model. 9077-9086 - Shen Lin
, Xiaoyu Zhang
, Willy Susilo
, Xiaofeng Chen
, Jun Liu
:
GDR-GMA: Machine Unlearning via Direction-Rectified and Magnitude-Adjusted Gradients. 9087-9095 - Timin Gao
, Peixian Chen
, Mengdan Zhang
, Chaoyou Fu
, Yunhang Shen
, Yan Zhang
, Shengchuan Zhang
, Xiawu Zheng
, Xing Sun
, Liujuan Cao
, Rongrong Ji
:
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM. 9096-9105 - Shijie Li
, Yunbin Tu
, Qingyuan Xiang
, Zheng Li
:
MAGIC: Rethinking Dynamic Convolution Design for Medical Image Segmentation. 9106-9115 - Chao Wang, Yang Zhou, Liangtian He, Fenglai Lin, Hongming Chen, Liang-Jian Deng:
Illumination Distribution Prior for Low-light Image Enhancement. 9116-9125 - Pinhan Fu
, Xinyan Liang
, Yuhua Qian
, Qian Guo
, Zhifang Wei
, Wen Li
:
CoMO-NAS: Core-Structures-Guided Multi-Objective Neural Architecture Search for Multi-Modal Classification. 9126-9135 - Yi Liu
, Jiachen Li
, Yanchun Ma
, Qing Xie
, Yongjian Liu
:
HcaNet: Haze-concentration-aware Network for Real-scene Dehazing with Codebook Priors. 9136-9144 - Wenlong Liao
, Sunyuan Qiang
, Xianfei Li
, Xiaolei Chen
, Haoyu Wang
, Yanyan Liang
, Junchi Yan
, Tao He
, Pai Peng
:
CalibRBEV: Multi-Camera Calibration via Reversed Bird's-eye-view Representations for Autonomous Driving. 9145-9154 - Md. Tanvir Islam
, Nasir Rahim
, Saeed Anwar
, Muhammad Saqib
, Sambit Bakshi
, Khan Muhammad
:
HazeSpace2M: A Dataset for Haze Aware Single Image Dehazing. 9155-9164 - Xiaojun Chen
, Jimeng Lou
, Wenxi Huang
, Ting Wan
, Qin Zhang
, Min Yang
:
ReCoS: A Novel Benchmark for Cross-Modal Image-Text Retrieval in Complex Real-Life Scenarios. 9165-9174 - Shicheng Yang
, Xiaoxu Li
, Dongliang Chang
, Zhanyu Ma
, Jing-Hao Xue
:
Channel-Spatial Support-Query Cross-Attention for Fine-Grained Few-Shot Image Classification. 9175-9183 - Xiaorui Jiang, Zhongyi Ma, Yulin Fu, Yong Liao, Pengyuan Zhou:
Heterogeneity-Aware Federated Deep Multi-View Clustering towards Diverse Feature Representations. 9184-9193 - Jiyuan Zhang
, Kang Chen
, Shiyan Chen
, Yajing Zheng
, Tiejun Huang
, Zhaofei Yu
:
SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion. 9194-9203 - Jiangyi Wang
, Zhongyao Cheng
, Na Zhao
, Jun Cheng
, Xulei Yang
:
On-the-fly Point Feature Representation for Point Clouds Analysis. 9204-9213 - Kun Wang
, Hao Liu
, Lirong Jie
, Zixu Li
, Yupeng Hu
, Liqiang Nie
:
Explicit Granularity and Implicit Scale Correspondence Learning for Point-Supervised Video Moment Localization. 9214-9223 - Shaoqing Xu
, Shengyin Jiang
, Fang Li
, Li Liu
, Ziying Song
, Bo Yang
, Zhixin Yang
:
SparseInteraction: Sparse Semantic Guidance for Radar and Camera 3D Object Detection. 9224-9233 - Mahiro Ukai
, Shuhei Kurita
, Atsushi Hashimoto
, Yoshitaka Ushiku
, Nakamasa Inoue
:
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering. 9234-9243 - Shengwei Zhao
, Linhai Xu
, Yuying Liu
, Shaoyi Du
:
Multi-grained Correspondence Learning of Audio-language Models for Few-shot Audio Recognition. 9244-9252 - Song Wu
, Xiaoyu Wei
, Xinyue Chen
, Yazhou Ren
, Jing He
, Xiaorong Pu
:
Cross-View Mutual Learning for Semi-Supervised Medical Image Segmentation. 9253-9261 - Yunshan Qi
, Lin Zhu
, Yifan Zhao
, Nan Bao
, Jia Li
:
Deblurring Neural Radiance Fields with Event-driven Bundle Adjustment. 9262-9270 - Jingqiao Xiu
, Mengze Li
, Wei Ji
, Jingyuan Chen
, Hanbin Zhao
, Shin'ichi Satoh
, Roger Zimmermann
:
Hierarchical Debiasing and Noisy Correction for Cross-domain Video Tube Retrieval. 9271-9280 - Wenyu Yin
, Shuyuan Lin
, Yang Lu
, Hanzi Wang
:
Diverse Consensuses Paired with Motion Estimation-Based Multi-Model Fitting. 9281-9290 - Andong Lu
, Jiacong Zhao
, Chenglong Li
, Yun Xiao
, Bin Luo
:
Breaking Modality Gap in RGBT Tracking: Coupled Knowledge Distillation. 9291-9300 - Peng Wu, Xuerong Zhou, Guansong Pang, Zhiwei Yang, Qingsen Yan, Peng Wang, Yanning Zhang:
Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts. 9301-9310 - Pengfei Luo
, Tong Xu
, Che Liu
, Suojuan Zhang
, Linli Xu
, Minglei Li
, Enhong Chen
:
Bridging Gaps in Content and Knowledge for Multimodal Entity Linking. 9311-9320 - Shiyu Tang
, Zhaofan Luo
, Yifan Wang
, Lijun Wang
, Huchuan Lu
, Weibo Su
, Libo Liu
:
LOVD: Large-and-Open Vocabulary Object Detection. 9321-9329 - Cam-Van Thi Nguyen
, The-Son Le
, Anh-Tuan Mai
, Duc-Trong Le
:
Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition. 9330-9339 - Xinpeng Li
, Teng Wang
, Jian Zhao
, Shuyi Mao
, Jinbao Wang
, Feng Zheng
, Xiaojiang Peng
, Xuelong Li
:
Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer. 9340-9349 - Jingjia Huang
, Jingyan Tu
, Ge Meng
, Yingying Wang
, Yuhang Dong
, Xiaotong Tu
, Xinghao Ding
, Yue Huang
:
Efficient Perceiving Local Details via Adaptive Spatial-Frequency Information Integration for Multi-focus Image Fusion. 9350-9359 - Wonwoo Cho
, Kangyeol Kim
, Saemee Choi
, Jaegul Choo
:
Training Spatial-Frequency Visual Prompts and Probabilistic Clusters for Accurate Black-Box Transfer Learning. 9360-9368 - Ning Xu
, Yifei Gao
, Ting-Ting Zhang
, Hongshuo Tian
, An-An Liu
:
Cross-Modal Coherence-Enhanced Feedback Prompting for News Captioning. 9369-9377 - Yuzhen Li
, Zehang Deng
, Yuxin Cao
, Lihua Liu
:
GRFormer: Grouped Residual Self-Attention for Lightweight Single Image Super-Resolution. 9378-9386 - Muxin Pu
, Mei Kuan Lim
, Chun Yong Chong
:
Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition. 9387-9396 - Yue Duan
, Zhangxuan Gu
, Zhenzhe Ying
, Lei Qi
, Changhua Meng
, Yinghuan Shi
:
PC2: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval. 9397-9406 - Wei Feng
, Zhenwei Wu
, Qianqian Wang
, Bo Dong
, Quanxue Gao
:
Federated Fuzzy C-means with Schatten-p Norm Minimization. 9407-9416 - Tianjiao Wan
, Kele Xu
, Long Lan
, Zijian Gao
, Dawei Feng
, Bo Ding
, Huaimin Wang
:
Tracing Training Progress: Dynamic Influence Based Selection for Active Learning. 9417-9425 - Ruohao Guo
, Dantong Niu
, Liao Qu
, Yanyu Qi
, Ji Shi
, Wenzhen Yue
, Bowei Xing
, Taiyan Chen
, Xianghua Ying
:
Instance-Level Panoramic Audio-Visual Saliency Detection and Ranking. 9426-9434 - Shenglin Yin
, Kelu Yao
, Zhen Xiao
, Jieyi Long
:
Embracing Adaptation: An Effective Dynamic Defense Strategy Against Adversarial Examples. 9435-9444 - Zitong Huang
, Ze Chen
, Yuanze Li
, Bowen Dong
, Erjin Zhou
, Yong Liu
, Rick Siow Mong Goh
, Chun-Mei Feng
, Wangmeng Zuo
:
Class Balance Matters to Active Class-Incremental Learning. 9445-9454 - Hao Zhang
, Ee Yeo Keat
, Basura Fernando
:
RCA: Region Conditioned Adaptation for Visual Abductive Reasoning. 9455-9464 - Jian-Yu Jiang-Lin
, Kang-Yang Huang
, Ling Lo
, Yi-Ning Huang
, Terence Lin
, Jhih-Ciang Wu
, Hong-Han Shuai
, Wen-Huang Cheng
:
ReCorD: Reasoning and Correcting Diffusion for HOI Generation. 9465-9474 - Xiaze Zhang
, Ziheng Ding
, Qi Jing
, Ying Cheng
, Wenchao Ding
, Rui Feng
:
DeepPointMap2: Accurate and Robust LiDAR-Visual SLAM with Neural Descriptors. 9475-9484 - Hongyu Li
, Tianrui Hui
, Zihan Ding
, Jing Zhang
, Bin Ma
, Xiaoming Wei
, Jizhong Han
, Si Liu
:
Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding. 9485-9494 - Hengde Zhu
, Xiangyu Kong
, Weicheng Xie, Xin Huang
, Linlin Shen
, Lu Liu
, Hatice Gunes
, Siyang Song
:
PerFRDiff: Personalised Weight Editing for Multiple Appropriate Facial Reaction Generation. 9495-9504 - Shiqin Liu
, Chaozhuo Li
, Xi Zhang
, Minjun Zhao
, Yuanbo Xu
, Jiajun Bu
:
Deeply Fusing Semantics and Interactions for Item Representation Learning via Topology-driven Pre-training. 9505-9514 - Yongsen Zheng
, Guohua Wang
, Yang Liu
, Liang Lin
:
Diversity Matters: User-Centric Multi-Interest Learning for Conversational Movie Recommendation. 9515-9524 - Yuanchen Shi
, Fang Kong
:
Integrating Stickers into Multimodal Dialogue Summarization: A Novel Dataset and Approach for Enhancing Social Media Interaction. 9525-9534 - Andreea-Maria Oncescu
, João F. Henriques
, A. Sophia Koepke
:
Dissecting Temporal Understanding in Text-to-Audio Retrieval. 9535-9543 - Yuhang Su
, Wei Hu
, Fan Zhang
, Qiming Xu
:
AMG-Embedding: A Self-Supervised Embedding Approach for Audio Identification. 9544-9553 - Xue Li
, Jiong Yu
, Ziyang Li
, Hongchun Lu
, Ruifeng Yuan
:
Dr. CLIP: CLIP-Driven Universal Framework for Zero-Shot Sketch Image Retrieval. 9554-9562 - Yan Zhuang
, Yanlu Cai
, Weizhong Zhang
, Cheng Jin
:
Future Motion Dynamic Modeling via Hybrid Supervision for Multi-Person Motion Prediction Uncertainty Reduction. 9563-9572 - Yupeng Zhang
, Shuqi Zheng
, Ruize Han
, Yuzhong Feng
, Junhui Hou
, Linqi Song
, Wei Feng
, Liang Wan
:
Rethinking the One-shot Object Detection: Cross-Domain Object Search. 9573-9581 - Yuhan Wu
, Xiyu Meng
, Yang He
, Junru Zhang
, Haowen Zhang
, Yabo Dong
, Dongming Lu
:
Multi-view Self-Supervised Contrastive Learning for Multivariate Time Series. 9582-9590 - Dongding Lin
, Jian Wang
, Chak Tou Leong
, Wenjie Li
:
SCREEN: A Benchmark for Situated Conversational Recommendation. 9591-9600 - Xiaowan Hu
, Yiyi Chen
, Yan Li
, Minquan Wang
, Haoqian Wang
, Quan Chen
, Han Li
, Peng Jiang
:
Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval. 9601-9610 - Zheqi Lv
, Shaoxuan He
, Tianyu Zhan
, Shengyu Zhang
, Wenqiao Zhang
, Jingyuan Chen
, Zhou Zhao
, Fei Wu
:
Semantic Codebook Learning for Dynamic Recommendation Models. 9611-9620 - Geng Tu
, Feng Xiong
, Bin Liang
, Hui Wang
, Xi Zeng
, Ruifeng Xu
:
Multimodal Emotion Recognition Calibration in Conversations. 9621-9630 - Wuyou Xia
, Shengzhe Liu
, Rong Qin
, Guoli Jia
, Eunil Park
, Jufeng Yang
:
Perceive before Respond: Improving Sticker Response Selection by Emotion Distillation and Hard Mining. 9631-9640 - Yunshan Ma
, Yingzhi He
, Wenjun Zhong
, Xiang Wang
, Roger Zimmermann
, Tat-Seng Chua
:
CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling. 9641-9649 - Zixian Gao
, Disen Hu
, Xun Jiang
, Huimin Lu
, Heng Tao Shen
, Xing Xu
:
Enhanced Experts with Uncertainty-Aware Routing for Multimodal Sentiment Analysis. 9650-9659 - Zhenyang Li
, Fan Liu
, Yinwei Wei
, Zhiyong Cheng
, Liqiang Nie
, Mohan S. Kankanhalli
:
Attribute-driven Disentangled Representation Learning for Multimodal Recommendation. 9660-9669 - Ting Fu
, Yu-Wei Zhan
, Chong-Yu Zhang
, Xin Luo
, Zhen-Duo Chen
, Yongxin Wang
, Xun Yang
, Xin-Shun Xu
:
FedCAFE: Federated Cross-Modal Hashing with Adaptive Feature Enhancement. 9670-9679 - Feng Zhu
, Xinxing Yang
, Longfei Li
, Jun Zhou
:
An Active Masked Attention Framework for Many-to-Many Cross-Domain Recommendations. 9680-9689 - Zehao Qi
, Ruixu Zhang
, Xinyi Hu
, Wenxuan Liu
, Zheng Wang
:
Predicting the Unseen: A Novel Dataset for Hidden Intention Localization in Pre-abnormal Analysis. 9690-9698 - Ding Wang
, Wei Zhou
, Songlin Hu
:
Information Diffusion Prediction with Graph Neural Ordinary Differential Equation Network. 9699-9708 - Jian Chen
, Wei Wang
, Yuzhu Hu
, Junxin Chen
, Han Liu
, Xiping Hu
:
TGCA-PVT: Topic-Guided Context-Aware Pyramid Vision Transformer for Sticker Emotion Recognition. 9709-9718 - Rui Yang
, Shuang Wang
, Jianwei Tao
, Yingping Han
, Qiaoling Lin
, Yanhe Guo
, Biao Hou
, Licheng Jiao
:
Accurate and Lightweight Learning for Specific Domain Image-Text Retrieval. 9719-9728 - Xianbing Zhao
, Lizhen Qu
, Tao Feng
, Jianfei Cai
, Buzhou Tang
:
Learning in Order! A Sequential Strategy to Learn Invariant Features for Multimodal Sentiment Analysis. 9729-9738 - Yutong Wang
, Sidan Zhu
, Hongteng Xu
, Dixin Luo
:
An Inverse Partial Optimal Transport Framework for Music-guided Trailer Generation. 9739-9748 - Haonan Zheng
, Wen Jiang
, Xinyang Deng
, Wenrui Li
:
Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models. 9749-9758 - Jiade Chen
, Jin Wang
, Yunhui Shi
, Nam Ling
, Baocai Yin
:
MVP-Net: Multi-View Depth Image Guided Cross-Modal Distillation Network for Point Cloud Upsampling. 9759-9768 - Zuoyan Zhao
, Hui Xue
, Pengfei Fang
, Shipeng Zhu
:
PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution. 9769-9778 - Yuzhi Huang
, Chenxin Li
, Zixu Lin
, Hengyu Liu
, Haote Xu
, Yifan Liu
, Yue Huang
, Xinghao Ding
, Xiaotong Tu
, Yixuan Yuan
:
P2SAM: Probabilistically Prompted SAMs Are Efficient Segmentator for Ambiguous Medical Images. 9779-9788 - Ran Yi
, Haokun Zhu
, Teng Hu
, Yu-Kun Lai
, Paul L. Rosin
:
AesStyler: Aesthetic Guided Universal Style Transfer. 9789-9798 - Wenxuan Wang
, Chenglei Wang
, Huihui Qi
, Menghao Ye
, Xuelin Qian
, Peng Wang
, Yanning Zhang
:
Sustainable Self-evolution Adversarial Training. 9799-9808 - Jian-Jun Qiao
, Meng-Yu Duan
, Xiao Wu
, Wei Li
:
CAPNet: Cartoon Animal Parsing with Spatial Learning and Structural Modeling. 9809-9817 - Xuanyu Zhang
, Youmin Xu
, Runyi Li
, Jiwen Yu
, Weiqi Li
, Zhipei Xu
, Jian Zhang
:
V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection. 9818-9827 - Xian Zhong
, Shengwang Hu
, Wenxuan Liu
, Wenxin Huang
, Jianhao Ding
, Zhaofei Yu
, Tiejun Huang
:
Towards Low-latency Event-based Visual Recognition with Hybrid Step-wise Distillation Spiking Neural Networks. 9828-9836 - Junqi Shi
, Mingyi Jiang
, Ming Lu
, Tong Chen
, Xun Cao
, Zhan Ma
:
HINER: Neural Representation for Hyperspectral Image. 9837-9846 - Yaqiang Wu
, Zhen Xu
, Yong Duan
, Yanlai Wu
, Qinghua Zheng
, Hui Li
, Xiaochen Hu
, Lianwen Jin
:
RDLNet: A Novel and Accurate Real-world Document Localization Method. 9847-9855 - Xiao Teng
, Xingyu Shen
, Kele Xu
, Long Lan
:
Enhancing Unsupervised Visible-Infrared Person Re-Identification with Bidirectional-Consistency Gradual Matching. 9856-9865 - Zhen Zhang
, Jing Xiao
, Liang Liao
, Mi Wang
:
RefScale: Multi-temporal Assisted Image Rescaling in Repetitive Observation Scenarios. 9866-9874 - Chaoxiang He
, Xiaofan Bai
, Xiaojing Ma
, Bin B. Zhu
, Pingyi Hu
, Jiayun Fu
, Hai Jin
, Dongmei Zhang
:
Towards Stricter Black-box Integrity Verification of Deep Neural Network Models. 9875-9884 - Peibin Chen
, Xijin Zhang
, Daniel Kang Du
:
SimpliGuard: Robust Mesh Simplification In the Wild. 9885-9893 - Shixuan Gao
, Pingping Zhang, Tianyu Yan
, Huchuan Lu
:
Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection. 9894-9903 - Panjun Duan
, Yang Zhao
, Yuan Chen
, Wei Jia
, Zhao Zhang
, Ronggang Wang
:
Blind Video Bit-Depth Expansion. 9904-9912 - Xiaoheng Tan
, Jiabin Zhang
, Yuhui Quan
, Jing Li
, Yajing Wu
, Zilin Bian
:
Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy. 9913-9922 - Yujia Wang
, Zhongxu Wang
, Hua Huang
:
AutoSFX: Automatic Sound Effect Generation for Videos. 9923-9932 - Weiguang Zhang
, Qiufeng Wang
, Kaizhu Huang
, Xiaowei Huang
, Fengjun Guo
, Xiaomeng Gu
:
Document Registration: Towards Automated Labeling of Pixel-Level Alignment Between Warped-Flat Documents. 9933-9942 - Hao Yang
, Min Wang
, Zhengfei Yu
, Zhi Zeng
, Mingrui Lao
, Yun Zhou
:
Maximizing Feature Distribution Variance for Robust Neural Networks. 9943-9951 - Kai Han
, Jin Wang
, Yunhui Shi
, Nam Ling
, Baocai Yin
:
D3U-Net: Dual-Domain Collaborative Optimization Deep Unfolding Network for Image Compressive Sensing. 9952-9960 - Jiangtong Zhu
, Zhao Yang
, Yinan Shi
, Jianwu Fang
, Jianru Xue
:
IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction. 9961-9969 - Jianjun Xiang
, Yuanjie Dang
, Peng Chen
, Ronghua Liang
, Ruohong Huan
, Nan Gao
:
Semantic-Aware and Quality-Aware Interaction Network for Blind Video Quality Assessment. 9970-9979 - Zerui Zhang
, Jun Yu
, Liangxian Cui
, Qiang Ling
, Tianyu Liu
:
Part-level Reconstruction for Self-Supervised Category-level 6D Object Pose Estimation with Coarse-to-Fine Correspondence Optimization. 9980-9988 - Yachun Mi
, Yan Shu
, Yu Li
, Chen Hui
, Puchao Zhou
, Shaohui Liu
:
CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings. 9989-9998 - Xuntao Liu
, Yuzhou Yang
, Haoyue Wang
, Qichao Ying
, Zhenxing Qian
, Xinpeng Zhang
, Sheng Li
:
Multi-view Feature Extraction via Tunable Prompts is Enough for Image Manipulation Localization. 9999-10007 - Junfeng Yang
, Jing Fu
, Zhen Zhang
, Limei Liu
, Qin Li
, Wei Zhang
, Wenzhi Cao
:
Align-IQA: Aligning Image Quality Assessment Models with Diverse Human Preferences via Customizable Guidance. 10008-10017 - Zehang Lin
, Jiayuan Xie
, Zhenguo Yang
, Yi Yu
, Qing Li
:
Generalized News Event Discovery via Dynamic Augmentation and Entropy Optimization. 10018-10026 - Jiahao Cui
, Wei Jiang
, Zhan Peng
, Zhiyu Pan
, Zhiguo Cao
:
Exposure Completing for Temporally Consistent Neural High Dynamic Range Video Rendering. 10027-10035 - Lei Han
, Xuesong Zhang
:
Scalable Super-Resolution Neural Operator. 10036-10045 - Ling Zhang
, Yidong Ma
, Zhi Jiang
, Weilei He
, Zhongyun Bao
, Gang Fu
, Wenju Xu
, Chunxia Xiao
:
HighlightRemover: Spatially Valid Pixel Learning for Image Specular Highlight Removal. 10046-10054 - Yuhang Zhou
, Yushu Zhang
, Leo Yu Zhang
, Zhongyun Hua
:
DERD: Data-free Adversarial Robustness Distillation through Self-adversarial Teacher Group. 10055-10064 - Shuman Zhuang
, Sujia Huang
, Wei Huang, Yuhong Chen
, Zhihao Wu
, Ximeng Liu
:
Enhancing Multi-view Graph Neural Network with Cross-view Confluent Message Passing. 10065-10074 - Fu Rong
, Wenjin Peng
, Meng Lan
, Qian Zhang
, Lefei Zhang
:
Driving Scene Understanding with Traffic Scene-Assisted Topology Graph Transformer. 10075-10084 - Chang'an Yi
, Haotian Chen
, Yifan Zhang
, Yonghui Xu
, Yan Zhou
, Lizhen Cui
:
From Question to Exploration: Can Classic Test-Time Adaptation Strategies Be Effectively Applied in Semantic Segmentation? 10085-10094 - Zehao Chen
, Zhan Lu
, De Ma
, Huajin Tang
, Xudong Jiang
, Qian Zheng
, Gang Pan
:
Event-ID: Intrinsic Decomposition Using an Event Camera. 10095-10104 - Xu Zhang
, Fan Ni
, Guannan Dong
, Aichun Zhu
, Jianhui Wu
, Mingcheng Ni
, Hui Liu
:
TVPR: Text-to-Video Person Retrieval and a New Benchmark. 10105-10113 - Haoyu Shi
, Huaiwen Zhang
:
Modal-Enhanced Semantic Modeling for Fine-Grained 3D Human Motion Retrieval. 10114-10123 - Hongyu Zhu
, Sichu Liang
, Wentao Hu
, Fangqi Li, Ju Jia
, Shi-Lin Wang
:
Reliable Model Watermarking: Defending against Theft without Compromising on Evasion. 10124-10133 - Qian Qiao
, Yu Xie
, Jun Gao
, Tianxiang Wu
, Shaoyao Huang
, Jiaqing Fan
, Ziqiang Cao
, Zili Wang
, Yue Zhang
:
DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training. 10134-10143 - Yi Liu
, Xinyi Li
, Wenjing Shuai
:
3D Scene De-occlusion in Neural Radiance Fields: A Framework for Obstacle Removal and Realistic Inpainting. 10144-10153 - Xuannan Liu
, Peipei Li
, Huaibo Huang
, Zekun Li
, Xing Cui
, Jiahao Liang
, Lixiong Qin
, Weihong Deng
, Zhaofeng He
:
FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs. 10154-10163 - Yalan Qin
, Li Qian
:
Fast Elastic-Net Multi-view Clustering: A Geometric Interpretation Perspective. 10164-10172 - Xiaojiao Guo
, Xuhang Chen
, Shenghong Luo
, Shuqiang Wang
, Chi-Man Pun
:
Dual-Hybrid Attention Network for Specular Highlight Removal. 10173-10181 - Yiyang Luo
, Ke Lin
, Chao Gu
:
Context-Aware Indoor Point Cloud Object Generation through User Instructions. 10182-10190 - Zhangli Hu
, Ye Chen
, Zhongyin Zhao
, Jinfan Liu
, Bilian Ke
, Bingbing Ni
:
Towards Artist-Like Painting Agents with Multi-Granularity Semantic Alignment. 10191-10199 - Zixuan Wang
, Jiayi Li
, Xiaoyu Qin
, Shikun Sun
, Songtao Zhou
, Jia Jia
, Jiebo Luo
:
DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis. 10200-10209 - Sooho Kim
, Soyeon Hong
, Kyungsoo Park
, Hyunsouk Cho
, Kyung-Ah Sohn
:
OmniStitch: Depth-Aware Stitching Framework for Omnidirectional Vision with Multiple Cameras. 10210-10219 - Kaijiang Li
, Hao Li
, Haining Li
, Peisen Wang
, Chunyi Guo
, Wenfeng Jiang
:
SIRLUT: Simulated Infrared Fusion Guided Image-adaptive 3D Lookup Tables for Lightweight Image Enhancement. 10220-10228 - Bolin Jiang
, Yuqiu Xie
, Jiawei Li
, Naiqi Li
, Bin Chen
, Shu-Tao Xia
:
IGSPAD: Inverting 3D Gaussian Splatting for Pose-agnostic Anomaly Detection. 10229-10237 - Guobiao Li
, Sheng Li
, Zhenxing Qian
, Xinpeng Zhang
:
Cover-separable Fixed Neural Network Steganography via Deep Generative Models. 10238-10247 - Baorui Ma
, Yu-Shen Liu
, Matthias Zwicker
, Zhizhong Han
:
Inferring 3D Occupancy Fields through Implicit Reasoning on Silhouette Images. 10248-10257 - Rui Li
, Yishu Liu
, Huafeng Li
, Jinxing Li
, Guangming Lu
:
Prototype-Guided Dual-Transformer Reasoning for Video Individual Counting. 10258-10267 - Tao Wang
, Yushu Zhang
, Xiangli Xiao
, Lin Yuan
, Zhihua Xia
, Jian Weng
:
Make Privacy Renewable! Generating Privacy-Preserving Faces Supporting Cancelable Biometric Recognition. 10268-10276 - Green Rosh K. S
, B. H. Pawan Prasad
, Lokesh R. Boregowda
, Kaushik Mitra
:
R2SFD: Improving Single Image Reflection Removal using Semantic Feature Dictionary. 10277-10286 - Jiaming Shen
, Kun Hu
, Wei Bao
, Chang Wen Chen
, Zhiyong Wang
:
Bridging the Gap: Sketch-Aware Interpolation Network for High-Quality Animation Sketch Inbetweening. 10287-10295 - Yanghao Su
, Jie Zhang
, Ting Xu
, Tianwei Zhang
, Weiming Zhang
, Nenghai Yu
:
Model X-ray: Detecting Backdoored Models via Decision Boundary. 10296-10305 - Lize Zhou
, Xiaoqi Wang
, Jian Xiong
, Xianzhong Long
, Hao Gao
:
Towards Distortion-Debiased Blind Image Quality Assessment. 10306-10315 - Benhui Zhang
, Junyu Gao
, Yuan Yuan
:
A Descriptive Basketball Highlight Dataset for Automatic Commentary Generation. 10316-10325 - Cong Wang
, Liyan Wang
, Jie Mu
, Chengjin Yu
, Wei Wang
:
Progressive Local and Non-Local Interactive Networks with Deeply Discriminative Training for Image Deraining. 10326-10335 - Kaifang Yang
, Xinrong Zhao
, Yanchao Gong
:
Semantic Aware Just Noticeable Differences for VVC Compressed Text Screen Content Images. 10336-10344 - Jiaxuan Wu
, Zhengxian Wu
, Yiming Xue
, Juan Wen
, Wanli Peng
:
Generative Text Steganography with Large Language Model. 10345-10353 - Yuchen Wang
, Xingyu Zhu
, Guanhui Ye
, Shiyao Zhang
, Xuetao Wei
:
Achieving Resolution-Agnostic DNN-based Image Watermarking: A Novel Perspective of Implicit Neural Representation. 10354-10362 - Renshu Gu
, Jiajun Zhu
, Yixuan Si
, Fei Gao
, Jiamin Xu
, Gang Xu
:
3D Human Pose Estimation from Multiple Dynamic Views via Single-view Pretraining with Procrustes Alignment. 10363-10372 - Yang Ding
, Yi Dai
, Xin Wang
, Ling Feng
, Lei Cao
, Huijun Zhang
:
Integrating Content-Semantics-World Knowledge to Detect Stress from Videos. 10373-10381 - Xintian Mao
, Jiansheng Wang
, Xingran Xie
, Qingli Li
, Yan Wang
:
LoFormer: Local Frequency Transformer for Image Deblurring. 10382-10391 - Mingjin Zhang
, Chi Zhang
, Qiming Zhang
, Yunsong Li
, Xinbo Gao
, Jing Zhang
:
Unleashing the Power of Generic Segmentation Model: A Simple Baseline for Infrared Small Target Detection. 10392-10401 - Honglin Yuan
, Shiyun Lai
, Xingfeng Li
, Jian Dai
, Yuan Sun
, Zhenwen Ren
:
Robust Prototype Completion for Incomplete Multi-view Clustering. 10402-10411 - Changhao Peng
, Wei Gao
:
Laplacian Matrix Learning for Point Cloud Attribute Compression with Ternary Search-Based Adaptive Block Partition. 10412-10420 - Zhongwei Xuan
, Zunjie Zhu
, Shuai Wang
, Haibing Yin
, Hongkui Wang
, Ming Lu
:
Superpixel-based Efficient Sampling for Learning Neural Fields from Large Input. 10421-10430 - Zhaolin Wan
, Qiushuang Yang
, Zhiyang Li
, Xiaopeng Fan
, Wangmeng Zuo
, Debin Zhao
:
Dual-stream Perception-driven Blind Quality Assessment for Stereoscopic Omnidirectional Images. 10431-10439 - Weixuan Tang
, Haoyu Yang
, Yuan Rao
, Zhili Zhou
, Fei Peng
:
Dig a Hole and Fill in Sand: Adversary and Hiding Decoupled Steganography. 10440-10448 - Bin Wang
, Meishan Zhang
, Hao Fei
, Yu Zhao
, Bobo Li
, Shengqiong Wu
, Wei Ji
, Min Zhang
:
SpeechEE: A Novel Benchmark for Speech Event Extraction. 10449-10458 - Shouyu Chen
, Liang Hu
, Tangwei Ye
, Zhongyuan Lai
, Qi Zhang
, Ke Liu
, Usman Naseem
, Ke Sun
, Nengjun Zhu
:
VR-DiagNet: Medical Volumetric and Radiomic Diagnosis Networks with Interpretable Clinician-like Optimizing Visual Inspection. 10459-10467 - Minjing Yu
, Delong Pang
, Ziwen Kang
, Zhiyao Sun
, Tian Lv
, Jenny Sheng
, Ran Yi
, Yu-Hui Wen
, Yong-Jin Liu
:
ECAvatar: 3D Avatar Facial Animation with Controllable Identity and Emotion. 10468-10476 - Zhenyu Bao
, Guibiao Liao
, Zhongyuan Zhao
, Kanglin Liu
, Qing Li
, Guoping Qiu
:
3D Reconstruction and Novel View Synthesis of Indoor Environments Based on a Dual Neural Radiance Field. 10477-10486 - Zimo Liu
, Kangjun Liu
, Mingyue Guo
, Shiliang Zhang
, Yaowei Wang
:
CoTuning: A Large-Small Model Collaborating Distillation Framework for Better Model Generalization. 10487-10496 - Yanbin Deng
, Zheng Li
, Ning Xie
, Wei Zhang
:
PIMT: Physics-Based Interactive Motion Transition for Hybrid Character Animation. 10497-10505 - Kang Shen
, Haifeng Xia
, Guangxing Geng
, Guangyue Geng
, Siyu Xia
, Zhengming Ding
:
DEITalk: Speech-Driven 3D Facial Animation with Dynamic Emotional Intensity Modeling. 10506-10514 - Tianyi Wang
, Mengxiao Huang
, Harry Cheng
, Xiao Zhang
, Zhiqi Shen
:
LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks. 10515-10524 - Lintao Dong
, Wei Zhai
, Zheng-Jun Zha
:
UniDense: Unleashing Diffusion Models with Meta-Routers for Universal Few-Shot Dense Prediction. 10525-10534 - Henglei Lv
, Jiayu Xiao
, Liang Li
:
Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization. 10535-10543 - Guoqing Zhu, Honghu Pan, Qiang Wang, Chao Tian, Chao Yang, Zhenyu He:
Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model. 10544-10553 - Qiao Li
, Xiaomeng Fu
, Xi Wang
, Jin Liu
, Xingyu Gao
, Jiao Dai
, Jizhong Han
:
Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models. 10554-10562 - Zhaoda Ye
, Xinhan Zheng
, Yang Liu
, Yuxin Peng
:
RelScene: A Benchmark and baseline for Spatial Relations in text-driven 3D Scene Generation. 10563-10571 - Shilong Tian
, Hong Chen
, Chengtao Lv
, Yu Liu
, Jinyang Guo
, Xianglong Liu
, Shengxi Li
, Hao Yang
, Tao Xie
:
QVD: Post-training Quantization for Video Diffusion Models. 10572-10581 - Jingjing Xie
, Yuxin Zhang
, Mingbao Lin
, Liujuan Cao
, Rongrong Ji
:
Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation. 10582-10591 - Pengfei Zhou
, Fangxiang Feng
, Guang Liu
, Ruifan Li
, Xiaojie Wang
:
DiffHarmony++: Enhancing Image Harmonization with Harmony-VAE and Inverse Harmonization Model. 10592-10601 - Qi Xu
, Xuanye Fang
, Yaxin Li
, Jiangrong Shen
, De Ma
, Yi Xu
, Gang Pan
:
RSNN: Recurrent Spiking Neural Networks for Dynamic Spatial-Temporal Information Processing. 10602-10610 - Wei Yang
, Tengfei Huo
, Zhiqiang Liu
:
Enhancing Transformer-based Semantic Matching for Few-shot Learning through Weakly Contrastive Pre-training. 10611-10620 - Stanislav Frolov
, Brian B. Moser
, Sebastian Palacio
, Andreas Dengel
:
ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation. 10621-10629 - Rongjie Huang
, Yongqi Wang
, Ruofan Hu
, Xiaoshan Xu
, Zhiqing Hong
, Dongchao Yang
, Xize Cheng
, Zehan Wang
, Ziyue Jiang
, Zhenhui Ye
, Luping Liu
, Siqi Zheng
, Zhou Zhao
:
VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation. 10630-10639 - Yuran Wang
, Zhijing Wan
, Yansheng Qiu
, Zheng Wang
:
Devil is in Details: Locality-Aware 3D Abdominal CT Volume Generation for Self-Supervised Organ Segmentation. 10640-10648 - Minghui Li
, Jiangxiong Wang
, Hao Zhang
, Ziqi Zhou
, Shengshan Hu
, Xiaobing Pei
:
Transferable Adversarial Facial Images for Privacy Protection. 10649-10658 - Ming Tao
, Bing-Kun Bao
, Hao Tang
, Yaowei Wang
, Changsheng Xu
:
CoIn: A Lightweight and Effective Framework for Story Visualization and Continuation. 10659-10668 - Xulu Zhang
, Wengyu Zhang
, Xiaoyong Wei
, Jinlin Wu
, Zhaoxiang Zhang
, Zhen Lei
, Qing Li
:
Generative Active Learning for Image Synthesis Personalization. 10669-10677 - Zhijun Zhai
, Zengmao Wang
, Xiaoxiao Long
, Kaixuan Zhou
, Bo Du
:
SAT3D: Image-driven Semantic Attribute Transfer in 3D. 10678-10687 - Zihan Huang
, Xinyu Shi
, Zecheng Hao
, Tong Bu
, Jianhao Ding
, Zhaofei Yu
, Tiejun Huang
:
Towards High-performance Spiking Transformers from ANN to SNN Conversion. 10688-10697 - Jialiang Li
, Haoyue Wang
, Sheng Li
, Zhenxing Qian
, Xinpeng Zhang
, Athanasios V. Vasilakos
:
Are handcrafted filters helpful for attributing AI-generated images? 10698-10706 - Peng Ding
, Jingyu Wu
, Jun Kuang
, Dan Ma
, Xuezhi Cao
, Xunliang Cai
, Shi Chen
, Jiajun Chen
, Shujian Huang
:
Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs. 10707-10715 - Shaodong Wang
, Yunyang Ge
, Liuhan Chen
, Haiyang Zhou
, Qian Wang
, Xinhua Cheng
, Li Yuan
:
Prompt2Poster: Automatically Artistic Chinese Poster Creation from Prompt Only. 10716-10724 - Weijie Wang
, Jichao Zhang
, Chang Liu
, Xia Li
, Xingqian Xu
, Humphrey Shi
, Nicu Sebe
, Bruno Lepri
:
UVMap-ID: A Controllable and Personalized UV Map Generative Model. 10725-10734 - Tianshuo Peng
, Zuchao Li
, Lefei Zhang
, Hai Zhao
, Ping Wang
, Bo Du
:
Multi-modal Auto-regressive Modeling via Visual Tokens. 10735-10744 - Haining Wang
, Na Li
, Huijie Zhao
, Yan Wen
, Yi Su
, Yuqiang Fang
:
MappingFormer: Learning Cross-modal Feature Mapping for Visible-to-infrared Image Translation. 10745-10754 - Xiangping Zheng
, Xiuxin Hao
, Bo Wu
, Xigang Bao
, Xuan Zhang
, Wei Li
, Xun Liang
:
A Sample-driven Selection Framework: Towards Graph Contrastive Networks with Reinforcement Learning. 10755-10764 - Peiyong Wang
, Bohan Xiao
, Qisheng He
, Carri Glide-Hurst
, Ming Dong
:
Score-Based Image-to-Image Brownian Bridge. 10765-10773 - Tingfeng Cao
, Junsheng Kong
, Xue Zhao
, Wenqing Yao
, Junwei Ding
, Jinhui Zhu
, Jiandong Zhang
:
Product2IMG: Prompt-Free E-commerce Product Background Generation with Diffusion Model and Self-Improved LMM. 10774-10783 - Zhenyu Xie
, Haoye Dong
, Yufei Gao
, Zehua Ma
, Xiaodan Liang
:
DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models. 10784-10793 - Chencan Fu
, Yabiao Wang
, Jiangning Zhang
, Zhengkai Jiang
, Xiaofeng Mao
, Jiafu Wu
, Weijian Cao
, Chengjie Wang
, Yanhao Ge
, Yong Liu
:
MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion. 10794-10803 - Wei Lou
, Guanbin Li
, Xiang Wan
, Haofeng Li
:
Multi-modal Denoising Diffusion Pre-training for Whole-Slide Image Classification. 10804-10813 - Xingyi Li
, Yizheng Wu
, Jun Cen
, Juewen Peng
, Kewei Wang
, Ke Xian
, Zhe Wang
, Zhiguo Cao
, Guosheng Lin
:
iControl3D: An Interactive System for Controllable 3D Scene Generation. 10814-10823 - Yibin Wang
, Weizhong Zhang
, Jianwei Zheng
, Cheng Jin
:
PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering. 10824-10832 - Jiancheng Huang
, Mingfu Yan
, Songyan Chen
, Yi Huang
, Shifeng Chen
:
MagicFight: Personalized Martial Arts Combat Video Generation. 10833-10842 - Longfei Lu
, Huachen Gao
, Tao Dai
, Yaohua Zha
, Zhi Hou
, Junta Wu
, Shu-Tao Xia
:
Large Point-to-Gaussian Model for Image-to-3D Generation. 10843-10852 - Mingzhen Sun
, Weining Wang
, Yanyuan Qiao
, Jiahui Sun
, Zihan Qin
, Longteng Guo
, Xinxin Zhu
, Jing Liu
:
MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation. 10853-10861 - Ruowei Wang
, Jiaqi Li
, Dan Zeng
, Xueqi Ma
, Zixiang Xu
, Jianwei Zhang
, Qijun Zhao
:
GenUDC: High Quality 3D Mesh Generation With Unsigned Dual Contouring Representation. 10862-10871 - Xiaopei Zhu, Peiyang Xu, Guanning Zeng, Yinpeng Dong, Xiaolin Hu:
Natural Language Induced Adversarial Images. 10872-10881 - Xin Lu
, Chuanqing Zhuang
, Zhengda Lu
, Yiqun Wang
, Jun Xiao
:
FC-4DFS: Frequency-controlled Flexible 4D Facial Expression Synthesizing. 10882-10890 - Jiaxing Li, Hongbo Zhao, Yijun Wang, Jianxin Lin:
Towards Photorealistic Video Colorization via Gated Color-Guided Image Diffusion Models. 10891-10900 - Mengmeng Ge
, Xu Jia
, Takashi Isobe
, Xiaomin Li
, Qinghe Wang
, Jing Mu
, Dong Zhou
, Li Wang
, Huchuan Lu
, Lu Tian
, Ashish Sirasao
, Emad Barsoum
:
Customizing Text-to-Image Generation with Inverted Interaction. 10901-10909 - Yunqiu Xu
, Linchao Zhu
, Yi Yang:
GG-Editor: Locally Editing 3D Avatars with Multimodal Large Language Model Guidance. 10910-10919 - Xianqiang Lyu
, Hui Liu
, Junhui Hou
:
RainyScape: Unsupervised Rainy Scene Reconstruction using Decoupled Neural Rendering. 10920-10929 - Jingyu Lin
, Guiqin Zhao
, Jing Xu
, Guoli Wang
, Zejin Wang
, Antitza Dantcheva
, Lan Du
, Cunjian Chen
:
DiffTV: Identity-Preserved Thermal-to-Visible Face Translation via Feature Alignment and Dual-Stage Conditions. 10930-10938 - Yifan Li
, Yuhang Bai
, Shuai Yang
, Jiaying Liu
:
COCO-LC: Colorfulness Controllable Language-based Colorization. 10939-10947 - Yiying Bao
, Hao Zhou
, Chao Peng
, Chenyang Xu
, Shuo Shi
, Kecheng Cai
:
Boundary-Aware Periodicity-based Sparsification Strategy for Ultra-Long Time Series Forecasting. 10948-10956 - Ziyi Dong
, Yao Xiao
, Pengxu Wei
, Liang Lin
:
Decoder-Only LLMs are Better Controllers for Diffusion Models. 10957-10965 - Zhenqi Dai
, Ting Liu
, Xingxing Zhang
, Yunchao Wei
, Yanning Zhang
:
One-shot In-context Part Segmentation. 10966-10975 - Ziyang Yuan
, Mingdeng Cao
, Xintao Wang
, Zhongang Qi
, Chun Yuan
, Ying Shan
:
CustomNet: Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models. 10976-10984 - Kyusun Cho
, Joungbin Lee
, Heeji Yoon
, Yeobin Hong
, Jaehoon Ko
, Sangjun Ahn
, Seungryong Kim
:
GaussianTalker: Real-Time Talking Head Synthesis with 3D Gaussian Splatting. 10985-10994 - Huanpeng Chu
, Wei Wu
, Chengjie Zang
, Kun Yuan
:
QNCD: Quantization Noise Correction for Diffusion Models. 10995-11003 - Dan Wang
, Xinrui Cui
:
InNeRF: Learning Interpretable Radiance Fields for Generalizable 3D Scene Representation and Rendering. 11004-11012 - Zhongyi Fan
, Zixin Yin
, Gang Li
, Yibing Zhan
, Heliang Zheng
:
DreamBooth++: Boosting Subject-Driven Generation via Region-Level References Packing. 11013-11021 - Zhenghao Chen
, Luping Zhou
, Zhihao Hu
, Dong Xu
:
Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression. 11022-11031 - Lingfei Ren
, Ruimin Hu
, Zheng Wang
, Yilin Xiao
, Dengshi Li
, Junhang Wu
, Yilong Zang
, Jinzhang Hu
, Zijun Huang
:
Heterophilic Graph Invariant Learning for Out-of-Distribution of Fraud Detection. 11032-11040 - Haicheng Liao
, Haoyu Sun
, Huanming Shen
, Chengyue Wang
, Chunlin Tian
, KaHou Tam
, Li Li
, Chengzhong Xu
, Zhenning Li
:
CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions. 11041-11050 - Lehao Lin
, Hong Kang
, Xinyao Sun
, Wei Cai
:
SemNFT: A Semantically Enhanced Decentralized Middleware for Digital Asset Immortality. 11051-11059 - Guogang Zhu
, Xuefeng Liu
, Jianwei Niu
, Shaojie Tang
, Xinghao Wu
, Jiayuan Zhang
:
DualFed: Enjoying both Generalization and Personalization in Federated Learning via Hierachical Representations. 11060-11069 - Hui Zeng
, Minrui Xu
, Tongqing Zhou
, Xinyi Wu
, Jiawen Kang
, Zhiping Cai
, Dusit Niyato
:
One-shot-but-not-degraded Federated Learning. 11070-11079 - Miao Cao
, Lishun Wang
, Huan Wang
, Guoqing Wang
, Xin Yuan
:
Towards Real-time Video Compressive Sensing on Mobile Devices. 11080-11088 - Daheng Yin
, Jianxin Shi
, Miao Zhang
, Zhaowu Huang
, Jiangchuan Liu
, Fang Dong
:
FSVFG: Towards Immersive Full-Scene Volumetric Video Streaming with Adaptive Feature Grid. 11089-11098 - Huanhuan Zhang
, Liu zhuo
, Haotian Li
, Anfu Zhou
, Chuanming Wang
, Huadong Ma
:
AraLive: Automatic Reward Adaption for Learning-based Live Video Streaming. 11099-11108 - Jun Dan
, Weiming Liu
, Mushui Liu
, Chunfeng Xie
, Shunjie Dong
, Guofang Ma
, Yanchao Tan
, Jiazheng Xing
:
HOGDA: Boosting Semi-supervised Graph Domain Adaptation via High-Order Structure-Guided Adaptive Feature Alignment. 11109-11118
Reproducibility
- Xin Jin
, Longteng Jiang
, Yihao Zhang
, Lihua Lu
, Xiaobo Gao
, Boyan Dong
:
Reproducibility Companion Paper: Aesthetics-Driven Virtual Time-Lapse Photography Generation. 11119-11122
Panel
- Zi Helen Huang
, Phoebe Chen
, Shuicheng Yan
:
Generative AI in Multimedia: Challenges and Opportunities for Academic and Industrial Impact. 11123-11124
Industry Session
- Jianquan Liu
, Balu Adsumilli
, Yukiko Yanagawa
, Haiwei Dong
:
An Innovative Industry Program in A New Era of Multimedia with Generative AI. 11125-11126
Doctoral Symposium
- Wenmiao Hu
:
Utilizing Very High-resolution Optical RGB Satellite Imagery in Geo-information Extraction for Fine-scale Map-making. 11127-11131 - Cheng Zhang
:
Practical Deep Learning Models for QIM-based VoIP Steganalysis. 11132-11136
Brave New Ideas
- Jie An
, Zhengyuan Yang
, Linjie Li
, Jianfeng Wang
, Kevin Lin
, Zicheng Liu
, Lijuan Wang
, Jiebo Luo
:
OpenLEAF: A Novel Benchmark for Open-Domain Interleaved Image-Text Generation. 11137-11145 - Carlos de la Torre-Ortiz
, Tuukka Ruotsalo
:
Perceptual Visual Similarity from EEG: Prediction and Image Generation. 11146-11155 - Yifeng Gao
, Yuhua Sun
, Xingjun Ma
, Zuxuan Wu
, Yu-Gang Jiang
:
ModelLock: Locking Your Model With a Spell. 11156-11165 - Jiyi Zhang
, Han Fang
, Ee-Chien Chang
:
Finding Input Data Domains of Image Classification Models with Hard-Label Black-Box Access. 11166-11174 - Yudong Zhang
, Ruobing Xie
, Jiansheng Chen
, Xingwu Sun
, Yu Wang
:
PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe Questions. 11175-11183 - Taotao Zhou
, Teng Xu
, Dong Zhang
, Yuyang Jiao
, Peijun Xu
, Yaoyu He
, Lan Xu
, Jingyi Yu
:
Sophia-in-Audition: Virtual Production with a Robot Performer. 11184-11193
Open-Source
- Xiaodong Chen
, Kunlang He
, Wu Liu
, Xinchen Liu
, Zheng-Jun Zha
, Tao Mei
:
CLaM: An Open-Source Library for Performance Evaluation of Text-driven Human Motion Generation. 11194-11197 - Haodong Duan
, Junming Yang
, Yuxuan Qiao
, Xinyu Fang
, Lin Chen
, Yuan Liu
, Xiaoyi Dong
, Yuhang Zang
, Pan Zhang
, Jiaqi Wang
, Dahua Lin
, Kai Chen
:
VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models. 11198-11201 - Wei Gao
, Huiming Zheng
, Chenhao Zhang
, Kaiyu Zheng
, Zhuozhen Yu
, Yuan Li
, Hua Ye
, Yongchi Zhang
:
OpenDIC: An Open-Source Library and Performance Evaluation for Deep-learning-based Image Compression. 11202-11205 - Hung-Jui Guo
, Hiranya Garbha Kumar
, Minhas Kamal
, Balakrishnan Prabhakaran
:
Room2XR: Virtual Interactive Collaboration in Real-world Scenes. 11206-11209 - Jack Jansen
, Thomas Röggla
, Silvia Rossi
, Irene Viola
, Pablo César
:
Open-Sourcing VR2Gather: A Collaborative Social VR System for Adaptive Multi-Party Real Time Communication. 11210-11213 - Joni Räsänen
, Heikki Tampio
, Alexandre Mercat
, Jarno Vanne
:
uvgComm: Open Software for Low-Latency Multi-party Video Communication. 11214-11217 - Tomás Soucek
, Jakub Lokoc
:
TransNet V2: An Effective Deep Network Architecture for Fast Shot Transition Detection. 11218-11221 - Jingyuan Tang
, Yangang Cai
, Xuesong Gao
, Songlin Sun
:
Generalized Sampling of Non-Local Textural Clues Multi-View Stereo Framework. 11222-11225 - Yuan Tong
, Mengshun Hu
, Zheng Wang
:
NNVISR: Bring Neural Network Video Interpolation and Super Resolution into Video Processing Framework. 11226-11229 - Marko Viitanen
, Joose Sainio
, Kari Siivonen
, Alexandre Mercat
, Jarno Vanne
:
uvg266: Open-Source VVC Intra Encoder. 11230-11233 - Liang Xie
, Wei Gao
:
LearningPCC: A PyTorch Library for Learning-Based Point Cloud Compression. 11234-11238 - Liang Xie
, Wei Gao
:
PCHMVision: An Open-Source Library of Point Cloud Compression for Human and Machine Vision. 11239-11243 - Feng Ye
, Li Zhang
, Chuanmin Jia
:
Deep Video Compression with Scaled Hierarchical Bi-directional Motion Model. 11244-11247 - Hang Yuan
, Wei Gao
, Wenxu Gao
:
OpenSEP: An Open Source Subjective Experiment Platform. 11248-11251
Technical Demonstrations
- Ansel Blume
, Khanh Duy Nguyen
, Zhenhailong Wang
, Yangyi Chen
, Michal Shlapentokh-Rothman
, Xiaomeng Jin
, Jeonghwan Kim
, Zhen Zhu
, Jiateng Liu
, Kuan-Hao Huang
, Mankeerat Sidhu
, Xuanming Zhang
, Vivian Liu
, Raunak Sinha
, Te-Lin Wu
, Abhay Zala
, Elias Stengel-Eskin
, Da Yin
, Yao Xiao
, Utkarsh Mall
, Zhou Yu
, Kai-Wei Chang
, Camille Cobb
, Karrie Karahalios
, Lydia B. Chilton
, Mohit Bansal
, Nanyun Peng
, Carl Vondrick
, Derek Hoiem
, Heng Ji
:
MIRACLE: An Online, Explainable Multimodal Interactive Concept Learning System. 11252-11254 - Difei Gao
, Siyuan Hu
, Zechen Bai
, Qinghong Lin
, Mike Zheng Shou
:
AssistEditor: Multi-Agent Collaboration for GUI Workflow Automation in Video Creation. 11255-11257 - Feilin Han
, Leping Zhang
, Xin Wang
, Ke-Ao Zhao
, Ying Zhong
, Ziyi Su
, Tongtong Feng
, Wenwu Zhu
:
U2USim - A UAV Telepresence Simulation Platform with Multi-agent Sensing and Dynamic Environment. 11258-11260 - Zhanbin Hu
, Xiaodong He
, Renzhou Pan
, Xianzhou Zeng
, Chenming Fan
, Qiang Zhu
:
MAF-ID: Multi-Agent Framework for Interactive Dubbing through Deep Video Understanding. 11261-11263 - Xin Jin
, Liaoruxing Zhang
, Longteng Jiang
, Dandan Li
:
Unlimited Vision: Professional Composition by Yourself. 11264-11266 - Seongjean Kim, Jungwoo Huh, Yeseung Park, Jungsu Kim, Sanghoon Lee:
DanceMimic: Awaken Your Dancing Instinct through a Real-time Dance Imitation Capture System. 11267-11269 - Ying Ma
, Xinyan Yang
, Aiqi Wang
, Jianglin Zeng
, Shaofei Liu
:
Video Editing Chatbot: Language-Driven Video Compositing System. 11270-11272 - Liangyu Wang
, Yoko Yamakata
, Ryoma Maeda
, Kiyoharu Aizawa
:
Measure and Improve Your Food: Ingredient Estimation Based Nutrition Calculator. 11273-11275 - Mingyuan Wu
, Ruifan Ji
, Haozhen Zheng
, Jiaxi Li
, Beitong Tian
, Bo Chen
, Ruixiao Zhang
, Jacob Chakareski
, Michael Zink
, Ramesh K. Sitaraman
, Klara Nahrstedt
:
Scene Graph Driven Hybrid Interactive VR Teleconferencing. 11276-11278 - Yuning Wu
, Jiatong Shi
, Yifeng Yu
, Yuxun Tang
, Tao Qian
, Yueqian Lin
, Jionghao Han
, Xinyi Bai
, Shinji Watanabe
, Qin Jin
:
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. 11279-11281 - Shengzhou Yi
, Junichiro Matsugami
, Takuya Yamamoto
, Toshihiko Yamasaki
:
Enhancing Speaking and Slide Design Skills with Deep Learning: An Online Presentation Assessment System. 11282-11284
Tutorial Presentations
- Rahel Arnold
, Werner Bailer
, Ralph Gasser
, Björn Þór Jónsson
, Omar Shahbaz Khan
, Heiko Schuldt
, Florian Spiess
, Lucia Vadicamo
:
Multimedia Information Retrieval in XR. 11285-11286 - Niccolo Biondi
, Simone Ricci
, Federico Pernici
, Alberto Del Bimbo
:
Learning Backward Compatible Representations. 11287-11288 - Hao Fei
, Xiangtai Li
, Haotian Liu
, Fuxiao Liu
, Zhuosheng Zhang
, Hanwang Zhang
, Shuicheng Yan
:
From Multimodal LLM to Human-level AI: Modality, Instruction, Reasoning and Beyond. 11289-11291 - Wei Gao
, Ge Li
:
Point Cloud Compression, Enhancement and Applications: From 3D Perception to Large Models. 11292-11293 - Soyeon Caren Han
, Feiqi Cao
, Josiah Poon
, Roberto Navigli
:
Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond. 11294-11295 - Xin Wang
, Yuwei Zhou
, Hong Chen
, Wenwu Zhu
:
Curriculum Learning for Multimedia in the Era of Large Language Models. 11296-11297 - Kaicheng Yu
, Zhuang Shao
, Siyuan Qi
, Dongfang Liu
:
Tutorial: Large Language-Vision Model in Society. 11298-11299 - Sicheng Zhao
, Guoli Jia
, Xiaopeng Hong
, Yanyan Zhao
, Jianhua Tao
:
Label-Efficient Emotion and Sentiment Analysis. 11300-11301
Grand Challenges
- Yicheng Wu
, Yutong Xie
, Xiangde Luo
, Qi Wu
, Jianfei Cai
:
Dataset, Challenge, and Evaluation for Tumor Segmentation Variability. 11302-11303 - Dan Guo
, Xiaobai Li
, Kun Li
, Haoyu Chen
, Jingjing Hu
, Guoying Zhao
, Yi Yang, Meng Wang
:
MAC 2024: Micro-Action Analysis Grand Challenge. 11304-11305 - Jun Yu
, Mohan Jing
, Guopeng Zhao
, Keda Lu
, Yifan Wang
, Feng Zhao
, Jiaqing Sun
, Qingsong Liu
, Jiaen Liang
:
End-to-end Spatio-Temporal Information Aggregation For Micro-Action Detection. 11306-11312 - Qiankun Li
, Xiaolong Huang
, Huabao Chen
, Feng He
, Qiupu Chen
, Zengfu Wang
:
Advancing Micro-Action Recognition with Multi-Auxiliary Heads and Hybrid Loss Optimization. 11313-11319 - Chen Wang
, Xun Mei
, Feng Zhang
:
Instance-aware Fine-grained Micro-action Recognition. 11320-11326 - Fan Gong
, Jialiang Chen
, Jiajun Zhu
, Qijian Bao
, Fei Gao
, Renshu Gu
, Gang Xu
:
Micro-Action Recognition via Hierarchical Fusion and Inference. 11327-11332 - Muhammad Saad Saeed, Shah Nawaz, Marta Moscati, Rohan Kumar Das, Muhammad Salman Tahir, Muhammad Zaigham Zaheer, Muhammad Irzam Liaqat, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf, Markus Schedl:
A Synopsis of FAME 2024 Challenge: Associating Faces with Voices in Multilingual Environments. 11333-11334 - Jiehui Tang
, Xiaofei Wang
, Zhen Xiao
, Jiayi Liu
, Xueliang Liu
, Richang Hong
:
Exploring Robust Face-Voice Matching in Multilingual Environments. 11335-11341 - Ruijie Tao
, Zhan Shi
, Yidi Jiang
, Duc-Tuan Truong
, Eng Siong Chng
, Massimo Alioto
, Haizhou Li
:
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization. 11342-11347 - Wuyang Chen
, Yanjie Sun
, Kele Xu
, Yong Dou
:
Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association. 11348-11354 - Zhixi Cai
, Abhinav Dhall
, Shreya Ghosh
, Munawar Hayat
, Dimitrios Kollias
, Kalin Stefanov
, Usman Tariq
:
1M-Deepfakes Detection Challenge. 11355-11359 - Diego Pérez-Vieites
, Juan José Moreira-Pérez
, Ángel Aragón-Kifute
, Raquel Román-Sarmiento
, Rubén Castro-González
:
Vigo: Audiovisual Fake Detection and Segment Localization. 11360-11364 - Yi Zhang
, Changtao Miao
, Man Luo
, Jianshu Li
, Wenzhong Deng
, Weibin Yao
, Zhe Li
, Bingyu Hu
, Weiwei Feng
, Tao Gong
, Qi Chu
:
MFMS: Learning Modality-Fused and Modality-Specific Features for Deepfake Detection and Localization Tasks. 11365-11369 - Yifan Wang
, Xuecheng Wu
, Jia Zhang
, Mohan Jing
, Keda Lu
, Jun Yu
, Wen Su
, Fang Gao
, Qingsong Liu
, Jianqing Sun
, Jiaen Liang
:
Building Robust Video-Level Deepfake Detection via Audio-Visual Local-Global Interactions. 11370-11376 - Philipp Müller
, Michal Balazia
, Tobias Baur
, Michael Dietz
, Alexander Heimerl
, Anna Penzkofer
, Dominik Schiller
, François Brémond
, Jan Alexandersson
, Elisabeth André
, Andreas Bulling
:
MultiMediate'24: Multi-Domain Engagement Estimation. 11377-11382 - Deepak Kumar
, Surbhi Madan
, Pradeep Singh
, Abhinav Dhall
, Balasubramanian Raman
:
Towards Engagement Prediction: A Cross-Modality Dual-Pipeline Approach using Visual and Audio Features. 11383-11389 - Fuyan Ma
, Yiran He
, Bin Sun
, Shutao Li
:
Less is More: Adaptive Feature Selection and Fusion for Eye Contact Detection. 11390-11396 - Jia Li
, Yangchen Yu
, Yin Chen
, Yu Zhang
, Peng Jia
, Yunbo Xu
, Ziqiang Li
, Meng Wang
, Richang Hong
:
DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation. 11397-11403 - Yu Zhao
, Hao Fei
, Bobo Li
, Meishan Zhang
, Min Zhang
:
The ACM Multimedia 2024 Viual Spatial Description Grand Challenge. 11404-11406 - Jun Yu
, Yunxiang Zhang
, Zerui Zhang
, Zhao Yang
, Gongpeng Zhao
, Fengzhao Sun
, Fanrui Zhang
, Qingsong Liu
, Jianqing Sun
, Jiaen Liang
, Yaohui Zhang
:
RAG-Guided Large Language Models for Visual Spatial Description with Adaptive Hallucination Corrector. 11407-11413 - Jiabao Wang, Fang Gao, Jingfeng Tang
, Shaodong Li
, Hanbo Zheng
, Shengheng Ma
, Feng Shuang
, Jun Yu
:
A Method for Visual Spatial Description Based on Large Language Model Fine-tuning. 11414-11419 - Yizhang Jin
, Jian Li
, Jiangning Zhang
, Jianlong Hu
, Zhenye Gan
, Xin Tan
, Yong Liu
, Yabiao Wang
, Chengjie Wang
, Lizhuang Ma
:
LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description. 11420-11425 - Zhiqi Ge
, Juncheng Li
, Qifan Yu
, Wei Zhou
, Siliang Tang
, Yueting Zhuang
:
DEMON24: ACM MM24 Demonstrative Instruction Following Challenge. 11426-11428 - Xian Fu
:
Enhancing Multimodal Large Language Models on Demonstrative Multi-Image Instructions. 11429-11434 - Jingyu Wei
, Yi Su
, Kele Xu
, Lingbin Zeng
, Bo Liu
, Huaimin Wang
:
Demonstrative Instruction Following in Multimodal LLMs via Integrating Low-Rank Adaptation with Ensemble Learning. 11435-11441 - Bo Wu
, Peiye Liu
, Qiushi Huang
, Zhaoyang Zeng
, Jia Wang
, Bei Liu
, Jiebo Luo
, Wen-Huang Cheng
:
SMP Challenge Summary: Social Media Prediction Challenge. 11442-11444 - Yu-Shi Lin
, Anthony J. T. Lee
:
MMF: Winning Solution to Social Media Popularity Prediction Challenge 2024. 11445-11449 - Wenhao Hu
, Weilong Chen
, Weimin Yuan
, Yan Wang
, Shimin Cai
, Yanru Zhang
:
Dual-Stream Pre-Training Transformer to Enhance Multimodal Learning for Social Media Prediction. 11450-11456 - Mingsheng Tu
, Tianjiao Wan
, Qisheng Xu
, Xinhao Jiang
, Kele Xu
, Cheng Yang
:
Higher-Order Vision-Language Alignment for Social Media Prediction. 11457-11463 - Chih-Chung Hsu
, Chia-Ming Lee
, Yu-Fan Lin
, Yi-Shiuan Chou
, Chih-Yu Jian
, Chi-Han Tsai
:
Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity Prediction. 11464-11469 - Shien Song
, Jie Yang
, Jin Chen
, Han Qi
, Yifei Xue
, Yizhen Lao
, Yi Yu
:
ACM Multimedia 2024 Grand Challenge Report for Artificial Intelligence Generated Image Detection. 11470-11471 - Huihui Fu
:
Optimizing AIGC Image Detection: Strategies in Data Augmentation and Model Architecture. 11472-11474 - ShiHang Li
, Haishan Wu
, Biao Wang
:
A Solution to ACMMM 2024 on Artificial Intelligence Generated Image Detection. 11475-11477 - Jin Chen
:
Optimizing the Baseline Approach for the 2024 ACM Multimedia Grand Challenge in Artificial Intelligence Generated Image Detection. 11478-11481 - John See
, Jingting Li
, Adrian K. Davison
, Gen-Bing Liong
, Moi Hoon Yap
, Wen-Huang Cheng
, Xiaobai Li
, Xiaopeng Hong
, Su-Jing Wang
:
MEGC2024: ACM Multimedia 2024 Facial Micro-Expression Grand Challenge. 11482-11483 - Jun Yu
, Gongpeng Zhao
, Yaohui Zhang
, Peng He
, Zerui Zhang
, Zhao Yang
, Qingsong Liu
, Jianqing Sun
, Jiaen Liang
:
Temporal-Informative Adapters in VideoMAE V2 and Multi-Scale Feature Fusion for Micro-Expression Spotting-then-Recognize. 11484-11489 - Jun Yu
, Yaohui Zhang
, Gongpeng Zhao
, Peng He
, Zerui Zhang
, Zhongpeng Cai
, Qingsong Liu
, Jianqing Sun
, Jiaen Liang
:
Micro-Expression Spotting Based on Optical Flow Feature with Boundary Calibration. 11490-11496 - Zhengye Zhang
, Sirui Zhao
, Xinglong Mao
, Shifeng Liu
, Hao Wang
, Tong Xu
, Enhong Chen
:
A Multi-scale Feature Learning Network with Optical Flow Correction for Micro- and Macro-expression Spotting. 11497-11502 - Yuhong He
, Wenchao Liu
, Guangyu Wang
, Lin Ma
, Haifeng Li
:
Enhancing Micro-Expression Analysis Performance by Effectively Addressing Data Imbalance. 11503-11507

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.