ICASSP-2024-Papers Application Previous Collections Multimodal Processing of Language Title Repo Paper Video Cooking-Clip: Context-Aware Language-Image Pretraining for Zero-Shot Recipe Generation ➖ ➖ Exploring Object-Centered External Knowledge for Fine-Grained Video Paragraph Captioning ➖ ➖ Relational Graph-Bridged Image-Text Interaction: A Novel Method for Multi-Modal Relation Extraction ➖ ➖ DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever ➖ ➖ Vector Quantization Knowledge Transfer for End-to-End Text Image Machine Translation ➖ ➖ EmoRED: A Dataset for Relation Extraction in Texts with Emoticons ➖ ➖ MSG-BART: Multi-granularity Scene Graph-Enhanced Encoder-Decoder Language Model for Video-grounded Dialogue Generation ➖ ➖ CausalME: Balancing bi-modalities in Visual Question Answering ➖ ➖ MHPS: Multimodality-Guided Hierarchical Policy Search for Knowledge Graph Reasoning ➖ ➖ Empowering Vision-Language Models for Reasoning Ability through Large Language Models ➖ ➖ PVCG: Prompt-Based Vision-Aware Classification and Generation for Multi-Modal Rumor Detection ➖ ➖ LabCLIP: Label-Enhanced Clip for Improving Zero-Shot Text Classification ➖ Context-Aware Dual Attention Network for Multimodal Sarcasm Detection ➖ ➖