Shih-fu Chang

110 papers · 2012–2025 · 14 conferences · across top CS/AI conferences

Achievements

+19 more ↓

🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (14)

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (14) 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (4) 🏠 Conference Loyalist (35) 🤝 Dynamic Duo (27) 🏆 Grand Slam 👥 Mega-Team (34) 🌱 Topic Pioneer 🔬 Deep Specialist (24) 🧬 Topic Evolution 🏆 Keyword Champion (3) ❓ The Questioner (3) 💎 Century Club (110) 📈 Trend Setter 🚀 Conference Pioneer 🔥 Unstoppable (14) 🗃️ Keyword Collector (425) ⚡ Prolific Year (5)

Conferences

CVPR (35) EMNLP (14) ACL (11) ECCV (10) ICCV (8) AAAI (7) NIPS (7) NAACL (6) ICLR (4) ICML (2) IJCNLP (2) JMLR (2) IJCAI (1) WACV (1)

Top co-authors

Heng Ji (27) Xudong Lin (24) Manling Li (14) Haoxuan You (12) Alireza Zareian (12) Guangxing Han (11) Jiawei Ma (11) Long Chen (11) Zhecan Wang (11) Shiyuan Huang (10)

Keywords

multimodal learning (21) video understanding (10) zero-shot learning (9) few-shot learning (9) self-supervised learning (8) weakly supervised learning (8) object detection (7) representation learning (6) event extraction (6) contrastive learning (6) knowledge graph (6) visual grounding (5) vision-language model (5) video captioning (5) visual question answering (5) relation extraction (4) metric learning (4) image retrieval (4) action recognition (4) image captioning (4)

Papers

M2-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data ACL 2025 PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction NAACL 2025 What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions CVPR 2024 RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos ECCV 2024 MoDE: CLIP Data Experts via Clustering CVPR 2024 JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images NIPS 2024 Personalized Video Comment Generation EMNLP 2024 Ferret: Refer and Ground Anything Anywhere at Any Granularity ICLR 2024 SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos ICLR 2024 Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities AAAI 2024 Training-free Deep Concept Injection Enables Language Models for Video Question Answering EMNLP 2024 VIEWS: Entity-Aware News Video Captioning EMNLP 2024 Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning ACL 2024 Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs ACL 2023 Learning from Children: Improving Image-Caption Pretraining via Curriculum ACL 2023 Towards Fast Adaptation of Pretrained Contrastive Models for Multi-Channel Video-Language Retrieval CVPR 2023 IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models EMNLP 2023 DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection CVPR 2023 Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond EMNLP 2023 TempCLR: Temporal Alignment Representation with Contrastive Learning ICLR 2023 PreViTS: Contrastive Pretraining With Video Tracking Supervision WACV 2023 Supervised Masked Knowledge Distillation for Few-Shot Transformers CVPR 2023 Non-Sequential Graph Script Induction via Multimedia Grounding ACL 2023 UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding ACL 2023 Video Event Extraction via Tracking Visual States of Arguments AAAI 2023 Fine-Grained Visual Entailment ECCV 2022 Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners NIPS 2022 Meta Faster R-CNN: Towards Accurate Few-Shot Object Detection with Attentive Feature Alignment AAAI 2022 SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning AAAI 2022 MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding AAAI 2022 Bridging the Gap between Recognition-level Pre-training and Commonsensical Vision-language Tasks ACL 2022 Task-Adaptive Negative Envision for Few-Shot Open-Set Recognition CVPR 2022 Learning To Recognize Procedural Activities With Distant Supervision CVPR 2022 Few-Shot Object Detection With Fully Cross-Transformer CVPR 2022 CLIP-Event: Connecting Text and Images With Event Structures CVPR 2022 Few-Shot End-to-End Object Detection via Constantly Concentrated Encoding across Heads ECCV 2022 Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training ECCV 2022 Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense EMNLP 2022 Weakly-Supervised Temporal Article Grounding EMNLP 2022 Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding EMNLP 2022 RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios NAACL 2022 Multimodal Clustering Networks for Self-Supervised Learning From Unlabeled Videos ICCV 2021 Partner-Assisted Learning for Few-Shot Image Classification ICCV 2021 Query Adaptive Few-Shot Object Detection With Heterogeneous Graph Convolutional Networks ICCV 2021 Coreference by Appearance: Visually Grounded Event Coreference Resolution EMNLP 2021 RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System NAACL 2021 Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding AAAI 2021 InfoSurgeon: Cross-Media Fine-grained Information Consistency Checking for Fake News Detection ACL 2021 Open-Vocabulary Object Detection Using Captions CVPR 2021 Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs CVPR 2021 Co-Grounding Networks With Semantic Attention for Referring Expression Comprehension in Videos CVPR 2021 InfoSurgeon: Cross-Media Fine-grained Information Consistency Checking for Fake News Detection IJCNLP 2021 COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation NAACL 2021 Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions NAACL 2021 Uncertainty-Aware Few-Shot Image Classification IJCAI 2021 VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text NIPS 2021 Joint Multimedia Event Extraction from Video and Article EMNLP 2021 Weakly Supervised Visual Semantic Parsing CVPR 2020 GAIA: A Fine-grained Multimedia Knowledge Extraction System ACL 2020 General Partial Label Learning via Dual Bipartite Graph Autoencoder AAAI 2020 Context-Gated Convolution ECCV 2020 Bridging Knowledge Graphs to Generate Scene Graphs ECCV 2020 Learning Visual Commonsense for Robust Scene Graph Generation ECCV 2020 Learning to Learn Words from Visual Scenes ECCV 2020 Cross-media Structured Common Space for Multimedia Event Extraction ACL 2020 Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding CVPR 2019 DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition CVPR 2019 Multi-Granularity Generator for Temporal Action Proposal CVPR 2019 Unsupervised Embedding Learning via Invariant and Spreading Instance Feature CVPR 2019 Cross-lingual Structure Transfer for Relation and Event Extraction IJCNLP 2019 Counterfactual Critic Multi-Agent Training for Scene Graph Generation ICCV 2019 Cross-lingual Structure Transfer for Relation and Event Extraction EMNLP 2019 Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks CVPR 2018 Grounding Referring Expressions in Images by Variational Context CVPR 2018 On Binary Embedding using Circulant Matrices JMLR 2018 Online Detection of Action Start in Untrimmed, Streaming Videos ECCV 2018 AutoLoc: Weakly-supervised Temporal Action Localization in Untrimmed Videos ECCV 2018 Incorporating Background Knowledge into Video Description Generation EMNLP 2018 Entity-aware Image Caption Generation EMNLP 2018 Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks ICLR 2018 Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks NIPS 2018 CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos CVPR 2017 Visual Translation Embedding Network for Visual Relation Detection CVPR 2017 PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN ICCV 2017 Learning Spread-Out Local Feature Descriptors ICCV 2017 Learning Discriminative and Transformation Covariant Local Feature Detectors CVPR 2017 A Multi-media Approach to Cross-lingual Entity Knowledge Transfer ACL 2016 Cross-media Event Extraction and Recommendation NAACL 2016 Temporal Action Localization in Untrimmed Videos via Multi-Stage CNNs CVPR 2016 Interactive Segmentation on RGBD Images via Cue Selection CVPR 2016 Cross-document Event Coreference Resolution based on Cross-media Features EMNLP 2015 Attributes and Categories for Generic Instance Search From One Example CVPR 2015 New Insights Into Laplacian Similarity Search CVPR 2015 Discrete Graph Hashing NIPS 2014 Circulant Binary Embedding ICML 2014 Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification CVPR 2014 Locally Linear Hashing for Extracting Non-Linear Manifolds CVPR 2014 Video Event Detection by Inferring Temporal Instance Labels CVPR 2014 Robust Object Co-detection CVPR 2013 Sample-Specific Late Fusion for Visual Category Recognition CVPR 2013 Designing Category-Level Attributes for Discriminative Visual Recognition CVPR 2013 \proptoSVM for Learning with Label Proportions ICML 2013 Semi-Supervised Learning Using Greedy Max-Cut JMLR 2013 A Bayesian Approach to Multimodal Visual Dictionary Learning CVPR 2013 Analyzing the Harmonic Structure in Graph-Based Learning NIPS 2013 Distributed Low-Rank Subspace Segmentation ICCV 2013 Large-Scale Video Hashing via Structure Learning ICCV 2013 Hash Bit Selection: A Unified Solution for Selection Problems in Hashing CVPR 2013 Label Propagation from ImageNet to 3D Point Clouds CVPR 2013 Learning with Partially Absorbing Random Walks NIPS 2012