Shih-fu Chang
110 papers · 2012–2025 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (14)
🌉
Interdisciplinary Bridge
🌍
Conference Polyglot
(14)
🧭
Keyword Pioneer
🌟
Keyword Trendsetter Combo
(4)
🏠
Conference Loyalist
(35)
🤝
Dynamic Duo
(27)
🏆
Grand Slam
👥
Mega-Team
(34)
🌱
Topic Pioneer
🔬
Deep Specialist
(24)
🧬
Topic Evolution
🏆
Keyword Champion
(3)
❓
The Questioner
(3)
💎
Century Club
(110)
📈
Trend Setter
🚀
Conference Pioneer
🔥
Unstoppable
(14)
🗃️
Keyword Collector
(425)
⚡
Prolific Year
(5)
Conferences
CVPR (35)
EMNLP (14)
ACL (11)
ECCV (10)
ICCV (8)
AAAI (7)
NIPS (7)
NAACL (6)
ICLR (4)
ICML (2)
IJCNLP (2)
JMLR (2)
IJCAI (1)
WACV (1)
Top co-authors
Keywords
multimodal learning
(21)
video understanding
(10)
zero-shot learning
(9)
few-shot learning
(9)
self-supervised learning
(8)
weakly supervised learning
(8)
object detection
(7)
representation learning
(6)
event extraction
(6)
contrastive learning
(6)
knowledge graph
(6)
visual grounding
(5)
vision-language model
(5)
video captioning
(5)
visual question answering
(5)
relation extraction
(4)
metric learning
(4)
image retrieval
(4)
action recognition
(4)
image captioning
(4)
Papers
M2-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data
ACL 2025
PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
NAACL 2025
What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
CVPR 2024
RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos
ECCV 2024
MoDE: CLIP Data Experts via Clustering
CVPR 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
NIPS 2024
Personalized Video Comment Generation
EMNLP 2024
Ferret: Refer and Ground Anything Anywhere at Any Granularity
ICLR 2024
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
ICLR 2024
Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities
AAAI 2024
Training-free Deep Concept Injection Enables Language Models for Video Question Answering
EMNLP 2024
VIEWS: Entity-Aware News Video Captioning
EMNLP 2024
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
ACL 2024
Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs
ACL 2023
Learning from Children: Improving Image-Caption Pretraining via Curriculum
ACL 2023
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-Channel Video-Language Retrieval
CVPR 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
EMNLP 2023
DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
CVPR 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
EMNLP 2023
TempCLR: Temporal Alignment Representation with Contrastive Learning
ICLR 2023
PreViTS: Contrastive Pretraining With Video Tracking Supervision
WACV 2023
Supervised Masked Knowledge Distillation for Few-Shot Transformers
CVPR 2023
Non-Sequential Graph Script Induction via Multimedia Grounding
ACL 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
ACL 2023
Video Event Extraction via Tracking Visual States of Arguments
AAAI 2023
Fine-Grained Visual Entailment
ECCV 2022
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
NIPS 2022
Meta Faster R-CNN: Towards Accurate Few-Shot Object Detection with Attentive Feature Alignment
AAAI 2022
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
AAAI 2022
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
AAAI 2022
Bridging the Gap between Recognition-level Pre-training and Commonsensical Vision-language Tasks
ACL 2022
Task-Adaptive Negative Envision for Few-Shot Open-Set Recognition
CVPR 2022
Learning To Recognize Procedural Activities With Distant Supervision
CVPR 2022
Few-Shot Object Detection With Fully Cross-Transformer
CVPR 2022
CLIP-Event: Connecting Text and Images With Event Structures
CVPR 2022
Few-Shot End-to-End Object Detection via Constantly Concentrated Encoding across Heads
ECCV 2022
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
ECCV 2022
Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
EMNLP 2022
Weakly-Supervised Temporal Article Grounding
EMNLP 2022
Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
EMNLP 2022
RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios
NAACL 2022
Multimodal Clustering Networks for Self-Supervised Learning From Unlabeled Videos
ICCV 2021
Partner-Assisted Learning for Few-Shot Image Classification
ICCV 2021
Query Adaptive Few-Shot Object Detection With Heterogeneous Graph Convolutional Networks
ICCV 2021
Coreference by Appearance: Visually Grounded Event Coreference Resolution
EMNLP 2021
RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System
NAACL 2021
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
AAAI 2021
InfoSurgeon: Cross-Media Fine-grained Information Consistency Checking for Fake News Detection
ACL 2021
Open-Vocabulary Object Detection Using Captions
CVPR 2021
Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
CVPR 2021
Co-Grounding Networks With Semantic Attention for Referring Expression Comprehension in Videos
CVPR 2021
InfoSurgeon: Cross-Media Fine-grained Information Consistency Checking for Fake News Detection
IJCNLP 2021
COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation
NAACL 2021
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
NAACL 2021
Uncertainty-Aware Few-Shot Image Classification
IJCAI 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
NIPS 2021
Joint Multimedia Event Extraction from Video and Article
EMNLP 2021
Weakly Supervised Visual Semantic Parsing
CVPR 2020
GAIA: A Fine-grained Multimedia Knowledge Extraction System
ACL 2020
General Partial Label Learning via Dual Bipartite Graph Autoencoder
AAAI 2020
Context-Gated Convolution
ECCV 2020
Bridging Knowledge Graphs to Generate Scene Graphs
ECCV 2020
Learning Visual Commonsense for Robust Scene Graph Generation
ECCV 2020
Learning to Learn Words from Visual Scenes
ECCV 2020
Cross-media Structured Common Space for Multimedia Event Extraction
ACL 2020
Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding
CVPR 2019
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition
CVPR 2019
Multi-Granularity Generator for Temporal Action Proposal
CVPR 2019
Unsupervised Embedding Learning via Invariant and Spreading Instance Feature
CVPR 2019
Cross-lingual Structure Transfer for Relation and Event Extraction
IJCNLP 2019
Counterfactual Critic Multi-Agent Training for Scene Graph Generation
ICCV 2019
Cross-lingual Structure Transfer for Relation and Event Extraction
EMNLP 2019
Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks
CVPR 2018
Grounding Referring Expressions in Images by Variational Context
CVPR 2018
On Binary Embedding using Circulant Matrices
JMLR 2018
Online Detection of Action Start in Untrimmed, Streaming Videos
ECCV 2018
AutoLoc: Weakly-supervised Temporal Action Localization in Untrimmed Videos
ECCV 2018
Incorporating Background Knowledge into Video Description Generation
EMNLP 2018
Entity-aware Image Caption Generation
EMNLP 2018
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
ICLR 2018
Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks
NIPS 2018
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
CVPR 2017
Visual Translation Embedding Network for Visual Relation Detection
CVPR 2017
PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN
ICCV 2017
Learning Spread-Out Local Feature Descriptors
ICCV 2017
Learning Discriminative and Transformation Covariant Local Feature Detectors
CVPR 2017
A Multi-media Approach to Cross-lingual Entity Knowledge Transfer
ACL 2016
Cross-media Event Extraction and Recommendation
NAACL 2016
Temporal Action Localization in Untrimmed Videos via Multi-Stage CNNs
CVPR 2016
Interactive Segmentation on RGBD Images via Cue Selection
CVPR 2016
Cross-document Event Coreference Resolution based on Cross-media Features
EMNLP 2015
Attributes and Categories for Generic Instance Search From One Example
CVPR 2015
New Insights Into Laplacian Similarity Search
CVPR 2015
Discrete Graph Hashing
NIPS 2014
Circulant Binary Embedding
ICML 2014
Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification
CVPR 2014
Locally Linear Hashing for Extracting Non-Linear Manifolds
CVPR 2014
Video Event Detection by Inferring Temporal Instance Labels
CVPR 2014
Robust Object Co-detection
CVPR 2013
Sample-Specific Late Fusion for Visual Category Recognition
CVPR 2013
Designing Category-Level Attributes for Discriminative Visual Recognition
CVPR 2013
\proptoSVM for Learning with Label Proportions
ICML 2013
Semi-Supervised Learning Using Greedy Max-Cut
JMLR 2013
A Bayesian Approach to Multimodal Visual Dictionary Learning
CVPR 2013
Analyzing the Harmonic Structure in Graph-Based Learning
NIPS 2013
Distributed Low-Rank Subspace Segmentation
ICCV 2013
Large-Scale Video Hashing via Structure Learning
ICCV 2013
Hash Bit Selection: A Unified Solution for Selection Problems in Hashing
CVPR 2013
Label Propagation from ImageNet to 3D Point Clouds
CVPR 2013
Learning with Partially Absorbing Random Walks
NIPS 2012