Zhenfang Chen
29 papers · 2019–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+9 more ↓ Show less ↑
π Interdisciplinary Bridge π Academic Marathon (6) π Renaissance Researcher (5) π Conference Polyglot (10) πΊοΈ Taxonomy Completionist (39)
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π
Conference Polyglot
(10)
π
Grand Slam
π€
Dynamic Duo
(22)
β‘
Prolific Year
(9)
π
Century Club
(29)
ποΈ
Keyword Collector
(96)
π₯
Unstoppable
(7)
Conferences
CVPR (7)
ICLR (7)
NIPS (5)
ICML (3)
ECCV (2)
AAAI (1)
ACL (1)
CORL (1)
EMNLP (1)
ICCV (1)
Top co-authors
Keywords
large language model
(3)
visual reasoning
(3)
multimodal learning
(2)
multi-modal learning
(2)
weakly-supervised learning
(2)
question answering
(2)
vision-language model
(2)
weakly supervised learning
(2)
visual question answering
(1)
video prediction
(1)
multi-task learning
(1)
image captioning
(1)
depth estimation
(1)
trajectory prediction
(1)
in-context learning
(1)
object detection
(1)
scene reconstruction
(1)
video understanding
(1)
referring expression
(1)
visual localization
(1)
Papers
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
ICML 2025
Scene-agnostic Pose Regression for Visual Localization
CVPR 2025
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
ICLR 2025
Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning
ICML 2025
Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning
AAAI 2024
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
ICLR 2024
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
CVPR 2024
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
ICML 2024
GENOME: Generative Neuro-Symbolic Visual Reasoning by Growing and Reusing Modules
ICLR 2024
FlexAttention for Efficient High-Resolution Vision-Language Models
ECCV 2024
SALMON: Self-Alignment with Instructable Reward Models
ICLR 2024
3D-LLM: Injecting the 3D World into Large Language Models
NIPS 2023
Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
CVPR 2023
Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention
CVPR 2023
3D Concept Learning and Reasoning From Multi-View Images
CVPR 2023
Sparse Universal Transformer
EMNLP 2023
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
ICCV 2023
Planning with Large Language Models for Code Generation
ICLR 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
NIPS 2023
Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties
NIPS 2023
ComPhy: Compositional Physical Reasoning of Objects and Events from Videos
ICLR 2022
S$^3$-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint
NIPS 2022
PS-NeRF: Neural Inverse Rendering for Multi-View Photometric Stereo
ECCV 2022
Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following
CORL 2022
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
NIPS 2021
The Blessings of Unlabeled Background in Untrimmed Videos
CVPR 2021
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
ICLR 2021
Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension
CVPR 2020
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
ACL 2019