Zhenfang Chen

29 papers · 2019–2025 · 10 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (6) 🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (10) 🗺️ Taxonomy Completionist (39)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌍 Conference Polyglot (10) 🏆 Grand Slam 🤝 Dynamic Duo (22) ⚡ Prolific Year (9) 💎 Century Club (29) 🗃️ Keyword Collector (96) 🔥 Unstoppable (7)

Conferences

CVPR (7) ICLR (7) NIPS (5) ICML (3) ECCV (2) AAAI (1) ACL (1) CORL (1) EMNLP (1) ICCV (1)

Top co-authors

Chuang Gan (22) Yikang Shen (10) Mingyu Ding (8) Joshua B. Tenenbaum (7) Yining Hong (6) Zhiqing Sun (3) Peihao Chen (3) Qinhong Zhou (3) Ping Luo (3) Kwan-Yee K. Wong (3)

Keywords

large language model (3) visual reasoning (3) multimodal learning (2) multi-modal learning (2) weakly-supervised learning (2) question answering (2) vision-language model (2) weakly supervised learning (2) visual question answering (1) video prediction (1) multi-task learning (1) image captioning (1) depth estimation (1) trajectory prediction (1) in-context learning (1) object detection (1) scene reconstruction (1) video understanding (1) referring expression (1) visual localization (1)

Papers

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search ICML 2025 Scene-agnostic Pose Regression for Visual Localization CVPR 2025 Scaling Autonomous Agents via Automatic Reward Modeling And Planning ICLR 2025 Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning ICML 2025 Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning AAAI 2024 CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding ICLR 2024 SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge CVPR 2024 ContPhy: Continuum Physical Concept Learning and Reasoning from Videos ICML 2024 GENOME: Generative Neuro-Symbolic Visual Reasoning by Growing and Reusing Modules ICLR 2024 FlexAttention for Efficient High-Resolution Vision-Language Models ECCV 2024 SALMON: Self-Alignment with Instructable Reward Models ICLR 2024 3D-LLM: Injecting the 3D World into Large Language Models NIPS 2023 Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners CVPR 2023 Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention CVPR 2023 3D Concept Learning and Reasoning From Multi-View Images CVPR 2023 Sparse Universal Transformer EMNLP 2023 TextPSG: Panoptic Scene Graph Generation from Textual Descriptions ICCV 2023 Planning with Large Language Models for Code Generation ICLR 2023 Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision NIPS 2023 Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties NIPS 2023 ComPhy: Compositional Physical Reasoning of Objects and Events from Videos ICLR 2022 S$^3$-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint NIPS 2022 PS-NeRF: Neural Inverse Rendering for Multi-View Photometric Stereo ECCV 2022 Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following CORL 2022 Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language NIPS 2021 The Blessings of Unlabeled Background in Untrimmed Videos CVPR 2021 Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning ICLR 2021 Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension CVPR 2020 Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video ACL 2019