Ruohan Gao

32 papers · 2017–2025 · 7 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (8) 🏃 Academic Marathon (8) 🌍 Conference Polyglot (7) 🗺️ Taxonomy Completionist (36)

🏃 Academic Marathon (8) 🗺️ Taxonomy Completionist (36) 🌈 Renaissance Researcher (8) 🔬 Deep Specialist (13) 🤝 Dynamic Duo (11) 🧬 Topic Evolution 🏆 Keyword Champion (2) ⚡ Prolific Year (5) 💎 Century Club (32) 🗃️ Keyword Collector (118) 🚀 Conference Pioneer 🔥 Unstoppable (9)

Conferences

CVPR (12) ICCV (7) ECCV (5) CORL (4) ICLR (2) AAAI (1) NIPS (1)

Top co-authors

Kristen Grauman (11) Jiajun Wu (10) Li Fei-fei (5) Samuel Clarke (5) Vamsi Krishna Ithapu (4) Sanjoy Chowdhury (4) Sayan Nag (4) Dinesh Manocha (4) Ishwarya Ananthabhotla (4) Changan Chen (3)

Keywords

multimodal learning (10) audio-visual learning (5) differentiable rendering (3) object recognition (3) room acoustics (3) action recognition (3) room impulse response (3) 3d reconstruction (2) multisensory learning (2) neural network (2) impact sound (2) tactile sensing (2) multisensory perception (2) video classification (1) source separation (1) robotic manipulation (1) speech separation (1) sim-to-real transfer (1) human detection (1) cross-modal learning (1)

Papers

Learning to Highlight Audio by Watching Movies CVPR 2025 AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs ICCV 2025 GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning ICCV 2025 EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception ICCV 2025 Hearing Anywhere in Any Environment CVPR 2025 AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs ICCV 2025 Differentiable Room Acoustic Rendering with Multi-View Vision Priors ICCV 2025 Multisensory Machine Intelligence AAAI 2025 The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective CVPR 2024 Hearing Anything Anywhere CVPR 2024 Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time ECCV 2024 Spherical World-Locking for Audio-Visual Localization in Egocentric Videos ECCV 2024 An Extensible Multi-modal Multi-task Object Dataset with Materials ICLR 2023 NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities CORL 2023 RealImpact: A Dataset of Impact Sound Fields for Real Objects CVPR 2023 The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects CVPR 2023 SoundCam: A Dataset for Finding Humans Using Room Acoustics NIPS 2023 ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer CVPR 2022 See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation CORL 2022 Visual Acoustic Matching CVPR 2022 DiffImpact: Differentiable Rendering and Identification of Impact Sounds CORL 2021 ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations CORL 2021 VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency CVPR 2021 Learning to Set Waypoints for Audio-Visual Navigation ICLR 2021 Listen to Look: Action Recognition by Previewing Audio CVPR 2020 VisualEchoes: Spatial Image Representation Learning through Echolocation ECCV 2020 Co-Separating Sounds of Visual Objects ICCV 2019 2.5D Visual Sound CVPR 2019 ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids ECCV 2018 Im2Flow: Motion Hallucination From Static Images for Action Recognition CVPR 2018 Learning to Separate Object Sounds by Watching Unlabeled Video ECCV 2018 On-Demand Learning for Deep Image Restoration ICCV 2017