Yong Man Ro
45 papers · 2018–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π Conference Polyglot (9) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π Academic Marathon (7)
π
Academic Marathon
(7)
π
Cross-Pollinator
(4)
πΊοΈ
Taxonomy Completionist
(85)
π€
Dynamic Duo
(14)
π¬
Deep Specialist
(18)
π
Keyword Champion
(3)
π§¬
Topic Evolution
π
Conference Pioneer
ποΈ
Keyword Collector
(223)
π
Trend Setter
β‘
Prolific Year
(10)
π₯
Unstoppable
(8)
β
The Questioner
(2)
π
Century Club
(44)
Conferences
AAAI (11)
CVPR (10)
ECCV (6)
ICCV (6)
NIPS (4)
EMNLP (3)
ACL (2)
INTERSPEECH (2)
ICML (1)
Top co-authors
Research topics
Keywords
multimodal learning
(7)
lip reading
(7)
large language model
(5)
audio-visual speech recognition
(4)
memory network
(4)
visual speech recognition
(3)
model compression
(3)
vision language model
(3)
pedestrian detection
(3)
representation learning
(3)
visual instruction tuning
(3)
adversarial robustness
(3)
object detection
(3)
causal inference
(3)
diffusion model
(2)
large multimodal model
(2)
visual context
(2)
audio-visual learning
(2)
multispectral imaging
(2)
speech synthesis
(2)
Papers
Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier
AAAI 2026
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
ACL 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
AAAI 2025
Long-Form Speech Generation with Spoken Language Models
ICML 2025
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
CVPR 2025
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis
CVPR 2025
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
ICCV 2025
What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models
EMNLP 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
NIPS 2024
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
NIPS 2024
Improving Open Set Recognition via Visual Prompts Distilled from Common-Sense Knowledge
AAAI 2024
CoLLaVO: Crayon Large Language and Vision mOdel
ACL 2024
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
CVPR 2024
Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection
CVPR 2024
MoAI: Mixture of All Intelligence for Large Language and Vision Models
ECCV 2024
TroL: Traversal of Layers for Large Language and Vision Models
EMNLP 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
EMNLP 2024
Intelligible Lip-to-Speech Synthesis with Speech Units
INTERSPEECH 2023
Watch or Listen: Robust Audio-Visual Speech Recognition With Visual Corruption Modeling and Reliability Scoring
CVPR 2023
Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression
CVPR 2023
Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning
ICCV 2023
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
ICCV 2023
Multispectral Invisible Coating: Laminated Visible-Thermal Physical Attack against Multispectral Object Detectors Using Transparent Low-E Films
AAAI 2023
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
AAAI 2023
DiffV2S: Diffusion-Based Video-to-Speech Synthesis with Vision-Guided Speaker Embedding
ICCV 2023
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
AAAI 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
INTERSPEECH 2022
Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory
CVPR 2022
Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network
CVPR 2022
Audio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment
ECCV 2022
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
AAAI 2022
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
ECCV 2022
Speaker-Adaptive Lip Reading with User-Dependent Padding
ECCV 2022
Towards Versatile Pedestrian Detector with Multisensory-Matching and Multispectral Recalling Memory
AAAI 2022
Video Prediction Recalling Long-Term Motion Context via Memory Alignment Learning
CVPR 2021
Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck
NIPS 2021
Towards a Better Understanding of VR Sickness: Physical Symptom Prediction for VR Contents
AAAI 2021
Visual Comfort Aware-Reinforcement Learning for Depth Adjustment of Stereoscopic 3D Images
AAAI 2021
Lip to Speech Synthesis with Visual Context Attentional GAN
NIPS 2021
Multi-Modality Associative Bridging Through Memory: Speech Sound Recollected From Face Video
ICCV 2021
Robust Small-Scale Pedestrian Detection With Cued Recall via Memory Learning
ICCV 2021
SACA Net: Cybersickness Assessment of Individual Viewers for VR Content via Graph-based Symptom Relation Embedding
ECCV 2020
Structure Boundary Preserving Segmentation for Medical Image With Ambiguous Boundary
CVPR 2020
Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition
AAAI 2019
Facial Dynamics Interpreter Network: What are the Important Relations between Local Dynamics for Facial Trait Estimation?
ECCV 2018