Yifan Peng

48 papers · 2013–2026 · 10 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🗺️ Taxonomy Completionist (16) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (10)

🐝 Cross-Pollinator (10) 🗺️ Taxonomy Completionist (16) 🧭 Keyword Pioneer 🤝 Dynamic Duo (23) 🧬 Topic Evolution 👥 Mega-Team (21) 🏆 Keyword Champion (3) 🗃️ Keyword Collector (211) 🚀 Conference Pioneer ⚡ Prolific Year (10) 📈 Trend Setter 💎 Century Club (45) 🔥 Unstoppable (9)

Conferences

ACL (11) CVPR (10) INTERSPEECH (10) NAACL (6) AAAI (2) EMNLP (2) ICCV (2) ICML (2) WACV (2) ICLR (1)

Top co-authors

Shinji Watanabe (23) William Chen (9) Siddhant Arora (8) Jiatong Shi (7) Brian Yan (7) Zhiyong Lu (6) Jinchuan Tian (6) Yui Sudo (5) Soumi Maiti (4) Ying Ding (4)

Keywords

automatic speech recognition (6) medical imaging (6) speech recognition (6) contrastive learning (4) large language model (4) speech translation (4) clinical text (3) multi-label classification (3) named entity recognition (3) speech processing (3) disease classification (3) radiology report (3) spoken language understanding (3) self-supervised learning (3) end-to-end asr (3) multi-task learning (2) language model (2) model compression (2) multimodal learning (2) zero-shot learning (2)

Papers

A Disease-Aware Dual-Stage Framework for Chest X-ray Report Generation AAAI 2026 MARCH: Multi-Agent Radiology Clinical Hierarchy for CT Report Generation ACL 2026 Improving Retrieval-Augmented Generation without Taxonomy-based Error Categorization ACL 2026 ESPnet-SpeechLM: An Open Speech Language Model Toolkit NAACL 2025 Natural Language Processing in Support of Evidence-based Medicine: A Scoping Review ACL 2025 Glossy Object Reconstruction with Cost-effective Polarized Acquisition CVPR 2025 Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding CVPR 2025 Learned Binocular-Encoding Optics for RGBD Imaging Using Joint Stereo and Focus Cues CVPR 2025 Context-aware Dynamic Pruning for Speech Foundation Models ICLR 2025 OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models ICML 2025 VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning NAACL 2025 Enhancing Audiovisual Speech Recognition Through Bifocal Preference Optimization AAAI 2025 ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems NAACL 2025 Towards Robust Speech Representation Learning for Thousands of Languages EMNLP 2024 OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification ACL 2024 On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models INTERSPEECH 2024 Learned Scanpaths Aid Blind Panoramic Video Quality Assessment CVPR 2024 Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss INTERSPEECH 2024 OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer INTERSPEECH 2024 MULTI-CONVFORMER: Extending Conformer with Multiple Convolution Kernels INTERSPEECH 2024 UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions NAACL 2024 CMU’s IWSLT 2023 Simultaneous Speech Translation System ACL 2023 Learning a Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation ICCV 2023 Attend Who Is Weak: Pruning-Assisted Medical Image Localization Under Sophisticated and Implicit Imbalances WACV 2023 DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models INTERSPEECH 2023 Tensor decomposition for minimization of E2E SLU model toward on-device processing INTERSPEECH 2023 A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks INTERSPEECH 2023 Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute INTERSPEECH 2023 Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training INTERSPEECH 2023 ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit ACL 2023 Less Likely Brainstorming: Using Language Models to Generate Alternative Hypotheses ACL 2023 CMU’s IWSLT 2022 Dialect Speech Translation System ACL 2022 Knowledge-Augmented Contrastive Learning for Abnormality Classification and Localization in Chest X-Rays With Radiomics Using a Feedback Loop WACV 2022 Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding ICML 2022 EchoGen: Generating Conclusions from Echocardiogram Notes ACL 2022 Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR INTERSPEECH 2022 Leveraging Deep Representations of Radiology Reports in Survival Analysis for Predicting Heart Failure Patient Mortality NAACL 2021 Improving BERT Model Using Contrastive Learning for Biomedical Relation Extraction NAACL 2021 Automatic recognition of abdominal lymph nodes from clinical text EMNLP 2020 An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining ACL 2020 Deep Optics for Single-Shot High-Dynamic-Range Imaging CVPR 2020 Holistic and Comprehensive Annotation of Clinically Significant Findings on Diverse CT Images: Learning From Radiology Reports and Label Ontology CVPR 2019 Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets ACL 2019 Depth and Transient Imaging With Compressive SPAD Array Cameras CVPR 2018 TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays CVPR 2018 ChestX-ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases CVPR 2017 Revisiting Cross-Channel Information Transfer for Chromatic Aberration Correction ICCV 2017 Studying Relationships between Human Gaze, Description, and Computer Vision CVPR 2013