Ruoming Pang

33 papers · 2018–2025 · 10 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (11) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (10)

🌍 Conference Polyglot (10) 🏃 Academic Marathon (7) 🐝 Cross-Pollinator (12) 🤝 Dynamic Duo (10) 👥 Mega-Team (29) 🧬 Topic Evolution ⚡ Prolific Year (5) 🔥 Unstoppable (8) 📈 Trend Setter 💎 Century Club (33) 🗃️ Keyword Collector (115) ❓ The Questioner

Conferences

INTERSPEECH (13) ICLR (6) ACL (3) CVPR (3) ECCV (2) NAACL (2) EMNLP (1) ICCV (1) ICML (1) NIPS (1)

Top co-authors

Chung-Cheng Chiu (10) Yonghui Wu (8) Jiahui Yu (6) Tara N. Sainath (5) Mingxing Tan (5) Yu Zhang (5) James Qin (5) Bowen Zhang (4) Wei Han (4) Bo Li (4)

Keywords

speech recognition (4) end-to-end model (4) automatic speech recognition (4) convolutional neural network (4) word error rate (3) latency optimization (3) large language model (3) on-device speech recognition (3) language model (3) neural architecture search (3) shallow fusion (2) recurrent neural network transducer (2) object detection (2) end-to-end speech recognition (2) tool use (2) domain adaptation (2) model scaling (2) efficient computing (2) knowledge distillation (2) attention mechanism (2)

Papers

MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains NAACL 2025 Improve Vision Language Model Chain-of-thought Reasoning ACL 2025 Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge? ACL 2025 Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics ICLR 2025 EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing ICLR 2025 Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo ICLR 2025 Instruction-Following Pruning for Large Language Models ICML 2025 ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities NAACL 2025 "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training" ECCV 2024 STAIR: Learning Sparse Text and Image Representation in Grounded Tokens EMNLP 2023 Vector-quantized Image Modeling with Improved VQGAN ICLR 2022 Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition INTERSPEECH 2022 A Language Agnostic Multilingual Streaming On-Device ASR System INTERSPEECH 2022 Bridging the Gap Between Streaming and Non-Streaming ASR Systems by Distilling Ensembles of CTC and RNN-T Models INTERSPEECH 2021 Searching for Fast Model Families on Datacenter Accelerators CVPR 2021 Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling ICLR 2021 Unsupervised Learning of Disentangled Speech Content and Style Representation INTERSPEECH 2021 An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling INTERSPEECH 2021 BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models ECCV 2020 Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition INTERSPEECH 2020 ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context INTERSPEECH 2020 Emitting Word Timings with End-to-End Models INTERSPEECH 2020 Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus INTERSPEECH 2020 Conformer: Convolution-augmented Transformer for Speech Recognition INTERSPEECH 2020 EfficientDet: Scalable and Efficient Object Detection CVPR 2020 Hierarchical Generative Modeling for Controllable Speech Synthesis ICLR 2019 Monotonic Infinite Lookback Attention for Simultaneous Machine Translation ACL 2019 Shallow-Fusion End-to-End Contextual Biasing INTERSPEECH 2019 Two-Pass End-to-End Speech Recognition INTERSPEECH 2019 MnasNet: Platform-Aware Neural Architecture Search for Mobile CVPR 2019 Searching for MobileNetV3 ICCV 2019 Compression of End-to-End Models INTERSPEECH 2018 Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis NIPS 2018