Bryan Catanzaro

64 papers · 2013–2026 · 12 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🌍 Conference Polyglot (12) 🏃 Academic Marathon (12) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (11)

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (11) 🤝 Dynamic Duo (29) 👑 Triple Crown 👥 Mega-Team (69) 🔬 Deep Specialist (10) 🏆 Keyword Champion (2) 🚀 Conference Pioneer 🗃️ Keyword Collector (209) 📈 Trend Setter ⚡ Prolific Year (10) 🔥 Unstoppable (8) ❓ The Questioner (2) 💎 Century Club (63)

Conferences

ICLR (13) NIPS (11) ICML (10) EMNLP (7) CVPR (6) ACL (5) EACL (3) ICCV (3) ECCV (2) INTERSPEECH (2) IJCNLP (1) WACV (1)

Top co-authors

Mohammad Shoeybi (30) Wei Ping (23) Andrew Tao (18) Mostofa Patwary (17) Guilin Liu (11) Peng Xu (10) Rafael Valle (9) Zihan Liu (8) Shrimai Prabhumoye (8) Kevin J. Shih (8)

Keywords

large language model (8) language model (7) text generation (4) instruction tuning (4) neural network (4) retrieval-augmented generation (4) question answering (3) video generation (3) knowledge distillation (3) zero-shot learning (2) speech synthesis (2) image generation (2) domain adaptation (2) unsupervised pretraining (2) dialogue generation (2) few-shot learning (2) reward modeling (2) contrastive learning (2) factual accuracy (2) benchmark evaluation (2)

Papers

Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning EACL 2026 MIND: Math Informed syNthetic Dialogues for Pretraining LLMs ICLR 2025 MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS ICLR 2025 RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models CVPR 2025 ETTA: Elucidating the Design Space of Text-to-Audio Models ICML 2025 Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities ICML 2025 Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders ICLR 2025 Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data ICLR 2025 Nemotron-CORTEXA: Enhancing LLM Agents for Software Engineering Tasks via Improved Localization and Solution Diversity ICML 2025 FeatSharp: Your Vision Model Features, Sharper ICML 2025 UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation ICLR 2025 ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities ICLR 2025 NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models ICLR 2025 Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset ACL 2025 Fugatto 1: Foundational Generative Audio Transformer Opus 1 ICLR 2025 AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling ACL 2025 Retrieval meets Long Context Large Language Models ICLR 2024 ChatQA: Surpassing GPT-4 on Conversational QA and RAG NIPS 2024 Compact Language Models via Pruning and Knowledge Distillation NIPS 2024 RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs NIPS 2024 Data, Data Everywhere: A Guide for Pretraining Dataset Construction EMNLP 2024 LLM-Evolve: Evaluation for LLM’s Evolving Capability on Benchmarks EMNLP 2024 ODIN: Disentangled Reward Mitigates Hacking in RLHF ICML 2024 Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities ICML 2024 InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining ICML 2024 Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement WACV 2024 P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting NIPS 2023 CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram INTERSPEECH 2023 RAD-MMM: Multilingual Multiaccented Multispeaker Text To Speech INTERSPEECH 2023 Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning EMNLP 2023 Context Generation Improves Open Domain Question Answering EACL 2023 Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models ICCV 2023 BigVGAN: A Universal Neural Vocoder with Large-Scale Training ICLR 2023 Adding Instructions during Pretraining: Effective way of Controlling Toxicity in Language Models EACL 2023 Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study EMNLP 2023 Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models NIPS 2022 Evaluating Parameter Efficient Learning for Generation EMNLP 2022 Multi-Stage Prompting for Knowledgeable Dialogue Generation ACL 2022 Efficient Token Mixing for Transformers via Adaptive Fourier Neural Operators ICLR 2022 Factuality Enhanced Language Models for Open-Ended Text Generation NIPS 2022 End-to-End Training of Neural Retrievers for Open-Domain Question Answering IJCNLP 2021 Long-Short Transformer: Efficient Transformers for Language and Vision NIPS 2021 Dual Contrastive Loss and Attention for GANs ICCV 2021 Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis ICLR 2021 DiffWave: A Versatile Diffusion Model for Audio Synthesis ICLR 2021 End-to-End Training of Neural Retrievers for Open-Domain Question Answering ACL 2021 View Generalization for Single Image Textured 3D Models CVPR 2021 MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models EMNLP 2020 Training Question Answering Models From Synthetic Data EMNLP 2020 Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? NIPS 2020 Large Scale Multi-Actor Generative Dialog Modeling ACL 2020 Neural FFTs for Universal Texture Image Synthesis NIPS 2020 Panoptic-Based Image Synthesis CVPR 2020 Few-shot Video-to-Video Synthesis NIPS 2019 Graphical Contrastive Losses for Scene Graph Parsing CVPR 2019 Improving Semantic Segmentation via Video Propagation and Label Relaxation CVPR 2019 Unsupervised Video Interpolation Using Cycle Consistency ICCV 2019 Video-to-Video Synthesis NIPS 2018 High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs CVPR 2018 SDC-Net: Video prediction using spatially-displaced convolution ECCV 2018 Image Inpainting for Irregular Holes Using Partial Convolutions ECCV 2018 Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin ICML 2016 Persistent RNNs: Stashing Recurrent Weights On-Chip ICML 2016 Deep learning with COTS HPC systems ICML 2013