Dinesh Manocha

128 papers · 2005–2026 · 19 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🗺️ Taxonomy Completionist (22) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (25) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (34) 🔬 Deep Specialist (32) 🏆 Keyword Champion (2) 🚀 Conference Pioneer ⚡ Prolific Year (15) 🔥 Unstoppable (8) 🗃️ Keyword Collector (56) 📈 Trend Setter 💎 Century Club (121) ❓ The Questioner (4)

Conferences

EMNLP (16) CVPR (13) ICCV (11) AAAI (10) ACL (9) ICML (9) NAACL (9) ECCV (8) ICLR (8) RSS (7) WACV (7) INTERSPEECH (6) COLING (4) CORL (3) IJCAI (3) IJCNLP (2) EACL (1) AACL (1) NIPS (1)

Top co-authors

Sreyan Ghosh (27) Sonal Kumar (23) Puneet Mathur (22) Utkarsh Tyagi (17) Franck Dernoncourt (14) Ashish Seth (11) Furong Huang (10) Souradip Chakraborty (10) Aniket Bera (10) Ramani Duraiswami (9)

Research topics

Synthesis (1)

Keywords

multimodal learning (22) data augmentation (9) vision-language model (6) 3d reconstruction (5) contrastive learning (5) benchmark evaluation (5) automatic speech recognition (4) multimodal large language model (4) visual question answering (4) hallucination detection (4) few-shot learning (4) multi-task learning (4) novel view synthesis (4) in-context learning (3) video understanding (3) question answering (3) multi-modal learning (3) point cloud (3) emotion recognition (3) depth estimation (3)

Papers

DIAGRAMS : A Review Framework for Reasoning-Level Attribution in Diagram QA ACL 2026 UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery Using Gaussian Splatting AAAI 2026 Bi-VLM: Binary Post-Training Quantization for Vision-Language Models AAAI 2026 MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence AAAI 2026 MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency WACV 2026 FIGMA: Towards FIne-Grained Music retrievAl ACL 2026 Engagement Undermines Safety: How Stereotypes and Toxicity Shape Humor in Language Models EACL 2026 MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark ICLR 2025 Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data ICLR 2025 ChartLens: Fine-grained Visual Attribution in Charts ACL 2025 Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation ACL 2025 IM360: Large-scale Indoor Mapping with 360 Cameras ICCV 2025 AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs ICCV 2025 AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs ICCV 2025 Towards Optimal Multi-draft Speculative Decoding ICLR 2025 Collab: Controlled Decoding using Mixture of Agents for LLM Alignment ICLR 2025 How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings ICLR 2025 Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs ICLR 2025 Imposter: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models COLING 2025 Do Audio-Language Models Understand Linguistic Variations? NAACL 2025 ProSE: Diffusion Priors for Speech Enhancement NAACL 2025 PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification NAACL 2025 ChartEval: LLM-Driven Chart Generation Evaluation Using Scene Graph Parsing AACL 2025 EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching CVPR 2025 Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment CVPR 2025 VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation NAACL 2025 PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from related Example Banks NAACL 2025 ChartEval: LLM-Driven Chart Generation Evaluation Using Scene Graph Parsing IJCNLP 2025 Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities ICML 2025 Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents EMNLP 2025 EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding EMNLP 2025 MULTIVOX: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions EMNLP 2025 RELIC: Enhancing Reward Model Generalization for Low-Resource Indic Languages with Few-Shot Examples EMNLP 2025 HALLUCINOGEN: Benchmarking Hallucination in Implicit Reasoning within Large Vision Language Models EMNLP 2025 Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time ICML 2025 HALO : Human Preference Aligned Offline Reward Learning for Robot Navigation CORL 2025 EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception ICCV 2025 CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP NAACL 2024 Do Vision-Language Models Understand Compound Nouns? NAACL 2024 Can LLM’s Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis NAACL 2024 LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition INTERSPEECH 2024 Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time ECCV 2024 V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation ECCV 2024 MaxMin-RLHF: Alignment with Diverse Human Preferences ICML 2024 CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models ICLR 2024 ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations ACL 2024 Transfer Q-star : Principled Decoding for LLM Alignment NIPS 2024 ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions ACL 2024 DOC-RAG: ASR Language Model Personalization with Domain-Distributed Co-occurrence Retrieval Augmentation COLING 2024 DocScript: Document-level Script Event Prediction COLING 2024 Saliency-Aware Interpolative Augmentation for Multimodal Financial Prediction COLING 2024 Position: On the Possibilities of AI-Generated Text Detection ICML 2024 TAME-RD: Text Assisted Replication of Image Multi-Adjustments for Reverse Designing ACL 2024 PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition WACV 2024 MITFAS: Mutual Information Based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition WACV 2024 GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities EMNLP 2024 EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning EMNLP 2024 DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding EMNLP 2024 IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning EMNLP 2024 AV-RIR: Audio-Visual Room Impulse Response Estimation CVPR 2024 LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering CVPR 2024 HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models CVPR 2024 MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models CVPR 2024 AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models EMNLP 2024 PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback ICLR 2024 Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles ICML 2024 A Closer Look at the Limitations of Instruction Tuning ICML 2024 Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic ICML 2023 Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning CORL 2023 DocEdit: Language-Guided Document Editing AAAI 2023 Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning AAAI 2023 ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER ACL 2023 TMO: Textured Mesh Acquisition of Objects With a Mobile Device by Using Differentiable Rendering CVPR 2023 CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network EMNLP 2023 DALE: Generative Data Augmentation for Low-Resource Legal NLP EMNLP 2023 APoLLo : Unified Adapter and Prompt Learning for Vision Language Models EMNLP 2023 PersonaLM: Language Model Personalization via Domain-distributed Span Aggregated K-Nearest N-gram Retrieval Augmentation EMNLP 2023 LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference ICCV 2023 CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition ICCV 2023 AdVerb: Visually Guided Audio Dereverberation ICCV 2023 STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning ICML 2023 MMER: Multimodal Multi-task Learning for Speech Emotion Recognition INTERSPEECH 2023 LayerDoc: Layer-Wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents WACV 2023 Placing Human Animations Into 3D Scenes by Learning Interaction- and Geometry-Driven Keyframes WACV 2023 SALAD: Source-Free Active Label-Agnostic Domain Adaptation for Classification, Segmentation and Detection WACV 2023 D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights ECCV 2022 A Repulsive Force Unit for Garment Collision Handling in Neural Networks ECCV 2022 STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes CVPR 2022 3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Social Media Short Videos CVPR 2022 DocFin: Multimodal Financial Prediction and Bias Mitigation using Semi-structured Documents EMNLP 2022 N-Penetrate: Active Learning of Neural Collision Handler for Complex 3D Mesh Deformations ICML 2022 DocInfer: Document-level Natural Language Inference using Optimal Evidence Selection EMNLP 2022 M3DETR: Multi-Representation, Multi-Scale, Mutual-Relation 3D Object Detection With Transformers WACV 2022 HTRON: Efficient Outdoor Navigation with Sparse Rewards via Heavy Tailed Adaptive Reinforce Algorithm CORL 2022 DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis INTERSPEECH 2022 PISA: PoIncaré Saliency-Aware Interpolative Augmentation INTERSPEECH 2022 TNS: Terrain Traversability Mapping and Navigation System for Autonomous Excavators RSS 2022 FAR: Fourier Aerial Video Recognition ECCV 2022 DocTime: A Document-level Temporal Dependency Graph Parser NAACL 2022 Human Trajectory Prediction via Neural Social Physics ECCV 2022 HighlightMe: Detecting Highlights From Human-Centric Videos ICCV 2021 Point-based Acoustic Scattering for Interactive Sound Propagation via Surface Encoding IJCAI 2021 TIMERS: Document-level Temporal Relation Extraction IJCNLP 2021 IR-GAN: Room Impulse Response Generator for Far-Field Speech Recognition INTERSPEECH 2021 Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality CVPR 2021 TIMERS: Document-level Temporal Relation Extraction ACL 2021 LCollision: Fast Generation of Collision-Free Human Poses using Learned Non-Penetration Constraints AAAI 2021 Robust 2D/3D Vehicle Parsing in Arbitrary Camera Views for CVIS ICCV 2021 DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes ICCV 2021 AutoTrajectory: Label-free Trajectory Extraction and Prediction from Videos using Dynamic Points ECCV 2020 Deep Differentiable Grasp Planner for High-DOF Grippers RSS 2020 M3ER: Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues AAAI 2020 STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits AAAI 2020 Crowd-Steer: Realtime Smooth and Collision-Free Robot Navigation in Densely Crowded Scenarios Trained using High-Fidelity Simulation IJCAI 2020 EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege's Principle CVPR 2020 Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping ECCV 2020 NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations AAAI 2020 HMPO: Human Motion Prediction in Occluded Environments for Safe Motion Planning RSS 2020 TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions CVPR 2019 VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation ICCV 2019 Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks INTERSPEECH 2019 TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents AAAI 2019 Aggressive, Tense or Shy? Identifying Personality Traits from Crowd Videos IJCAI 2017 Intention-Aware Motion Planning Using Learning Based Human Motion Prediction RSS 2017 3D Reconstruction in the Presence of Glasses by Acoustic and Stereo Fusion CVPR 2015 Collision-Free and Curvature-Continuous Path Smoothing In Cluttered Environments RSS 2011 Star-shaped Roadmaps - A Deterministic Sampling Approach for Complete Motion Planning RSS 2005 Path Planning for Deformable Robots in Complex Environments RSS 2005