Dinesh Manocha
128 papers · 2005–2026 · 19 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (22) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π£
Hot Topic Early Bird
π€
Dynamic Duo
(25)
π
Triple Crown
π
Grand Slam
π₯
Mega-Team
(34)
π¬
Deep Specialist
(32)
π
Keyword Champion
(2)
π
Conference Pioneer
β‘
Prolific Year
(15)
π₯
Unstoppable
(8)
ποΈ
Keyword Collector
(56)
π
Trend Setter
π
Century Club
(121)
β
The Questioner
(4)
Conferences
EMNLP (16)
CVPR (13)
ICCV (11)
AAAI (10)
ACL (9)
ICML (9)
NAACL (9)
ECCV (8)
ICLR (8)
RSS (7)
WACV (7)
INTERSPEECH (6)
COLING (4)
CORL (3)
IJCAI (3)
IJCNLP (2)
EACL (1)
AACL (1)
NIPS (1)
Top co-authors
Research topics
Keywords
multimodal learning
(22)
data augmentation
(9)
vision-language model
(6)
3d reconstruction
(5)
contrastive learning
(5)
benchmark evaluation
(5)
automatic speech recognition
(4)
multimodal large language model
(4)
visual question answering
(4)
hallucination detection
(4)
few-shot learning
(4)
multi-task learning
(4)
novel view synthesis
(4)
in-context learning
(3)
video understanding
(3)
question answering
(3)
multi-modal learning
(3)
point cloud
(3)
emotion recognition
(3)
depth estimation
(3)
Papers
DIAGRAMS : A Review Framework for Reasoning-Level Attribution in Diagram QA
ACL 2026
UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery Using Gaussian Splatting
AAAI 2026
Bi-VLM: Binary Post-Training Quantization for Vision-Language Models
AAAI 2026
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
AAAI 2026
MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency
WACV 2026
FIGMA: Towards FIne-Grained Music retrievAl
ACL 2026
Engagement Undermines Safety: How Stereotypes and Toxicity Shape Humor in Language Models
EACL 2026
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
ICLR 2025
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
ICLR 2025
ChartLens: Fine-grained Visual Attribution in Charts
ACL 2025
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
ACL 2025
IM360: Large-scale Indoor Mapping with 360 Cameras
ICCV 2025
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
ICCV 2025
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
ICCV 2025
Towards Optimal Multi-draft Speculative Decoding
ICLR 2025
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment
ICLR 2025
How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings
ICLR 2025
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
ICLR 2025
Imposter: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models
COLING 2025
Do Audio-Language Models Understand Linguistic Variations?
NAACL 2025
ProSE: Diffusion Priors for Speech Enhancement
NAACL 2025
PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
NAACL 2025
ChartEval: LLM-Driven Chart Generation Evaluation Using Scene Graph Parsing
AACL 2025
EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching
CVPR 2025
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
CVPR 2025
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
NAACL 2025
PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from related Example Banks
NAACL 2025
ChartEval: LLM-Driven Chart Generation Evaluation Using Scene Graph Parsing
IJCNLP 2025
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
ICML 2025
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents
EMNLP 2025
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
EMNLP 2025
MULTIVOX: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions
EMNLP 2025
RELIC: Enhancing Reward Model Generalization for Low-Resource Indic Languages with Few-Shot Examples
EMNLP 2025
HALLUCINOGEN: Benchmarking Hallucination in Implicit Reasoning within Large Vision Language Models
EMNLP 2025
Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time
ICML 2025
HALO : Human Preference Aligned Offline Reward Learning for Robot Navigation
CORL 2025
EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
ICCV 2025
CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP
NAACL 2024
Do Vision-Language Models Understand Compound Nouns?
NAACL 2024
Can LLMβs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis
NAACL 2024
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
INTERSPEECH 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
ECCV 2024
V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation
ECCV 2024
MaxMin-RLHF: Alignment with Diverse Human Preferences
ICML 2024
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
ICLR 2024
ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations
ACL 2024
Transfer Q-star : Principled Decoding for LLM Alignment
NIPS 2024
ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions
ACL 2024
DOC-RAG: ASR Language Model Personalization with Domain-Distributed Co-occurrence Retrieval Augmentation
COLING 2024
DocScript: Document-level Script Event Prediction
COLING 2024
Saliency-Aware Interpolative Augmentation for Multimodal Financial Prediction
COLING 2024
Position: On the Possibilities of AI-Generated Text Detection
ICML 2024
TAME-RD: Text Assisted Replication of Image Multi-Adjustments for Reverse Designing
ACL 2024
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition
WACV 2024
MITFAS: Mutual Information Based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition
WACV 2024
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
EMNLP 2024
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
EMNLP 2024
DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
EMNLP 2024
IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning
EMNLP 2024
AV-RIR: Audio-Visual Room Impulse Response Estimation
CVPR 2024
LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering
CVPR 2024
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
CVPR 2024
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
CVPR 2024
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
EMNLP 2024
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback
ICLR 2024
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
ICML 2024
A Closer Look at the Limitations of Instruction Tuning
ICML 2024
Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic
ICML 2023
Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning
CORL 2023
DocEdit: Language-Guided Document Editing
AAAI 2023
Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning
AAAI 2023
ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER
ACL 2023
TMO: Textured Mesh Acquisition of Objects With a Mobile Device by Using Differentiable Rendering
CVPR 2023
CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network
EMNLP 2023
DALE: Generative Data Augmentation for Low-Resource Legal NLP
EMNLP 2023
APoLLo : Unified Adapter and Prompt Learning for Vision Language Models
EMNLP 2023
PersonaLM: Language Model Personalization via Domain-distributed Span Aggregated K-Nearest N-gram Retrieval Augmentation
EMNLP 2023
LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference
ICCV 2023
CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition
ICCV 2023
AdVerb: Visually Guided Audio Dereverberation
ICCV 2023
STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning
ICML 2023
MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
INTERSPEECH 2023
LayerDoc: Layer-Wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents
WACV 2023
Placing Human Animations Into 3D Scenes by Learning Interaction- and Geometry-Driven Keyframes
WACV 2023
SALAD: Source-Free Active Label-Agnostic Domain Adaptation for Classification, Segmentation and Detection
WACV 2023
D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights
ECCV 2022
A Repulsive Force Unit for Garment Collision Handling in Neural Networks
ECCV 2022
STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes
CVPR 2022
3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Social Media Short Videos
CVPR 2022
DocFin: Multimodal Financial Prediction and Bias Mitigation using Semi-structured Documents
EMNLP 2022
N-Penetrate: Active Learning of Neural Collision Handler for Complex 3D Mesh Deformations
ICML 2022
DocInfer: Document-level Natural Language Inference using Optimal Evidence Selection
EMNLP 2022
M3DETR: Multi-Representation, Multi-Scale, Mutual-Relation 3D Object Detection With Transformers
WACV 2022
HTRON: Efficient Outdoor Navigation with Sparse Rewards via Heavy Tailed Adaptive Reinforce Algorithm
CORL 2022
DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis
INTERSPEECH 2022
PISA: PoIncarΓ© Saliency-Aware Interpolative Augmentation
INTERSPEECH 2022
TNS: Terrain Traversability Mapping and Navigation System for Autonomous Excavators
RSS 2022
FAR: Fourier Aerial Video Recognition
ECCV 2022
DocTime: A Document-level Temporal Dependency Graph Parser
NAACL 2022
Human Trajectory Prediction via Neural Social Physics
ECCV 2022
HighlightMe: Detecting Highlights From Human-Centric Videos
ICCV 2021
Point-based Acoustic Scattering for Interactive Sound Propagation via Surface Encoding
IJCAI 2021
TIMERS: Document-level Temporal Relation Extraction
IJCNLP 2021
IR-GAN: Room Impulse Response Generator for Far-Field Speech Recognition
INTERSPEECH 2021
Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality
CVPR 2021
TIMERS: Document-level Temporal Relation Extraction
ACL 2021
LCollision: Fast Generation of Collision-Free Human Poses using Learned Non-Penetration Constraints
AAAI 2021
Robust 2D/3D Vehicle Parsing in Arbitrary Camera Views for CVIS
ICCV 2021
DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes
ICCV 2021
AutoTrajectory: Label-free Trajectory Extraction and Prediction from Videos using Dynamic Points
ECCV 2020
Deep Differentiable Grasp Planner for High-DOF Grippers
RSS 2020
M3ER: Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues
AAAI 2020
STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits
AAAI 2020
Crowd-Steer: Realtime Smooth and Collision-Free Robot Navigation in Densely Crowded Scenarios Trained using High-Fidelity Simulation
IJCAI 2020
EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege's Principle
CVPR 2020
Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping
ECCV 2020
NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations
AAAI 2020
HMPO: Human Motion Prediction in Occluded Environments for Safe Motion Planning
RSS 2020
TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions
CVPR 2019
VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation
ICCV 2019
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks
INTERSPEECH 2019
TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents
AAAI 2019
Aggressive, Tense or Shy? Identifying Personality Traits from Crowd Videos
IJCAI 2017
Intention-Aware Motion Planning Using Learning Based Human Motion Prediction
RSS 2017
3D Reconstruction in the Presence of Glasses by Acoustic and Stereo Fusion
CVPR 2015
Collision-Free and Curvature-Continuous Path Smoothing In Cluttered Environments
RSS 2011
Star-shaped Roadmaps - A Deterministic Sampling Approach for Complete Motion Planning
RSS 2005
Path Planning for Deformable Robots in Complex Environments
RSS 2005