Papers
184,605 papers found
Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models
Yifan Jia, Yuntao Du, Kailin Jiang et al.
Benchmarking Offensive Language Detection in Persian and Pashto
Zahra Bokaei, Bonnie Webber, Walid Magdy
Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties
Zhenglin Wang, Jialong Wu, Pengfei Li et al.
Benchmarking the Energy Savings with Speculative Decoding Strategies
Rohit Dutta, Paramita Koley, Soham Poddar et al.
Benchmarking Trustworthiness in Multimodal LLMs for Video Understanding
Youze Wang, Zijun Chen, Ruoyu Chen et al.
Benchmarking Visual LLMs Resilience to Unanswerable Questions on Visually Rich Documents
Davide Napolitano, Luca Cagliero, Fabrizio Battiloro
Benchmarking XAI Explanations with Human-Aligned Evaluations
Rémi Kazmierczak, Steve Azzolin, Eloïse Berthier et al.
BERT, are you paying attention? Attention regularization with human-annotated rationales
Elize Herrewijnen, Dong Nguyen, Floris Bex et al.
Best Arm Identification with Biased Contexts
James Cheshire, Stephan Clémençon
Best-Effort Policies for Robust Markov Decision Processes
Alessandro Abate, Thom Badings, Giuseppe De Giacomo et al.
Best of Both Worlds Guarantees for Equitable Allocations
Umang Bhaskar, Vishwa Prakash HV, Aditi Sethia et al.
Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning
Sara Rajaee, Rochelle Choenni, Ekaterina Shutova et al.
Beta Distribution Learning for Reliable Roadway Crash Risk Assessment
Ahmad Elallaf, Nathan Jacobs, Xinyue Ye et al.
Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual Classification
Branislav Pecher, Jan Cegin, Robert Belanec et al.
Better Call CLAUSE: A Discrepancy Benchmark for Auditing LLMs Legal Reasoning Capabilities
Manan Roy Choudhury, Adithya Chandramouli, Mannan Anand et al.
Better Datasets Start from RefineLab: Automatic Optimization for High-Quality Dataset Refinement
Xiaonan Luo, Yue Huang, Ping He et al.
Better Generalizing to Unseen Concepts: An Evaluation Framework and An LLM-Based Auto-Labeled Pipeline for Biomedical Concept Recognition
Shanshan Liu, Noriki Nishida, Fei Cheng et al.
Better Matching, Less Forgetting: A Quality-Guided Matcher for Transformer-based Incremental Object Detection
Qirui Wu, Shizhou Zhang, De Cheng et al.
Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition
Dasol Choi, Seunghyun Lee, Youngsook Song
BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection
Guowen Zhang, Chenhang He, Liyi Chen et al.
Beware of Reasoning Overconfidence: Pitfalls in the Reasoning Process for Multi-solution Tasks
Jiannan Guan, Qiguang Chen, Libo Qin et al.
Beyond Accuracy: A Cognitive Load Framework for Mapping the Capability Boundaries of Tool-use Agents
Qihao Wang, Yue Hu, Mingzhe Lu et al.
Beyond Accuracy: Alignment and Error Detection across Languages in the Bi-GSM8K Math-Teaching Benchmark
Jieun Park, KyungTae Lim, Joon-ho Lim
Beyond Adapter Retrieval: Latent Geometry-Preserving Composition via Sparse Task Projection
Pengfei Jin, Peng Shu, Sifan Song et al.