Papers
Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets
Mahdi Zakizadeh, Mohammad Taher Pilehvar
BLiSS: Evaluating Bilingual Learner Competence in Second Language Small Language Models
Yuan Gao, Suchir Salhan, Andrew Caines et al.
BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom’s-Taxonomy-Inspired Prompts
Maria-Eleni Zoumpoulidi, Georgios Paraskevopoulos, Alexandros Potamianos
Bold Claims or Self-Doubt? Factuality Hallucination Type Detection via Belief State
Dongyu Zhang, Qingqing Hong, Bingxuan Hou et al.
BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)
Tomas Ruiz, Siyao Peng, Barbara Plank et al.
Boosting Data Utilization for Multilingual Dense Retrieval
Chao Huang, Fengran Mo, Yufeng Chen et al.
Boosting Multi-modal Keyphrase Prediction with Dynamic Chain-of-Thought in Vision-Language Models
Qihang Ma, Shengyu Li, Jie Tang et al.
Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM
Dingjie Song, Sicheng Lai, Mingxuan Wang et al.
Boundary Matters: Leveraging Structured Text Plots for Long Text Outline Generation
Yuanchi Ma, Jiamou Liu, Hui He et al.
BOUQuET : dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
Pierre Andrews, Mikel Artetxe, Mariano Coria Meglioli et al.
BrailleLLM: Braille Instruction Tuning with Large Language Models for Braille Domain Tasks
Tianyuan Huang, Zepeng Zhu, Hangdi Xing et al.
BrainLoc: Brain Signal-Based Object Detection with Multi-modal Alignment
Jiaqi Duan, Xiaoda Yang, Kaixuan Luan et al.
Bratly: A Python Extension for BRAT Functionalities
Jamil Zaghir, Jean-Philippe Goldman, Nikola Bjelogrlic et al.
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang, Yicong Tan, Yun Shen et al.
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
Agam Goyal, Vedant Rathi, William Yeh et al.
Breaking the Attention Trap in Code LLMs: A Rejection Sampling Approach to Enhance Code Execution Prediction
Xingcheng Ruan, Haoxiang Geng, Yunhui Xia et al.
Breaking the Noise Barrier: LLM-Guided Semantic Filtering and Enhancement for Multi-Modal Entity Alignment
Chenglong Lu, Chenxiao Li, Jingwei Cheng et al.
Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks
Tzu-Ling Lin, Wei-Chih Chen, Teng-Fang Hsiao et al.
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs
Mohsinul Kabir, Ajwad Abrar, Sophia Ananiadou
B-REASO: A Multi-Level Multi-Faceted Bengali Evaluation Suite for Foundation Models
Md Tanzib Hosain, Md Kishor Morol
Bridging Dialectal Gaps in Arabic Medical LLMs through Model Merging
Ahmed Ibrahim, Abdullah Hosseini, Hoda Helmy et al.
Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge
Yi Sui, Chaozhuo Li, Chen Zhang et al.
Bridging Information Gaps with Comprehensive Answers: Improving the Diversity and Informativeness of Follow-Up Questions
Zhe Liu, Taekyu Kang, Haoyu Wang et al.