Papers
5,479 papers found
An Information-Theoretic Approach to Reducing Fertility in LLMs for Manipuri Machine Translation
Telem Joyson Singh, Ranbir Singh Sanasam, Priyankoo Sarmah
Agent-based Automated Claim Matching with Instruction-following LLMs
Dina Pisarevskaya, Arkaitz Zubiaga
Human–LLM Benchmarks for Bangla Dialect Translation: Sylheti and Chittagonian on the BanglaCHQ-Summ Corpus
Nowshin Mahjabin, Ahmed Shafin Ruhan, Mehreen Chowdhury et al.
A Comparative Analysis of Retrieval-Augmented Generation Techniques for Bengali Standard-to-Dialect Machine Translation Using LLMs
K. M. Jubair Sami, Dipto Sumit, Ariyan Hossain et al.
Robustness of LLMs to Transliteration Perturbations in Bangla
Fabiha Haider, Md Farhan Ishmam, Fariha Tanjim Shifat et al.
Computational Story Lab at BLP-2025 Task 1: HateSense: A Multi-Task Learning Framework for Comprehensive Hate Speech Identification using LLMs
Tabia Tanzin Prama, Christopher M. Danforth, Peter Dodds
Barrier Breakers at BLP-2025 Task 2: Enhancing LLM Code Generation Capabilities through Test-Driven Development and Code Interpreter
Sajed Jalil, Shuvo Saha, Hossain Mohammad Seym
CUET_Expelliarmus at BLP2025 Task 2: Leveraging Instruction Translation and Refinement for Bangla-to-Python Code Generation with Open-Source LLMs
Md Kaf Shahrier, Suhana Binta Rashid, Hasan Mesbaul Ali Taher et al.
TeamB2B at BLP-2025 Task 2: BanglaForge: LLM Collaboration with Self-Refinement for Bangla Code Generation
Mahir Labib Dihan, Sadif Ahmed, Md Nafiu Rahman
Benchmarking Hindi LLMs: A New Suite of Datasets and a Comparative Analysis
Anusha Kamath, Kanishk Singla, Rakesh Paul et al.
SmurfCat at SHROOM-CAP: Factual but Awkward? Fluent but Wrong? Tackling Both in LLM Scientific QA
Timur Ionov, Evgenii Nikolaev, Artem Vazhentsev et al.
Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation
Naila Shafirni Hidayat, Muhammad Dehan Al Kautsar, Alfan Farizki Wicaksono et al.
Reliable Inline Code Documentation with LLMs: Fine-Grained Evaluation of Comment Quality and Coverage
Rohan Patil, Gaurav Tirodkar, Shubham Gatfane
Beyond the Rubric: Cultural Misalignment in LLM Benchmarks for Sexual and Reproductive Health
Sumon Kanti Dey, Manvi S, Zeel Mehta et al.
Non-Determinism of “Deterministic” LLM System Settings in Hosted Environments
Berk Atıl, Sarp Aykent, Alexa Chittams et al.
Test Set Quality in Multilingual LLM Evaluation
Chalamalasetti Kranti, Gabriel Bernier-Colborne, Yvan Gauthier et al.
LLM Driven Legal Text Analytics: A Case Study For Food Safety Violation Cases
Suyog Joshi, Soumyajit Basu, Lipika Dey et al.
MEDEQUALQA: Evaluating Biases in LLMs with Counterfactual Reasoning
Rajarshi Ghosh, Abhay Gupta, Hudson McBride et al.
Reasoning-Enhanced Retrieval for Misconception Prediction: A RAG-Inspired Approach with LLMs
Chaudhary Divya, Chang Xue, Shaorui Sun
A benchmark for end-to-end zero-shot biomedical relation extraction with LLMs: experiments with OpenAI models
Aviv Brokman, Xuguang Ai, Yuhang Jiang et al.
Bridging the Gap: Instruction-Tuned LLMs for Scientific Named Entity Recognition
Necva Bölücü, Maciej Rybinski, Stephen Wan
A Hybrid LLM and Supervised Model Pipeline for Polymer Property Extraction from Tables in Scientific Literature
Van-Thuy Phi, Dinh-Truong Do, Hoang-An Trieu et al.
Structured Outputs in Prompt Engineering: Enhancing LLM Adaptability on Counterintuitive Instructions
Jingjing Ye, Song Bai, Zhenyang Li et al.