Research Explorer

AraEval: An Arabic Multi-Task Evaluation Suite for Large Language Models

Alhanoof Althnian, Norah A. Alzahrani, Shaykhah Z. Alsubaie et al.

2025 EMNLP

AraHalluEval: A Fine-grained Hallucination Evaluation Framework for Arabic LLMs

Aisha Alansari, Hamzah Luqman

2025 EMNLP

AraHealthQA 2025: The First Shared Task on Arabic Health Question Answering

Hassan Alhuzali, Walid Al-Eisawi, Muhammad Abdul-Mageed et al.

2025 EMNLP

AraMinds at AraHealthQA 2025: A Retrieval-Augmented Generation System for Fine-Grained Classification and Answer Generation of Arabic Mental Health Q&A

Mohamed Zaytoon, Ahmed Mahmoud Salem, Ahmed Sakr et al.

2025 EMNLP

AraMinds at MAHED 2025: Leveraging Vision-Language Models and Contrastive Multi-task Learning for Multimodal Hate Speech Detection

Mohamed Zaytoon, Ahmed Mahmoud Salem, Ahmed Sakr et al.

2025 EMNLP

AraNLP at MAHED 2025 Shared Task: Using AraBERT for Text-based Hate and Hope Speech Classification

Wafaa S. El-Kassas, Enas A. Hakim Khalil

2025 EMNLP

AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP

Ahmed Abul Hasanaath, Aisha Alansari, Ahmed Ashraf et al.

2025 EMNLP

AraS2P: Arabic Speech-to-Phonemes System

Bassam Mattar, Mohamed Fayed, Ayman Khalafallah

2025 EMNLP

AraSafe: Benchmarking Safety in Arabic LLMs

Hamdy Mubarak, Abubakr Mohamed, Majd Hawasly

2025 EMNLP

Archaeology at TSAR 2025 Shared Task Teaching Small Models to do CEFR Simplifications

Rareş-Alexandru Roşcan, Sergiu Nisioi

2025 EMNLP

A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy

Xiaoyun Zhang, Jingqing Ruan, Xing Ma et al.

2025 EMNLP

Are BabyLMs Deaf to Gricean Maxims? A Pragmatic Evaluation of Sample-efficient Language Models

Raha Askari, Sina Zarrieß, Özge Alacam et al.

2025 EMNLP

Are Checklists Really Useful for Automatic Evaluation of Generative Tasks?

Momoka Furuhashi, Kouta Nakayama, Takashi Kodama et al.

2025 EMNLP

Are Economists Always More Introverted? Analyzing Consistency in Persona-Assigned LLMs

Manon Reusens, Bart Baesens, David Jurgens

2025 EMNLP

Are Generative Models Underconfident? Better Quality Estimation with Boosted Model Probability

Tu Anh Dinh, Jan Niehues

2025 EMNLP

Are Knowledge and Reference in Multilingual Language Models Cross-Lingually Consistent?

Xi Ai, Mahardika Krisna Ihsani, Min-Yen Kan

2025 EMNLP

Are Language Models Consequentialist or Deontological Moral Reasoners?

Keenan Samway, Max Kleiman-Weiner, David Guzman Piedrahita et al.

2025 EMNLP

Are Large Language Models Chronically Online Surfers? A Dataset for Chinese Internet Meme Explanation

Yubo Xie, Chenkai Wang, Zongyang Ma et al.

2025 EMNLP

Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance

Omer Nahum, Nitay Calderon, Orgad Keller et al.

2025 EMNLP

Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning

Kush Juvekar, Arghya Bhattacharya, Sai Khadloya et al.

2025 EMNLP

Are LLMs Empathetic to All? Investigating the Influence of Multi-Demographic Personas on a Model’s Empathy

Ananya Malik, Nazanin Sabri, Melissa M. Karnaze et al.

2025 EMNLP

Arena-lite: Efficient and Reliable Large Language Model Evaluation via Tournament-Based Direct Comparisons

Seonil Son, Ju-Min Oh, Heegon Jin et al.

2025 EMNLP

Are Stereotypes Leading LLMs’ Zero-Shot Stance Detection ?

Anthony Dubreuil, Antoine Gourru, Christine Largeron et al.

2025 EMNLP

Are the Reasoning Models Good at Automated Essay Scoring?

Lui Yoshida

2025 EMNLP

Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study

DongGeon Lee, Joonwon Jang, Jihae Jeong et al.

2025 EMNLP

Papers