Artificial Intelligence › Core AI ›

Large Language Models

6405 directly classified papers

Papers per year

Papers

The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation ACL 2025

Does Biomedical Training Lead to Better Medical Performance? ACL 2025

Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs ACL 2025

Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons ACL 2025

Evaluating Calibration of Arabic Pre-trained Language Models on Dialectal Text COLING 2025

PMPO: A Self-Optimizing Framework for Creating High-Fidelity Measurement Tools for Social Bias in Large Language Models IJCNLP 2025

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction CVPR 2025

AraSim: Optimizing Arabic Dialect Translation in Children’s Literature with LLMs and Similarity Scores COLING 2025

Chain-of-MetaWriting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts COLING 2025

Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications ACL 2025

Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities COLING 2025

Crypto-LLM: Two-Stage Language Model Pre-training with Ciphered and Natural Language Data IJCNLP 2025

ParaRev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction COLING 2025

Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models ACL 2025

Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts COLING 2025

Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics IJCNLP 2025

IberoBench: A Benchmark for LLM Evaluation in Iberian Languages COLING 2025

STAND-Guard: A Small Task-Adaptive Content Moderation Model COLING 2025

Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems COLING 2025

RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Linguistic Classifiers COLING 2025

Beyond Visual Understanding Introducing PARROT-360V for Vision Language Model Benchmarking COLING 2025

OKG: On-the-Fly Keyword Generation in Sponsored Search Advertising COLING 2025

From Templates to Natural Language: Generalization Challenges in Instruction-Tuned LLMs for Spatial Reasoning IJCNLP 2025

MasRouter: Learning to Route LLMs for Multi-Agent Systems ACL 2025

Rationale Behind Essay Scores: Enhancing S-LLM’s Multi-Trait Essay Scoring with Rationale Generated by LLMs NAACL 2025