conftrace_

Artificial Intelligence › Core AI ›

Large Language Models

6,405 papers

Papers per year

Papers

Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents NAACL 2025

Aligning to What? Limits to RLHF Based Alignment NAACL 2025

Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models NAACL 2025

Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving NAACL 2025

Evaluation of LLMs-based Hidden States as Author Representations for Psychological Human-Centered NLP Tasks NAACL 2025

ThoughtSculpt: Reasoning with Intermediate Revision and Search NAACL 2025

Using Linguistic Entrainment to Evaluate Large Language Models for Use in Cognitive Behavioral Therapy NAACL 2025

On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation NAACL 2025

LITERA: An LLM Based Approach to Latin-to-English Translation NAACL 2025

Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning NAACL 2025

Towards Long Context Hallucination Detection NAACL 2025

Accounting for Sycophancy in Language Model Uncertainty Estimation NAACL 2025

Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-sample Aggregation on Large Language Models NAACL 2025

Meta-Reasoning Improves Tool Use in Large Language Models NAACL 2025

GAIfE: Using GenAI to Improve Literacy in Low-resourced Settings NAACL 2025

Hard Emotion Test Evaluation Sets for Language Models NAACL 2025

UCL-Bench: A Chinese User-Centric Legal Benchmark for Large Language Models NAACL 2025

LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models NAACL 2025

AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation NAACL 2025

DHP Benchmark: Are LLMs Good NLG Evaluators? NAACL 2025

GraphEval36K: Benchmarking Coding and Reasoning Capabilities of Large Language Models on Graph Datasets NAACL 2025

SimulBench: Evaluating Language Models with Creative Simulation Tasks NAACL 2025

ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning NAACL 2025

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision NAACL 2025

Demystifying the Power of Large Language Models in Graph Generation NAACL 2025