conftrace_

Papers

5,914 papers found · incl. 435 without abstracts Only with abstracts

Persistent Instability in LLM’s Personality Measurements: Effects of Scale, Reasoning, and Conversation History

Tommaso Tosato, Saskia Helbling, Yorguin-Jose Mantilla-Ramos et al.

2026 AAAI

Benchmarking Trustworthiness in Multimodal LLMs for Video Understanding

Youze Wang, Zijun Chen, Ruoyu Chen et al.

2026 AAAI

STAR-1: Safer Alignment of Reasoning LLMs with 1K Data

Zijun Wang, Haoqin Tu, Yuhan Wang et al.

2026 AAAI

CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing

Zixia Wang, Gaojie Jin, Jia Hu et al.

2026 AAAI

HumorReject: Decoupling LLM Safety from Refusal Prefix via a Little Humor

Zihui Wu, Haichang Gao, Jiacheng Luo et al.

2026 AAAI

MedAtlas: Evaluating LLMs for Multi-Round, Multi-Task Medical Reasoning Across Diverse Imaging Modalities and Clinical Text

Ronghao Xu, Zhen Huang, Yangbo Wei et al.

2026 AAAI

Differentiated Directional Intervention: A Framework for Evading LLM Safety Alignment

Peng Zhang, Peijie Sun

2026 AAAI

GEM: Generative Entropy-Guided Preference Modeling for Few-Shot Alignment of LLMs

Yiyang Zhao, Huiyu Bai, Xuejiao Zhao

2026 AAAI

Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models

Tianyi Zhou, Johanne Medina, Sanjay Chawla

2026 AAAI

On the Feasibility of Using MultiModal LLMs to Execute AR Social Engineering Attacks

Ting Bi, Chenghang Ye, Zheyu Yang et al.

2026 AAAI

Can LLMs Identify Tax Abuse?

Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme

2026 AAAI

CO2-Meter: A Comprehensive Carbon Footprint Estimator for LLMs on Edge Devices

Zhenxiao Fu, Fan Chen, Lei Jiang

2026 AAAI

LocalBench: Benchmarking LLMs on County-Level Local Knowledge and Reasoning

Zihan Gao, Yifei Xu, Jacob Thebault-Spieker

2026 AAAI

Can LLMs Truly Embody Human Personality? Analyzing AI and Human Behavior Alignment in Dispute Resolution

Deuksin Kwon, Kaleen Shrestha, Bin Han et al.

2026 AAAI

Evaluating LLMs for Police Decision-Making: A Framework Based on Police Action Scenarios

Sangyub Lee, Heedou Kim, Hyeoncheol Kim

2026 AAAI

Should You Use LLMs to Simulate Opinions? Quality Checks for Early-Stage Deliberation

Terrence Neumann, Maria De-Arteaga, Sina Fazelpour

2026 AAAI

LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users

Elinor Poole-Dayan, Deb Roy, Jad Kabbara

2026 AAAI

The Confidence Trap: Gender Bias and Predictive Certainty in LLMs

Ahmed Sabir, Markus Kängsepp, Rajesh Sharma

2026 AAAI

CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological Counseling

Bichen Wang, Yixin Sun, Junzhe Wang et al.

2026 AAAI

LLM Safety in Judicial AI: A Stress Test of Social Media Influence on Real-World Judgments

Yixuan Xie, Yang He, Xiaoyu Yang et al.

2026 AAAI

Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs

Shasha Zhou, Mingyu Huang, Jack Cole et al.

2026 AAAI

Democratizing LLM Efficiency: From Hyperscale Optimizations to Universal Deployability

Hen-Hsen Huang

2026 AAAI

Is Word Sense Disambiguation Dead in the LLM Era?

Roberto Navigli

2026 AAAI

Beyond Neuron-Level Sparsity: Achieving Faithful and Interpretable LLMs with Mixture of Decoders

Grigorios Chrysos

2026 AAAI

Breaking the Resource Monopoly: LLM Post-Training and Serving with Modest Data and Compute

Jiaxin Huang

2026 AAAI