Papers
RuleR: Improving LLM Controllability by Rule-based Data Recycling
Ming Li, Han Chen, Chenguang Wang et al.
RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation
Viacheslav Vasilev, Julia Agafonova, Nikolai Gerasimenko et al.
RxLens: Multi-Agent LLM-powered Scan and Order for Pharmacy
Akshay Jagatap, Srujana Merugu, Prakash Mandayam Comar
𝒮2IT: Stepwise Syntax Integration Tuning for Large Language Models in Aspect Sentiment Quad Prediction
Bingfeng Chen, Chenjie Qiu, Yifeng Xie et al.
S2-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency
Yuting Zeng, Weizhe Huang, Lei Jiang et al.
Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Models
Siyin Wang, Xingsong Ye, Qinyuan Cheng et al.
SafeQuant: LLM Safety Analysis via Quantized Gradient Inspection
Sindhu Padakandla, Sadbhavana Babar, Rathod Darshan D et al.
SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations
Xingwei Tan, Chen Lyu, Hafiz Muhammad Umer et al.
SafetyQuizzer: Timely and Dynamic Evaluation on the Safety of LLMs
Zhichao Shi, Shaoling Jing, Yi Cheng et al.
SAFR: Neuron Redistribution for Interpretability
Ruidi Chang, Chunyuan Deng, Hanjie Chen
SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data
Suyoung Bae, YunSeok Choi, Hyojun Kim et al.
SANDWiCH: Semantical Analysis of Neighbours for Disambiguating Words in Context ad Hoc
Daniel Guzman Olivares, Lara Quijano, Federico Liberatore
SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search
Hanwen Du, Bo Peng, Xia Ning
Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement
Nicolas Floquet, Joseph Le Roux, Nadi Tomeh et al.
Scaling LLM Inference Efficiently with Optimized Sample Compute Allocation
Kexun Zhang, Shang Zhou, Danqing Wang et al.
Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches
Adithya Pratapa, Teruko Mitamura
Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models
Haritz Puerto, Martin Gubri, Sangdoo Yun et al.
Schema and Natural Language Aware In-Context Learning for Improved GraphQL Query Generation
Nitin Gupta, Manish Kesarwani, Sambit Ghosh et al.
SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis
Hengxing Cai, Xiaochen Cai, Junhan Chang et al.
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
Carter Teplica, Yixin Liu, Arman Cohan et al.
SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models
Grigor Nalbandyan, Rima Shahbazyan, Evelina Bakhturina
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges
Rao Fu, Ziyang Luo, Hongzhan Lin et al.
ScreenQA: Large-Scale Question-Answer Pairs Over Mobile App Screenshots
Yu-Chung Hsiao, Fedir Zubach, Gilles Baechler et al.
Script-Agnosticism and its Impact on Language Identification for Dravidian Languages
Milind Agarwal, Joshua Otten, Antonios Anastasopoulos
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
Chaoqun Liu, Wenxuan Zhang, Jiahao Ying et al.