← Optimization & Theory

Machine Learning › Optimization & Theory ›

Theory

4950 directly classified papers

Papers per year

Papers

REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction EMNLP 2025

Semantic Causality-Aware Vision-Based 3D Occupancy Prediction ICCV 2025

LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs NAACL 2025

An Inversion-based Measure of Memorization for Diffusion Models ICCV 2025

Reliability of Topic Modeling NAACL 2025

Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment ACL 2025

Benchmarking Language Model Creativity: A Case Study on Code Generation NAACL 2025

What Do Machine Learning Researchers Mean by “Reproducible”? AAAI 2025

NovAScore: A New Automated Metric for Evaluating Document Level Novelty COLING 2025

Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly COLING 2025

Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning COLING 2025

Arena-lite: Efficient and Reliable Large Language Model Evaluation via Tournament-Based Direct Comparisons EMNLP 2025

Unstable Grounds for Beautiful Trees? Testing the Robustness of Concept Translations in the Compilation of Multilingual Wordlists ACL 2025

When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models EMNLP 2025

Benchmarking Large Language Models Under Data Contamination: A Survey from Static to Dynamic Evaluation EMNLP 2025

Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness NAACL 2025

On the Distortion of Committee Election with 1-Euclidean Preferences and Few Distance Queries AAAI 2025

Stress-Testing the Reasoning Competence of Language Models With Formal Proofs EMNLP 2025

On the Interplay between Positional Encodings, Morphological Complexity, and Word Order Flexibility IJCNLP 2025

Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? ACL 2025

AssoCiAm: A Benchmark for Evaluating Association Thinking while Circumventing Ambiguity EMNLP 2025

Complete Symmetry Breaking for Finite Models AAAI 2025

Language Models Encode the Value of Numbers Linearly COLING 2025

Exploring Backdoor Vulnerabilities of Chat Models COLING 2025

Something’s Fishy in the Data Lake: A Critical Re-evaluation of Table Union Search Benchmarks ACL 2025