José Hernández-Orallo

19 papers · 2012–2026 · 6 conferences · across top CS/AI conferences

Achievements

+10 more ↓

🏃 Academic Marathon (13) 🌍 Conference Polyglot (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (6)

🐝 Cross-Pollinator (6) 🌈 Renaissance Researcher (8) 🗺️ Taxonomy Completionist (36) 🏆 Keyword Champion (5) 👥 Mega-Team (27) 🤝 Dynamic Duo (10) ❓ The Questioner (2) 💎 Century Club (18) ⚡ Prolific Year (5) 🗃️ Keyword Collector (87)

Conferences

IJCAI (7) AAAI (4) ACL (3) EACL (2) NIPS (2) JMLR (1)

Top co-authors

Fernando Martínez-Plumed (10) Wout Schellaert (4) John Burden (4) Bao Sheng Loe (3) Cesar Ferri (3) Marko Tesic (2) Konstantinos Voudouris (2) Behzad Mehrbakhsh (2) Sean Ó hÉigeartaigh (2) Peter Flach (2)

Keywords

ai evaluation (5) large language model (4) concept learning (2) instance difficulty (2) ai benchmark (2) prompt engineering (2) foundation model (2) few-shot learning (2) human-ai interaction (2) machine teaching (2) item response theory (2) zero-shot learning (2) model evaluation (2) language model (2) data contamination (2) predictive accuracy (1) evaluation framework (1) cognitive modeling (1) explainable ai (1) bayesian inference (1)

Papers

TRACE: A Corpus of Team Creative Discussions ACL 2026 Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture IJCAI 2025 PredictaBoard: Benchmarking LLM Score Predictability ACL 2025 Contamination Budget: Trade-offs Between Breadth, Depth and Difficulty IJCAI 2025 Item Response Theory for Natural Language Processing EACL 2024 A Proposal for Scaling the Scaling Laws EACL 2024 Melting Pot Contest: Charting the Future of Generalized Cooperative Intelligence NIPS 2024 Your Prompt Is My Command: On Assessing the Human-Centred Generality of Multimodal Models (Abstract Reprint) AAAI 2024 Confounders in Instance Variation for the Analysis of Data Contamination ACL 2024 How General-Purpose Is a Language Model? Usefulness and Safety with Human Prompters in the Wild AAAI 2022 Non-Cheating Teaching Revisited: A New Probabilistic Machine Teaching Model IJCAI 2022 Measuring the Occupational Impact of AI: Tasks, Cognitive Abilities and AI Benchmarks (Extended Abstract)* IJCAI 2022 Training on the Test Set: Mapping the System-Problem Space in AI AAAI 2022 When AI Difficulty Is Easy: The Explanatory Power of Predicting IRT Difficulty AAAI 2022 Not a Number: Identifying Instance Features for Capability-Oriented Evaluation IJCAI 2022 Think Big, Teach Small: Do Language Models Distil Occam’s Razor? NIPS 2021 The Facets of Artificial Intelligence: A Framework to Track the Evolution of AI IJCAI 2018 Computer Models Solving Intelligence Test Problems: Progress and Implications (Extended Abstract) IJCAI 2017 A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss JMLR 2012