Chenghao Xiao

20 papers · 2022–2026 · 7 conferences · across top CS/AI conferences

Achievements

+6 more ↓

🐝 Cross-Pollinator (13) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (7) 🌈 Renaissance Researcher (7)

🗺️ Taxonomy Completionist (46) 🐝 Cross-Pollinator (13) 👥 Mega-Team (82) ⚡ Prolific Year (9) 🗃️ Keyword Collector (90) 💎 Century Club (18)

Conferences

EMNLP (7) ACL (6) ICCV (2) ICLR (2) COLING (1) CONLL (1) NAACL (1)

Top co-authors

Chenghua Lin (9) Noura Al Moubayed (9) Yizhi Li (3) Bohao Yang (3) Kun Zhao (3) Yu Rong (3) Kenneth Enevoldsen (2) Siwei Wu (2) Márton Kardos (2) Jie Fu (2)

Research topics

Digital Humanities (1)

Keywords

large language model (6) text classification (2) representation learning (2) semantic analysis (2) reinforcement learning (2) contrastive learning (2) cross-modal retrieval (2) unsupervised learning (2) sentence representation (2) question answering (1) natural language generation (1) cross-lingual transfer (1) embedding space (1) topic modeling (1) machine translation (1) multimodal learning (1) knowledge distillation (1) information retrieval (1) text analysis (1) text generation (1)

Papers

Translation or Recitation? Calibrating Evaluation Scores for Machine Translation of Extremely Low-Resource Languages ACL 2026 Understanding the Behaviors of Environment-aware Information Retrieval ACL 2026 Overview of the BioLaySumm 2025 Shared Task on Lay Summarization of Biomedical Research Articles and Radiology Reports ACL 2025 Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth EMNLP 2025 Analyzing LLMs’ Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations ACL 2025 ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning EMNLP 2025 Crafting Customisable Characters with LLMs: A Persona-Driven Role-Playing Agent Framework EMNLP 2025 Everything is a Video: Unifying Modalities through Next-Frame Prediction ICCV 2025 MIEB: Massive Image Embedding Benchmark ICCV 2025 MMTEB: Massive Multilingual Text Embedding Benchmark ICLR 2025 CAST: Corpus-Aware Self-similarity Enhanced Topic modelling NAACL 2025 SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval ACL 2024 Effective Distillation of Table-based Reasoning Ability from LLMs COLING 2024 MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training ICLR 2024 On the Rigour of Scientific Writing: Criteria, Analysis, and Insights EMNLP 2024 On Isotropy, Contextualization and Learning Dynamics of Contrastive-based Sentence Representation Learning ACL 2023 Towards more Human-like Language Models based on Contextualizer Pretraining Strategy CONLL 2023 Length is a Curse and a Blessing for Document-level Semantics EMNLP 2023 Towards more Human-like Language Models based on Contextualizer Pretraining Strategy EMNLP 2023 Breaking through Inequality of Information Acquisition among Social Classes: A Modest Effort on Measuring “Fun” EMNLP 2022