Yuxia Wang
47 papers · 2020–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
🏃 Academic Marathon (5) 🌍 Conference Polyglot (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird
🐝
Cross-Pollinator
(11)
🌍
Conference Polyglot
(7)
🏃
Academic Marathon
(5)
🤝
Dynamic Duo
(24)
👥
Mega-Team
(35)
🔬
Deep Specialist
(15)
🧬
Topic Evolution
❓
The Questioner
(2)
🗃️
Keyword Collector
(175)
💎
Century Club
(40)
⚡
Prolific Year
(5)
Conferences
ACL (20)
EMNLP (10)
EACL (5)
NAACL (5)
COLING (4)
AACL (1)
IJCNLP (1)
SEMEVAL (1)
Top co-authors
Keywords
large language model
(23)
binary classification
(7)
machine-generated text detection
(6)
text classification
(6)
claim verification
(5)
low-resource language
(5)
benchmark evaluation
(4)
instruction tuning
(4)
multilingual detection
(3)
model safety
(3)
harmful content detection
(3)
chain-of-thought reasoning
(3)
evidence retrieval
(3)
automatic speech recognition
(3)
machine translation
(3)
multilingual nlp
(3)
data augmentation
(3)
safety evaluation
(3)
semantic textual similarity
(3)
multimodal learning
(2)
Papers
AICD Bench: A Challenging Benchmark for AI-Generated Code Detection
EACL 2026
FAID: Fine-grained AI-generated Text Detection using Multi-task Auxiliary and Multi-level Contrastive Learning
EACL 2026
Is Human-Like Text Liked by Humans? Multilingual Human Detection and Preference Against AI
ACL 2026
Stereotype Bias in a Bilingual Setting: A Culturally Grounded Evaluation in Kazakhstan
ACL 2026
SAHM: A Benchmark for Arabic Financial and Shari’ah-Compliant Reasoning
ACL 2026
FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning
ACL 2026
HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs
ACL 2025
Explicit and Implicit Data Augmentation for Social Event Detection
ACL 2025
KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan
ACL 2025
UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking
EMNLP 2025
UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases
IJCNLP 2025
Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh
ACL 2025
VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
ACL 2025
Arabic Dataset for LLM Safeguard Evaluation
NAACL 2025
Qorǵau: Evaluating Safety in Kazakh-Russian Bilingual Contexts
ACL 2025
Conversational SimulMT: Efficient Simultaneous Translation with Large Language Models
ACL 2025
UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases
AACL 2025
OpenFactCheck: Building, Benchmarking Customized Fact-Checking Systems and Evaluating the Factuality of Claims and LLMs
COLING 2025
Loki: An Open-Source Tool for Fact Verification
COLING 2025
FIRE: Fact-checking with Iterative Retrieval and Verification
NAACL 2025
GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human
COLING 2025
Detection of Human and Machine-Authored Fake News in Urdu
ACL 2025
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
NAACL 2025
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection
EMNLP 2024
M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection
ACL 2024
Demystifying Instruction Mixing for Fine-tuning Large Language Models
ACL 2024
A Chinese Dataset for Evaluating the Safeguards in Large Language Models
ACL 2024
M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection
EACL 2024
Do-Not-Answer: Evaluating Safeguards in LLMs
EACL 2024
Rethinking STS and NLI in Large Language Models
EACL 2024
Factuality of Large Language Models: A Survey
EMNLP 2024
OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs
EMNLP 2024
Exploring the Potential of Multimodal LLM with Knowledge-Intensive Multimodal ASR
EMNLP 2024
Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers
EMNLP 2024
Can Machines Resonate with Humans? Evaluating the Emotional and Empathic Comprehension of LMs
EMNLP 2024
A Survey of Confidence Estimation and Calibration in Large Language Models
NAACL 2024
SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection
NAACL 2024
SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection
SEMEVAL 2024
The HW-TSC’s Speech to Speech Translation System for IWSLT 2022 Evaluation
ACL 2022
The HW-TSC’s Simultaneous Speech Translation System for IWSLT 2022 Evaluation
ACL 2022
Capture Human Disagreement Distributions by Calibrated Networks for Natural Language Inference
ACL 2022
The HW-TSC’s Offline Speech Translation System for IWSLT 2022 Evaluation
ACL 2022
Noisy Label Regularisation for Textual Regression
COLING 2022
HW-TSC’s Participation at WMT 2021 Quality Estimation Shared Task
EMNLP 2021
How Length Prediction Influence the Performance of Non-Autoregressive Translation?
EMNLP 2021
Learning from Unlabelled Data for Clinical Semantic Textual Similarity
EMNLP 2020
Evaluating the Utility of Model Configurations and Data Augmentation on Clinical Semantic Textual Similarity
ACL 2020