Tongshuang Wu

28 papers · 2019–2026 · 7 conferences · across top CS/AI conferences

Achievements

+10 more ↓

🏃 Academic Marathon (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (7) 🐝 Cross-Pollinator (12)

🐣 Hot Topic Early Bird 🌍 Conference Polyglot (7) 🏃 Academic Marathon (6) 🏆 Keyword Champion (2) 🧬 Topic Evolution 🔥 Unstoppable (7) ⚡ Prolific Year (8) 💎 Century Club (27) ❓ The Questioner (2) 🗃️ Keyword Collector (118)

Conferences

ACL (12) EMNLP (8) IJCAI (2) IJCNLP (2) NAACL (2) AAAI (1) AACL (1)

Top co-authors

Vijay Viswanathan (6) Marco Tulio Ribeiro (5) Graham Neubig (4) Xinran Zhao (4) Daniel Weld (3) Diyi Yang (3) Qianou Ma (3) Jeffrey Heer (3) Chenyang Yang (3) Yuanchen Bai (2)

Research topics

Education (1)

Keywords

large language model (6) data augmentation (4) model evaluation (4) question answering (4) language model (4) text perturbation (3) nlp model (3) sentiment analysis (3) information retrieval (3) question generation (3) counterfactual generation (2) natural language processing (2) human-ai interaction (2) retrieval-augmented generation (2) behavioral testing (2) human-computer interaction (2) synthetic data generation (2) software engineering (2) code generation (2) retrieval augmentation (2)

Papers

RECAP: An End-to-End Platform for Capturing, Replaying, and Analyzing AI-Assisted Programming Interactions ACL 2026 Evaluating Mathematical Reasoning Beyond Accuracy AAAI 2025 SPHERE: An Evaluation Card for Human-AI Systems ACL 2025 MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers EMNLP 2025 cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree EMNLP 2025 How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging (Extended Abstract) IJCAI 2025 SOTOPIA-S4: a user-friendly system for flexible, customizable, and large-scale social simulation NAACL 2025 Large Language Models Help Humans Verify Truthfulness – Except When They Are Convincingly Wrong NAACL 2024 Better Synthetic Data by Retrieving and Transforming Existing Datasets ACL 2024 Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models ACL 2024 Synthetic Multimodal Question Generation EMNLP 2024 Designing, Evaluating, and Learning from Humans Interacting with NLP Models EMNLP 2023 Prompt2Model: Generating Deployable Models from Natural Language Instructions EMNLP 2023 NewsSense: Reference-free Verification via Cross-document Comparison EMNLP 2023 Beyond Testers’ Biases: Guiding Model Testing with Knowledge Bases using LLMs EMNLP 2023 BiasX: “Thinking Slow” in Toxic Content Moderation with Explanations of Implied Social Biases EMNLP 2023 DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions ACL 2023 Measuring Adversarial Datasets AACL 2023 Measuring Adversarial Datasets IJCNLP 2023 It is AI’s Turn to Ask Humans a Question: Question-Answer Pair Generation for Children’s Story Books ACL 2022 Are Shortest Rationales the Best Explanations for Human Understanding? ACL 2022 Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension ACL 2022 Tailor: Generating and Perturbing Text with Semantic Controls ACL 2022 Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models ACL 2021 Beyond Accuracy: Behavioral Testing of NLP Models with Checklist (Extended Abstract) IJCAI 2021 Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models IJCNLP 2021 Beyond Accuracy: Behavioral Testing of NLP Models with CheckList ACL 2020 Errudite: Scalable, Reproducible, and Testable Error Analysis ACL 2019