Pinjia He

14 papers · 2021–2026 · 6 conferences · across top CS/AI conferences

Achievements

+7 more ↓

🐝 Cross-Pollinator (13) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (5) 🏃 Academic Marathon (5)

🌍 Conference Polyglot (5) 🐝 Cross-Pollinator (13) 🏆 Keyword Champion (2) 🗃️ Keyword Collector (55) ⚡ Prolific Year (6) 💎 Century Club (11) ❓ The Questioner (2)

Conferences

ACL (6) EMNLP (3) ICLR (2) AAAI (1) COLING (1) OSDI (1)

Top co-authors

Youliang Yuan (10) Wenxuan Wang (8) Jen-tse Huang (6) Zhaopeng Tu (4) Wenxiang Jiao (4) Xiaoyuan Liu (3) Junjielong Xu (3) Shuai Wang (2) Sihang Zhao (2) Fan Mo (1)

Keywords

large language model (7) benchmark evaluation (4) multimodal large language model (3) visual question answering (2) software testing (2) prompt engineering (1) in-context learning (1) logical reasoning (1) model safety (1) code generation (1) ai safety (1) automated reasoning (1) harmful content (1) model alignment (1) adversarial defense (1) multimodal learning (1) software engineering (1) reward hacking (1) static analysis (1) commonsense knowledge (1)

Papers

SHAPE: Unifying Safety, Helpfulness and Pedagogy for Educational LLMs ACL 2026 MicLog: Towards Accurate and Efficient LLM-based Log Parsing via Progressive Meta In-Context Learning AAAI 2026 Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards ACL 2026 Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs ACL 2025 Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training ACL 2025 ToolSafety: A Comprehensive Dataset for Enhancing Safety in LLM-Based Agent Tool Invocations EMNLP 2025 UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench ACL 2025 Can’t See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs ACL 2025 OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures? ICLR 2025 Does ChatGPT Know That It Does Not Know? Evaluating the Black-Box Calibration of ChatGPT COLING 2024 Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs EMNLP 2024 LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models EMNLP 2024 GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher ICLR 2024 SANRAZOR: Reducing Redundant Sanitizer Checks in C/C++ Programs OSDI 2021