conftrace_

Bochuan Cao

14 papers · 2022–2025 · 6 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+6 more ↓

🐝 Cross-Pollinator (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (6) 🌈 Renaissance Researcher (6)

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🤝 Dynamic Duo (10) ⚡ Prolific Year (5) 🗃️ Keyword Collector (52) 💎 Century Club (14)

Conferences

ACL (4) NIPS (4) ICML (2) NAACL (2) AACL (1) IJCNLP (1)

Top co-authors

Jinghui Chen (10) Lu Lin (7) Yuanpu Cao (6) Yurui Chang (3) Jinyuan Jia (3) Rongrong Wang (2) Haitao Mao (2) Zhiyu Xue (2) Bo Li (2) Kristen Johnson (2)

Research topics

Keywords

large language model (7) model alignment (3) ai safety (3) moral reasoning (2) hallucination mitigation (2) adversarial attack (2) jailbreak attack (2) intrinsic self-correction (2) ensemble learning (1) convergence analysis (1) domain generalization (1) language model alignment (1) continual learning (1) harmful content (1) preference optimization (1) text generation (1) model security (1) distribution shift (1) backdoor attack (1) factual accuracy (1)

Papers

WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response NAACL 2025 On the Convergence of Moral Self-Correction in Large Language Models AACL 2025 JoPA: Explaining Large Language Model’s Generation via Joint Prompt Attribution ACL 2025 Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation ACL 2025 TruthFlow: Truthful LLM Generation via Representation Flow Correction ICML 2025 AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion Models ICML 2025 On the Convergence of Moral Self-Correction in Large Language Models IJCNLP 2025 Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization NIPS 2024 Jailbreak Open-Sourced Large Language Models via Enforced Decoding ACL 2024 Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM ACL 2024 Data Free Backdoor Attacks NIPS 2024 Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections NAACL 2024 IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI NIPS 2023 Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time NIPS 2022