Bill Yuchen Lin

65 papers · 2018–2026 · 11 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🧭 Keyword Pioneer 🌍 Conference Polyglot (11) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (11) 🏃 Academic Marathon (7)

🏃 Academic Marathon (7) 🐝 Cross-Pollinator (12) 🌈 Renaissance Researcher (8) 🏠 Conference Loyalist (20) 🏆 Grand Slam 🏆 Keyword Champion (3) 👥 Mega-Team (32) 🔬 Deep Specialist (19) 🧬 Topic Evolution 🤝 Dynamic Duo (31) 💎 Century Club (64) 🔥 Unstoppable (8) ❓ The Questioner ⚡ Prolific Year (6) 🗃️ Keyword Collector (260)

Conferences

ACL (21) EMNLP (14) NAACL (10) NIPS (5) ICLR (4) IJCNLP (4) AAAI (3) AACL (1) CVPR (1) EACL (1) ICML (1)

Top co-authors

Xiang Ren (31) Yejin Choi (15) Radha Poovendran (8) Dong-Ho Lee (7) Zhangchen Xu (7) Luyao Niu (7) Fengqing Jiang (7) Khyathi Chandu (6) Seyeon Lee (5) Nouha Dziri (5)

Keywords

large language model (17) commonsense reasoning (10) named entity recognition (7) language model (7) benchmark evaluation (5) pre-trained language model (5) knowledge distillation (4) knowledge graph (4) transfer learning (3) few-shot learning (3) contrastive learning (3) chain-of-thought reasoning (3) sequence labeling (3) vision-language model (3) commonsense knowledge (3) continual learning (2) unsupervised learning (2) text generation (2) question answering (2) relation extraction (2)

Papers

Temporal Sampling for Forgotten Reasoning in LLMs ACL 2026 VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models CVPR 2025 WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild ICLR 2025 ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates AAAI 2025 On Memorization of Large Language Models in Logical Reasoning AACL 2025 SimulBench: Evaluating Language Models with Creative Simulation Tasks NAACL 2025 CulturalBench: A Robust, Diverse and Challenging Benchmark for Measuring LMs’ Cultural Knowledge Through Human-AI Red-Teaming ACL 2025 SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities ACL 2025 Small Models Struggle to Learn from Strong Reasoners ACL 2025 RewardBench: Evaluating Reward Models for Language Modeling NAACL 2025 L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects NAACL 2025 The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models NAACL 2025 Stronger Models are Not Always Stronger Teachers for Instruction Tuning NAACL 2025 The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism NAACL 2025 Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models NAACL 2025 On Memorization of Large Language Models in Logical Reasoning IJCNLP 2025 Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing ICLR 2025 ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning ICML 2025 Latent Action Pretraining from Videos ICLR 2025 The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning ICLR 2024 Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models EMNLP 2024 VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation EMNLP 2024 WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs NIPS 2024 WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences NIPS 2024 SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding ACL 2024 Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents ACL 2024 Agent Lumos: Unified and Modular Training for Open-Source Language Agents ACL 2024 OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement ACL 2024 Selective “Selective Prediction”: Reducing Unnecessary Abstention in Vision-Language Reasoning ACL 2024 Complex Reasoning in Natural Language ACL 2023 AutoTriggER: Label-Efficient and Robust Named Entity Recognition with Auxiliary Trigger Extraction EACL 2023 LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion ACL 2023 Faith and Fate: Limits of Transformers on Compositionality NIPS 2023 SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks NIPS 2023 On Grounded Planning for Embodied Tasks with Language Models AAAI 2023 Unsupervised Cross-Task Generalization via Retrieval Augmentation NIPS 2022 On Continual Model Refinement in Out-of-Distribution Data Streams ACL 2022 Knowledge-Augmented Methods for Natural Language Processing ACL 2022 Reflect, Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality EMNLP 2022 On the Robustness of Reading Comprehension Models to Entity Renaming NAACL 2022 FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks NAACL 2022 Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning EMNLP 2021 RockNER: A Simple Method to Create Adversarial Examples for Evaluating the Robustness of Named Entity Recognition Models EMNLP 2021 Common Sense Beyond English: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning IJCNLP 2021 RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge IJCNLP 2021 RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge ACL 2021 Differentiable Open-Ended Commonsense Reasoning NAACL 2021 IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization AAAI 2021 Common Sense Beyond English: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning ACL 2021 CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP EMNLP 2021 RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms EMNLP 2021 Probing Commonsense Explanation in Dialogue Response Generation EMNLP 2021 Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling ACL 2020 LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation ACL 2020 TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition ACL 2020 CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning EMNLP 2020 Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-Trained Language Models EMNLP 2020 Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering EMNLP 2020 KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning IJCNLP 2019 AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging ACL 2019 KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning EMNLP 2019 ExtRA: Extracting Prominent Review Aspects from Customer Feedback EMNLP 2018 Automatic Extraction of Commonsense LocatedNear Knowledge ACL 2018 Mining Cross-Cultural Differences and Similarities in Social Media ACL 2018 Neural Adaptation Layers for Cross-domain Named Entity Recognition EMNLP 2018