Arman Cohan
126 papers · 2015–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
π Conference Polyglot (12) π Academic Marathon (10) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (10)
π
Renaissance Researcher
(11)
π
Academic Marathon
(10)
π£
Hot Topic Early Bird
π
Conference Loyalist
(42)
π€
Dynamic Duo
(41)
π
Grand Slam
π₯
Mega-Team
(43)
π¬
Deep Specialist
(23)
π§¬
Topic Evolution
π
Keyword Champion
(14)
π
Trend Setter
ποΈ
Keyword Collector
(435)
β‘
Prolific Year
(5)
π₯
Unstoppable
(11)
π
Century Club
(115)
β
The Questioner
(8)
Conferences
EMNLP (42)
ACL (38)
NAACL (24)
EACL (7)
COLING (3)
SEMEVAL (3)
ICLR (2)
ICML (2)
IJCNLP (2)
AAAI (1)
CVPR (1)
NIPS (1)
Top co-authors
Research topics
Keywords
large language model
(39)
benchmark evaluation
(14)
information retrieval
(13)
question answering
(13)
retrieval-augmented generation
(11)
text classification
(9)
text summarization
(9)
scientific literature
(8)
instruction following
(7)
zero-shot learning
(7)
scientific document
(6)
language model
(6)
text generation
(6)
few-shot learning
(5)
pretrained language model
(5)
scientific text
(5)
document summarization
(4)
transfer learning
(4)
foundation model
(4)
natural language processing
(4)
Papers
MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application
ACL 2026
Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics
ACL 2026
Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future
ACL 2026
CPTCoder: A Reliable LLM System for Medical Procedure Code Prediction
ACL 2026
Anchor: Branch-Point Data Generation for GUI Agents
ACL 2026
SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature
EACL 2026
Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL
EACL 2026
SciMDR: Advancing Scientific Multimodal Document Reasoning
ACL 2026
A Survey of Multimodal Mathematical Reasoning: From Perception, Alignment to Reasoning
ACL 2026
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems
ACL 2026
MMSciCode: Real-world Evaluation of Multilingual Multi-Discipline Scientific Research Coding
ACL 2026
LocAgent: Graph-Guided LLM Agents for Code Localization
ACL 2025
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research
ACL 2025
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
EMNLP 2025
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
EMNLP 2025
Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers
ACL 2025
TESS 2: A Large-Scale Generalist Diffusion Language Model
ACL 2025
Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models
ACL 2025
MIR: Methodology Inspiration Retrieval for Scientific Research Problems
ACL 2025
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
ACL 2025
IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery
ACL 2025
Physics: Benchmarking Foundation Models on University-Level Physics Problem Solving
ACL 2025
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task
ACL 2025
Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers
ACL 2025
YaleNLP @ PerAnsSumm 2025: Multi-Perspective Integration via Mixture-of-Agents for Enhanced Healthcare QA Summarization
NAACL 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
CVPR 2025
Understanding Reference Policies in Direct Preference Optimization
NAACL 2025
Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
NAACL 2025
RouterRetriever: Routing over a Mixture of Expert Embedding Models
AAAI 2025
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
NAACL 2025
ReIFE: Re-evaluating Instruction-Following Evaluation
NAACL 2025
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
NAACL 2025
Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning
NAACL 2025
ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning
ICLR 2025
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models
ICLR 2025
Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplification and Resistance in Multi-Agent Based LLM-as-Judge
EMNLP 2025
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search
EMNLP 2025
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering
EMNLP 2025
Z1: Efficient Test-time Scaling with Code
EMNLP 2025
SciSketch: An Open-source Framework for Automated Schematic Diagram Generation in Scientific Papers
EMNLP 2025
MedTutor: A Retrieval-Augmented LLM System for Case-Based Medical Education
EMNLP 2025
CourtReasoner: Can LLM Agents Reason Like Judges?
EMNLP 2025
MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs
EMNLP 2025
LimRank: Less is More for Reasoning-Intensive Information Reranking
EMNLP 2025
Table-R1: Inference-Time Scaling for Table Reasoning Tasks
EMNLP 2025
From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations
EMNLP 2025
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain
EMNLP 2025
SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification
ACL 2025
MIMIR: A Customizable Agent Tuning Platform for Enhanced Scientific Applications
EMNLP 2024
TaPERA: Enhancing Faithfulness and Interpretability in Long-Form Table QA by Content Planning and Execution-based Reasoning
ACL 2024
FinanceMATH: Knowledge-Intensive Math Reasoning in Finance Domains
ACL 2024
Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models
ACL 2024
OLMo: Accelerating the Science of Language Models
ACL 2024
DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents
ACL 2024
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
ACL 2024
Rethinking Efficient Multilingual Text Summarization Meta-Evaluation
ACL 2024
Unveiling the Spectrum of Data Contamination in Language Model: A Survey from Detection to Remediation
ACL 2024
TESS: Text-to-Text Self-Conditioned Simplex Diffusion
EACL 2024
On the Benefits of Fine-Grained Loss Truncation: A Case Study on Factuality in Summarization
EACL 2024
When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets
EACL 2024
Bayesian Calibration of Win Rate Estimation with LLM Evaluators
EMNLP 2024
FinDVer: Explainable Claim Verification over Long and Hybrid-content Financial Documents
EMNLP 2024
SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers
EMNLP 2024
FOLIO: Natural Language Reasoning with First-Order Logic
EMNLP 2024
TAIL: A Toolkit for Automatic and Realistic Long-Context Large Language Model Evaluation
EMNLP 2024
OpenT2T: An Open-Source Toolkit for Table-to-Text Generation
EMNLP 2024
OMG-QA: Building Open-Domain Multi-Modal Generative Question Answering Systems
EMNLP 2024
Calibrating Long-form Generations From Large Language Models
EMNLP 2024
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
EMNLP 2024
P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
EMNLP 2024
Observable Propagation: Uncovering Feature Vectors in Transformers
ICML 2024
NExT: Teaching Large Language Models to Reason about Code Execution
ICML 2024
On Learning to Summarize with Large Language Models as References
NAACL 2024
Investigating Data Contamination in Modern Benchmarks for Large Language Models
NAACL 2024
Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data?
NAACL 2024
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
NAACL 2024
On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering
NAACL 2024
OpenRT: An Open-source Framework for Reasoning Over Tabular Data
ACL 2023
Aligning Factual Consistency for Clinical Studies Summarization through Reinforcement Learning
ACL 2023
Embedding Recycling for Language Models
EACL 2023
Investigating Table-to-Text Generation Capabilities of Large Language Models in Real-World Information Seeking Scenarios
EMNLP 2023
Medical Text Simplification: Optimizing for Readability with Unlikelihood Training and Reranked Beam Search Decoding
EMNLP 2023
A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents
EMNLP 2023
SciRepEval: A Multi-Format Benchmark for Scientific Document Representations
EMNLP 2023
Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering
ACL 2023
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
EACL 2023
Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under Retrieval
EMNLP 2023
Enhancing Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies
EMNLP 2023
QTSumm: Query-Focused Summarization over Tabular Data
EMNLP 2023
MultiVerS: Improving scientific claim verification with weak supervision and full-document context
NAACL 2022
PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
ACL 2022
Generating Scientific Claims for Zero-Shot Scientific Fact Checking
ACL 2022
Zero- and Few-Shot NLP with Pretrained Language Models
ACL 2022
Overview of the Third Workshop on Scholarly Document Processing
COLING 2022
Overview of the First Shared Task on Multi Perspective Scientific Document Summarization (MuP)
COLING 2022
SciFact-Open: Towards open-domain scientific claim verification
EMNLP 2022
MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting
NAACL 2022
Long Context Question Answering via Supervised Contrastive Learning
NAACL 2022
Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity
NAACL 2022
Improving the Generalizability of Depression Detection by Leveraging Clinical Questionnaires
ACL 2022
FLEX: Unifying Evaluation for Few-Shot NLP
NIPS 2021
CDLM: Cross-Document Language Modeling
EMNLP 2021
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers
NAACL 2021
Beyond Paragraphs: NLP for Long Sequences
NAACL 2021
Overview of the Second Workshop on Scholarly Document Processing
NAACL 2021
GUIR @ LongSumm 2020: Learning to Generate Long Summaries from Scientific Documents
EMNLP 2020
Fact or Fiction: Verifying Scientific Claims
EMNLP 2020
SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search
EMNLP 2020
TLDR: Extreme Summarization of Scientific Documents
EMNLP 2020
SPECTER: Document-level Representation Learning using Citation-informed Transformers
ACL 2020
SUPP.AI: finding evidence for supplement-drug interactions
ACL 2020
Pretrained Language Models for Sequential Sentence Classification
EMNLP 2019
SciBERT: A Pretrained Language Model for Scientific Text
EMNLP 2019
Structural Scaffolds for Citation Intent Classification in Scientific Publications
NAACL 2019
Pretrained Language Models for Sequential Sentence Classification
IJCNLP 2019
SciBERT: A Pretrained Language Model for Scientific Text
IJCNLP 2019
SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions
COLING 2018
RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses
NAACL 2018
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
NAACL 2018
GU IRLAB at SemEval-2018 Task 7: Tree-LSTMs for Scientific Relation Classification
SEMEVAL 2018
Helping or Hurting? Predicting Changes in Usersβ Risk of Self-Harm Through Online Community Interactions
NAACL 2018
Depression and Self-Harm Risk Assessment in Online Forums
EMNLP 2017
GUIR at SemEval-2017 Task 12: A Framework for Cross-Domain Clinical Temporal Information Extraction
SEMEVAL 2017
GUIR at SemEval-2016 task 12: Temporal Information Processing for Clinical Narratives
SEMEVAL 2016
Matching Citation Text and Cited Spans in Biomedical Literature: a Search-Oriented Approach
NAACL 2015
Scientific Article Summarization Using Citation-Context and Articleβs Discourse Structure
EMNLP 2015