← Resources & Methods

Natural Language Processing › Resources & Methods ›

Text Representation

2246 directly classified papers

Papers per year

Papers

Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models? EMNLP 2025

MEXMA: Token-level objectives improve sentence representations ACL 2025

Length-Induced Embedding Collapse in PLM-based Models ACL 2025

UWBa at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval SEMEVAL 2025

A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment ACL 2025

PRISM: A Framework for Producing Interpretable Political Bias Embeddings with Political-Aware Cross-Encoder ACL 2025

Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention ACL 2025

ALGEN: Few-shot Inversion Attacks on Textual Embeddings via Cross-Model Alignment and Generation ACL 2025

Employing Discourse Coherence Enhancement to Improve Cross-Document Event and Entity Coreference Resolution ACL 2025

TeCoFeS: Text Column Featurization using Semantic Analysis NAACL 2025

Detecting Legal Citations in United Kingdom Court Judgments EMNLP 2025

Redundancy, Isotropy, and Intrinsic Dimensionality of Prompt-based Text Embeddings ACL 2025

Zero-Shot Cross-Sentential Scientific Relation Extraction via Entity-Guided Summarization IJCNLP 2025

Mind the Query: A Benchmark Dataset towards Text2Cypher Task EMNLP 2025

GigaEmbeddings — Efficient Russian Language Embedding Model ACL 2025

An Annotation Protocol for Diachronic Evaluation of Semantic Drift in Disability Sources ACL 2025

Bidirectional Topic Matching: Quantifying Thematic Intersections Between Climate Change and Climate Mitigation News Corpora Through Topic Modelling ACL 2025

Standardizing Heterogeneous Corpora with DUUR: A Dual Data- and Process-Oriented Approach to Enhancing NLP Pipeline Integration IJCNLP 2025

Beyond Benchmarks: Building a Richer Cross-Document Event Coreference Dataset with Decontextualization NAACL 2025

Adapting Multilingual Embedding Models to Historical Luxembourgish NAACL 2025

Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion ACL 2025

Harmonizing Divergent Lemmatization and Part-of-Speech Tagging Practices for Latin Participles through the LiLa Knowledge Base ACL 2025

TokenSmith: Streamlining Data Editing, Search, and Inspection for Large-Scale Language Model Training and Interpretability EMNLP 2025

On the Correspondence between the Squared Norm and Information Content in Text Embeddings EMNLP 2025

Byte Pair Encoding Is All You Need For Automatic Bengali Speech Recognition AACL 2025