Papers
CSSWiki: A Chinese Sentence Simplification Dataset with Linguistic and Content Operations
Fengkai Liu, John S. Y. Lee
CTSM: Combining Trait and State Emotions for Empathetic Response Model
Yufeng Wang, Chao Chen, Zhou Yang et al.
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Thuat Nguyen, Chien Van Nguyen, Viet Dac Lai et al.
Curation of Benchmark Templates for Measuring Gender Bias in Named Entity Recognition Models
Ana Cimitan, Ana Alves Pinto, Michaela Geierhos
CuRIAM: Corpus Re Interpretation and Metalanguage in U.S. Supreme Court Opinions
Michael Kranzlein, Nathan Schneider, Kevin Tobia
Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition
Cam-Van Thi Nguyen, Cao-Bach Nguyen, Duc-Trong Le et al.
CuSINeS: Curriculum-driven Structure Induced Negative Sampling for Statutory Article Retrieval
Santosh T.y.s.s., Kristina Kaiser, Matthias Grabmair
CWTM: Leveraging Contextualized Word Embeddings from BERT for Neural Topic Modeling
Zheng Fang, Yulan He, Rob Procter
Czech Dataset for Complex Aspect-Based Sentiment Analysis Tasks
Jakub Šmíd, Pavel Přibáň, Ondrej Prazak et al.
DACL: Disfluency Augmented Curriculum Learning for Fluent Text Generation
Rohan Chaudhury, Maria Teleki, Xiangjue Dong et al.
DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods
Lorenzo Lupo, Paul Bose, Mahyar Habibi et al.
DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition
Yi-Cheng Wang, Hsin-Wei Wang, Bi-Cheng Yan et al.
DanteLLM: Let’s Push Italian LLM Research Forward!
Andrea Bacciu, Cesare Campagnano, Giovanni Trappolini et al.
DARES: Dataset for Arabic Readability Estimation of School Materials
Mo El-Haj, Sultan Almujaiwel, Damith Premasiri et al.
DARIUS: A Comprehensive Learner Corpus for Argument Mining in German-Language Essays
Nils-Jonathan Schaller, Andrea Horbach, Lars Ingver Höft et al.
Data Collection Pipeline for Low-Resource Languages: A Case Study on Constructing a Tetun Text Corpus
Gabriel de Jesus, Sérgio Sobral Nunes
Data Drift in Clinical Outcome Prediction from Admission Notes
Paul Grundmann, Jens-Michalis Papaioannou, Tom Oberhauser et al.
Data Driven Approach for Mathematical Problem Solving
Byungju Kim, Wonseok Lee, Jaehong Kim et al.
Data-Envelopes for Cultural Heritage: Going beyond Datasheets
Mrinalini Luthra, Maria Eskevich
Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks
Ileana Rugina, Rumen Dangovski, Li Jing et al.
Dataset for Identification of Homophobia and Transphobia for Telugu, Kannada, and Gujarati
Prasanna Kumar Kumaresan, Rahul Ponnusamy, Dhruv Sharma et al.
Dataset of Quotation Attribution in German News Articles
Fynn Petersen-Frey, Chris Biemann
Dates and places as points of attachment for memorial contents in the ISW corpus: 1938 as a turning point
Carolina Flinz, Simona Leonardi
DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding
Jianhao Yan, Jin Xu, Fandong Meng et al.