Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Application Areas
Machine Learning
›
Application Areas
›
Data Augmentation
3622 directly classified papers
Papers per year
2002: 2
2006: 1
2008: 2
2009: 1
2011: 3
2012: 3
2013: 9
2014: 8
2015: 7
2016: 35
2017: 45
2018: 108
2019: 239
2020: 329
2021: 477
2022: 518
2023: 607
2024: 561
2025: 546
2026: 121
Papers
FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models
EMNLP 2025
BOOTPLACE: Bootstrapped Object Placement with Detection Transformers
CVPR 2025
Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation
EMNLP 2025
SOMD2025: A Challenging Shared Tasks for Software Related Information Extraction
ACL 2025
RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question Answering
EMNLP 2025
Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery Guarantees
JMLR 2025
MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
ACL 2025
Language Models as Continuous Self-Evolving Data Engineers
EMNLP 2025
KCS: Diversify Multi-hop Question Generation with Knowledge Composition Sampling
EMNLP 2025
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
ACL 2025
TRATES: Trait-Specific Rubric-Assisted Cross-Prompt Essay Scoring
ACL 2025
FIRE: Flexible Integration of Data Quality Ratings for Effective Pretraining
EMNLP 2025
Fossils at SemEval-2025 Task 9: Tasting Loss Functions for Food Hazard Detection in Text Reports
ACL 2025
Data-Constrained Synthesis of Training Data for De-Identification
ACL 2025
Transplant Then Regenerate: A New Paradigm for Text Data Augmentation
EMNLP 2025
Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval
EMNLP 2025
Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning
EMNLP 2025
Mat-Instructions: A Large-Scale Inorganic Material Instruction Dataset for Large Language Models
IJCAI 2025
KGCL: Knowledge-Enhanced Graph Contrastive Learning for Retrosynthesis Prediction Based on Molecular Graph Editing
IJCAI 2025
Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs
EMNLP 2025
EDGE: Efficient Data Selection for LLM Agents via Guideline Effectiveness
IJCAI 2025
Dynamic and Adaptive Feature Generation with LLM
IJCAI 2025
We Need to Measure Data Diversity in NLP — Better and Broader
EMNLP 2025
CMHG: A Dataset and Benchmark for Headline Generation of Minority Languages in China
EMNLP 2025
UnitCoder: Scalable Code Synthesis from Pre-training Corpora
EMNLP 2025
<
1
…
6
7
8
…
145
>