Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Application Areas
Machine Learning
›
Application Areas
›
Data Augmentation
3622 directly classified papers
Papers per year
2002: 2
2006: 1
2008: 2
2009: 1
2011: 3
2012: 3
2013: 9
2014: 8
2015: 7
2016: 35
2017: 45
2018: 108
2019: 239
2020: 329
2021: 477
2022: 518
2023: 607
2024: 561
2025: 546
2026: 121
Papers
CalligraphicOCR for Chinese Calligraphy Recognition
EMNLP 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
ACL 2025
We Need to Measure Data Diversity in NLP — Better and Broader
EMNLP 2025
V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me
ACL 2025
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs
EMNLP 2025
Diversity-oriented Data Augmentation with Large Language Models
ACL 2025
TdAttenMix: Top-Down Attention Guided Mixup
AAAI 2025
CoreEval: Automatically Building Contamination-Resilient Datasets with Real-World Knowledge toward Reliable LLM Evaluation
ACL 2025
Abacus-SQL: A Text-to-SQL System Empowering Cross-Domain and Open-Domain Database Retrieval
ACL 2025
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
ACL 2025
Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery Guarantees
JMLR 2025
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
ACL 2025
Synthetic Data in the Era of Large Language Models
ACL 2025
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
ACL 2025
Target Scanpath-Guided 360-Degree Image Enhancement
AAAI 2025
Revisiting Scaling Laws for Language Models: The Role of Data Quality and Training Strategies
ACL 2025
MAIN: Mutual Alignment Is Necessary for instruction tuning
EMNLP 2025
Automated Structured Radiology Report Generation
ACL 2025
Unlocking Speech Instruction Data Potential with Query Rewriting
ACL 2025
Is linguistically-motivated data augmentation worth it?
ACL 2025
Enhanced Data Synthesis for LLM through Reasoning Structures Generated by Hierarchical GFlowNet
ACL 2025
What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
ACL 2025
Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning
EMNLP 2025
Data-Constrained Synthesis of Training Data for De-Identification
ACL 2025
CaricatureBooth: Data-Free Interactive Caricature Generation in a Photo Booth
CVPR 2025
<
1
…
17
18
19
…
145
>