Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Application Areas
Machine Learning
›
Application Areas
›
Data Augmentation
3622 directly classified papers
Papers per year
2002: 2
2006: 1
2008: 2
2009: 1
2011: 3
2012: 3
2013: 9
2014: 8
2015: 7
2016: 35
2017: 45
2018: 108
2019: 239
2020: 329
2021: 477
2022: 518
2023: 607
2024: 561
2025: 546
2026: 121
Papers
CR4-NarrEmote: An Open Vocabulary Dataset of Narrative Emotions Derived Using Citizen Science
EMNLP 2025
CAARMA: Class Augmentation with Adversarial Mixup Regularization
EMNLP 2025
SynC-LLM: Generation of Large-Scale Synthetic Circuit Code with Hierarchical Language Models
EMNLP 2025
VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data
EMNLP 2025
Skeletons Matter: Dynamic Data Augmentation for Text-to-Query
EMNLP 2025
CondenseLM: LLMs-driven Text Dataset Condensation via Reward Matching
EMNLP 2025
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
CVPR 2025
Promptable Representation Distribution Learning and Data Augmentation for Gigapixel Histopathology WSI Analysis
AAAI 2025
IRIS-VIS: A New Dataset for Visibility Estimation in an Industrial Environment
WACV 2025
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
EMNLP 2025
CalligraphicOCR for Chinese Calligraphy Recognition
EMNLP 2025
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
EMNLP 2025
Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation
EMNLP 2025
CharacterCraft: Bridging the Literature-Reality Dialogue Gap for Practical Role-Playing Agents
EMNLP 2025
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
EMNLP 2025
OVFact: Measuring and Improving Open-Vocabulary Factuality for Long Caption Models
EMNLP 2025
TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation
EMNLP 2025
Assessing the Role of Data Quality in Training Bilingual Language Models
EMNLP 2025
LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval
EMNLP 2025
Exploring Quality and Diversity in Synthetic Data Generation for Argument Mining
EMNLP 2025
Priority on High-Quality: Selecting Instruction Data via Consistency Verification of Noise Injection
EMNLP 2025
PromotionGo at LeWiDi-2025: Enhancing Multilingual Irony Detection with Data-Augmented Ensembles and L1 Loss
EMNLP 2025
GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction
WACV 2025
AkibaNLP-TUT: Injecting Language-Specific Word-Level Noise for Low-Resource Language Translation
EMNLP 2025
Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
ACL 2025
<
1
…
24
25
26
…
145
>