Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Application Areas
Machine Learning
›
Application Areas
›
Data Augmentation
3622 directly classified papers
Papers per year
2002: 2
2006: 1
2008: 2
2009: 1
2011: 3
2012: 3
2013: 9
2014: 8
2015: 7
2016: 35
2017: 45
2018: 108
2019: 239
2020: 329
2021: 477
2022: 518
2023: 607
2024: 561
2025: 546
2026: 121
Papers
EDGE: Efficient Data Selection for LLM Agents via Guideline Effectiveness
IJCAI 2025
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
CVPR 2025
KGCL: Knowledge-Enhanced Graph Contrastive Learning for Retrosynthesis Prediction Based on Molecular Graph Editing
IJCAI 2025
Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction
ACL 2025
LADM: Long-context Training Data Selection with Attention-based Dependency Measurement for LLMs
ACL 2025
Enhancing Unsupervised Sentence Embeddings via Knowledge-Driven Data Augmentation and Gaussian-Decayed Contrastive Learning
ACL 2025
Dually Self-Improved Counterfactual Data Augmentation Using Large Language Model
ACL 2025
MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion
ACL 2025
Explicit and Implicit Data Augmentation for Social Event Detection
ACL 2025
From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding
ACL 2025
SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Models
ACL 2025
Global Eye: Breaking the “Fixed Thinking Pattern” during the Instruction Expansion Process
ACL 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
ACL 2025
V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me
ACL 2025
Diversity-oriented Data Augmentation with Large Language Models
ACL 2025
CoreEval: Automatically Building Contamination-Resilient Datasets with Real-World Knowledge toward Reliable LLM Evaluation
ACL 2025
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
ACL 2025
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
ACL 2025
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
ACL 2025
Revisiting Scaling Laws for Language Models: The Role of Data Quality and Training Strategies
ACL 2025
Automated Structured Radiology Report Generation
ACL 2025
Is linguistically-motivated data augmentation worth it?
ACL 2025
What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
ACL 2025
Data-Constrained Synthesis of Training Data for De-Identification
ACL 2025
ADD: Attribution-Driven Data Augmentation Framework for Boosting Image Super-Resolution
CVPR 2025
<
1
…
8
9
10
…
145
>