← Application Areas

Machine Learning › Application Areas ›

Data Augmentation

3622 directly classified papers

Papers per year

Papers

EDGE: Efficient Data Selection for LLM Agents via Guideline Effectiveness IJCAI 2025

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios CVPR 2025

KGCL: Knowledge-Enhanced Graph Contrastive Learning for Retrosynthesis Prediction Based on Molecular Graph Editing IJCAI 2025

Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction ACL 2025

LADM: Long-context Training Data Selection with Attention-based Dependency Measurement for LLMs ACL 2025

Enhancing Unsupervised Sentence Embeddings via Knowledge-Driven Data Augmentation and Gaussian-Decayed Contrastive Learning ACL 2025

Dually Self-Improved Counterfactual Data Augmentation Using Large Language Model ACL 2025

MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion ACL 2025

Explicit and Implicit Data Augmentation for Social Event Detection ACL 2025

From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding ACL 2025

SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Models ACL 2025

Global Eye: Breaking the “Fixed Thinking Pattern” during the Instruction Expansion Process ACL 2025

Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation ACL 2025

V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me ACL 2025

Diversity-oriented Data Augmentation with Large Language Models ACL 2025

CoreEval: Automatically Building Contamination-Resilient Datasets with Real-World Knowledge toward Reliable LLM Evaluation ACL 2025

Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement ACL 2025

Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation ACL 2025

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions ACL 2025

Revisiting Scaling Laws for Language Models: The Role of Data Quality and Training Strategies ACL 2025

Automated Structured Radiology Report Generation ACL 2025

Is linguistically-motivated data augmentation worth it? ACL 2025

What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices ACL 2025

Data-Constrained Synthesis of Training Data for De-Identification ACL 2025

ADD: Attribution-Driven Data Augmentation Framework for Boosting Image Super-Resolution CVPR 2025