← Application Areas

Machine Learning › Application Areas ›

Data Augmentation

3622 directly classified papers

Papers per year

Papers

CalligraphicOCR for Chinese Calligraphy Recognition EMNLP 2025

Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation ACL 2025

We Need to Measure Data Diversity in NLP — Better and Broader EMNLP 2025

V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me ACL 2025

AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs EMNLP 2025

Diversity-oriented Data Augmentation with Large Language Models ACL 2025

TdAttenMix: Top-Down Attention Guided Mixup AAAI 2025

CoreEval: Automatically Building Contamination-Resilient Datasets with Real-World Knowledge toward Reliable LLM Evaluation ACL 2025

Abacus-SQL: A Text-to-SQL System Empowering Cross-Domain and Open-Domain Database Retrieval ACL 2025

Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement ACL 2025

Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery Guarantees JMLR 2025

Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation ACL 2025

Synthetic Data in the Era of Large Language Models ACL 2025

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions ACL 2025

Target Scanpath-Guided 360-Degree Image Enhancement AAAI 2025

Revisiting Scaling Laws for Language Models: The Role of Data Quality and Training Strategies ACL 2025

MAIN: Mutual Alignment Is Necessary for instruction tuning EMNLP 2025

Automated Structured Radiology Report Generation ACL 2025

Unlocking Speech Instruction Data Potential with Query Rewriting ACL 2025

Is linguistically-motivated data augmentation worth it? ACL 2025

Enhanced Data Synthesis for LLM through Reasoning Structures Generated by Hierarchical GFlowNet ACL 2025

What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices ACL 2025

Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning EMNLP 2025

Data-Constrained Synthesis of Training Data for De-Identification ACL 2025

CaricatureBooth: Data-Free Interactive Caricature Generation in a Photo Booth CVPR 2025