Tong Sun

28 papers · 2019–2026 · 11 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌍 Conference Polyglot (11) 🏃 Academic Marathon (6) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird 🐝 Cross-Pollinator (14)

🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🌍 Conference Polyglot (11) 🤝 Dynamic Duo (15) 🏆 Grand Slam 🧬 Topic Evolution 📈 Trend Setter 🗃️ Keyword Collector (132) 💎 Century Club (27) ⚡ Prolific Year (9) 🔥 Unstoppable (7)

Conferences

NAACL (5) CVPR (4) AAAI (3) ACL (3) EMNLP (3) ICLR (3) NIPS (3) COLING (1) ICML (1) IJCAI (1) WACV (1)

Top co-authors

Jiuxiang Gu (16) Ruiyi Zhang (14) Jason Kuen (7) Ani Nenkova (7) Yufan Zhou (7) Handong Zhao (6) Tong Yu (6) Changyou Chen (5) Nikolaos Barmpalios (4) Alexa Siu (4)

Research topics

Differential Privacy (1)

Keywords

multimodal learning (3) large language model (3) diffusion model (3) image generation (2) self-supervised learning (2) contrastive learning (2) text-to-image generation (2) question answering (2) generative model (2) multi-modal learning (2) document understanding (2) local attention (1) stochastic gradient descent (1) domain adaptation (1) adversarial robustness (1) shortcut learning (1) transformer architecture (1) certified robustness (1) information extraction (1) adversarial learning (1)

Papers

OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive AAAI 2026 MoDS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections NAACL 2025 Persona-SQ: A Personalized Suggested Question Generation Framework For Real-world Documents NAACL 2025 ARTIST: Improving the Generation of Text-Rich Images with Disentangled Diffusion Models and Large Language Models WACV 2025 SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding ICLR 2025 Numerical Pruning for Efficient Autoregressive Models AAAI 2025 Customization Assistant for Text-to-Image Generation CVPR 2024 TRINS: Towards Multimodal Language Models that Can Read CVPR 2024 MATSA: Multi-Agent Table Structure Attribution EMNLP 2024 ADOPD: A Large-Scale Document Page Decomposition Dataset ICLR 2024 SOHES: Self-supervised Open-world Hierarchical Entity Segmentation ICLR 2024 DocPilot: Copilot for Automating PDF Edit Workflows in Documents ACL 2024 Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models ACL 2024 Adaptive Simultaneous Sign Language Translation with Confident Translation Length Estimation COLING 2024 ATLAS: A System for PDF-centric Human Interaction Data Collection NAACL 2024 A Critical Analysis of Document Out-of-Distribution Detection EMNLP 2023 Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels NIPS 2023 MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding EMNLP 2022 TiGAN: Text-Based Interactive Image Generation and Manipulation AAAI 2022 Towards Language-Free Training for Text-to-Image Generation CVPR 2022 Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns ACL 2022 UniDoc: Unified Pretraining Framework for Document Understanding NIPS 2021 Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models NAACL 2021 Open-Domain Question Answering with Pre-Constructed Question Spaces NAACL 2021 Cross-Domain Document Object Detection: Benchmark Suite and Method CVPR 2020 Scalable Differential Privacy with Certified Robustness in Adversarial Learning ICML 2020 Self-Supervised Relationship Probing NIPS 2020 CLVSA: A Convolutional LSTM Based Variational Sequence-to-Sequence Model with Attention for Predicting Trends of Financial Markets IJCAI 2019