Songyang Zhang

52 papers · 2017–2026 · 13 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (13) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (13) 🐝 Cross-Pollinator (12) 👥 Mega-Team (24) 🏆 Grand Slam 🔬 Deep Specialist (10) 🧬 Topic Evolution 🤝 Dynamic Duo (19) 🗃️ Keyword Collector (220) ❓ The Questioner (3) ⚡ Prolific Year (7) 🚀 Conference Pioneer 📈 Trend Setter 💎 Century Club (49) 🔥 Unstoppable (7)

Conferences

ACL (10) CVPR (7) AAAI (6) ECCV (6) EMNLP (4) ICCV (4) NAACL (4) NIPS (4) ICML (2) IJCAI (2) COLING (1) ICLR (1) INTERSPEECH (1)

Top co-authors

Kai Chen (19) Dahua Lin (13) Xuming He (12) Haodong Duan (8) Wenwei Zhang (6) Conghui He (5) Hongwei Liu (5) Jiebo Luo (4) Shipeng Yan (4) Wei Li (3)

Research topics

Mathematics (1)

Keywords

large language model (14) benchmark evaluation (7) evaluation benchmark (5) scene graph generation (4) vision transformer (3) few-shot learning (3) vision language model (3) image classification (3) multi-modal learning (3) graph neural network (3) visual recognition (2) unsupervised learning (2) game theory (2) semantic segmentation (2) reinforcement learning (2) class imbalance (2) grammar induction (2) automatic speech recognition (2) mathematical reasoning (2) natural language understanding (2)

Papers

RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents ACL 2026 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination AAAI 2026 Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains ACL 2026 OpenHuEval: Evaluating Large Language Model on Hungarian Specifics ACL 2025 CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward EMNLP 2025 LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation ICCV 2025 UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios AAAI 2025 DualGFL: Federated Learning with a Dual-Level Coalition-Auction Game AAAI 2025 Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement ACL 2025 Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law ACL 2025 Are Your LLMs Capable of Stable Reasoning? ACL 2025 InternLM-Law: An Open-Sourced Chinese Legal Large Language Model COLING 2025 MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark ACL 2024 Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs NIPS 2024 ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs EMNLP 2024 FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data ICML 2024 T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step ACL 2024 Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations ACL 2024 LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models ACL 2024 LawBench: Benchmarking Legal Knowledge of Large Language Models EMNLP 2024 MMBENCH: Is Your Multi-Modal Model an All-around Player? ECCV 2024 BotChat: Evaluating LLMs’ Capabilities of Having Multi-Turn Dialogues NAACL 2024 Fake Alignment: Are LLMs Really Aligned Well? NAACL 2024 Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks NAACL 2024 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD NIPS 2024 GTA: A Benchmark for General Tool Agents NIPS 2024 From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models CVPR 2024 Make-A-Video: Text-to-Video Generation without Text-Video Data ICLR 2023 Improving Pixel-based MIM by Reducing Wasted Modeling Capability ICCV 2023 RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer CVPR 2023 TG-VQA: Ternary Game of Video Question Answering IJCAI 2023 Expanding Language-Image Pretrained Models for General Video Recognition ECCV 2022 The Devil Is in the Labels: Noisy Label Correction for Robust Scene Graph Generation CVPR 2022 SGTR: End-to-End Scene Graph Generation With Transformer CVPR 2022 Action Quality Assessment with Temporal Parsing Transformer ECCV 2022 MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration ECCV 2022 Learning Semantic Correspondence with Sparse Annotations ECCV 2022 Learning a Grammar Inducer from Massive Uncurated Instructional Videos EMNLP 2022 SAT: 2D Semantics Assisted Training for 3D Visual Grounding ICCV 2021 Dynamic Grained Encoder for Vision Transformers NIPS 2021 Learning Implicit Temporal Alignment for Few-shot Video Classification IJCAI 2021 Video-aided Unsupervised Grammar Induction NAACL 2021 Bipartite Graph Network With Adaptive Message Passing for Unbiased Scene Graph Generation CVPR 2021 Distribution Alignment: A Unified Framework for Long-Tail Visual Recognition CVPR 2021 Boundary Proposal Network for Two-stage Natural Language Video Localization AAAI 2021 Transformer with Bidirectional Decoder for Speech Recognition INTERSPEECH 2020 Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language AAAI 2020 Part-aware Prototype Network for Few-shot Semantic Segmentation ECCV 2020 A Dual Attention Network with Semantic Embedding for Few-Shot Learning AAAI 2019 Dynamic Context Correspondence Network for Semantic Alignment ICCV 2019 LatentGNN: Learning Efficient Non-local Relations for Visual Recognition ICML 2019 Predicting Salient Face in Multiple-Face Videos CVPR 2017