Lin Ma
96 papers · 2015–2026 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
🌍 Conference Polyglot (14) 🏃 Academic Marathon (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (12)
🌈
Renaissance Researcher
(12)
🏃
Academic Marathon
(10)
🐣
Hot Topic Early Bird
🏠
Conference Loyalist
(23)
🤝
Dynamic Duo
(24)
🏆
Grand Slam
🔬
Deep Specialist
(21)
🧬
Topic Evolution
🏆
Keyword Champion
(2)
📈
Trend Setter
🗃️
Keyword Collector
(403)
🚀
Conference Pioneer
🔥
Unstoppable
(9)
💎
Century Club
(93)
⚡
Prolific Year
(15)
Conferences
CVPR (23)
AAAI (16)
ECCV (13)
ICCV (13)
NIPS (10)
IJCAI (7)
ACL (4)
EMNLP (2)
ICLR (2)
ICML (2)
AACL (1)
COLING (1)
IJCNLP (1)
WACV (1)
Top co-authors
Keywords
video understanding
(9)
multimodal learning
(8)
convolutional neural network
(6)
representation learning
(6)
semantic segmentation
(6)
multi-modal learning
(5)
zero-shot learning
(4)
foundation model
(4)
image segmentation
(4)
autonomous driving
(4)
vision-language model
(4)
image captioning
(4)
contrastive learning
(4)
generative adversarial network
(4)
video localization
(4)
few-shot learning
(3)
instance segmentation
(3)
domain adaptation
(3)
weakly supervised learning
(3)
object detection
(3)
Papers
X-SAM: From Segment Anything to Any Segmentation
AAAI 2026
DBGroup: Dual-Branch Point Grouping for Weakly Supervised 3D Semantic Instance Segmentation
AAAI 2026
Leveraging Visual Blur Perception Characteristics for EEG Decoding
AAAI 2026
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
EMNLP 2025
Affordances-Oriented Planning Using Foundation Models for Continuous Vision-Language Navigation
AAAI 2025
RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving
ICCV 2025
RoboTrom-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction
ICCV 2025
RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation
ICCV 2025
MLLM-Tool: A Multimodal Large Language Model for Tool Agent Learning
WACV 2025
SSPNet: Leveraging Robust Medication Recommendation with History and Knowledge
IJCAI 2025
Learning Dynamical Coupled Operator For High-dimensional Black-box Partial Differential Equations
IJCAI 2025
MCF-Spouse: A Multi-Label Causal Feature Selection Method with Optimal Spouses Discovery
IJCAI 2025
TimeStacker: A Novel Framework with Multilevel Observation for Capturing Nonstationary Patterns in Time Series Forecasting
ICML 2025
CO-MOT: Boosting End-to-end Transformer-based Multi-Object Tracking via Coopetition Label Assignment and Shadow Sets
ICLR 2025
DisTime: Distribution-based Time Representation for Video Large Language Models
ICCV 2025
RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case
ICCV 2025
Towards Efficient Foundation Model for Zero-shot Amodal Segmentation
CVPR 2025
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
CVPR 2024
LESS: Label-Efficient and Single-Stage Referring 3D Segmentation
NIPS 2024
EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals
NIPS 2024
Splatter a Video: Video Gaussian Representation for Versatile Processing
NIPS 2024
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
NIPS 2024
Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost
ICLR 2024
Instance-Aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning
AAAI 2024
ColNeRF: Collaboration for Generalizable Sparse Input Neural Radiance Field
AAAI 2024
Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs
ACL 2024
A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation
COLING 2024
3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
ECCV 2024
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
ECCV 2024
Making Large Language Models Better Planners with Reasoning-Decision Alignment
ECCV 2024
Misalignment-Robust Frequency Distribution Loss for Image Transformation
CVPR 2024
AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
CVPR 2024
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection
CVPR 2023
Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields
ICCV 2023
Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
ICCV 2023
E2E-LOAD: End-to-End Long-form Online Action Detection
ICCV 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
ACL 2023
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text
ACL 2023
AeDet: Azimuth-Invariant Multi-View 3D Object Detection
CVPR 2023
Adaptive Sparse Pairwise Loss for Object Re-Identification
CVPR 2023
Curriculum Multi-Negative Augmentation for Debiased Video Grounding
AAAI 2023
Punctuation-level Attack: Single-shot and Single Punctuation Can Fool Text Models
NIPS 2023
TriDet: Temporal Action Detection With Relative Boundary Modeling
CVPR 2023
ReAct: Temporal Action Detection with Relational Queries
ECCV 2022
Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation
NIPS 2022
Visual Consensus Modeling for Video-Text Retrieval
AAAI 2022
Explore Inter-contrast between Videos via Composition for Weakly Supervised Temporal Sentence Grounding
AAAI 2022
Contrastive Video-Language Learning with Fine-grained Frame Sampling
AACL 2022
PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images
ECCV 2022
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
ECCV 2022
Contrastive Video-Language Learning with Fine-grained Frame Sampling
IJCNLP 2022
Similarity Reasoning and Filtration for Image-Text Matching
AAAI 2021
Relation-aware Instance Refinement for Weakly Supervised Visual Grounding
CVPR 2021
Context-Gated Convolution
ECCV 2020
Recurrent Nested Model for Sequence Generation
AAAI 2020
Consensus-Aware Visual-Semantic Embedding for Image-Text Matching
ECCV 2020
Temporally Grounding Language Queries in Videos by Contextual Boundary-Aware Prediction
AAAI 2020
Beyond Monocular Deraining: Stereo Image Deraining via Semantic Understanding
ECCV 2020
Fine-Grained Image-to-Image Transformation Towards Visual Recognition
CVPR 2020
Feature Deformation Meta-Networks in Image Captioning of Novel Objects
AAAI 2020
Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension
CVPR 2020
Deblurring by Realistic Blurring
CVPR 2020
Hierarchical Photo-Scene Encoder for Album Storytelling
AAAI 2019
Exploiting Local and Global Structure for Point Cloud Semantic Segmentation with Contextual Point Representations
NIPS 2019
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos
NIPS 2019
Image Deformation Meta-Networks for One-Shot Learning
CVPR 2019
Learning Joint Gait Representation via Quintuplet Loss Minimization
CVPR 2019
Unsupervised Image Captioning
CVPR 2019
Multi-Granularity Generator for Temporal Action Proposal
CVPR 2019
Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network
ICCV 2019
Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis
ICCV 2019
Spatio-Temporal Video Re-Localization by Warp LSTM
CVPR 2019
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
ACL 2019
Hallucinating Optical Flow Features for Video Classification
IJCAI 2019
Position Focused Attention Network for Image-Text Matching
IJCAI 2019
Cousin Network Guided Sketch Recognition via Latent Attribute Warehouse
AAAI 2019
Localizing Natural Language in Videos
AAAI 2019
Gated Fusion Network for Single Image Dehazing
CVPR 2018
Regularizing RNNs for Caption Generation by Reconstructing the Past With the Present
CVPR 2018
Reconstruction Network for Video Captioning
CVPR 2018
Bidirectional Attentive Fusion With Context Gating for Dense Video Captioning
CVPR 2018
Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks
CVPR 2018
Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning
NIPS 2018
Safe Element Screening for Submodular Function Minimization
ICML 2018
Temporally Grounding Natural Sentence in Video
EMNLP 2018
Image-level to Pixel-wise Labeling: From Theory to Practice
IJCAI 2018
Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamics
IJCAI 2018
Recurrent Fusion Network for Image captioning
ECCV 2018
Video Re-localization
ECCV 2018
Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation
NIPS 2018
Neural Stereoscopic Image Style Transfer
ECCV 2018
Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks
ECCV 2018
Real-Time Neural Style Transfer for Videos
CVPR 2017
Local Subspace Collaborative Tracking
ICCV 2015
Multiple Feature Fusion via Weighted Entropy for Visual Tracking
ICCV 2015
Multimodal Convolutional Neural Networks for Matching Image and Sentence
ICCV 2015