Papers
EgoGen: An Egocentric Synthetic Data Generator
Gen Li, Kaifeng Zhao, Siwei Zhang et al.
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng, Zhicheng Guo, Jingwen Wu et al.
E-GPS: Explainable Geometry Problem Solving via Top-Down Solver and Bottom-Up Generator
Wenjun Wu, Lingling Zhang, Jun Liu et al.
EGTR: Extracting Graph from Transformer for Scene Graph Generation
Jinbae Im, JeongYeon Nam, Nokyung Park et al.
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation
Moayed Haji-Ali, Guha Balakrishnan, Vicente Ordonez
EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
Haiyang Liu, Zihao Zhu, Giorgio Becherini et al.
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
Yijun Yang, Tianyi Zhou, Kanxue Li et al.
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Tai Wang, Xiaohan Mao, Chenming Zhu et al.
Embracing Unimodal Aleatoric Uncertainty for Robust Multimodal Fusion
Zixian Gao, Xun Jiang, Xing Xu et al.
EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation
Md Mostafijur Rahman, Mustafa Munir, Radu Marculescu
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
Jiayun Luo, Siddhesh Khandelwal, Leonid Sigal et al.
EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
Jingyuan Yang, Jiawei Feng, Hui Huang
EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
Nikita Drobyshev, Antoni Bigata Casademunt, Konstantinos Vougioukas et al.
Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
Kiran Chhatre, Radek Dan??ek, Nikos Athanasiou et al.
EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng et al.
Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance
Wei Yu, Jie Huang, Bing Li et al.
Emu Edit: Precise Image Editing via Recognition and Generation Tasks
Shelly Sheynin, Adam Polyak, Uriel Singer et al.
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
Yifang Men, Biwen Lei, Yuan Yao et al.
Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection
Wenjun Hui, Zhenfeng Zhu, Shuai Zheng et al.
End-to-End Spatio-Temporal Action Localisation with Video Transformers
Alexey A. Gritsenko, Xuehan Xiong, Josip Djolonga et al.
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu, Chen-Lin Zhang, Chen Zhao et al.
Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
Wei Zhang, Chaoqun Wan, Tongliang Liu et al.
Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model
Zhicai Wang, Longhui Wei, Tan Wang et al.
Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences
Seungwook Kim, Kejie Li, Xueqing Deng et al.