Co-occurring keywords
Papers
MicroEvoEval: A Systematic Evaluation Framework for Image-Based Microstructure Evolution Prediction
AAAI 2026
MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains
AAAI 2026
Complex Mathematical Expression Recognition: Benchmark, Large-Scale Dataset and Strong Baseline
AAAI 2026
UQ-Bench: A Benchmark for Evaluating Multimodal LLMs on Underwater Image Quality Assessment
AAAI 2026
TIME: Temporal-Sensitive Multi-Dimensional Instruction Tuning and Robust Benchmarking for Video-LLMs
AAAI 2026
VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models
AAAI 2026
BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities
WACV 2026
HumanBench: Two Heads, No Legs, But Mostly Human, the State of Generative Capabilities in T2I Models
WACV 2026
When Can We Trust LLMs in Mental Health? Large-Scale Benchmarks for Reliable LLM Evaluation
EACL 2026