News

2026.03: ResearchClawBench is released — an end-to-end benchmark with 40 tasks across 10 domains, evaluating whether AI coding agents can independently conduct research, from Re-Discovery to New-Discovery.
2026.01: Four papers were accepted at ICLR 2026, including two first-author papers: Eigen-1 and EarthSE 🎉.
2026.01: Introducing Iris, a desktop GUI Agent that is simple to start but powerful enough to execute any workflow you need.
2025.12: Our large-scale benchmark SGI-Bench is released 👏 —— a comprehensive report of over 150 pages co-authored by more than 100 researchers, providing the most extensive evaluation to date of LLMs and Agents on deep research, idea generation, code generation, multimodal reasoning, and more. SGI-Bench offers a unified and rigorous framework for measuring AI systems’ automated research capabilities, marking a major milestone toward building truly automated research agents.
2025.10: Our new paper on multi-agent reasoning, Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning reached 36w views on BiliBili!
2025.10: Our new paper on multi-agent reasoning, Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning achieves 60%+ score on Humanity’s Last Exam (HLE) benchmark, establishing a new SOTA on HLE.
2025.09: Two papers were accepted at NeurIPS 2025.
2025.08: During my first year of PhD, my personal Google Scholar citations exceeded 100 🎉.
2025.06: InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis was accepted at ICCV 2025.
2025.06: Our website PrismaX, an evaluation-driven platform for AI scientific discovery, is launched 🎉.
2024.09: Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling (first author) was accepted at NeurIPS 2024.
2024.05: CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling was accepted at ICML 2024.

Wanghan Xu (徐望瀚)

News

Wanghan Xu
(徐望瀚)