Manalyzer: End-to-end Automated Meta-analysis with Multi-agent System

Wanghan Xu^1,2* Wenlong Zhang² Fenghua Ling² Ben Fei²

Yusong Hu² Fangxuan Ren³ Jintai Lin³ Wanli Ouyang² Lei Bai²^†

¹Shanghai Jiao Tong University
²Shanghai Artificial Intelligence Laboratory
³Peking University

Overview

Meta-analysis is a systematic research methodology that synthesizes data from multiple existing studies to derive comprehensive conclusions. This approach not only mitigates limitations inherent in individual studies but also facilitates novel discoveries through integrated data analysis. Traditional meta-analysis involves a complex multi-stage pipeline including literature retrieval, paper screening, and data extraction, which demands substantial human effort and time. However, while LLM-based methods can accelerate certain stages, they still face significant challenges, such as hallucinations in paper screening and data extraction. In this paper, we propose a multi-agent system, Manalyzer, which achieves end-to-end automated meta-analysis through tool calls. The hybrid review, hierarchical extraction, self-proving, and feedback checking strategies implemented in Manalyzer significantly alleviate these two hallucinations. To comprehensively evaluate the performance of meta-analysis, we construct a new benchmark comprising 729 papers across 3 domains, encompassing text, image, and table modalities, with over 10,000 data points. Extensive experiments demonstrate that Manalyzer achieves significant performance improvements over the LLM baseline in multi meta-analysis tasks.

Comparison

Description of the image

(a) Manual meta-analysis is inherently time-consuming. This traditional method relies heavily on human effort for every step, from paper screening to data extraction and synthesis, making it a lengthy and resource-intensive process. (b) In contrast, LLM-based methods offer some automation but are often limited to specific steps, consequently failing to achieve true end-to-end automation. A significant drawback of these approaches is their propensity for hallucinations during critical stages like paper screening and data extraction, which can compromise the reliability of the analysis. (c) Our proposed system, Manalyzer, addresses these limitations directly. It offers end-to-end automation for meta-analysis. Crucially, Manalyzer's workflow design specifically incorporates mechanisms that lead to significantly reduced hallucinations, thereby enhancing the accuracy and reliability of the entire meta-analysis process.

Pipeline

Description of the image

Manalyzer is a multi-agent system incorporating tool calling and feedback mechanisms, enabling end-to-end automated meta-analysis in real scientific research scenarios. We divide the meta-analysis process into three stages. The first stage involves receiving user input, searching for and downloading papers, followed by filtering out relevant and valuable ones. The second stage focuses on extracting data from these selected papers and integrating it into tables. The third stage is to analyze the integrated data and output the final meta-analysis report.

Benchmark

Description of the image

To comprehensively and objectively evaluate the performance of Manalyzer and LLM baselines in meta-analysis, we introduce the first meta-analysis benchmark dataset derived from real-world and large-scale scientific papers. The benchmark includes 729 papers with 10,000+ data points across three fields, which assesses models extract research-relevant data from multimodal content (tables, images, text) and consolidate it into structured tables.

Citation

@misc{xu2025manalyzerendtoendautomatedmetaanalysis,
title={Manalyzer: End-to-end Automated Meta-analysis with Multi-agent System},
author={Wanghan Xu and Wenlong Zhang and Fenghua Ling and Ben Fei and Yusong Hu and Fangxuan Ren and Jintai Lin and Wanli Ouyang and Lei Bai},
year={2025},
eprint={2505.20310},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2505.20310},
}

Acknowledgements

* This work was primarily conducted during the author's internship at the Shanghai Artificial Intelligence Laboratory.
† Corresponding author.