A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation
Abstract
Scientific reasoning relies not only on logical inference but also on activating prior knowledge and experiential structures. Memory can efficiently reuse knowledge and enhance reasoning consistency and stability. However, existing benchmarks mainly evaluate final answers or step-by-step coherence, overlooking the memory-driven mechanisms that underlie human reasoning, which involves activating anchors and attractors, then integrating them into multi-step inference. To address this gap, we propose A^3-Bench~ https://a3-bench.github.io, a benchmark designed to evaluate scientific reasoning through dual-scale memory-driven activation, grounded in Anchor and Attractor Activation. First, we annotate 2,198 science reasoning problems across domains using the SAPM process(subject, anchor & attractor, problem, and memory developing). Second, we introduce a dual-scale memory evaluation framework utilizing anchors and attractors, along with the AAUI(Anchor--Attractor Utilization Index) metric to measure memory activation rates. Finally, through experiments with various base models and paradigms, we validate A^3-Bench and analyze how memory activation impacts reasoning performance, providing insights into memory-driven scientific reasoning.
Community
Scientific reasoning relies not only on logical inference but also on activating prior knowledge and experiential structures. Memory can efficiently reuse knowledge and enhance reasoning consistency and stability. However, existing benchmarks mainly evaluate final answers or step-by-step coherence, overlooking the memory-driven mechanisms that underlie human reasoning, which involves activating anchors and attractors, then integrating them into multi-step inference. To address this gap, we propose A3-Bench, a benchmark designed to evaluate scientific reasoning through dual-scale memory-driven activation, grounded in Anchor and Attractor Activation. First, we annotate 2,198 science reasoning problems across domains using the SAPM process(subject, anchor & attractor, problem, and memory developing). Second, we introduce a dual-scale memory evaluation framework utilizing anchors and attractors, along with the AAUI(Anchor--Attractor Utilization Index) metric to measure memory activation rates. Finally, through experiments with various base models and paradigms, we validate A3-Bench and analyze how memory activation impacts reasoning performance, providing insights into memory-driven scientific reasoning.
I've created a podcast to explain the key concepts behind this paper:
https://researchpod-share.vercel.app/episode/53c50aba-2e51-422c-a452-d7ef7ac63ead
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios (2026)
- TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models (2025)
- Agentic Learner with Grow-and-Refine Multimodal Semantic Memory (2025)
- LIR$^3$AG: A Lightweight Rerank Reasoning Strategy Framework for Retrieval-Augmented Generation (2025)
- CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models (2025)
- From Chains to Graphs: Self-Structured Reasoning for General-Domain LLMs (2026)
- From Hypothesis to Premises: LLM-based Backward Logical Reasoning with Selective Symbolic Translation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper