Back to browse
Paper Lantern – improving Autoresearch with research knowledge

Paper Lantern – improving Autoresearch with research knowledge

by paperlantern·Apr 21, 2026·2 points·2 comments

AI Analysis

●●●BangerBig BrainDark Horse

Coding agents miss research knowledge; this surfaces 2M+ papers with benchmarks.

Strengths
  • Concrete 3.2% val loss improvement on autoresearch benchmarks
  • MCP server distribution integrates directly with existing coding agents
  • Surfaces implementation details, hyperparameters, and failure modes
Weaknesses
  • Narrow focus on CS research limits broader applicability
  • Requires MCP client setup that adds friction to adoption
Category
Target Audience

ML engineers running autoresearch workflows

Similar To

Elicit · Semantic Scholar · Consensus

Post Description

Hi, we've been working on Paper Lantern - an MCP server that searches 2M+ CS research papers for coding agents. The coding agent describes its problem and PL returns ranked techniques with implementation steps, hyperparameters, and failure modes.

We tested it on Karpathy's autoresearch framework : where the task is to find better llm architecture and training configs. In autoresearch, the agent proposes an optimization, tries a 5 min training run, calculates the val loss and then keeps / discards if the val loss lowered / increased.

We compared a strong baseline agent (Opus 4.6 + web search) vs that same agent + Paper Lantern.

- agent + Paper Lantern iterated to a config that got a much lower val loss on 5-min runs

- we trained the two final configs for 2 hours : the config from Paper Lantern got a 3.2% lower val loss

Two concrete examples :

1. Both agents tried halving the batch size. The paper-access agent pulled a 2022 paper and scaled the learning rate by 1/sqrt(2) as the paper prescribed. It worked, and further halving kept working. The web-search agent made the same batch change, got worse loss, and moved on without diagnosing the LR.

2. The with-paper-lantern agent also implemented AdaGC (adaptive gradient clipping, arxiv 2502.11034, published Feb 2025) on the first try with no tuning. Which the baseline agent did not try at all.

If you want to deep-dive:

- (code) https://github.com/paperlantern-ai/autoresearch-experiment

- (blog) https://www.paperlantern.ai/blog/autoresearch

If you want to try Paper Lantern yourself:

- Quick setup: `npx paperlantern@latest`

Similar Projects