Agent-evals – Claude skill to build your own evals
Claude Skill for agent evals, but LangSmith and Arize already own this.
Research automation skills for Claude Code. Idea to submittable paper in one session.
Useful Claude Code skills wrapper but five minutes per paper claim is marketing hyperbole.
ML researchers and PhD students
Elicit · Consensus · Scite
So I built R-stack, a Claude code setup tailor made for ML and other computational Research.
You just give it your idea and R-stack will: • check novelty • refine the research idea • write code for the experiments • run them on cloud GPUs • generate the paper
Human time investment = 5 mins per project + 30 sec
initial setup Just clone the repo and run the setup script.
Then open Claude Code and type: /research {your idea}
Claude Skill for agent evals, but LangSmith and Arize already own this.
Binary JSON with table reuse, but CBOR and MessagePack already own this space.
Rust-powered BeautifulSoup with 10x speed and full API compatibility.
Naur's 1985 theory applied to AI agents, but it's just a prompt template.
Multi-wave code review with 20+ specialists reading each other's findings before final analysis.
Code-reformatting skill to read AI output faster, but narrow scope and unproven impact.