Sostactic – polynomial inequalities using sums-of-squares in Lean
Beats Lean's `nlinarith` on nonlinear inequalities using Python-backed SOS decompositions.
A python decorator that autoformalizes and formally verifies python code using Lean 4 and LLMs
Formal verification via Python decorator—Lean proofs generated by LLMs on the fly.
Python developers interested in formal verification
Lean 4 · Coq · Dafny
I have recently been playing around with AI and formal methods and built Provepy. It's an experiment to see if we can use frontier LLMs to bridge the gap between everyday Python code and formal verification in Lean.
The idea is to make formal methods more accessible by burying the complex Lean syntax behind a standard Python decorator. You add @provable to your function, give it a plain English claim, and run your code.
from provepy import provable
@provable(claim="This function returns the sum of its inputs") def add(a: int, b: int) -> int: return a + b
Under the hood, when the script runs, Provepy grabs the function and your claim, and passes them to an LLM (defaults to Gemma, but supports OpenRouter or any custom OpenAI-compatible endpoint). The LLM attempts to translate the claim into a Lean 4 theorem and generate the proof. Using Frontier models is recommended for best results.Generation of the theorem, code and proof is done separately to avoid the LLM proving the wrong claim or function just to succeed in the proof.
If Lean compiles and accepts the proof, your Python function executes normally. If the proof fails, your program halts with a VerificationError.
You can also pass context to the decorator if your function relies on other Python functions in your codebase. I also added a fallback mechanism—you can configure a smaller, cheaper model to try first, and if it fails to generate a valid proof, it automatically retries with a heavier model.
To be clear: this is very experimental alpha software.
It currently only works reliably on simple functions.
There's a lot left to do, like improving how context is pulled from the AST and resolving name clashes with Lean's mathlib.
Contributions are most welcome!
I'd love to hear your thoughts.
Beats Lean's `nlinarith` on nonlinear inequalities using Python-backed SOS decompositions.
Drop @ui_enabled on a function and you get an instant Flask-backed web form with inputs inferred from type hints, a sidebar for global variables, and a searchable collapsible JSON result viewer. It's a deliberate tiny alternative to Streamlit/Gradio — no React or npm, just vanilla HTML/JS with grouping and theming; excellent for converting one-off scripts into a control panel, but expect limited auth, validation, and scaling features.
LLM-generated Lean 4 theorems that Mathlib actually proves — genuinely novel architecture.
LLM generates Lean 4 theorems from your code—formal verification without the PhD.
Spring-style config injection for Python when pydantic-settings already handles this.
Solid Lean tutorial, but implementing insertion sort proofs is a standard exercise in the field.