Seamless: content-addressed computation and caching
Content-addressed caching for Python and shell with checksum-based result sharing.
Wrap Python functions and shell commands as content-addressed transformations. Cache results, run them locally or on a cluster, and share them by checksum.
Checksum-based computation identity beats Make and DVC for reproducible pipelines.
Data scientists and research engineers building reproducible pipelines
Nix · DVC · Make
It started as a hobby project — I had an itch about programming not being at-your-fingertips enough. Then I applied it to my work as a bioinformatics research engineer. The early versions focused on interactive workflows. After a year or two I realized that to make interactivity work properly, you need really good DAG tracking, so checksums were added everywhere. My lab built a collaborative web server with it that we published. More recently I've rebuilt it around the command line, persistent caching, and remote deployment.
It's still in alpha, but the core is usable.
Core idea: same code + same inputs = same result, identified by checksum. If you've already computed it, you don't compute it again.
Two entry points:
Python:
from seamless.transformer import direct
@direct def add(a, b): import time time.sleep(5) return a + b
add(2, 3) # runs, caches result add(2, 3) # cache hit — instant
Bash:seamless-run 'seq 1 10 | tac && sleep 5' # runs, caches result seamless-run 'seq 1 10 | tac && sleep 5' # cache hit — instant
With persistent caching enabled, results are stored as checksum-to-checksum mappings in a small SQLite database that can be shared with collaborators, so that they get cache hits too.Execution scales by changing config, not code: in-process, spawned workers, or a Dask-backed HPC cluster.
Remote execution also doubles as a reproducibility test. If your code produces the same result on a clean worker, it's reproducible. If not, Seamless helped you find the problem — whether it's a missing dependency, an undeclared input, or a platform sensitivity.
Built for scientific computing and data pipelines, but works for anything pipeline-shaped.
Content-addressed caching for Python and shell with checksum-based result sharing.
Tool result caching for agents when GPTCache and LangChain already do semantic caching.
SIEVE cache beats LRU with one-line swap, but only matters if you're bottlenecked on cache.
Byte-stable prefix organization beats naive message concatenation for cache hits.
Caches bloated MCP responses and lets agents query with jq, saving real tokens.
Scheduled AI agents with per-run cost tracking beat generic chat wrappers.