Profine – Automated profiling and code rewrites for ML training loops
3.11x speedup on minGPT with automated LLM-suggested rewrites.
Profine automatically profiles and optimizes PyTorch training jobs on real GPUs, delivering measurable speedups and lower GPU costs before teams waste days tuning configs by hand.
Automates the painful torch.compile and mixed-precision tuning loop with measured 3x speedups.
ML engineers and researchers optimizing PyTorch training
PyTorch Profiler · TensorBoard · CodeTuner
3.11x speedup on minGPT with automated LLM-suggested rewrites.
Automated PyTorch optimizer delivering 3x speedups before you waste cloud credits.
Custom CPU kernels for sparse training when everyone else chases GPU.
GPU-vectorized PPO arena with thousands of agents, but emergent behavior research remains niche.
QuillBot alternative that builds a style profile from your past writing samples.
Xcode GPU frame debugging for WebGL/WebGPU—finally fills the native profiler gap.