#12025 – 2026 · most recent
Faultline
ML training continuity & crash-to-resume
Full-stack platform for ML engineers who lose GPU time to crashes and preemptions: Python SDK on PyPI (faultline-sdk), FastAPI cloud API, Next.js dashboard, Postgres + object storage on Vercel, Render, Neon, and Cloudflare R2. Same repo includes a Rust persistence runtime (async checkpoint queue, gRPC) wired to the SDK for local training and benchmarks.




