Back to browse
GitHub Repository

Free & open source AI video platform — Clip Generator, AI Shorts (UGC with AI actors) & YouTube Studio. Self-hosted, no watermarks.

2,342 starsJavaScript

Open-source AI SaaS to turn long videos into viral shorts

by mutonbini·Mar 8, 2026·1 point·0 comments

AI Analysis

●●●BangerWizardrySolve My Problem

Vertical reframing with MediaPipe+YOLOv8 face tracking beats naive cropping; Opus Clip exists but this is free/open.

Strengths
  • Dual-mode cropping strategy (TRACK for single subject with stabilization, GENERAL for groups) shows thoughtful scene analysis, not just naive centering.
  • Heavy Tripod stabilization engine specifically addresses jitter problem in face-tracked reframing—this is the hard part done right.
  • Full stack included: transcription, moment detection (Gemini), cropping, dubbing, S3 backup, direct social posting. Single command from URL to TikTok-ready clip.
Weaknesses
  • Opus Clip (paid, closed) and shorter-form tools (Captions.ai) already solve core use case; open-source doesn't guarantee adoption.
  • Accuracy of viral moment detection depends entirely on Gemini 2.0 Flash's understanding of platform trends—unproven at scale.
Target Audience

Content creators, YouTube channel owners, social media automation builders.

Similar To

Opus Clip · Captions.ai · Runway

Post Description

Hey HN,

I built OpenShorts, an open-source tool that takes a long YouTube video (or local file) and automatically generates vertical short clips ready for TikTok, Instagram Reels, and YouTube Shorts.

How it works:

1. Transcribes the video using faster-whisper (CPU-optimized, word-level timestamps) 2. Sends the transcript to Gemini 2.0 Flash, which identifies the 3–15 most "viral-worthy" moments (15–60s each) 3. FFmpeg extracts the clips precisely 4. AI-powered vertical reframing with two modes: - TRACK mode: MediaPipe face detection + YOLOv8 fallback with stabilization ("Heavy Tripod" engine) for single-subject scenes - GENERAL mode: Blurred background layout for groups/landscapes 5. Optional: AI subtitles, hook text overlays, voice dubbing (ElevenLabs, 30+ languages), and direct social posting

The reframing engine was the hardest part. Naive face tracking produces jittery, unwatchable output. I built a SmoothedCameraman class with safe-zone logic and a SpeakerTracker that prevents rapid switching between detected faces. The system pre-scans every scene to decide TRACK vs. GENERAL before processing.

Stack: Python/FastAPI backend, React/Vite dashboard, Docker Compose for one-command setup. All API keys (Gemini, ElevenLabs) stay client-side, encrypted in localStorage — never stored on the server.

Try it: git clone ... && docker compose up --build Then open localhost:5173, paste a Gemini API key and a YouTube URL.

MIT licensed. Feedback and PRs welcome.

Similar Projects