replaces Videotto, OpusClip, Vizard, Spikes

Upload a video.
Get 4 viral clips.
Free. In your browser.

Whisper transcribes it. Llama picks the best moments. FFmpeg renders them 9:16. Your video never leaves your machine — the transcript is the only thing we send anywhere.

1
Pick a video
Under 5 minutes works best on the free tier. MP4 / WebM / MOV.
📼
Drop a video here
or
2
Analyze
Extract audio → transcribe → pick clips. Takes 30–90s.
1 Extract audio
2 Transcribe
3 Pick clips
4 Ready
3
Your clips
Click export on any card — renders 9:16 in your browser. No upload.
🎬
No clips yet
Upload a video and hit Analyze.

How this costs $0

OpusClip is a $30M company. Videotto is a clone of that. This is the same stack, rearranged so the user's machine does most of the work.

Browser
Audio extraction

Web Audio API decodes your video's audio track to 16kHz mono WAV. This happens in your tab. No upload.

Cloudflare free tier
Whisper transcription

Just the audio (not the video) gets sent to Workers AI's Whisper endpoint. Word-level timestamps come back.

Cloudflare free tier
Clip selection

The transcript is sent to Llama 3.3 70B with a prompt: "return the 4 most engaging 15–55s segments as JSON."

Browser
Rendering

FFmpeg compiled to WebAssembly runs inside your browser. It cuts, crops to 9:16, and encodes. Your video never touches our server. The paid clones upload your file to their backend, run FFmpeg on a GPU they rent, and charge you for it.

FAQ

Does the video leave my browser?

No. Only the extracted audio (a few MB) goes to Whisper. Only the transcript text goes to Llama. The video file itself is never uploaded.

Why is it free?

The heavy parts (transcription, rendering) run on free tiers (Cloudflare Workers AI) and your machine. There's no paid GPU rental because there's no server-side render.

Is it as good as OpusClip?

No. OpusClip has animated captions, better viral-moment scoring trained on their own engagement data, and face-tracking that reframes on the speaker. This has none of that. But for a raw "give me the 4 best 30-second moments" it's shockingly close.

Why 5 minutes max?

Workers AI has a payload limit on Whisper. At 16kHz mono audio that caps at roughly 5 minutes. Longer videos would need chunked transcription — doable, not done yet.

Rendering is slow.

FFmpeg in WASM is single-threaded and your browser isn't built for this. A 30-second clip takes roughly 30–90 seconds to render. Trade-off for $0.