New v3.2.0 - Local Voice Cloning & Guided Settings!

From Script to Video: Integrated AI Video Creation

Stay in VS Code and transform your transcripts into professional demo videos. Speechify 3.2.0 combines Azure Speech, local CosyVoice and Qwen3-TTS voice cloning, and guided configuration to automate alignment, edit timing, and generate synced audio-visual content in one workflow.

200+
Voices
60+
Languages
2
UI Languages

World-Class Multi-language Support

Experience 200+ natural voices powered by Azure

Speechify Voice Demos
πŸ‡ΊπŸ‡Έ
English
Azure AI Voice
πŸ‡¨πŸ‡³
Chinese
Azure AI Voice
πŸ‡―πŸ‡΅
Japanese
Azure AI Voice
πŸ‡ͺπŸ‡Έ
Spanish
Azure AI Voice
πŸ‡«πŸ‡·
French
Azure AI Voice
High-Quality Audio
60+ Languages
Azure + Local Backends

Next-Gen Video & Speech Features

The complete toolkit for AI-driven demo video production and high-quality synthesis

Vision-Aware Alignment

AI automatically analyzes your screen recordings to identify key moments and perfectly syncs your voiceover with visual changes.

Interactive Timeline Editor

Fine-tune your audio-visual sync with a built-in drag-and-drop editor. Achieve millisecond precision without leaving VS Code.

One-Click Video Muxing

Automatically combine processed video, AI speech, and pixel-perfect subtitles (SRT) into a ready-to-share MP4 file.

Smart Script Refinement

AI helps adjust your script's length and pacing based on the actual video clip duration for a natural-sounding narration.

Azure + Local Voice Cloning

Choose between Azure neural voices, a fully local CosyVoice backend, or local Qwen3-TTS + MLX-Audio voice cloning reference-audio voice cloning directly inside VS Code.

Native i18n Workflow

Optimized for multilingual creators with built-in English/Chinese UI and intelligent language detection for scripts.

Why Speechify?

How we compare to traditional video editors

Script-as-Code

Stop fighting with timelines and keyframes. Your video is defined by your JSON script. Update the text, and the entire video updates. Perfect for agile technical content.

AI-Native Workflow

Instead of manual drag-and-drop, our Vision AI "watches" your recording to automatically align your narrative with visual changes. It's automated precision.

Zero Context Switching

Stay in the tool you love. From writing documentation to producing the final demo video, everything happens within VS Code. No heavy video editors required.

Copilot Multilingual

Seamlessly integrate with GitHub Copilot to translate scripts. Generate localized versions of the same video content for global markets in minutes.

Designed For Efficiency

Speechify 3.2.0 scales from simple audio conversion to complex video production

App Demo Production

Record your app, write a script, and let AI handle the heavy lifting of syncing your explanation with UI interactions.

Tutorial Creation

Generate high-quality video tutorials with precise subtitles and professional voiceovers directly from your documentation.

Social Content

Fast-track your social media presence by generating voiced-over clips and product demos with diverse AI personas.

Accessible Docs

Convert complex technical documentation into audio-visual guides that make learning more inclusive and engaging.

Code Walkthroughs

Explain your architecture or new features through voiced demos that highlight specific code blocks as you speak.

Script-Driven Workflows

Build transcript-based video production pipelines that allow you to iterate on your content as fast as you write your documentation.

Quick Installation

Choose your speech backend and get started quickly inside VS Code

1

Install from VS Code Marketplace

Search for "Speechify" in VS Code Extensions (Ctrl+Shift+X) or click the install button above.

2

Choose a Speech Backend

Open Speechify settings in VS Code and choose Azure Speech, CosyVoice (Local), or Qwen3-TTS + MLX-Audio (Local) as your generation backend.

3

Generate Voiceover

Select text and use the context menu or Ctrl+Shift+P to run Azure: Generate Voiceover, Local CosyVoice: Generate Voiceover, or Local Qwen3-TTS: Generate Voiceover.

Local Voice Cloning Workflows

Voice cloning, reference recording, and local transcription without leaving VS Code

1. Choose the local engine

CosyVoice runs through a local FastAPI backend at speechify.cosyVoice.baseUrl (default: http://127.0.0.1:50000).

Qwen3-TTS runs directly through mlx-audio and a local Python environment configured by speechify.qwenTts.pythonPath, so no long-running server is required.

2. Set a reference voice

Use Local CosyVoice: Set Reference Voice or Local Qwen3-TTS: Set Reference Voice to record a sample in VS Code or select an existing audio or video file as reference media.

When you choose a video, Speechify extracts the audio automatically and can open workspace settings so you can review and edit the reference transcript.

3. Generate cloned speech

Right-click your text and choose Local CosyVoice: Generate Voiceover or Local Qwen3-TTS: Generate Voiceover. Speechify uses the configured reference audio for cloning and saves the generated result into your project workflow.

Reference text is optional, but recommended. CosyVoice and Qwen3-TTS both produce better cloning quality when the reference audio also has a matching transcript.

Local setup notes

Reference-media transcription runs locally. On macOS, Speechify tries Whisper MLX first for faster Apple Silicon inference, then falls back to a local Python Whisper runtime when MLX is unavailable.

In your workspace settings, make sure you configure the local backend address or Python path, the reference audio path, and the reference transcript for the provider you actually use.

The full list of CosyVoice and Qwen3-TTS configuration keys, what each one does, and complete settings.json examples are documented in the project README.

Local engine repositories:

Support & Community

Get help and contribute to the Speechify project

Report Issues

Found a bug or have a feature request? Create an issue on our GitHub repository and we'll help you resolve it quickly.

Open Issues

Documentation

Comprehensive guides for Azure Speech, local CosyVoice and Qwen3-TTS setup, reference-media configuration, and development documentation in multiple languages.

Read Docs

Contribute

Help improve Speechify by contributing code, translations, documentation, or reporting bugs and suggesting new features.

Contribute