new
LIMITED TIME OFFER
Unlimited Nano Banana 2 / Pro, Veo 3.1
Unlimited Nano Banana 2 / Pro, Veo 3.1
new
Unlimited Nano Banana 2 / Pro, Veo 3.1
Features
Workflows
Customers
Resources
BACK

How AI Lip Sync Works, and How to Make a Lip-Synced Music Video

How AI Lip Sync Works, and How to Make a Lip-Synced Music Video

How AI Lip Sync Works, and How to Make a Lip-Synced Music Video

How does AI lip sync work?

AI lip sync analyzes the sounds in an audio track, maps each sound to the matching mouth shape, then redraws a character's lips frame by frame so the mouth moves in time with the voice. For indie and vocal artists, that means a singing performance video without a camera, a crew, or a studio. In Atlabs, the Music Video workflow builds a lip-synced performance video straight from your uploaded track, and the standalone Lip Sync app syncs any existing clip to an audio file. Here is how the process works, and how to make one yourself.


AI lip sync reads your audio and drives the mouth movement in the video, so the two stay locked together.

How AI lip sync works under the hood

Every vocal is a stream of sounds, and every sound has a mouth shape that produces it. Linguists call those shapes visemes. AI lip sync listens to your track, breaks the vocal into its individual sounds across time, and matches each one to the viseme that a real mouth would form. It then renders the character's lips and jaw frame by frame so the movement lands on the exact moment each sound plays.

Because the system follows the audio itself rather than a typed transcript, the timing tracks your phrasing, your held notes, and your breaths. On Atlabs, models such as Kling 3.0 carry the motion and realism, so the mouth movement reads as a live performance instead of a flat overlay. The result is a singing artist whose lips match the recording you uploaded, shot after shot.

Watch the full tutorial

Watch the complete step-by-step tutorial on how to create a realistic lip-synced music video.

What you'll need

You'll need an Atlabs account, your finished track as an mp3 up to 200MB or a Suno music URL, a rough idea of the look you want, and about 10 to 15 minutes. No camera, no studio, no editing rig. The Music Video workflow handles the mouth movement, the scene generation, and the cuts. If you already have a clip of your artist and only want the mouth matched to a vocal, the Lip Sync app does that on its own in a single step.

How to make a lip-synced performance video in Atlabs

The Music Video workflow runs in five steps. The Performance video type is the one that produces a lip-synced singing video, so this walkthrough uses that path from start to finish.

1. Upload your track. Open the Music Video workflow, and on the Add Music screen upload an mp3 up to 200MB or paste a Suno music URL, then click EXTRACT MUSIC. Atlabs reads the track and auto-detects its properties.


Step 1: upload your track or paste a Suno URL, then click EXTRACT MUSIC.

2. Pick your segment and choose Performance. In the "Pick the best part of your track" modal, drag the green selection window across the waveform to set the section you want, usually up to about 25 seconds. Then choose the Performance video type, which builds a lip-synced performance video of the artist rather than a narrative scene.


Step 2: drag the green window to pick your hook, then select Performance.

3. Set your style. On the Set Style step, pick an Aspect Ratio: 9:16 for TikTok and Instagram, 16:9 for YouTube, or 1:1 for square feeds. Keep Video Style on AI Video, and choose a Visual Style such as Realistic for a live performance look that suits most indie tracks.


Step 3: choose your aspect ratio and a Realistic visual style for a live-performance feel.

4. Choose a concept. Atlabs shows 6 scene concepts generated from your track's tempo, mood, and genre. Pick the card that fits your song, or click "+ DESCRIBE YOUR CONCEPT" to write your own direction for the shoot.


Step 4: pick one of the 6 generated concepts or describe your own.

5. Cast your performer. On the Cast step, define your Characters so your artist's face stays consistent across every shot. Each character card shows a reference sheet with multiple angles, and the "Click to edit" overlay opens the character editor. Confirm your cast, generate, and the workflow renders the performance with lips synced to your vocal.


Step 5: cast your performer so the same face carries across every shot.

To sync a clip you already have instead of generating a new one, open the Lip Sync app, upload your image or video together with an audio file from 2 to 120 seconds, and it matches the mouth movement to the voice in one pass.


A lip-synced performance video gives an indie release a face on camera without ever filming one.

Tips for better lip sync

Start with a clean vocal. The clearer the voice sits in your mix, the more precise the timing, so a track with a prominent or isolated vocal syncs tighter than a dense wall of sound. Pick the Realistic visual style for a live feel, and reach for a stylized style only when your indie aesthetic calls for it. Frame for the platform first: choose 9:16 when the clip is heading to Reels or TikTok, so the face fills the frame and the lip movement reads clearly. Keep your segment short, because a 20 to 25 second hook syncs and renders faster than a full song and works better as a teaser for a new indie release.

Example prompts to try

Use this concept prompt on the Concepts step for a moody indie performance look:

Moody indie performance in a dim warehouse, one warm key light on the artist, slow handheld push-in, shallow depth of field, soft film grain, cuts landing on the beat. The artist sings directly to camera through the chorus.

Try this in Atlabs Music Video

Use this prompt in the Lip Sync app when you already have a portrait clip of your artist:

Sync this close-up portrait clip to the uploaded vocal stem. Keep the mouth movement natural, add subtle head motion, and hold eye contact with the camera throughout the take.

Try this in Atlabs Lip Sync

FAQ

How long does it take to make a lip-synced music video?

Most indie artists get a 20 to 25 second performance clip in about 10 to 15 minutes. Uploading the track and setting up the five steps takes a couple of minutes, and the generation runs once you confirm the Cast step.

What audio files does AI lip sync support?

The Music Video workflow accepts an mp3 up to 200MB or a Suno music URL. The standalone Lip Sync app takes an audio file from 2 to 120 seconds paired with an image or a video.

Do I need editing or animation skills?

No. The workflow handles the mouth timing, the scene generation, and the shot cuts for you. You upload a track, pick a segment, choose a style and a concept, and cast your performer.

Does AI lip sync work for languages other than English?

Yes. Because the sync follows the sounds in your audio rather than a typed transcript, it tracks your vocal in many languages, so the same workflow covers non-English indie tracks.

Get started

When your track is ready, open the Music Video workflow to build a lip-synced performance video, or use the Lip Sync app to match a clip you already have to your vocal. Open Atlabs

Ready to tell your story?

Ready to tell your story?

Ready to tell your story?