AI Development · 2026

Build a Real-Time Translation App with Gemini Live API, LiveKit and Cloud Run in 2026

One speaker, many languages, all in real time. Here is how to build a live voice translation app on the Gemini Live API and LiveKit, and ship it on Google Cloud Run.

Distk Editorial June 2026 13 min read

This 2026 build guide walks through a real-time voice translation app where one host speaks and listeners worldwide hear translated audio in their own language. LiveKit handles ultra-low-latency WebRTC audio, the Gemini Live API does the live translation with natural voice, and Google Cloud Run hosts it serverlessly. You will set up the LiveKit Room, spawn a Translation Bridge bot per language, tune audio to 100ms chunks for stability, containerize with Docker, and deploy with one gcloud command. For brands, this is the blueprint behind multilingual webinars, launches and global livestreams.

What Are You Building in 2026?

You are building a production-ready web app in 2026 where a speaker broadcasts in one language and listeners anywhere choose a language and hear real-time translated audio with natural vocal delivery. The system uses WebRTC for ultra-low-latency delivery, so the translation feels live rather than like a delayed dub. It is the kind of capability that, until recently, required expensive human interpreters and dedicated hardware.

The stack is deliberately lean: Next.js for the frontend, LiveKit Cloud for WebRTC audio transport, the Gemini Live API as the translation engine, and Google Cloud Run for serverless deployment, with Google Secret Manager holding the keys. This guide follows the official Google AI reference build and adds the marketing context that makes it worth your time in 2026.

How Does the Architecture Work?

The architecture in 2026 centers on a single LiveKit Room. The host publishes their voice into the room. When a listener joins and selects a language, the backend spins up a dedicated Translation Bridge worker for that language, which joins the room as a bot, subscribes to the host audio, streams it to Gemini, and publishes the translated audio back for the listener to hear.

The end-to-end flow

  1. The host streams vocal audio into the LiveKit Room
  2. A listener joins and selects a target language
  3. The backend spawns a Translation Bridge worker thread for that language
  4. The worker joins LiveKit as a bot and subscribes to the host audio track
  5. Raw PCM audio frames forward over WebSocket to the Gemini Live API
  6. Gemini returns translated audio, and the bot publishes it back to the room for playback

What Do You Need Before You Start?

Before building the translation app in 2026 you need four things in place. None of them require a paid tier to begin, since LiveKit and Google Cloud both offer free starting points.

How Do You Run It Locally?

You run the translation app locally in 2026 by installing dependencies, optionally starting a local LiveKit server, setting environment variables, and launching the dev server. This local loop lets you test the full speaker-to-listener path in two browser tabs before you touch the cloud.

Install and optional local LiveKit

npm install

# optional: run a local LiveKit dev server
docker run --rm -p 7880:7880 -p 7881:7881 -p 7882:7882/udp \
  -e LIVEKIT_KEYS="devkey: secret" \
  livekit/livekit:latest \
  --dev

Configure environment variables

Create a .env.local file with your LiveKit and Gemini credentials:

LIVEKIT_API_KEY=devkey
LIVEKIT_API_SECRET=secret
NEXT_PUBLIC_LIVEKIT_URL=ws://localhost:7880
LIVEKIT_URL=ws://localhost:7880
GEMINI_API_KEY=your-gemini-api-key-here

Launch and test

npm run dev

Open http://localhost:3000, use one tab as the Broadcast (host) page and another as the Watch (attendee) page, and confirm that speaking in one tab produces translated audio in the other.

Why Tune Audio to 100ms Chunks?

You tune the audio stream to 100ms chunks in 2026 because the default 20ms WebRTC interval transmits 50 times per second, which creates heavy network and CPU serialization overhead when you are running several translation bots at once. Moving to 100ms frames drops that to 10 times per second, dramatically reducing overhead, at the cost of only about 80ms of extra latency that listeners rarely notice.

const audioStream = new AudioStream(track, {
  sampleRate: this.inputSampleRate,
  numChannels: this.channels,
  frameSizeMs: 100, // 100ms frames cut transmission frequency to 10 Hz
});
Why this matters

The 100ms chunk decision is the single most important stability tuning in this build for 2026. Each listener language spins up its own bot, and at 50 Hz the per-frame overhead multiplies fast. Trading 80ms of latency for a fivefold drop in transmission frequency is what keeps a multi-language room smooth on a single instance.

How Do You Containerize It With Docker?

You containerize the app in 2026 with a multi-stage Dockerfile, and the one detail you cannot skip is installing CA certificates in the production image. Minimal base images like node:slim ship without SSL certificates, which makes the native LiveKit SDK fail its secure connection silently, a bug that is painful to diagnose if you miss it.

# --- Build stage ---
FROM node:22-slim AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build

# --- Production stage ---
FROM node:22-slim AS runner
RUN apt-get update && apt-get install -y ca-certificates && rm -rf /var/lib/apt/lists/*
WORKDIR /app
ENV NODE_ENV=production
ENV PORT=8080
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public
EXPOSE 8080
CMD ["node", "server.js"]

How Do You Deploy to Google Cloud Run?

You deploy to Google Cloud Run in 2026 in three moves: store secrets in Secret Manager, grant the compute service account access to them, then deploy from source. Cloud Run is a good fit here because it scales to zero when idle and runs your container without server management.

1. Store secrets in Secret Manager

source <(grep -v '^#' .env.local | sed 's/^/export /')

echo -n "$GEMINI_API_KEY" | gcloud secrets create gemini-api-key --data-file=-
echo -n "$LIVEKIT_API_KEY" | gcloud secrets create livekit-api-key --data-file=-
echo -n "$LIVEKIT_API_SECRET" | gcloud secrets create livekit-api-secret --data-file=-

2. Grant access to the compute service account

PROJECT_NUMBER=$(gcloud projects describe $(gcloud config get-value project) --format="value(projectNumber)")

gcloud secrets add-iam-policy-binding gemini-api-key \
  --member="serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com" \
  --role="roles/secretmanager.secretAccessor"

Repeat the binding for livekit-api-key and livekit-api-secret so the running service can read all three secrets.

3. Deploy the service

gcloud run deploy live-translate \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --min-instances 0 \
  --max-instances 1 \
  --timeout 3600 \
  --no-cpu-throttling \
  --set-secrets "GEMINI_API_KEY=gemini-api-key:latest,LIVEKIT_API_KEY=livekit-api-key:latest,LIVEKIT_API_SECRET=livekit-api-secret:latest" \
  --set-env-vars "LIVEKIT_URL=wss://your-project.livekit.cloud"

Each deploy flag is doing real work in 2026. The table below explains the ones that matter most.

FlagWhy it is set
--min-instances 0Scales to zero so idle cost is effectively nothing
--max-instances 1The session manager is a singleton, so one instance per room
--timeout 3600Allows sessions up to one hour long
--no-cpu-throttlingKeeps CPU allocated between requests for lag-free audio

Future updates are a one-liner, since secrets, env vars and scaling settings persist: gcloud run deploy live-translate --source . --region us-central1.

How Far Does This Scale?

The reference architecture in 2026 scales to roughly 15 to 20 simultaneous languages and around 200 to 300 attendees on LiveKit Cloud, with a single Cloud Run instance handling every translator bot for one room. The singleton session manager is what caps it at one instance per room, so going bigger means sharding rooms across multiple services rather than scaling a single instance.

DimensionReference limit (2026)
Simultaneous languages~15 to 20
Attendees (LiveKit Cloud)~200 to 300
Attendees (dev server)~50
Cloud Run instances per room1 (singleton)

What Does Real-Time Translation Unlock for Brands in 2026?

For brands in 2026, real-time translation collapses the language barrier that used to limit the reach of every live event. A product launch, an investor update, a creator livestream or a training webinar can now be heard natively by a global audience without hiring interpreters or running separate regional broadcasts. The marketing implication is simple: one event, many markets, no translation tax.

Distk Field Note

For an India D2C brand selling into the Gulf and Southeast Asia in 2026, a single founder-led launch livestream translated live into Arabic, Hindi and Bahasa is worth more than three separate regional events. It is cheaper, it keeps the founder's actual voice and energy through Gemini's natural delivery, and it turns one moment into simultaneous reach across markets. The build above is the practical path to that outcome.

Common Mistakes to Avoid in 2026

The technical win in 2026 is real-time translated voice on a serverless budget. The marketing win is that a single live moment now reaches every language your customers speak, in the speaker's own voice.

Gemini Live Translation App: FAQs

What does this app do in 2026?

It lets a speaker broadcast in one language while listeners worldwide hear real-time translated audio in their chosen language with natural voice. LiveKit handles WebRTC audio, the Gemini Live API does the translation, and Google Cloud Run hosts it.

What is the core architecture?

One LiveKit Room. The host publishes audio, and when a listener picks a language the backend spawns a Translation Bridge bot that subscribes to the host audio, streams PCM frames to Gemini over WebSocket, and publishes translated audio back.

Why use 100ms audio chunks?

Moving from 20ms to 100ms frames cuts transmission frequency from 50 Hz to 10 Hz, sharply reducing network and CPU overhead. The trade-off is about 80ms of extra latency, which is usually fine for live translation and improves stability.

How do you deploy it to Cloud Run?

Store keys in Secret Manager, grant the compute service account access, then run gcloud run deploy with --source. Key flags: --min-instances 0, --max-instances 1, --timeout 3600 and --no-cpu-throttling.

How many languages and listeners does it handle?

Around 15 to 20 simultaneous languages and 200 to 300 attendees on LiveKit Cloud, with one Cloud Run instance per room. Scaling beyond that means sharding rooms across services, since the session manager is a singleton.

Why does CA certificate installation matter?

Minimal base images like node:slim ship without SSL certificates, which makes the native LiveKit SDK fail its secure connection silently. Installing ca-certificates in the production stage prevents this hard-to-diagnose failure.

Turn live moments into global reach

Distk helps brands use AI like the Gemini Live API to reach every market in 2026, from multilingual launches to global webinars. We connect the build to the growth outcome.

Start the conversation →