We've all experienced it. You're chatting with someone who speaks another language, and every single message comes with an agonizing two-second delay. You send a text, wait. They reply, you wait. It completely kills the natural rhythm of a conversation. I keep thinking about how we've accepted this artificial lag as the cost of doing business globally.
When Google released Gemini 3.1 Flash-Lite, they made specific claims about its optimization for high-volume, latency-sensitive tasks. Translation was specifically called out. I wanted to see if it actually delivered on those claims, rather than just reading the press release.
Here is how you can build a streaming translation API using the new model, keeping latency low enough that conversations actually feel like conversations.
Getting your API key
Before writing any code, you need access to the model. Google AI Studio handles the provisioning.
If you don't have an account, head to Google AI Studio and sign in. Click the "Get API key" button in the left sidebar. Create a new key in a new or existing Google Cloud project. It takes about thirty seconds.
Copy that key and set it as an environment variable in your terminal. You'll need it for the Python SDK.
export GEMINI_API_KEY="your_api_key_here"
Setting up the environment
We'll use the official google-genai SDK. It handles the API requests and streaming logic smoothly.
Create a new directory for your project, set up a virtual environment, and install the required package:
mkdir gemini-translator
cd gemini-translator
python -m venv venv
source venv/bin/activate
pip install google-genai
Writing the streaming translation script
The secret to a real-time feel isn't just a fast model. It's streaming the response. If you wait for the entire paragraph to translate before showing it to the user, you've already lost. You need to push tokens to the screen the millisecond they are generated.
Create a file named translate.py. We are going to build a simple function that takes a target language and the input text, then streams the output back to the console.
import os
import sys
from google import genai
from google.genai import types
def stream_translation(target_language: str, text_to_translate: str):
# The SDK automatically picks up the GEMINI_API_KEY environment variable
client = genai.Client()
prompt = f"Translate the following text to {target_language}. Only output the translation, nothing else.\n\nText: {text_to_translate}"
print(f"Translating to {target_language}...\n")
print("Output: ", end="", flush=True)
try:
# We explicitly call the 3.1 Flash-Lite model
response = client.models.generate_content_stream(
model='gemini-3.1-flash-lite',
contents=prompt,
config=types.GenerateContentConfig(
temperature=0.1, # Keep it low for accurate translation
)
)
for chunk in response:
# Print each chunk as it arrives without adding newlines
print(chunk.text, end="", flush=True)
except Exception as e:
print(f"\nError during translation: {e}")
print("\n")
if __name__ == "__main__":
if len(sys.argv) < 3:
print("Usage: python translate.py <target_language> <text>")
sys.exit(1)
target_lang = sys.argv[1]
input_text = " ".join(sys.argv[2:])
stream_translation(target_lang, input_text)
Notice the temperature setting. I knocked it down to 0.1. You generally want translation to be deterministic and precise, not creative. We use generate_content_stream to yield the text chunks.
Running the latency test
Let's see if the Flash-Lite model is actually fast. I ran a quick terminal test translating a block of English text into Spanish.
python translate.py Spanish "The architecture of the new model allows for parallel processing of tokens, significantly reducing the time to first byte. This is especially useful for applications where users expect immediate feedback."
The results were noticeably fast. The first byte hit my terminal in under 400 milliseconds. The entire sentence finished streaming within a second. It feels instantaneous. If you plug this logic into a WebSocket connection for a web app, the user on the other end would see the words appearing almost as fast as the sender types them.
Official Links
- Project Page / Demo: Google AI Studio
Wrapping up
Building a translation API that doesn't annoy your users comes down to model choice and streaming implementation. Gemini 3.1 Flash-Lite handles the latency side of the equation well.
If you want to take this further, try wrapping this script in a FastAPI endpoint and serving it via WebSockets to a frontend chat interface. Let me know what you build.