Sageio
All posts

Blog

Why Asian languages deserve first-class treatment

A small choice we made early on quietly determined most of what came afterward in Sageio's translation pipeline.

By Ming · May 23, 2026 · 3 min read

When we started building Sageio, we made one decision early that quietly determined most of what came afterward: Asian languages would be first-class citizens, not afterthoughts.

That sounds like a marketing line. It isn't. Here's what "first-class" actually means in our architecture, and why I think the rest of the meeting-translation industry has been getting this wrong.

What "first-class" looks like

Most multilingual meeting tools treat language support as a checkbox list. They start with English, then bolt on French, Spanish, German, and Portuguese. Asian languages — Traditional Chinese, Japanese, Korean, Vietnamese, Thai, Cantonese — get added later, often through the same generic engine, often with worse quality, often with no consideration for the specific failure modes of each script.

We did the opposite. The very first prototype of Sageio was built specifically to handle a meeting where:

If you build for that meeting, English-to-French is trivial in comparison. If you build for English-to-French first, you almost never close the gap on Cantonese.

Where the industry is wrong

Three specific failures we see in every other tool we've tested:

1. Simplified vs Traditional Chinese conflation. Most engines treat zh as one language. They aren't the same. The vocabulary, idiom, even punctuation differ. A Taiwanese reader getting Simplified output reads it as broken Mandarin from the mainland — culturally jarring at best, untrustworthy at worst.

2. Cantonese as "Mandarin dialect". Cantonese has its own grammar, its own particles, its own colloquialisms. Routing Cantonese audio through a Mandarin STT model produces text that's technically wrong on every line. We use a dedicated Cantonese path with separate post-processing.

3. Thai and Vietnamese diacritics getting stripped. Modern Asian language scripts encode tone and meaning in diacritics. Stripping them — which a lot of cheap translation pipelines silently do during normalization — turns "Việt Nam" into "Viet Nam" and changes meaning. We carry diacritics through every stage of the pipeline.

Why this matters for buyers

If your team includes anyone whose first language is one of these, the difference between "we technically support that language" and "we actually treat it as primary" shows up the moment they read the captions. The wrong word choice in Taiwanese Chinese is the same as a comma splice in English — readable, but it telegraphs that the writer doesn't really know the language.

That signal — "this product wasn't really built for me" — is the death of trust in any tool used in a high-stakes meeting.

What's next

We're extending the first-class treatment to more languages as we grow. Hindi, Bengali, and Arabic are next on the roadmap, each with the same scrutiny: hire native testers, build language-specific post-processing, never ship until a native speaker says the captions read naturally.

If you have a language we don't yet support to your satisfaction, let us know. We prioritize by where real customers ask, not by world-language ranking lists.

— Ming