Skip to main content
&Sageio
All posts

Blog

Mongolian meeting translation: two scripts and vowel harmony

Mongolian is written in two scripts and runs on vowel harmony, and most tools handle neither. Why that breaks transcription — plus suffix-stacking and loan-mixing — and how to translate a Mongolian meeting correctly.

By Ming · · 6 min read

Most tools mishandle Mongolian for a reason they were never built around: Mongolian is written in two completely different scripts, and the grammar runs on vowel harmony — a rule where a single mis-heard vowel attaches the wrong suffix and changes the word. One side of the language is Cyrillic, the everyday script in the state of Mongolia since the mid-twentieth century; the other is the traditional vertical Mongolian script, written top-to-bottom, used in China's Inner Mongolia and under official revival in Mongolia. Add agglutinative suffix-stacking and the Russian-and-English loan-mixing of modern business speech, and "supports Mongolian" on a feature list tells you very little. Here's what actually decides whether a Mongolian meeting comes back usable.

One language, two scripts

Mongolian is written two ways, and they don't look or flow alike. In the state of Mongolia the everyday script is Cyrillic, adopted in the mid-twentieth century — left-to-right, with a few extra letters beyond the Russian alphabet. Alongside it is the traditional Mongolian script: a vertical alphabet written top-to-bottom in columns that run left-to-right, used in China's Inner Mongolia and under official revival in Mongolia. A transcript serving an Ulaanbaatar team and an Inner Mongolia office may need either, and a tool that renders only one fails the readers who need the other — including the vertical layout, which most caption engines simply can't lay out at all. Most tools quietly pick Cyrillic and hope your team only reads that one.

Vowel harmony decides the suffix

Mongolian grammar runs on vowel harmony: the vowels in a word's suffixes have to agree with the vowel class of the root — broadly, front vowels with front, back vowels with back. The same grammatical ending therefore has more than one spoken form, and which one is correct depends on the root. A recognizer that mis-hears one vowel doesn't just get a sound slightly wrong — it attaches the wrong harmonic form of the suffix, and the resulting word is wrong or incoherent. In a clean dictionary sentence that rarely shows; in a fast meeting with reduced vowels and overlapping speech, a model not tuned for the harmony rule produces suffixes that no Mongolian speaker would say. That's a wrong word in the transcript that a summary then builds on.

Agglutination and loanword mixing

Mongolian is agglutinative: it builds long words by stacking suffix after suffix onto a root — case, plurality, possession, tense, and more, all chained on. A tool that segments words wrong, or drops a suffix in the noise, changes who did what to whom. On top of that, modern business Mongolian mixes in Russian and English freely — Russian loans from decades of contact, English for newer technical and corporate terms — often with Mongolian suffixes attached to the borrowed word. A tool that detects "Mongolian" may leave the borrowed terms untranslated; one that detects "English" or "Russian" mishandles the Mongolian frame around them. Each reader needs a complete sentence rebuilt in their own language, not a half-translated line with the foreign words left raw. (For the same two-scripts problem in another Asian language, see Punjabi meeting translation.)

Why "supports Mongolian" isn't enough

A tool can list Mongolian, transcribe a clean dictionary sentence, and still fall apart on the second script, the vowel-harmony suffixes, and the Russian-and-English mixing your team actually speaks. The feature list won't tell you which. One real call will: does a native speaker read the captions and transcript and recognize how the room actually talked — the right suffix on each root, the script their office reads, the loanwords kept whole? For why this pattern repeats across Asian languages, see real-time translation for remote teams.

How to do it with Sageio

  1. Add bot@sageio.net to your Google Meet calendar invite. It joins on its own — no extension, nothing to install.
  2. Each participant picks their caption language. The Ulaanbaatar team reads clean Mongolian, a colleague abroad reads clean English — both from the same spoken Mongolian, at the same time. (Sageio translates into 20+ languages.)
  3. Everyone speaks naturally — stacked suffixes, Russian and English loans, all of it. Translated captions appear in about two seconds.
  4. Afterward, a searchable transcript and an AI summary arrive within about five minutes, shared at the host's discretion.

(Today this runs on Google Meet; Zoom and Microsoft Teams support is coming soon.)

How to test any tool in five minutes

Say a suffix-stacked word in a sentence — a root with several endings chained on, like a noun carrying plural, possession, and a case ending at once — and check whether the captions land the whole word, not a truncated stem. Then say a vowel-harmony pair: the same ending on a front-vowel root and a back-vowel root, where the suffix vowel has to flip to match. If the tool produces the same suffix form on both, or mangles the long word, it wasn't built for spoken Mongolian. While you're at it, check which script the captions render — and whether your Inner Mongolia readers get the one they actually use.

Is it private?

For anything that joins your meetings: Sageio doesn't use your meeting content to train AI models, and its AI vendors are contractually restricted from doing the same. Audio is processed in memory and discarded — only the text transcript and summary are kept, encrypted, in the region you choose (US, EU, or APAC). Enterprise customers can self-host the entire stack.

Frequently asked questions

Why does Mongolian have two scripts? In the state of Mongolia the everyday script is Cyrillic, adopted in the mid-twentieth century. The traditional Mongolian script — a vertical alphabet written top-to-bottom — remains in use in China's Inner Mongolia and is under official revival in Mongolia. A transcript may need either, and a tool that renders only one fails the readers who use the other, especially the vertical layout.

What is vowel harmony and why does it matter for transcription? Mongolian suffix vowels must agree with the vowel class of the root — front with front, back with back — so the same grammatical ending has more than one spoken form. A recognizer that mis-hears a vowel attaches the wrong harmonic form, and the word comes out wrong. It's a quiet failure that a summary then carries forward.

Does business Mongolian really mix in Russian and English? Yes. Decades of contact left Russian loanwords in everyday and technical speech, and English supplies newer corporate and technical terms — often with Mongolian suffixes attached. Tools that assume one language per sentence translate only part; correct handling rebuilds a full sentence in each target language.

How fast are the translated captions? About two seconds, fast enough to keep a live conversation moving, with a searchable transcript and summary within about five minutes after the call.

What does it cost to try? Every plan starts with a free 60-minute trial, no credit card required. After that, Professional is $49/month and Teams is $99 per seat/month (annual billing includes 2 months free); Enterprise is custom-priced.


If your team works in Mongolian, the honest test is whether a native speaker reads the live captions and transcript and hears the actual meeting — the right suffix on each root, the script their office reads, the Russian and English kept whole. Add the bot to your next call and let them judge.