Skip to main content
&Sageio
All posts

Blog

Sinhala meeting translation: spoken Sinhala drops what written Sinhala requires

Spoken Sinhala leaves out the agreement and inflections written Sinhala demands, so tools trained on text mishear meetings. Why that breaks transcription, plus script and code-mixing — and how to get it right.

By Ming · · 5 min read

The reason most tools mishear a Sinhala meeting is that spoken Sinhala deliberately drops grammar that written Sinhala insists on — and the models were trained on the written form. Sinhala has one of the steepest written-versus-spoken splits in South Asia: in writing, verbs agree with the subject in person, number, and gender; in speech, people drop that agreement and use a single plain verb form. A recognizer tuned to the textbook expects the inflected ending and stumbles when the room uses the bare spoken one. If your team has a Colombo office, here's what actually decides whether the captions and transcript are usable.

Diglossia: the meeting is in a register the model never read

Linguists call this diglossia — two forms of the same language side by side, one literary, one spoken, and the gap is wide. Written Sinhala conjugates the verb to match the subject (mama karami, "I do," with the "I" ending). Spoken Sinhala uses one invariant form for everyone (mama karanawa — and eyaa karanawa, "he does," same verb). The spoken version is not sloppy Sinhala; it's the Sinhala people actually speak, all day, in every meeting. A model trained mostly on news, books, and subtitles — overwhelmingly written Sinhala — meets the spoken register and transcribes it as a chain of near-misses that drift further every sentence.

The script carries the vowels inside the letters

Sinhala is an abugida: each consonant carries an inherent vowel, and other vowels are marked with signs attached above, below, before, or after the letter. The rounded letterforms are distinctive, but the rendering is unforgiving — a vowel sign placed wrong, or dropped because the font lacks the glyph, changes the syllable. A transcript that's going to be read back by the Colombo team has to render those combining signs correctly, not approximate them or fall back to boxes.

Singlish-style mixing is the corporate register

In Colombo tech and business, professional Sinhala runs heavily mixed with English — English nouns and verbs in a Sinhala frame, often with Sinhala endings attached. "Mē feature eka next release ekata deploy karanna ōnə" is one normal sentence: English content words, Sinhala grammar, Sinhala obligation. A tool that detects "Sinhala" may leave the English untranslated; one that detects "English" leaves the Sinhala. Each reader needs a complete sentence rebuilt in their own language, not a half-translated line.

Why "supports Sinhala" isn't enough

A tool can list Sinhala, transcribe a clean written-Sinhala demo sentence perfectly, and still fall apart on the spoken register, the combining vowel signs, and the English mixing your team actually speaks. The feature list won't tell you which. One real call will: does a native speaker read the captions and transcript and recognize how the room actually talked? For why this pattern repeats across Asian languages, see real-time translation for remote teams.

How to do it with Sageio

  1. Add bot@sageio.net to your Google Meet calendar invite. It joins on its own — no extension, nothing to install.
  2. Each participant picks their caption language. The Colombo team reads clean Sinhala, a colleague abroad reads clean English — both from the same spoken Sinhala, at the same time. (Sageio translates into 20+ languages.)
  3. Everyone speaks naturally — spoken register, English mixing, all of it. Translated captions appear in about two seconds.
  4. Afterward, a searchable transcript and an AI summary arrive within about five minutes, shared at the host's discretion.

(Today this runs on Google Meet; Zoom and Microsoft Teams support is coming soon.)

How to test any tool in five minutes

Say a normal spoken sentence — "mama heta enawa" ("I'll come tomorrow," spoken form) — and check whether the captions catch the invariant spoken verb or stumble because they expected the inflected written one. Then say a mixed line ("mē eka ada finish karanna ōnə" — "this must be finished today") and see whether the English word stays whole and the Sinhala renders with the right vowel signs. If the spoken forms trip it up, the tool learned textbook Sinhala, not meeting Sinhala.

Is it private?

For anything that joins your meetings: Sageio doesn't use your meeting content to train AI models, and its AI vendors are contractually restricted from doing the same. Audio is processed in memory and discarded — only the text transcript and summary are kept, encrypted, in the region you choose (US, EU, or APAC). Enterprise customers can self-host the entire stack.

Frequently asked questions

Why is spoken Sinhala harder to transcribe than written Sinhala? Because Sinhala is diglossic and the gap is large. Written Sinhala conjugates verbs to agree with the subject; spoken Sinhala drops that and uses one plain form for everyone. Most models are trained on written Sinhala, so a meeting held in the spoken register comes back as a chain of near-misses. A tool has to be built for how Sinhala is actually spoken.

Why does the Sinhala script matter for accuracy? Sinhala is an abugida — vowels attach to consonants as signs above, below, or beside the letter. If a tool renders those combining signs wrong, or the font lacks a glyph, the syllable changes or shows as boxes. Reliable rendering matters as much as reliable recognition.

Does Sinhala-English code-mixing affect meetings? Yes. Professional Sinhala in Colombo mixes English nouns and verbs into a Sinhala frame ("deploy karanna ōnə"). Tools that detect one language per sentence translate only half; correct handling rebuilds a complete sentence in each target language.

How fast are the translated captions? About two seconds, fast enough to keep a live conversation moving, with a searchable transcript and summary within about five minutes after the call.

What does it cost to try? Every plan starts with a free 60-minute trial, no credit card required. After that, Professional is $49/month and Teams is $99 per seat/month (annual billing includes 2 months free); Enterprise is custom-priced.


If your team works in Sinhala, the honest test is whether a native speaker reads the live captions and transcript and hears the actual meeting — the spoken register caught, the vowel signs right, the English kept whole. Add the bot to your next call and let them judge.