Skip to main content
&Sageio
All posts

Blog

Igbo meeting translation: the tone is the word, and it isn't written down

Igbo has two tones plus downstep that change a word's meaning — but everyday Igbo text leaves them off. Here's how to translate an Igbo meeting right.

By Ming · · 8 min read

In Igbo, pitch is part of the word: two tones — high and low — plus a downstep that lowers a high tone, decide which word you actually said, and the same string of letters can mean four different things depending on the melody. The catch is that everyday Igbo writing leaves all of that off, so a model trained on text never really learned to hear tone and ends up guessing from context. Add vowel harmony that governs spelling and affixes, real dialect variation, and Igbo-English code-switching in professional speech, and a line like "supports Igbo" on a feature list tells you almost nothing. Here's what actually decides whether an Igbo meeting comes back usable.

The tone is the word, not the emphasis

Igbo is a tone language with two level tones — high and low — and the pitch is lexical: change the tone and you change the meaning, fluently and grammatically, not just the emphasis. The textbook set is akwa: with a high then a low tone, ákwà is "cloth"; low then high, àkwá is "egg"; high then high, ákwá is "crying"; low then low, àkwà is "bed." Same four letters, four unrelated words, told apart only by the melody you say them with. So when someone says akwa in a meeting, the tool isn't choosing between a right and a misspelled version — it's choosing between several real words, and the only thing separating them is pitch it has to have actually heard. Get the tone wrong and you don't get gibberish; you get a different, perfectly valid word sitting in the wrong place. On top of the two tones, Igbo has downstep: a high tone after another high can be realized lower, and that small drop is meaningful, so the tool can't just sort syllables into "high" and "low" — it has to track the pitch relative to what came before.

The tones aren't in the writing

Here's the part that quietly breaks tools. Igbo can be written with tone marks — the acute for high, the grave for low — but in everyday typing, chat, email, quick notes, Igbo speakers overwhelmingly drop them. People write akwa and let you work out from context whether they mean cloth, egg, crying, or a bed. That's normal, and a human reader handles it. But it means most of the Igbo text in the world is untoned, and a model trained mostly on that text has never been forced to learn what the tones sound like — it learned to recover the word from surrounding words instead. That's a fine strategy for reading; it's the wrong strategy for live audio, where the speaker actually produced the tone and the tool should be hearing it rather than inferring it. A tool built for Igbo has to listen to pitch — and to downstep — as a first-class signal, not reconstruct it after the fact from untoned text patterns. This is the same under-served-language problem behind Yoruba meeting translation, with the added twist that even fewer Igbo training texts carry the tones at all.

Vowel harmony, dialects, and the bilingual office

Three more things stack on top of the tones. First, Igbo has vowel harmony: its vowels split into two sets, and within a word — and across the prefixes and suffixes Igbo attaches to verbs — the vowels normally come from one set. That harmony governs spelling and affix shape, so a tool that ignores it produces forms a native speaker reads as wrong even when every consonant is right. Second, Igbo has significant dialect variation: Standard Igbo is one thing, and the regional varieties people grew up speaking differ in vocabulary and pronunciation, so a recognizer tuned to one register can mishear another. Third, Nigerian professional speech is normally Igbo and English woven together in one sentence: "Anyị ga-deploy feature a na sprint na-abịa" — "we'll deploy this feature next sprint" — is one ordinary line, English content words carried inside Igbo grammar. A tool that decides the sentence is "Igbo" may leave the English mangled; one that decides it's "English" drops the Igbo. Each reader needs a complete sentence rebuilt in their own language — not a half-translated lump with the code-switched half left as it was spoken. The harmony, the dialects, and the code-switching are how the room actually talks, and handling them cleanly is part of the job.

Why this specifically stresses real-time captioning

Live translation lives on a tension between latency and committing too early, and tone makes that tension sharper. The faster a tool shows you a caption, the less of the word and its context it has heard — and if it's guessing the tone from context rather than hearing it, the early guess is exactly where it goes wrong. Worse, once it prints a caption it has committed to a tonal reading; print akwa as "cloth" and the next clause reveals it meant "crying," and now the line has to be retracted and re-rendered, or left wrong. Downstep makes it harder still, because the right reading can depend on a pitch drop the tool only resolves once it has heard the syllable in context. A tool built for Igbo has to take the tone from the audio, hold the caption until the pitch resolves, and land it once. A fluent caption that quietly substitutes one real word for another is more dangerous than an obvious error, because nobody stops to question a sentence that reads perfectly well. For why these distinctions are easy to lose at speed, see how accurate is AI meeting translation.

How to do it with Sageio

  1. Add bot@sageio.net to your Google Meet calendar invite. It joins on its own — no extension, nothing to install.
  2. Each participant picks their caption language. The Igbo-speaking team reads clean Igbo, a colleague elsewhere reads clean English — both from the same spoken Igbo, at the same time. (Sageio translates into 20+ languages.)
  3. Everyone speaks naturally — full tone, vowel harmony, dialect, Igbo-English code-switching and all. Translated captions appear in about two seconds.
  4. Afterward, a searchable transcript and an AI summary arrive within about five minutes, shared at the host's discretion.

(Today this runs on Google Meet; Zoom and Microsoft Teams support is coming soon.)

How to test any tool in five minutes

Say the akwa set in context — ákwà ("cloth"), àkwá ("egg"), ákwá ("crying"), àkwà ("bed") — in four short sentences and check the captions pick the right word each time instead of repeating one reading. Then say a normal code-switched line ("Anyị ga-deploy feature a na sprint na-abịa" — "we'll deploy this feature next sprint") and see whether it keeps the English words whole while rendering the Igbo correctly. Finally, watch whether the captions land once or flash a guess and revise it after the next clause. If it collapses the tones to one word, garbles the English, or keeps retracting lines, the tool wasn't built for spoken Igbo.

Is it private?

For anything that joins your meetings: Sageio doesn't use your meeting content to train AI models, and its AI vendors are contractually restricted from doing the same. Audio is processed in memory and discarded — only the text transcript and summary are kept, encrypted, in the region you choose (US, EU, or APAC). Enterprise customers can self-host the entire stack.

Frequently asked questions

Why would an Igbo caption show the wrong word that still reads fine? Igbo is a tone language: pitch is part of the word, not emphasis. Ákwà ("cloth"), àkwá ("egg"), ákwá ("crying"), and àkwà ("bed") are the same letters distinguished only by tone. A tool that doesn't hear the pitch picks one real word over the others and produces a fluent, grammatical sentence that simply means something different.

Why do the missing tone marks matter? In everyday typing, Igbo tone marks are usually left off, so most written Igbo is untoned. A model trained mainly on that text learned to guess the word from context rather than hear the tone — which is the wrong instinct for live audio, where the speaker actually produced the pitch, including downstep, and the tool should be listening for it.

Does it handle Igbo-English code-switching? Yes — that's the point of testing on a real call. Nigerian professional speech mixes English content words into Igbo grammar, like "Anyị ga-deploy feature a." A tool that assumes one language per sentence translates only half; correct handling keeps the English whole and rebuilds a full sentence in each target language.

How fast are the translated captions? About two seconds, fast enough to keep a live conversation moving, with a searchable transcript and summary within about five minutes after the call.

What does it cost to try? Every plan starts with a free 60-minute trial, no credit card required. After that, Professional is $49/month and Teams is $99 per seat/month (annual billing includes 2 months free); Enterprise is custom-priced.


If your team works in Igbo, the honest test is whether a native speaker reads the live captions and hears the actual meeting — the tones landing on the right words, the code-switching kept whole. Add the bot to your next call and let them judge.