The Ministry of Digital Transformation, together with Kyivstar, has identified Google's Gemma 3 as the base model for training a national large language model.
Technical capabilities of Gemma 3
Gemma 3 supports about 140 languages, including Ukrainian. The model is designed to work with long contexts — up to 128,000 tokens — and has multimodal capabilities that allow it to process not only text but also images.
Tuning for Ukrainian
They plan to adapt the model to the specifics of Ukrainian: modernize the tokenizer for better word recognition, fine-tune it on unique Ukrainian-language corpora, and create bespoke test sets to evaluate quality.
Gemma has previously been used in Ukrainian projects, including Lapa LLM and MamayLM, as well as in the development of the Bulgarian model BgGPT.
In addition, the Diia.AI chatbot is planned to be migrated from Gemini to the national language model. A team has also been formed in Ukraine to work on creating a homegrown large language model.