Google's Gemma 3 to serve as the foundation for a Ukrainian LLM

The Ministry of Digital Transformation and Kyivstar have chosen the open Gemma 3 model as the basis for creating a Ukrainian large language model. It will be adapted to Ukrainian, further trained on national data, and tested using their own benchmarks.

262
Share:

The Ministry of Digital Transformation, together with Kyivstar, has identified Google's Gemma 3 as the base model for training a national large language model.

Technical capabilities of Gemma 3

Gemma 3 supports about 140 languages, including Ukrainian. The model is designed to work with long contexts — up to 128,000 tokens — and has multimodal capabilities that allow it to process not only text but also images.

Tuning for Ukrainian

They plan to adapt the model to the specifics of Ukrainian: modernize the tokenizer for better word recognition, fine-tune it on unique Ukrainian-language corpora, and create bespoke test sets to evaluate quality.

Gemma has previously been used in Ukrainian projects, including Lapa LLM and MamayLM, as well as in the development of the Bulgarian model BgGPT.

In addition, the Diia.AI chatbot is planned to be migrated from Gemini to the national language model. A team has also been formed in Ukraine to work on creating a homegrown large language model.

World news