The State Archives of Ukraine (Ukrderzharkhiv) has transferred approximately 10 terabytes of data to train the AI model "Siayvo" — equivalent to roughly 70,000 books. According to Acting Minister of Digital Transformation Oleksandr Bornyakov, a significant portion of these materials has never been used in similar projects before. To put this in perspective: the entire English-language Wikipedia weighs approximately 21 GB — the archive transferred nearly five times more.
A State Project Without State Funding
The most unusual detail about "Siayvo" is its financing scheme. Kyivstar is covering all development costs, after which the model will be transferred to the state. As Bornyakov explains in a column for AIN, the logic is straightforward: "in the conditions of war, every budget hryvnia must go to defense". In return, the operator receives a reputational and commercial asset — and priority access to the model.
An open technical foundation was chosen: Gemma 3 from Google, which the Digital Transformation Ministry team will refine together with Kyivstar using Ukrainian data. This same architecture already served as the basis for the first Ukrainian LLMs — MamayLM and Lapa LLM, as well as the Bulgarian BgGPT. In other words, "Siayvo" is not built from scratch, but rather a deep adaptation of an existing open model to the language and context.
50+ Organizations and the Paper Problem
Over 50 organizations have already joined the initiative — businesses, media, universities, and research institutions. The Digital Transformation Ministry continues an open call for partners: seeking news, textbooks, scientific literature, fiction, and archival materials.
"The most important part of the work is data preparation. For an effective Ukrainian model, we need not just internet texts, but also historical archives and other written sources."
Sud.ua, on preparing the "Siayvo" dataset
However, there is a specific problem: a significant portion of materials still exists only on paper. Digitizing archives, which in peacetime would have been a matter of convenience, has suddenly become critical for the model's quality.
The Name Was Chosen by 136,000 People
"Siayvo" won the vote in the "Diia" app among over 136,000 participants — with a result of 22,601 votes from ten finalists selected from over 3,000 proposals. The margin from second place was approximately three thousand votes.
Open beta testing for everyone is planned for the end of spring 2026. The long-term goal is more ambitious: by 2030, Ukraine wants to enter the global top-3 in AI development.
The real question, which will be answered during beta testing: will 10 TB of archival texts — combined with the rest of the dataset — provide sufficient quality understanding of context for "Siayvo" to surpass publicly available models precisely where they traditionally fail: in the nuances of Soviet bureaucratic legacy, dialects, and documents that never made it to the internet.