Training sources
Materials are coming from more than 90 government institutions — from court registries and educational publishers to regional archives and documents related to Russia's actions during the full-scale invasion. These corpora will be used as the basis for training a national large language model that will be based on Google's open Gemma architecture.
Training location and security
Training will take place abroad on secure graphics processors (GPUs) provided by Google. After completion, the model is planned to be deployed in Ukrainian data centers. Among the project's technology partners is Kyivstar; an exact launch date has not yet been determined.
The development team is preparing for possible cyberattacks. The Ministry of Digital Transformation warns that immediately after public launch the system may become a target, as has happened with other AI services. Measures against 'prompt injection' — attempts to insert malicious instructions into user queries — are being considered.
It was recently announced that the Ukrainian large language model will be trained using Gemma. A team that will work on developing the national LLM has already been formed in Ukraine.