Technical Details
The Yandex team has improved the archive search function by introducing a new document recognition model called Alice AI VLM. The service now not only recognizes the text of an archive file but also structures the information, highlighting the roles of event participants and connections between people. This allows users to immediately see the name of the person they need and find data about their ancestors more quickly.
Background and Context
The Yandex archive search service helps users quickly find mentions of people, places, and events in handwritten documents from the 18th to 20th centuries. The service's database contains over 20 million pages of historical documents from archives in various regions of Russia, as well as information from over 200 pre-revolutionary and Soviet newspapers and directories.
Industry Impact
The updated service is based on Yandex's multimodal model Alice AI VLM, which has a deep understanding of the Russian language and images. As the developers noted, this has made it possible to achieve high search accuracy - on average 90.5%, and up to 92.7% for birth records. The new model allows users to set filters by events and roles, such as 'born', 'father', 'mother' for birth documents or 'groom', 'bride', 'witness' for marriage certificates.