The evolution of the lexical richness of the Russian language (corpus research based on diachronic datasets from the national corpus of the Russian language)

Keywords: lexical dynamics, lexical evolution, Russian lexis, Russian National Corpus, diachronic datasets, pre-revolutionary Russian, Soviet-period Russian, post-Soviet Russian

Abstract

Background. The relevance of this study is determined by the need to investigate lexical change in Russian using modern methods. The scientific novelty of the work lies in the development and application of complex statistical models and indices for a systematic quantitative analysis of Russian lexis based on new, previously unexplored material—frequency dictionaries for the periods 1700–1916, 1918–1991, and 1992–2016, with a total size of 250 million tokens. This made it possible to identify and quantitatively describe the dynamics of lexical richness and the structure of the vocabulary in a diachronic perspective.

Purpose. To determine the specific features of the dynamics of lexical richness in Russian on the basis of frequency dictionaries for the periods 1700–1916, 1918–1991, and 1992–2016.

Materials and methods. The material of the study consists of diachronic datasets of the Russian National Corpus for the periods 1700–1916, 1918–1991, and 1992–2016. The methods include computer-based corpus processing and testing for compliance with Zipf’s law, calculation of the Herfindahl–Hirschman Index (HHI), Simpson’s index, Berger–Parker index, Shannon entropy, the Type–Token Ratio (TTR) as a coefficient of lexical diversity, chi-square tests of statistical significance, and other related measures.

Results. Corpus-based analysis of diachronic data for the periods 1700–1916, 1918–1991, and 1992–2016 revealed a decrease in the overall lexical diversity and richness of Russian from the pre-revolutionary to the post-Soviet period. However, this lexical impoverishment occurs mainly at the expense of rare and low-frequency words, whereas the active vocabulary, by contrast, expands and becomes more productive.

Downloads

Download data is not yet available.

Author Biography

Tatiana A. Rychkova, Murmansk Arctic University

PhD in Philology, Associate Professor, Associate Professor of the Department of Philology, Intercultural Communications and Journalism

References

Завьялова, И. С., & Шерстинова, Т. Ю. (2022). О морфологических различиях в текстах русской малой прозы 1900–1930 гг. Человек: Образ и сущность. Гуманитарные аспекты, (2), 176–204. https://doi.org/10.31249/chel/2022.02.12. EDN: https://elibrary.ru/OEIGOJ

Комарькова, М. А. (2021). Тенденции лингвистических изменений в современном английском языке. Современное педагогическое образование, (6), 153–155. EDN: https://elibrary.ru/DNEYUT

Мартыненко, Г. Я., Шерстинова, Т. Ю., Попова, Т. И., Мельник, А. Г., & Замирайлова, Е. В. (2018). О принципах создания корпуса русского рассказа первой трети XX века. В кн.: Труды международной конференции по компьютерной и когнитивной лингвистике (с. 180–197). EDN: https://elibrary.ru/YFFGSO

Соловьёв, В. Д. (2012). Статистические методы анализа диахронических корпусов текстов как инструмент исследования языковой динамики. В кн.: Материалы международной конференции «Русский язык: функционирование и развитие» (с. 47). Казань: Казанский университет.

Черкасова, Г. А. (2015). Сопоставительные исследования коэффициентов «Лексического разнообразия» и «Лексического богатства» Ю. Н. Караулова на материале русских ассоциативных словарей. Вопросы психолингвистики, (25), 93–104. EDN: https://elibrary.ru/UDLHEJ

Шерстинова, Т. Ю., & Завьялова, И. С. (2022). Динамика дистрибуции частеречных и грамматических категорий в русском рассказе 1900–1930. В кн.: Русская грамматика в диалоге научных школ, направлений, методов (с. 324). EDN: https://elibrary.ru/LLVVYK

Шерстинова, Т. Ю. (2021). Русская литература 1900–1930: что изменилось в языке и стиле после Октябрьской революции? В кн.: Второй российский эстетический конгресс (с. 622–624). EDN: https://elibrary.ru/PZGGQT

Шерстинова, Т. Ю., Колпащикова, Е. О., Сейнова, А. Р., Максименко, П. И., & Родионов, Р. А. (2023). Русский рассказ 1900–1930 х и его восприятие читателем: опыт квантитативного анализа оценки художественного текста. Человек: Образ и сущность. Гуманитарные аспекты, (2), 164–184. https://doi.org/10.31249/chel/2023.02.09. EDN: https://elibrary.ru/GZYNIO

Юлдашева, Л. У. (2023). Исследование лексического массива русского языка: вопросы сохранения и потери слов в современной эпохе. Journal of Multidisciplinary Bulletin, 6(5), 458–466.

Blinova, O. V., Belov, S., & Revazov, M. A. (2021). Decisions of Russian constitutional court: lexical complexity analysis in shallow diachrony. В кн.: CEUR Workshop Proceedings (с. 61–74).

Bochkarev, V. V., Solovyev, V. D., Nestik, T. A., & Shevlyakova, A. V. (2024). Variations in average word valence of Russian books over a century and social change. Journal of Mathematical Sciences, 285(1), 14–27. https://doi.org/10.1007/s10958-024-07419-z. EDN: https://elibrary.ru/QYDSPS

Buntinx, V., & Kaplan, F. (2018). Negentropic linguistic evolution: A comparison of seven languages. В кн.: Digital Humanities 2018: Book of Abstracts / Libro de resúmenes.

Dunn, J., Coupe, T., & Adams, B. (2020, November). Measuring linguistic diversity during COVID 19. В кн.: Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science (с. 1–10). https://doi.org/10.18653/v1/2020.nlpcss-1.1

Fomin, V., Bakshandaeva, D., Rodina, Ju., & Kutuzov, A. (2019). Tracing cultural diachronic semantic shifts in Russian using word embeddings. В кн.: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2019” (Moscow, May 29 – June 1, 2019). Получено из: https://arxiv.org/pdf/1905.06837

Gries, S. T. (2021). Statistics for linguistics with R: A practical introduction (3rd ed.). Berlin: De Gruyter Mouton. https://doi.org/10.1515/9783110718256

Jost, L. (2006). Entropy and diversity. Oikos, 113(2), 363–375. https://doi.org/10.1111/j.2006.0030-1299

Kutuzov, A., & Kuzmenko, E. (2018). Two centuries in two thousand words: Neural embedding models in detecting diachronic lexical changes. В кн.: Quantitative approaches to the Russian language (с. 95–112). Routledge. https://doi.org/10.4324/9781315105048-5

Lyashevskaya, O., Vlasova, E., & Litvintseva, K. (2019). Lexical diversity and colour hues in Russian poetry: A corpus based study of adjectives. В кн.: P. Plecháč, M. Skulacheva, & R. Piłś (Eds.), Quantitative approaches to versification (с. 131–141). Institute of Czech Literature of the Czech Academy of Sciences.

Malvern, D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical diversity and language development (с. 16–30). Palgrave Macmillan UK.

MacKay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge: Cambridge University Press.

Piperski, A. (2023). Lexical diversity of Russian poets. В кн.: Literature, language and computing: Russian contribution (с. 113–120). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-99-3604-5_10

Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21(5), 1112–1130. https://doi.org/10.3758/s13423-014-0585-6. EDN: https://elibrary.ru/SFDFMF

Rosillo Rodes, P., San Miguel, M., & Sánchez, D. (2025). Entropy and type token ratio in gigaword corpora. Physical Review Research, 7(3), 033054. https://doi.org/10.48550/arXiv.2411.10227. EDN: https://elibrary.ru/XQTDHY

Sherstinova, T., & Martynenko, G. (2019, November). Linguistic and stylistic parameters for the study of literary language in the corpus of Russian short stories of the first third of the 20th century. В кн.: R. Piotrowski’s Readings in Language Engineering and Applied Linguistics: Proceedings of the III International Conference on Language Engineering and Applied Linguistics (PRLEAL 2019) (Saint Petersburg, Russia, с. 105–120).

Song, J., & Lei, L. (2025). Lexical bloom, syntactic retreat: Examining complexity trade offs within Classical Chinese evolution across two millennia. Corpus Linguistics and Linguistic Theory. https://doi.org/10.1515/cllt-2024-0125. EDN: https://elibrary.ru/ZMAJJX

Zipf, G. K. (1972). Human behavior and the principle of least effort: An introduction to human ecology. Hafner.

Список источников и словарей

Национальный корпус русского языка (НКРЯ) [Электронный ресурс]. (2003–2025). Скачиваемые корпуса. Получено 19.10.2025, из: https://ruscorpora.ru/page/corpora-datasets/

Диахронический словарь русской лексики [Электронный ресурс] / Казанский (Приволжский) федеральный университет, Институт филологии и межкультурной коммуникации. Получено 20.10.2025, из: https://kpfu.ru/philology-culture/struktura-instituta/nauchno-obrazovatelnye-centry-noc/noc-po-lingvistike-im-ia-boduena-de-kurtene/nil-39kvantitativnaya-lingvistika39/diahronicheskij-slovar.html

Засорина, Л. Н. (Ред.). (1977). Частотный словарь русского языка: около 40 000 слов. Москва: Русский язык.

Ляшевская, О. Н., & Шаров, С. А. (2009). Частотный словарь современного русского языка (на материалах Национального корпуса русского языка). Москва: Азбуковник. Получено 20.10.2025, из: http://dict.ruslang.ru/freq.php

Штейфельдт, Э. А. (1963). Частотный словарь современного русского литературного языка: 2500 наиболее употребительных слов: пособие для преподавателей русского языка. Таллин: Издательство «Юхисэлу».

Lönngren, L. (1993). The frequency dictionary of modern Russian. Acta Univ. Ups., Studia Slavica Upsaliensia. Uppsala.

Josselson, H. (1953). The Russian word count and frequency analysis of grammatical categories of standard literary Russian.

References

Zav’yalova, I. S., & Sherstinova, T. Yu. (2022). On morphological differences in the texts of Russian short prose from 1900 to 1930. Man: Image and Essence. Humanitarian Aspects, (2), 176–204. https://doi.org/10.31249/chel/2022.02.12. EDN: https://elibrary.ru/OEIGOJ

Komarkova, M. A. (2021). Trends in linguistic changes in modern English. Modern Pedagogical Education, (6), 153–155. EDN: https://elibrary.ru/DNEYUT

Martynenko, G. Ya., Sherstinova, T. Yu., Popova, T. I., Melnik, A. G., & Zamirailova, E. V. (2018). On the principles of creating a corpus of Russian short stories from the first third of the 20th century. In: Proceedings of the International Conference on Computational and Cognitive Linguistics (pp. 180–197). EDN: https://elibrary.ru/YFFGSO

Solovyev, V. D. (2012). Statistical methods for analyzing diachronic text corpora as a tool for studying language dynamics. In: Materials of the International Conference “Russian Language: Functioning and Development” (p. 47). Kazan: Kazan University.

Cherkasova, G. A. (2015). Comparative studies of the coefficients of “Lexical Diversity” and “Lexical Richness” by Yu. N. Karaulov based on Russian associative dictionaries. Journal of Psycholinguistics, (25), 93–104. EDN: https://elibrary.ru/UDLHEJ

Sherstinova, T. Yu., & Zav’yalova, I. S. (2022). Dynamics of distribution of part of speech and grammatical categories in Russian short stories of 1900–1930. In: Russian Grammar in the Dialogue of Scientific Schools, Directions, and Methods (p. 324). EDN: https://elibrary.ru/LLVVYK

Sherstinova, T. Yu. (2021). Russian literature of 1900–1930: what changed in language and style after the October Revolution? In: Second Russian Aesthetic Congress (pp. 622–624). EDN: https://elibrary.ru/PZGGQT

Sherstinova, T. Yu., Kolpashchikova, E. O., Seinova, A. R., Maksimenko, P. I., & Rodionov, R. A. (2023). Russian short story of 1900–1930 and its reader perception: an experience of quantitative analysis of literary text evaluation. Man: Image and Essence. Humanitarian Aspects, (2), 164–184. https://doi.org/10.31249/chel/2023.02.09. EDN: https://elibrary.ru/GZYNIO

Yuldasheva, L. U. (2023). Studying the lexical array of the Russian language: issues of preserving and losing words in the modern era. Journal of Multidisciplinary Bulletin, 6(5), 458–466.

Blinova, O. V., Belov, S., & Revazov, M. A. (2021). Decisions of Russian Constitutional Court: lexical complexity analysis in shallow diachrony. In: CEUR Workshop Proceedings (pp. 61–74).

Bochkarev, V. V., Solovyev, V. D., Nestik, T. A., & Shevlyakova, A. V. (2024). Variations in average word valence of Russian books over a century and social change. Journal of Mathematical Sciences, 285(1), 14–27. https://doi.org/10.1007/s10958-024-07419-z. EDN: https://elibrary.ru/QYDSPS

Buntinx, V., & Kaplan, F. (2018). Negentropic linguistic evolution: a comparison of seven languages. In: Digital Humanities 2018: Book of Abstracts / Libro de resúmenes.

Dunn, J., Coupe, T., & Adams, B. (2020, November). Measuring linguistic diversity during COVID 19. In: Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science (pp. 1–10). https://doi.org/10.18653/v1/2020.nlpcss-1.1

Fomin, V., Bakshandaeva, D., Rodina, Ju., & Kutuzov, A. (2019). Tracing cultural diachronic semantic shifts in Russian using word embeddings. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2019” (Moscow, May 29 – June 1, 2019). Получено из: https://arxiv.org/pdf/1905.06837

Gries, S. T. (2021). Statistics for linguistics with R: A practical introduction (3rd ed.). Berlin: De Gruyter Mouton. https://doi.org/10.1515/9783110718256

Jost, L. (2006). Entropy and diversity. Oikos, 113(2), 363–375. https://doi.org/10.1111/j.2006.0030-1299

Kutuzov, A., & Kuzmenko, E. (2018). Two centuries in two thousand words: Neural embedding models in detecting diachronic lexical changes. In: Quantitative approaches to the Russian language (pp. 95–112). Routledge. https://doi.org/10.4324/9781315105048-5

Lyashevskaya, O., Vlasova, E., & Litvintseva, K. (2019). Lexical diversity and colour hues in Russian poetry: A corpus based study of adjectives. In: P. Plecháč, M. Skulacheva, & R. Piłś (Eds.), Quantitative approaches to versification (pp. 131–141). Institute of Czech Literature of the Czech Academy of Sciences.

Malvern, D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical diversity and language development (pp. 16–30). Palgrave Macmillan UK.

MacKay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge: Cambridge University Press.

Piperski, A. (2023). Lexical diversity of Russian poets. In: Literature, language and computing: Russian contribution (pp. 113–120). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-99-3604-5_10

Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21(5), 1112–1130. https://doi.org/10.3758/s13423-014-0585-6. EDN: https://elibrary.ru/SFDFMF

Rosillo Rodes, P., San Miguel, M., & Sánchez, D. (2025). Entropy and type token ratio in gigaword corpora. Physical Review Research, 7(3), 033054. https://doi.org/10.48550/arXiv.2411.10227. EDN: https://elibrary.ru/XQTDHY

Sherstinova, T., & Martynenko, G. (2019, November). Linguistic and stylistic parameters for the study of literary language in the corpus of Russian short stories of the first third of the 20th century. In: R. Piotrowski’s Readings in Language Engineering and Applied Linguistics: Proceedings of the III International Conference on Language Engineering and Applied Linguistics (PRLEAL 2019) (Saint Petersburg, Russia, pp. 105–120).

Song, J., & Lei, L. (2025). Lexical bloom, syntactic retreat: Examining complexity trade offs within Classical Chinese evolution across two millennia. Corpus Linguistics and Linguistic Theory. https://doi.org/10.1515/cllt-2024-0125. EDN: https://elibrary.ru/ZMAJJX

Zipf, G. K. (1972). Human behavior and the principle of least effort: An introduction to human ecology. Hafner.

Sources and dictionaries

National Corpus of the Russian Language (NCRL) [Electronic resource]. (2003–2025). Downloadable corpora. Retrieved on October 19, 2025, from: https://ruscorpora.ru/page/corpora-datasets/

Diachronic dictionary of Russian vocabulary [Electronic resource] / Kazan (Volga Region) Federal University, Institute of Philology and Intercultural Communication. Retrieved on October 20, 2025, from: https://kpfu.ru/philology-culture/struktura-instituta/nauchno-obrazovatelnye-centry-noc/noc-po-lingvistike-im-ia-boduena-de-kurtene/nil-39kvantitativnaya-lingvistika39/diahronicheskij-slovar.html

Zasorina, L. N. (Ed.). (1977). Frequency dictionary of the Russian language: about 40 000 words. Moscow: Russkiy Yazyk.

Lyashevskaya, O. N., & Sharov, S. A. (2009). Frequency dictionary of modern Russian (based on materials from the National Corpus of the Russian Language). Moscow: Azbukovnik. Retrieved on October 20, 2025, from: http://dict.ruslang.ru/freq.php

Shteyfeldt, E. A. (1963). Frequency dictionary of modern standard Russian literature: 2500 most common words: a guide for Russian language teachers. Tallinn: Yuhiselu Publishing House.

Lönngren, L. (1993). The frequency dictionary of modern Russian. Acta Univ. Ups., Studia Slavica Upsaliensia. Uppsala.

Josselson, H. (1953). The Russian word count and frequency analysis of grammatical categories of standard literary Russian.


Published
2026-03-31
How to Cite
Rychkova, T. (2026). The evolution of the lexical richness of the Russian language (corpus research based on diachronic datasets from the national corpus of the Russian language). Russian Social and Humanitarian Studies, 18(1), 156-181. https://doi.org/10.12731/3033-5981-2026-18-1-541
Section
Applied Aspects of Linguistics