Principles of Developing a Chinese-Russian Polysemantic Dictionary as a Means of Improving Interpretability of Neural Machine Translators
https://doi.org/10.24833/2687-0126-2025-7-1-89-107
Abstract
This research addresses the challenge of polysemy in neural machine translation (NMT), particularly for the Chinese-Russian language pair, known for its significant interlingual and intercultural asymmetry. Despite considerable advancements in NMT, the accurate translation of polysemous words remains a key obstacle to achieving high-quality automated text generation, often leading to misinterpretations and hindering effective communication. Currently, methodologies for developing specialized dictionaries that can effectively address this issue for NMT systems are lacking. This article aims to define the qualitative characteristics for detailed polysemantic dictionaries designed to enhance the interpretability of NMT, specifically for Chinese-Russian translation. The study employs eco-cognitive modeling of professional translator communication to investigate human-machine interaction in handling lexical ambiguity, focusing on the cognitive processes involved in disambiguation. Parallel Chinese-Russian texts serve as the material, subjected to manual processing to identify polysemous units challenging for NMT. The article proposes a theoretical framework for bilingual dictionary compilation based on this manual analysis, outlining principles for structuring dictionary entries to capture subtleties of lexical usage. The developed algorithm details the manual processing of parallel texts and the design of dictionary entry schemes tailored for NMT. The research identifies key qualitative characteristics for detailed Chinese-Russian parallel training corpora. These include linguistic and definitional parameters, comprehensive dictionary representation, and translation variability informed by lexico-grammatical compatibility, discourse-genre affiliation, and conceptual-categorical taxonomy. This study contributes to translation theory by offering a practical approach to enhance NMT interpretability through targeted dictionary development. The findings are relevant for improving machine translation quality, particularly for complex language pairs, ultimately facilitating more effective cross-lingual communication and knowledge exchange in different spheres, including business and academic research.
References
1. Casas, N., Costa-juss`a, M.R., Fonolossa, J.A.R., Alonso, J.A., & Fanlo, R. (2018). Linguistic knowledge-based vocabularies for Neural Machine Translation. Natural Language Engineering, 27(4), 1-22. doi:10.1017/S1351324920000364
2. Chistova, E.V. (2022). Ekokognitivnaya model’ professional’noj mul’timodal’noj kommunikacii (na primere kejsa sinhronnyh perevodchikov) [Eco-cognitive model of professional multimodal communication (using the case of simultaneous interpreters as an example)] [Doctoral dissertation, Siberian Federal University]. Krasnoyarsk.
3. Cossa. (2018, February 28). Kak rabotaet neiroset’ Google Translate [How the Google Translate neural network works]. https://www.cossa.ru/trends/196086/ (in Russian).
4. Dashevskaya, G.Ya., & Kondrashevskij, A.F. (2003). Kitajskij yazyk dlya delovogo obshcheniya [Chinese for Business Communication]. Moscow: Muravey (in Russian).
5. Ershova, Yu.N., & Sannikova, Yu.A. (2024). Nejrosetevye arhitektury dlya resheniya zadach leksikografii [Neural network architectures for solving lexicography tasks]. Naukosfera, 10(2), 236-242 (in Russian). doi:10.5281/zenodo.13969346
6. Faheem, M.A., Wassif, K.T., Bayomi, H., & Abdou, Sh.M. (2024). Improving neural machine translation for low resource languages through non parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation. Scientific Reports, 14(1), 2265. https://doi. org/10.1038/s41598-023-51090-4
7. Finansovyi slovar’ [Financial dictionary]. (n.d.). Retrieved March 13, 2024, from https://dic.academic.ru/dic.nsf/fin_enc/30557 (in Russian).
8. Grazhdanskii kodeks Rossiiskoi Federatsii. Stat’ya 153 [Code of civil laws of the Russian Federation]. (n.d.). Gardium. Retrieved December 13, 2024, from https://base.garant.ru/10164072/172a6d689833ce3e42dc0a8a7b3cddf9/ (in Russian).
9. Kitaisko-russkii slovar’ Mul’titran [Multitran. Chinese-Russian dictionary]. (n.d.). Retrieved December 17, 2024, from https://www.multitran.com/m.exe?l1=17&l2=2
10. Kokanova, E.S., & Pak, N.S. (2024). “Rabochie” priemy predredaktirovaniya teksta anglijskoj medicinskoj instrukcii dlya mashinnogo perevoda na russkij i belorusskij yazyki [“Working” techniques for pre-editing the text of English medical instructions for machine translation into Russian and Belarusian]. Anglistika v tret’em tysyacheletii: novye podhody i puti razvitiya. tezisy dokladov Mezhdunarodnoj nauchnoj konferencii [English language studies in the third millennium: new approaches and ways of development: proceedings of the International scientific conference] (pp. 69-70). Minsk.
11. Kokanova, E.S., Berendyaev, M.V., & Kulikov, N.Yu. (2022). Pre-editing English news texts for machine translation into Russian. Language Studies and Modern Humanities, 4(1), 25-30. https://www.doi.org/10.33910/2686-830X-2022-4-1-25-30
12. Kompaniya Yandeks — Tekhnologii — Mashinnyi perevod [Yandex Company – Technologies – Machine Translation] (n.d.). Retrieved March 13, 2024, from https://yandex.ru/company/ technologies/translation (in Russian).
13. Kuo, Ch. (2024). Mashinnyj perevod tekstov v oblasti tradicionnoj kitajskoj mediciny [Machine translation of texts in the field of traditional Chinese medicine]. In E.S. Kokanova (Ed.), Ot mashinnogo perevoda k mashinnomu obucheniyu [From machine translation to machine teaching]: a collection of scientific articles (pp. 70-73). Archangelsk (in Russian).
14. Kurakin, G. (2024). How AI originates from biology – and how it returns to it. The Biochemist, 46(2), 3–6. https://doi.org/10.1042/bio_2024_120
15. Mashinnyi perevod. Innovatsii i vliyanie na perevodcheskie uslugi [Machine translation. Innovations and their influence on translation services]. (n.d.). Apriori linguistic services. Retrieved June 20, 2024, from https://apriori-ltd.ru/apriori-news-blogs-and-articles/tpost/2d59h4s0i1-mashinnii-perevod-innovatsii-i-vliyanie (in Russian).
16. Miftakhova, R.G. (2017). Metody popolneniya korpusnyh dannyh v statisticheskom mashinnom perevode [New approaches to resolving the problem of corpora data shortage]. Doklady Bashkirskogo Universiteta, 2(1), 97-103 (in Russian).
17. Morentsova, A.V. (2019). Ustranenie leksicheskoj mnogoznachnosti pri mashinnom perevode: ot terminologicheskih slovarej k ontologii predmetnoj oblasti [Elimination of Lexical Ambiguity in Machine Translation: From Terminological Dictionaries to the Domain Ontology]. Aktual’nye nauchnye issledovaniya v sovremennom mire [Relevant Research in the Modern World], 3-5(47), 69-73 (in Ukranian).
18. Mukabenov, K.I., & Akhmadullina, E.N. (2023). Osnovnye problemy mashinnogo perevoda i puti ih resheniya [Main challenges of machine translation: the ways to address]. Problemy yazyka i perevoda v trudah molodyh uchenyh [Language and translation issues in the works of young researchers], 22, 176-181 (in Russian).
19. Promt. Glossary. (n.d.). Technologies. In Promt. Retrieved March 13, 2024, from https://www.promt.ru/company/technology/glossary
20. Resiandi, K., Murakami, Y., & Nasution, A.H. (2023). Neural Network-Based Bilingual Lexicon Induction for Indonesian Ethnic Languages. Applied Sciences, 13(15), 8666. https://doi. org/10.3390/app13158666
21. Rudneva, R. (2023, June 3). Tranzaktsiya – ehto [Transaction is]. Banki.ru. https://www.banki.ru/wikibank/tranzaktsiya/ (in Russian).
22. Soldatkin, D. (2023, July 19). Tranzaktsiya [Transaction]. Biznes-sekrety [Business secrets]. https://secrets.tinkoff.ru/glossarij/tranzaktsiya/?internal_source=copypaste (in Russian).
23. Tolkovyi slovar’ Ozhegova onlain [Ozhegov’s defining dictionary online]. (n.d.). Retrieved March 13, 2024, from https://slovarozhegova.ru/word.php?wordid=772 (in Russian).
24. Tranzaktsiya [Transaction]. (2020, January 22). MyFin. Slovar’ bankovskikh terminov [MyFin. Dictionary of banking terms]. Retrieved March 13, 2024, from https://myfin.by/wiki/term/tranzakciya (in Russian).
25. Ubozhenko, I.V. (2016). O kognitivnom modelirovanii intuicii i tvorchestva v perevode: interpretativno-semioticheskij podhod [On cognitive modeling of intuition and creativity in translation: interpretive and semiotic approaches]. Vestnik Sankt-Peterburgskogo universiteta. Seriya 9. Filologiya. Vostokovedenie. Zhurnalistika [Vestnik of Saint Petersburg University. Language and Literature], 4, 122-141 (in Russian). doi:10.21638/11701/spbu09.2016.410
26. Vella, T.M. (2013). Perevodcheskie konstanty interpretativnoj teorii perevoda [Translation constants of the interpretative translation theory]. Izvestiya Voronezhskogo gosudarstvennogo pedagogicheskogo universiteta [Izvestia. Voronezh State Pedagogical University], 2(261), 204-206 (in Russian).
27. Wang, J. (2022). Research on Cultural Translation Based on Neural Network. In N. Jan (Ed.), Mathematical Problems in Engineering. https://doi.org/10.1155/2022/6330814
28. Zacharias, T., Taklikar, A., & Giryes, R. (2022). Extending the Vocabulary of Fictional Languages using Neural Networks. Workshop Machine Learning for Creativity and Design. doi:10.48550/arXiv.2201.07288.
29. Zhonga. Kitaiskii slovar’ i perevodchik onlain – Chzhunga [Zhonga. Online Chinese Dictionary]. (n.d.). Retrieved March 13, 2024, from https://www.zhonga.ru/.
Review
For citations:
Chistova E.V. Principles of Developing a Chinese-Russian Polysemantic Dictionary as a Means of Improving Interpretability of Neural Machine Translators. Professional Discourse & Communication. 2025;7(1):89-107. (In Russ.) https://doi.org/10.24833/2687-0126-2025-7-1-89-107