Креирање алгоритма за откривање лексичких минимума у српском као страном језику на нивоу А1

Лука М. Меденица, Универзитет у Београду, Филолошки факултет, Београд, Србија, имејл: luka.medenica@fil.bg.ac.rs
Милена С. Опарница, Истраживачко-развојни институт за вештачку интелигенцију, Нови Сад, Србија
Иновације у настави, XXXVII, 2024/2, стр. 154–165

| PDF | | Extended summary PDF |
DOI: 10.5937/inovacije2402154M

Резиме: Програми за аутоматску корекцију и адаптацију текста имају све већи утицај на савремену лингводидактику. Имајући у виду да се такви алати не користе у настави српског као страног језика, циљ нашег рада био је одређивање лексичких минимума за ниво А1 у српском као страном језику и креирање алгоритма за њихово откривање, односно предузимање првих корака за развој програма за аутоматску корекцију и адаптацију текста. У истраживању смо користили методологију која се ослања на лингводидактику руског као страног језика, имајући у виду развијене корпусе лексичких минимума у том словенском језику и постојање онлајн-алата Текстометр за проверу сложености текста. Тако смо дошли до 783 лексеме, које представљају списак лексичких минимума за ниво А1 у српском као страном језику. Затим смо за потребе рада креирали алгоритам у програмском језику Python, који смо испитали на конкретном тексту и установили одређене недостатке када је лематизација текста у питању. У наредном периоду је потребно извршити дораду лематизатора ради развоја програма за аутоматску корекцију текста.

Кључне речи: српски као страни језик, ниво А1, лексички минимуми, аутоматска корекција и адаптација текста

Summary: Programs for automatic text correction and adaptation have an increasing influence on modern language didactics. Given that such tools are not used in teaching Serbian as a foreign language, the goal of our research was to determine the lexical minimums for level A1 in Serbian as a foreign language and create an algorithm for their detection, i.e., to take the first steps in developing a program for automatic text correction and adaptation. In the research, we used a methodology that relies on the linguo-didactics of Russian as a foreign language, taking into account the developed corpora of lexical minimums in that Slavic language and the existence of the online tool Textometr for checking the complexity of the text. In this way, we identified 783 lexemes that represent the list of lexical minimums for level A1 in Serbian as a foreign language. Then, for the purposes of the paper, we created an algorithm in the Python programming language, which we tested on a specific text and found certain shortcomings when it comes to lemmatization of the text. In the following period, it is necessary to refine the lemmatizer in order to develop a program for automatic text correction.

Keywords: Serbian language as a foreign language, level A1, lexical minimum, automatic correction and text adaptation

Литература

Alarcon, R., Moreno, L., Segura-Bedmar, I., & Martinez, P. (2019). Lexical simplification approach using easy-to-read resources. Procesamiento de Lenguaje Natural, 63, 95–102. https://doi.org/10.26342/2019-63-10
Alharbi, W. (2023). AI in the Foreign Language Classroom: A Pedagogical Overview of Automated Writing Assistance Tools. Education Research International, 1–15. https://doi.org/10.1155/2023/4253331
Andriushina, N. P, & Kozlova, T. V. (2014). Leksicheskii minimum po russkomu iazyku kak inostrannomu. Elementarnyi uroven. Obshchee vladenie. Zlatoust.
Burstein, J., Shore, J., Sabatini, J., Lee, Y., & Ventura, M. (2007). The automated text adaptation tool. In B. Carpenter, A. Stent, & J. D. Williams (Еds.). Proceedings of Human Language Technologies (pp. 3–4). The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations.
Council of Europe (2020). Common European Framework of Reference for Languages: Learning, teaching, assessment. Companion volume, Council of Europe Publishing. www.coe.int/lang-cefr.
Dragićević, R. (2018). Srpska leksika u prošlosti i danas. Matica srpska.
Krajišnik, V. (2016). Leksički pristup srpskom kao stranom jeziku. Edicija Jezik, književnost, kultura, knjiga 8. Univerzitet u Beogradu, Filološki fakultet.
Laposhina, A. N., & Lebedeva, M. Y. (2021). Tintometr: an online tool for automated complexity level assessment of texts for Russian language learners. Russian Language Studies, 19(3), 331–345.
Ljubešić, N., & Dobrovoljc, K. (2019). What does Neural Bring? Analysing Improvements in Morphosyntactic Annotation and Lemmatisation of Slovenian, Croatian and Serbian. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing (pp. 29–34). Th e Seventh Workshop on Balto-Slavic Natural Language Processing. Association for Computational Linguistics.
Medenica, L. (2023). Uloga stepenovane lektire i leksičkih minimuma u nastavi i učenju srpskog kao stranog jezika na nivou A1. U V. Krajišnik (ur.). Srpski kao strani jezik u teroriji i praksi V: tematski zbornik radova (pp. 337–347). Međunarodni naučni skup Srpski kao strani jezik u teoriji i praksi, 28. i 29. oktobar 2022. Filološki fakultet ‒ Centar za srpski kao strani jezik.
Stanković, R., Šandrih, B., Krstev, C., Utvić, M., & Škorić, M. (2020). Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, D. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Еds.). Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 3954–3962). European Language Resources Association.
Stanković, R., & Škorić, M. (2021). SrpKor4Tagging-TreeTagger (Version 1.0.0) [Model]. https://doi.org/10.57771/bvkk-jv85.
Terčon, L., & Ljubešić, N. (2023). CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages. ArXiv preprint. https://arxiv.org/abs/2308.04255

Copyright © 2024 by the publisher Faculty of Education, University of Belgrade, SERBIA. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original paper is accurately cited.