Lietuvių kalbos gramatikos kompiuterizavimas

Direct Link:
Collection:
Mokslo publikacijos / Scientific publications
Document Type:
Knyga / Book
Language:
Lietuvių kalba / Lithuanian
Title:
Lietuvių kalbos gramatikos kompiuterizavimas
Alternative Title:
Computerisation of the lithuanian grammar
Publication Data:
Vilnius : Lietuvių kalbos institutas, 2022.
Pages:
Elektroninis (PDF), 279 p
Notes:
Bibliografija.
Contents:
Pratarmė — 1. Įvadas: 1.1.Bendrosios pastabos; 1.2. Truputis istorijos; 1.3. Šiuolaikiniai kalbos apdorojimo metodai; 1.4. Skyriaus išvados —2. Anotuoti tekstynai: 2.1. Tekstynų rūšys; 2.2. Morfologinis tekstynų anotavimas; 2.3. Sintaksinis tekstynų anotavimas; 2.4. Skyriaus išvados —3. Morfologijos kompiuterizavimas: 3.1. Morfologiniai analizatoriai; 3.2. Žodžio morfeminės struktūros pavaizdavimas; 3.3. Morfemikos kompiuterizavimo darbai; 3.4. Skyriaus išvados — 4. Sintaksės kompiuterizavimas: 4.1. Sakinio sintaksinės struktūros pavaizdavimas; 4.2. Sintaksiniai analizatoriai; 4.3. Skyriaus išvados — 5. Skaitmeninė gramatika: 5.1. Plačiajai visuomenei skirti kitų kalbų elektroniniai gramatikos leidiniai; 5.2. Apžvalginės gramatikos; 5.3. Formalus gramatikos taisyklių aprašas; 5.4. Daugiakalbiškumo problemos; 5.5. Lietuvių kalbos dalis „Gramatinėje struktūroje“; 5.6. Skyriaus išvados — 6. Lietuvių kalbos gramatikos informacinė sistema (lygis): 6.1. Gramatinės informacijos rūšys ir jos pateikimas; 6.2. Diskutuotini atvejai lietuvių kalbos gramatikoje: dalyvis; 6.3. LIGIS išgauta informacija netyrinėtais lietuvių kalbos klausimais; 6.4. LIGIS perspektyva - sintaksės dalis; 6.5. Skyriaus išvados — Apibendrinamosios išvados — Terminų žodynėlis — Santrumpos — Interneto nuorodos — Literatūra — Zusammenfassung — Summary — Priedai.
Summary / Abstract:

LTDarbe aprašomi autorės jau anksčiau publikuoti ir toliau tęsiami lietuvių kalbos gramatikos tyrimai, kurių poreikis iškyla kompiuterizuojant kalbą. Kuriant "Lietuvių kalbos gramatikos informacinę sistemą", siekiama išvengti trūkumų, pastebėtų jau atliktuose darbuose. Vienas jų – nepakankama žodžių apimtis tiek tekstynuose, tiek žodynuose. Kad būtų galima sugeneruoti visus teoriškai įmanomus lietuvių kalbos žodžių vedinius ir dūrinius, reikia išanalizuoti darybines morfemas. Todėl Lietuvių kalbos institute (toliau – LKI) buvo atlikta išsami priešdėlių analizė. Kita problema iškilo dėl nevienodo kalbos dalių traktavimo skirtinguose šaltiniuose. Kadangi "Lietuvių kalbos gramatikos informacinėje sistemoje" pasirinktas dalyvio statusas nesutampa su pateikiamu akademinėje "Lietuvių kalbos gramatikoje", šioje mokslo studijoje, siekiant pagrįsti tokį pasirinkimą, visas poskyris parašytas remiantis straipsniu apie dalyvį. Dar vienas straipsnio pagrindu parengtas poskyris – apie lietuvių kalbos skaitmeninę gramatiką. Likusioje studijos dalyje panaudota medžiaga ir iš kitų straipsnių. [Iš Pratarmės]

ENAs soon as they made their first appearance, computers began to spread across most of the fields of human life. Languages are no exception here. A lot has been accomplished in the world. Many things have been accomplished in Lithuania, too. This study offers a description of endeavours in the field of computerisation of grammar, including corpus annotations, morphological analysers and parsers, digital grammar, and a grammar information system. Each individual chapter covers a particular subject. The first efforts to combine languages with digits were made back in the Middle Ages. However, it was with the advent of the computer that these ideas started seeing some potential for implementation. The latest technology – neural networks – has produced decent results in some areas, yet 100 percent accuracy is still out of reach. After they had started making corpus annotations, researchers have discovered that tags are highly affected by the diversity of languages – that is why no uniform annotation tag set has been developed yet. In addition to English tags, the morphologically annotated corpus of the Lithuanian language also provides Lithuanian tags. Many different formats have been developed for the purposes of syntactic annotation, yet not all of them have followed a similar spread pattern. The Prague Markup Language, or PML designed by Prague researchers was probably the one that enjoyed the highest degree of popularity. It is also used to annotate sentences written in the Lithuanian language. The morphological analyser developed by Vytautas Magnus University (VMU) on the basis of the 'Hunspell' platform operates under a rule-based approach. Morphological analysis of the Lithuanian language grounded on statistical methods is performed by the 'UDPipe' Lithuanian language module.Developed by the Institute of Mathematics and Informatics, the first morphemic database contains detailed information about morphemes, including their types, yet its contents are not freely accessible. The VMU online morphemic database produces words broken down in hyphenated morphemes yet contains zero information about the type of the morpheme. The rule-based parser had been accessible on VMU’s website until February 2020 and not updated version of it has been made available as yet. The 'UDPipe' module is a parser of the Lithuanian language that uses statistic methods to function. The syntactically annotated Lithuanian corpus ALKSNIS was used for the machine learning. Any inaccuracies in parsing are primarily caused by sentences that carry specific qualities of the Lithuanian language, such as a peculiar ordering of words that cannot be typically found in the English language, and so on. It appears that methods that successfully apply to the English language cannot always be used with other languages. For now, only a pilot sample of a digital grammar of the Lithuanian language and a limited Sketch grammar are available. The purpose of developing an information system for the grammar of the Lithuanian language is to draw documentation on the grammar of the Lithuanian language. The system stores two types of information: data designed for the wide public, which are available in a popular and comprehensive form, and a computer friendly format used for scientific research purposes. The website contains both morphological and morphemic data with indication of morpheme type. The structure of the word – the lemma and underlying words for derivatives – is reflected as well. All inflexional forms can be viewed by clicking the OTHER FORMS button. The information on the website is available in seven languages: Lithuanian, English, German, French, Italian, Russian, and Japanese.Only the model for the morphological segment is available at this time, with the syntactic segment slated for development some time in the future. The development of the "Lithuanian Grammar Information System" (LIGIS) has highlighted new phenomena that have never been covered by linguists before: sometimes words do not have all of the paradigmatic forms, which is the product of the semantics of the word, or rather the difference between its semantic meaning and its grammatical meaning, for instance: verbs that denote a group action cannot have a singular form; passive-voice participles made from intransitive verbs can only have neuter forms, and so on. [From the publication]

DOI:
10.35321/e-pub.44.lietuviu-gramatikos-kompiuterizavimas
ISBN:
9786094113277
Related Publications:
Permalink:
https://www.lituanistika.lt/content/99503
Updated:
2023-08-28 12:35:39
Metrics:
Views: 67    Downloads: 10
Export: