LTReikšminiai žodžiai: Elektroninis žodynas; Kompiuterinė / elektroninė leksikografija; Kompiuterinė leksikografija; Kompiuterinė, arba elektroninė leksikografija; Leksikografijos duomenų bazė; Leksikografinė duomenų bazė; Teksto atpažinimas; Computer lexicography; Computer or electronic lexicography; Computer/electronic lexicography; Electronic dictionary; Electronic dictionary, Lithuanian language; Lexicographic database; Text recognition.
ENThe Dictionary of the Lithuanian Language (Lietuvių kalbos žodynas, LKŽ) is a lexicographic milestone that covers the lexicon of Lithuanian from 1547 to 2001. The digitization of the Dictionary aims creation of a lexicographic database with search engines allowing the instant retrieval of information. We present a technology for converting the text of the Dictionary to a set of records of structured data. The program automatically determines what the portions of text stand for (e.g. headwords, grammatical properties, meanings of a word, illustrations, etc.), and puts them into corresponding fields in a data structure that is created for each entry. The main concept of the process is that transferring the human-oriented text of the Dictionary to the structured data is being performed fully automatically. A non-automatic, manually performed entering of data would have been a tough task because of the formidable size of the Dictionary (22,000 pages, 11,000,000 words of text). [From the publication]