LTStraipsnio tikslas – pristatyti akademinei bendruomenei projekte „Lietuvių kalba: idealai, ideologijos ir tapatybės lūžiai“ parengtą apie 60 val. sakytinės žiniasklaidos kalbos tekstyną, aptarti metodologinius jo kūrimo principus ir problemas, pasidalinti praktine tekstų paieškos ir atrankos patirtimi. Straipsnyje pateikiama metodologinė tekstynų kūrimo principų apžvalga, daugiausia dėmesio skiriant reprezentatyvumo ir balanso kriterijams ir jų įgyvendinimo problematikai. Pristatoma teorinė tekstyno sandaros schema, besiremianti laiko (skirstymu į laikotarpius pagal sakytinės žiniasklaidos raidos etapus) ir žanro (išskirtos trys žanrinės grupės) kriterijais. Aptariami praktiniai ir metodologiniai šios schemos įgyvendinimo sunkumai, kuriuos lėmė ribotas medžiagos prieinamumas, medžiagos balansas tarp laikotarpių, žanrų palyginamumas tarp skirtingų laikotarpių, nevienoda žanrų įvairovė skirtingais laikotarpiais ir žanrų tęstinumas. Straipsnio pabaigoje pristatoma faktinė tekstyno sandara, apimtis ir techniniai jo požymiai. [Iš leidinio]Reikšminiai žodžiai: Sakytinė žiniasklaida; Sakytinės žiniasklaidos žanrai; Tekstynas; Tekstyno sudarymas; Viešoji kalba; Corpus sampling; Genres of spoken mass media; Language corpus; Spoken mass media; Spoken public language.
ENArticle aims to introduce the corpus of broadcast media (radio and television), which was compiled in the framework of the project "Lithuanian Language: Ideals, Ideologies and Identity Shifts" to the academic community. The corpus includes about 63 hours of transcribed recordings from 1960 to 2010. The article discusses theoretical principles of corpus sampling, which were based on the criteria of time periods and genres; the methodological issues encountered when constructing the sampling scheme; shares the practical experience of selecting and gathering recordings to be included into the corpus; and presents the actual structure of the corpus. One of the main requirements for any corpus is its representativeness and balance. One possible way to achieve them is to distinguish objectively defined text types and to build the corpus along these lines. Thus the composition of the corpus of the broadcast media was designed on the basis of two criteria: periods of broadcast media development and genre. There are three periods distinguished: Soviet 1960-1987, transitional 1988-1992 and contemporary 1993-now. In respect of genre, three groups of programs are included: talk programs (further subdivided into the types of interview, debate and talk-show); documentaries, features and journal programs; information programs. The article discusses problems that were encountered when trying to implement the corpus along these lines: problems of availability of materials due to technological peculiarities of different periods and organisational factors of archive institutions; the issue of balance between the periods; the problems of genre comparability and different extent of diversity of genres in different periods, and continuity of genres.Finally, the composition and the size of the corpus are presented (63 hours of recordings, about 350 thousand words). The paper concludes that despite the limited availability of materials and other problems discussed above which is why the corpus cannot be regarded as perfectly representative and balanced, it is sufficient for research into public language change. This was confirmed by tentative research studies done on its basis. The corpus meets the usual technical requirements: the transcriptions have been made in CLAN software developed within the CHILDES project, the recordings have been transcribed, coded and morphologically annotated following the conventions of the CHILDES project, the speakers have been assigned individual codes, and the transcriptions have been linked to the sound/image files. [From the publication]