LTReikšminiai žodžiai: Emocinė leksika; Interneto komentarai; Klasifikacija; Nuomonė; Prižiūrimasis mašininio mokymo metodas; Sentimentų klasifikacija; Classification; Emotional vocabulary; Internet comments; Sentiment; Sentiment classification; Supervised machine learning method.
ENDespite many methods that effectively solve sentiment classification task for such widely used languages as English, there is no clear answer which methods are the most suitable for the languages that are substantially different. In this paper we attempt to solve Internet comments sentiment classification task for Lithuanian, using two classification approaches – knowledge-based and supervised machine learning. We explore an influence of sentiment word dictionaries based on the different parts-of-speech (adjectives, adverbs, nouns, and verbs) for knowledge-based method; different feature types (bag-ofwords, lemmas, word n-grams, character n-grams) for machine learning methods; and pre-processing techniques (emoticons replacement with sentiment words, diacritics replacement, etc.) for both approaches. Despite that supervised machine learning methods (Support Vector Machine and Na¨ıve Bayes Multinomial) significantly outperform proposed knowledge-based method all obtained results are above baseline. The best accuracy 0.679 was achieved with Na¨ıve Bayes Multinomial and token unigrams plus bigrams, when pre-processing involved diacritics replacement. [From the publication]