Random Forest Approach for Sentiment Analysis in Indonesian Language

M. Ali Fauzi

Abstract


Sentiment analysis become very useful since the rise of social media and online review website and, thus, the requirement of analyzing their sentiment in an effective and efficient way. We can consider sentiment analysis as text classification problem with sentiment as its categories. In this study, we explore the use of Random Forest for sentiment classification in Indonesian language. We also explore the use of bag of words (BOW) features with some term weighting methods variation such as Binary TF, Raw TF, Logarithmic TF and TF.IDF. The experiment result showed that sentiment analysis system using random forest give good performance with average OOB score 0.829. The result also depicted that all of the four term weighting method has competitive result. Since the score difference is not very significant, we can say that the term weighting method variation in study has no remarkable effect for sentiment analysis using Random Forest.

Keywords


Text Classification; Sentiment Analysis; Random Forest; Term Weighting; TF.IDF

References


Jansen BJ, Zhang M, Sobel K, Chowdury A. Twitter power: Tweets as electronic word of mouth. Journal of the Association for Information Science and Technology. 2009 Nov 1;60(11):2169-88.

Tumasjan A, Sprenger TO, Sandner PG, Welpe IM. Predicting elections with twitter: What 140 characters reveal about political sentiment. Icwsm. 2010 May 23;10(1):178-85.

Bermingham A, Smeaton A. On using Twitter to monitor political sentiment and predict election results. InProceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011) 2011 (pp. 2-10).

Sang ET, Bos J. Predicting the 2011 dutch senate election results with twitter. InProceedings of the workshop on semantic analysis in social media 2012 Apr 23 (pp. 53-60). Association for Computational Linguistics.

Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. Journal of computational science. 2011 Mar 31;2(1):1-8.

Zhang X, Fuehres H, Gloor PA. Predicting stock market indicators through twitter “I hope it is not as bad as I fear”. Procedia-Social and Behavioral Sciences. 2011 Jan 1;26:55-62.

McGlohon M, Glance NS, Reiter Z. Star Quality: Aggregating Reviews to Rank Products and Merchants. InICWSM 2010 May 16.

Mishne G, Glance NS. Predicting Movie Sales from Blogger Sentiment. InAAAI Spring Symposium: Computational Approaches to Analyzing Weblogs 2006 Mar 27 (pp. 155-158).

Joshi M, Das D, Gimpel K, Smith NA. Movie reviews and revenues: An experiment in text regression. InHuman Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics 2010 Jun 2 (pp. 293-296). Association for Computational Linguistics.

Sadikov E, Parameswaran AG, Venetis P. Blogs as Predictors of Movie Success. InICWSM 2009 Mar 20.

Ortigosa A, Martín JM, Carro RM. Sentiment analysis in Facebook and its application to e-learning. Computers in Human Behavior. 2014 Feb 28;31:527-41.

Munezero M, Montero CS, Mozgovoy M, Sutinen E. Exploiting sentiment analysis to track emotions in students' learning diaries. InProceedings of the 13th Koli Calling International Conference on Computing Education Research 2013 Nov 14 (pp. 145-152). ACM.

Kang H, Yoo SJ, Han D. Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications. 2012 Apr 30;39(5):6000-10.

Antinasari P, Perdana RS, Fauzi MA. Analisis Sentimen Tentang Opini Film Pada Dokumen Twitter Berbahasa Indonesia Menggunakan Naive Bayes Dengan Perbaikan Kata Tidak Baku. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2017; 1(12):1733-41.

Gunawan F, Fauzi MA, Adikara PP. Analisis Sentimen Pada Ulasan Aplikasi Mobile Menggunakan Naive Bayes Dan Normalisasi Kata Berbasis Levenshtein Distance (Studi Kasus Aplikasi BCA Mobile). Systemic: Information System and Informatics Journal. 2017 Des 31; 3(2):1-6.

Fauzi MA, Afirianto T. Improving Sentiment Analysis of Short Informal Indonesian Product Reviews using Synonym Based Feature Expansion. TELKOMNIKA (Telecommunication Computing Electronics and Control). 2018 Jun 1;16(3).

Fanissa S, Fauzi MA, Adinugroho S. Analisis Sentimen Pariwisata di Kota Malang Menggunakan Metode Naive Bayes dan Seleksi Fitur Query Expansion Ranking. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer.2018; 2(8):2766-70.

Mullen T, Collier N. Sentiment Analysis using Support Vector Machines with Diverse Information Sources. InEMNLP 2004 Jul (Vol. 4, pp. 412-418).

Rofiqoh U, Perdana RS, Fauzi MA. Analisis Sentimen Tingkat Kepuasan Pengguna Penyedia Layanan Telekomunikasi Seluler Indonesia Pada Twitter Dengan Metode Support Vector Machine dan Lexicon Based Features. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2017; 1(12):1725-32.

Batista F, Ribeiro R. Sentiment analysis and topic classification based on binary maximum entropy classifiers.

Munir MM, Fauzi MA, Perdana RS. Implementasi Metode Backpropagation Neural Network berbasis Lexicon Based Features dan Bag of Words Untuk Identifikasi Ujaran Kebencian Pada Twitter. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer e-ISSN. 2017;2548:964X.

Lam SL, Lee DL. Feature reduction for neural network based text categorization. InDatabase Systems for Advanced Applications, 1999. Proceedings., 6th International Conference on 1999 (pp. 195-202). IEEE.

Bilal M, Israr H, Shahid M, Khan A. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques. Journal of King Saud University-Computer and Information Sciences. 2016 Jul 31;28(3):330-44.

Nurjanah WE, Perdana RS, Fauzi MA. Analisis Sentimen Terhadap Tayangan Televisi Berdasarkan Opini Masyarakat pada Media Sosial Twitter menggunakan Metode K-Nearest Neighbor dan Pembobotan Jumlah Retweet. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2017; 1 (12), 1750-57.

Mentari ND, Fauzi MA, Muflikhah L. Analisis Sentimen Kurikulum 2013 Pada Sosial Media Twitter Menggunakan Metode K-Nearest Neighbor dan Feature Selection Query Expansion Ranking. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2018; 2 (8):2739-43.

Claudy YI, Perdana RS, Fauzi MA. Klasifikasi Dokumen Twitter Untuk Mengetahui Karakter Calon Karyawan Menggunakan Algoritme K-Nearest Neighbor (KNN). Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2018; 2(8):2761-65.

Breiman L. Random forests. Machine learning. 2001 Oct 1;45(1):5-32.

Statnikov A, Aliferis CF. Are random forests better than support vector machines for microarray-based cancer classification?. InAMIA annual symposium proceedings 2007 (Vol. 2007, p. 686). American Medical Informatics Association.

Fauzi MA, Arifin AZ, Gosaria SC. Indonesian News Classification Using Naïve Bayes and Two-Phase Feature Selection Model. Indonesian Journal of Electrical Engineering and Computer Science. 2017 Dec 1;8(3).

Rosi F, Fauzi MA, Perdana RS. Prediksi Rating Pada Review Produk Kecantikan Menggunakan Metode Naïve Bayes dan Categorical Proportional Difference (CPD). Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2018; 2(5):1991-97.

Lestari AR, Perdana RS, Fauzi MA. Analisis Sentimen Tentang Opini Pilkada Dki 2017 Pada Dokumen Twitter Berbahasa Indonesia Menggunakan Näive Bayes dan Pembobotan Emoji. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2017; 1(12):1718-24.

M. Ali Fauzi, Djoko Cahyo Utomo, Budi Darma Setiawan, and Eko Sakti Pramukantoro. 2017. Automatic Essay Scoring System Using N-Gram and Cosine Similarity for Gamification Based E-Learning. In Proceedings of the International Conference on Advances in Image Processing (ICAIP 2017). ACM, New York, NY, USA, 151-155. DOI: https://doi.org/10.1145/3133264.3133303

Pramukantoro ES, Fauzi MA. Comparative analysis of string similarity and corpus-based similarity for automatic essay scoring system on e-learning gamification. InAdvanced Computer Science and Information Systems (ICACSIS), 2016 International Conference on 2016 Oct 15 (pp. 149-155). IEEE.

Fauzi MA, Arifin A, Yuniarti A. Term Weighting Berbasis Indeks Buku dan Kelas untuk Perangkingan Dokumen Berbahasa Arab. Lontar Komputer: Jurnal Ilmiah Teknologi Informasi. 2013;5(2).

Suharno CF, Fauzi MA, Perdana RS. KLASIFIKASI TEKS BAHASA INDONESIA PADA DOKUMEN PENGADUAN SAMBAT ONLINE MENGGUNAKAN METODE K-NEAREST NEIGHBORS DAN CHI-SQUARE. Systemic: Information System and Informatics Journal. 2017 Dec 7;3(1):25-32.

Fauzi MA, Arifin AZ, Yuniarti A. Arabic Book Retrieval using Class and Book Index Based Term Weighting. International Journal of Electrical and Computer Engineering (IJECE). 2017 Dec 1;7(6):3705-10.

Alfina I, Mulia R, Fanany MI, Ekanata Y. Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study.

Palomino-Garibay A, Camacho-González AT, Fierro-Villaneda RA, Hernández-Farias I, Buscaldi D, Meza-Ruiz IV. A random forest approach for authorship profiling. Cappellato et al.[8]. 2015.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 2011;12(Oct):2825-30.




DOI: http://doi.org/10.11591/ijeecs.v12.i1.pp%25p
Total views : 11 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

shopify stats IJEECS visitor statistics