Neighbor Weighted K-Nearest Neighbor for Sambat Online Classification

Annisya Aprilia Prasanti, M. Ali Fauzi, Muhammad Tanzil Furqon

Abstract


Sambat Online is one of the implementation of E-Government for complaints management provided by Malang City Government.  All of the complaints will be classified into its intended department. In this study, automatic complaint classification system using Neighbor Weighted K-Nearest Neighbor (NW-KNN) is poposed because Sambat Online has imbalanced data. The system developed consists of three main stages including preprocessing, N-Gram feature extraction, and classification using NW-KNN. Based on the experiment results, it can be concluded that the NW-KNN algorithm is able to classify the imbalanced data well with the most optimal k-neighbor value is 3 and unigram as the best features by 77.85% precision, 74.18% recall, and 75.25% f-measure value. Compared to the conventional KNN, NW-KNN algorithm also proved to be better for imbalanced data problems with very slightly differences.


Keywords


Text Classification, Sambat Online, N-Gram, NW-KNN, Neighbor Weighted K-Nearest Neighbor; K-Nearest Neighbor

References


Anandita N. Elemen Sukses E – Government: Studi Kasus Layanan Aspirasi Dan Pengaduan Online Rakyat (Lapor!) Kota Bandung. Universitas Katolik Parahyangan, Bandung. 2016.

Fauzi MA, Arifin AZ, Gosaria SC. Indonesian News Classification Using Naïve Bayes and Two-Phase Feature Selection Model. Indonesian Journal of Electrical Engineering and Computer Science. 2017 Dec 1;8(3):610-5.

Antinasari P, Perdana RS, Fauzi MA. Analisis Sentimen Tentang Opini Film Pada Dokumen Twitter Berbahasa Indonesia Menggunakan Naive Bayes Dengan Perbaikan Kata Tidak Baku. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2017; 1(12):1733-41.

Gunawan F, Fauzi MA, Adikara PP. Analisis Sentimen Pada Ulasan Aplikasi Mobile Menggunakan Naive Bayes Dan Normalisasi Kata Berbasis Levenshtein Distance (Studi Kasus Aplikasi BCA Mobile). Systemic: Information System and Informatics Journal. 2017 Des 31; 3(2):1-6.

Fauzi MA, Afirianto T. Improving Sentiment Analysis of Short Informal Indonesian Product Reviews using Synonym Based Feature Expansion. TELKOMNIKA (Telecommunication Computing Electronics and Control). 2018 Jun 1;16(3).

Fanissa S, Fauzi MA, Adinugroho S. Analisis Sentimen Pariwisata di Kota Malang Menggunakan Metode Naive Bayes dan Seleksi Fitur Query Expansion Ranking. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer.2018; 2(8):2766-70.

Rofiqoh U, Perdana RS, Fauzi MA. Analisis Sentimen Tingkat Kepuasan Pengguna Penyedia Layanan Telekomunikasi Seluler Indonesia Pada Twitter Dengan Metode Support Vector Machine dan Lexicon Based Features. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2017; 1(12):1725-32.

Joachims T. Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning 1998 Apr 21 (pp. 137-142). Springer, Berlin, Heidelberg.

Nurjanah WE, Perdana RS, Fauzi MA. Analisis Sentimen Terhadap Tayangan Televisi Berdasarkan Opini Masyarakat pada Media Sosial Twitter menggunakan Metode K-Nearest Neighbor dan Pembobotan Jumlah Retweet. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2017; 1 (12), 1750-57.

Suharno CF, Fauzi MA, Perdana RS. Klasifikasi Teks Bahasa Indonesia Pada Dokumen Pengaduan Sambat Online Menggunakan Metode K-Nearest Neighbors dan Chi-Square. Systemic: Information System and Informatics Journal. 2017 Dec 7;3(1):25-32.

Mentari ND, Fauzi MA, Muflikhah L. Analisis Sentimen Kurikulum 2013 Pada Sosial Media Twitter Menggunakan Metode K-Nearest Neighbor dan Feature Selection Query Expansion Ranking. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2018; 2 (8):2739-43.

Claudy YI, Perdana RS, Fauzi MA. Klasifikasi Dokumen Twitter Untuk Mengetahui Karakter Calon Karyawan Menggunakan Algoritme K-Nearest Neighbor (KNN). Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2018; 2(8):2761-65.

Munir MM, Fauzi MA, Perdana RS. Implementasi Metode Backpropagation Neural Network berbasis Lexicon Based Features dan Bag of Words Untuk Identifikasi Ujaran Kebencian Pada Twitter. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer e-ISSN. 2017;2548:964X.

Lam SL, Lee DL. Feature reduction for neural network based text categorization. InDatabase Systems for Advanced Applications, 1999. Proceedings., 6th International Conference on 1999 (pp. 195-202). IEEE.

Sun Y, Wong AK, Kamel MS. Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence. 2009 Jun;23(04):687-719.

Frank E, Bouckaert RR. Naive bayes for text classification with unbalanced classes. InEuropean Conference on Principles of Data Mining and Knowledge Discovery 2006 Sep 18 (pp. 503-510). Springer, Berlin, Heidelberg.

Liu Y, Loh HT, Sun A. Imbalanced text classification: A term weighting approach. Expert systems with Applications. 2009 Jan 1;36(1):690-701.

Chawla NV, Japkowicz N, Kotcz A. Special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter. 2004 Jun 1;6(1):1-6.

Tan S. Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications. 2005 May 1;28(4):667-71.

Rosi F, Fauzi MA, Perdana RS. Prediksi Rating Pada Review Produk Kecantikan Menggunakan Metode Naïve Bayes dan Categorical Proportional Difference (CPD). Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2018; 2(5):1991-97.

Lestari AR, Perdana RS, Fauzi MA. Analisis Sentimen Tentang Opini Pilkada Dki 2017 Pada Dokumen Twitter Berbahasa Indonesia Menggunakan Näive Bayes dan Pembobotan Emoji. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2017; 1(12):1718-24.

Fauzi MA, Arifin A, Yuniarti A. Term Weighting Berbasis Indeks Buku dan Kelas untuk Perangkingan Dokumen Berbahasa Arab. Lontar Komputer: Jurnal Ilmiah Teknologi Informasi. 2013.

Tala FZ. A study of stemming effects on information retrieval in Bahasa Indonesia. Institute for Logic, Language and Computation, Universiteit van Amsterdam, The Netherlands. 2003 Jul.

Pramukantoro ES, Fauzi MA. Comparative analysis of string similarity and corpus-based similarity for automatic essay scoring system on e-learning gamification. InAdvanced Computer Science and Information Systems (ICACSIS), 2016 International Conference on 2016 Oct 15 (pp. 149-155). IEEE.

Fauzi MA, Yuniarti A. Ensemble Method for Indonesian Twitter Hate Speech Detection. Indonesian Journal of Electrical Engineering and Computer Science. 2018 Jul 1;11(1).

Fauzi MA, Utomo DC, Setiawan BD, Pramukantoro ES. Automatic Essay Scoring System Using N-Gram and Cosine Similarity for Gamification Based E-Learning. InProceedings of the International Conference on Advances in Image Processing 2017 Aug 25 (pp. 151-155). ACM.




DOI: http://doi.org/10.11591/ijeecs.v12.i1.pp%25p
Total views : 78 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

shopify stats IJEECS visitor statistics