View Article

Article Details

File Missing!
JournalInternational Journal of Computer Applications
TitleA Novel Weighted Classification Approach using Linguistic Text Mining
Index TermInformation Sciences
AbstractText categorization is the process of automatically assigning labels or categories to new or previously unseen text documents. The text documents may be unstructured or semi structured in nature. In our work, we have used concepts of natural language processing for text categorization. That is, a lexical approach for text categorization. We have developed an algorithm which automatically classifies articles into their categories. The algorithm identifies tokens and assigns them weights in the abstracts of journal articles. We have implemented our approach using K Nearest Neighbor (KNN) classifier as it is the most widely used classifier in research. The proposed algorithm Lexical KNN (LKNN) has been evaluated on two datasets. One is set of journal articles of computer science discipline and the other is a collection of medical documents (Ohsumed collection).The experimental results show that our proposed algorithm Lexical KNN (LKNN) performs better than the other existing classifiers.
KeywordsText categorization, K Nearest Neighbor (KNN), Lexical Analysis, Tokens.
No. of Pages7
Author NamesRajni Jindal, Shweta Taneja
  1. Dong, T., Cheng, W. and Shang 2012. . The Research of kNN Text Categorization Algorithm Based on Eager Learning. Proceedings of International Conference on Industrial Control and Electronics Engineering, Xi'an, IEEE Xplore, 1120-1123.
  2. Aggarwal, Charu, C., Zhai, Cheng, X. (Eds.) 2012. Mining Text Data, Springer.
  3. Weiss, Sholom, M., Indurkhya, Nitin, Zhang, Tong 2015. Fundamentals of Predictive Text Mining. 2nd Edition, Springer.
  4. Wei, D., Yang, L.X. 2010. Weighted Naive Bayesian Classifier Model Based on Information Gain. Proceedings of International Conference on Intelligent System Design and Engineering Application ,Changsha, 2, IEEE Xplore digital library.
  5. Liu, Q., He, Q., and Shi, Z. 2008. Extreme Support Vector Machine Classifier. Proceedings of PAKDD, Lecture Notes in Computer Science , 5012, Springer, 222-233.
  6. Lewis, D. D. and Ringuette, M. 1994. Comparison of two learning algorithms for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR’94).
  7. Han, E. and Karypis, G. 2000. Centroid -based document classification analysis and experimental results. Technical Report. University of Minnesota.
  8. Ruiz, M. E., and Srinivasan, P. 2002. Hierarchical text categorization using neural networks. Information Retrieval, 5(1), 87–118.
  9. Rocchio, J. 1971. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, 313–323. Prentice-Hall Inc.
  10. Tan, 2006. An effective refinement strategy for KNN text classifier. Journal of Expert Systems with Applications, 30, 290–298.
  11. Cover, T.M. and Hart, P.E. 1967. Nearest Neighbor Pattern Classification. IEEE Transactions Information Theory, 13, 21-27.
  12. Tan, S. 2005. Neighbor-weighted K-nearest neighbor for unbalanced text corpus. Expert Systems with Applications, 28, 667–671.
  13. Sun, S. and Huan, R. 2010. An Adaptive k-Nearest Neighbor Algorithm. Proceedings of Seventh International Conference on Fuzzy Systems and Knowledge Discovery, Yantai, Shandong, IEEE Xplore, 91-94.
  14. Wu, J., Cai, Z. and Gao, Z. 2010. Dynamic K-Nearest-Neighbor with Distance and Attribute Weighted for Classification. Proceedings of International Conference on Electronics and Information Engineering, Kyoto, IEEE Xplore, 356-360.
  15. Bo, S., Junping, D. and Tian, D. 2009. Study on the Improvement of K-Nearest-Neighbor Algorithm. Proceedings of International Conference on Artificial Intelligence and Computational Intelligence , Shanghai , IEEE Xplore, 390-393.
  16. Pang, G. and Jiang, S. 2013. A generalized cluster centroid based classifier for text categorization. Journal of Information Processing & Management, Volume 49, Issue 2, 576-586.
  17. Wang, X. et al. 2005. Fuzzy-Rough Set Based Nearest Neighbor Clustering Classification Algorithm. In Proceedings of FSKD 2005, LNAI 3613, 370 – 373.
  18. Pham, T.D. 2005. An Optimally Weighted Fuzzy k-NN Algorithm. In Proceedings of ICAPR 2005, LNCS 3686, 239–247.
  19. Shang, W. et al. 2006. An Adaptive Fuzzy kNN Text Classifier. In Proceedings of ICCS. Part III, LNCS 3993, 216 – 223.
  20. Chua, T. and Tan, W. 2009. New Fuzzy Rule-Based Initialization Method for K-Nearest Neighbor Classifier. Published in Proceedings of Fuzz-IEEE , Korea.
  21. Juan, L. 2011. TKNN: an improved KNN algorithm based on tree structure. Seventh International Conference on Computational Intelligence and Security.
  22. Galitsky B. 2013. Machine learning of syntactic parse trees for search and classification of text. Engineering Applications of Artificial Intelligence, 26, 1072–1091.
  23. Miao, D, Duan, Q., Zhang, H. and Jiao, N. 2009. Rough set based hybrid algorithm for text classification. Expert Systems with Applications,36, 9168–9174.
  24. Schildt, H. 1996. Java: The Complete Reference, Ninth Edition, Oracle press.
  25. Hersh, W., Buckley, C., Leone, T., Hickman, D. 1994. Ohsumed: An interactive retrieval evaluation and new large text collection for research. 17th ACM International Conference Research and Development in Information Retrieval, 192–201.
  26. Mehnert, R. 1997. Federal agency and federal library reports. .Province, NJ: National Library of Medicine. National Library of Medicine:
  27. Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. Tenth European conference onmmachine learning, 137–142.
  28. Yuan, P., Chen, Y., Jin, H. and Huang, L. 2008. MSVM-kNN: Combining SVM and k-NN for Multi-Class Text Classification. Proceedings of IEEE International Workshop on Semantic Computing and Systems, Huangshan, .133-140.
  30. Mirkin, Boris, Nascimento, Susana, Pereira, Luis Moniz, 2008. Representing a Computer Science Research Organization on the ACM Computing Classification System, in Proceedings of the 16th International Conference on Conceptual Structures (ICCS-2008) , CEUR Workshop Proceedings, 354, RWTH Aachen University,57–65.

Publishing Information

Start Page No.9
Editor's Choice