Home  |  Organizers  |  Proceedings Editors  |  Proceedings Contributors  |  Search  |
 
Title:A FRAMEWORK FOR TOPIC CATEGORIZATION OF XML DOCUMENTS USING SUPPORT VECTOR MACHINES
DOI No:10.1142/9781860948534_0057
Source:INNOVATIVE APPLICATIONS OF INFORMATION TECHNOLOGY FOR THE DEVELOPING WORLD (pp 367-371)
Author(s):K. G. SRINIVASA
Department of CSE, Bangalore University, Bangalore, India

S. SHARATH
Department of CSE, Bangalore University, Bangalore, India

K. R. VENUGOPAL
Department of CSE, Bangalore University, Bangalore, India

L. M. PATNAIK
Microprocessor Applications Laboratory, IISc, Bangalore, India

Abstract:Extensible Markup Language (XML) has emerged as a medium for interoperability over the Internet. As the number of documents published in the form of XML is increasing, there is a need for categorization of XML documents into specific user interest categories. However, manually performing the categorization task is not feasible due to the sheer amount of XML documents available on the Internet. In this paper, we present a machine learning approach to topic categorization which makes use of a multi class Support Vector Machine (SVM) for exploiting the semantic content of XML documents. The SVM is supplemented by a feature selection technique which is used to extract the useful features. Experimental evaluations performed over a wide range of XML documents indicate that the proposed approach significantly improves the performance of the topic categorization task, with respect to accuracy and efficiency.
Full Text:View full text in PDF format (250KB)
TOC:Back to Table of Contents

Copyright © 2012 World Scientific Publishing Co. All rights reserved.