keyboard_arrow_up
Document Summarization in Kannada Using Keyword Extraction

Authors

Jayashree. R, Srikanta Murthy. K and Sunny. K, PES Institute of Technology, India

Abstract

The internet has caused a humongous growth in the amount of data available to the common man. Summaries of documents can help find the right information and are particularly effective when the document base is very large. Keywords are closely associated to a document as they reflect the document's content and act as indexes for the given document. In this work, we present a method to produce extractive summaries of documents in the Kannada language. The algorithm extracts key words from pre-categorized Kannada documents collected from online resources. We combine GSS (Galavotti, Sebastiani, Simi) coefficients and IDF (Inverse Document Frequency) methods along with TF (Term Frequency) for extracting key words and later use these for summarization. In the current implementation a document from a given category is selected from our database and depending on the number of sentences given by the user, a summary is generated.

Keywords

Summary, Keywords, GSS coefficient, Term Frequency(TF), IDF(Inverse Document Frequency) and Recall

Full Text  Volume 1, Number 3