keyboard_arrow_up
Improvement WSD Dictionary Using Annotated Corpus and Testing it with Simplified Lesk Algorithm

Authors

Ahmed H. Aliwy and Ayad R. Abbas, University of Technology, Iraq

Abstract

WSD is a task with a long history in computational linguistics. It is open problem in NLP. This research focuses on increasing the accuracy of Lesk algorithm with assistant of annotated corpus using Narodowy Korpus Jezyka Polskiego (NKJP “Polish National Corpus”). The NKJP_WSI (NKJP Word Sense Inventory) is used as senses inventory. A Lesk algorithm is firstly implemented on the whole corpus (training and test) and then getting the results. This is done with assistance of special dictionary that contains all possible senses for each ambiguous word. In this implementation, the similarity equation is applied to information retrieval using tf-idf with small modification in order to achieve the requirements. Experimental results show that the accuracy of 82.016% and 84.063% without and with deleting stop words respectively. Moreover, this paper practically solves the challenge of an execution time. Therefore, we proposed special structure for building another dictionary from the corpus in order to reduce time complicity of the training process. The new dictionary contains all the possible words (only these which help us in solving WSD) with their tf-idf from the existing dictionary with assistant of annotated corpus. Furthermore, eexperimental results show that the two tests are identical. The execution time - of the second test dropped down to 20 times compared to first test with same accuracy

Keywords

Corpus-based WSD, Lesk algorithm, dictionary and corpus based WSD.

Full Text  Volume 5, Number 4