keyboard_arrow_up
Author Identification using Traditional Machine Learning Models

Authors

Ojaswi Binnani, International Institute of Information Technology-Hyderabad, India

Abstract

The Internet has many useful resources with bountiful information at our fingertips. However, there are nefarious uses to this resource, and can be misused in cybercrime, fake emails, stealing content, plagiarism etc. In many cases, the text is anonymously written, and it is important to accurately find the author to bring the criminal to justice. The topic of author identification helps with this task, where from a set of suspect authors, the writer of a given text will be determined. We aim to create a computationally non-complex model that works to find the author of a given text. The model will not require as much data as deep learning methods. This paper focuses on the use of various stylometric and word-based features as well as different machine learning models to create a classifier that gives the best accuracy. We find that the XGBoosting algorithm performs this task with a good accuracy.

Keywords

Author Identification, Forensic Linguistics, Machine Learning.

Full Text  Volume 12, Number 14