Weak Supervision Approach for Arabic Named Entity Recognition


Olga Simek and Courtland VanDam, MIT Lincoln Laboratory, USA


Arabic named entity recognition (NER) is a challenging problem, especially in conversational data such as social media posts. To address this problem, we propose an Arabic weak learner NER model called ANER-HMM, which leverages low quality predictions that provide partial recognition of entities. By combining these predictions, we achieve state of the art NER accuracy for cases for out-of-domain predictions. ANER-HMM leverages a hidden markov model to combine multiple predictions from weak learners and gazetteers. We demonstrate that ANER-HMM outperforms the state-of-the-art Arabic NER methods without requiring any labeled data or training deep learning models which often require large computing resources.


Named entity recognition, Arabic, weak learning.

Full Text  Volume 13, Number 16