A Hybrid Approach for Urdu Sentence Boundary Disambiguation

A Hybrid Approach for Urdu Sentence Boundary Disambiguation

Zobia Rehman, Waqas Anwar
Department of Computer Science, COMSATS Institute of IT, Pakistan

 
Abstract: Sentence boundary identification is a preliminary step for preparing a text document for Natural Language Processing tasks, e.g., machine translation, POS tagging, text summarization and etc. We present a hybrid approach for Urdu sentence boundary disambiguation comprising of unigram statistical model and rule based algorithm.  After implementing this approach,  we obtained 99.48% precision, 86.35% recall and 92.45% F1-Measure while keeping training and testing data different from each other, and with same training and testing data, we obtained  99.36% precision, 96.45% recall and 97.89% F1-Measure.

Keywords: Sentence boundary disambiguation, and unigram model.


Received October 19, 2009; accepted May 20, 2010

Read 3452 times Last modified on Tuesday, 15 November 2011 02:16
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…