A Rule-Based Approach for Tagging Non-Vocalized Arabic Words

A Rule-Based Approach for Tagging
Non-Vocalized Arabic Words

Ahmad Al-Taani and Salah Abu Al-Rub
Department of Computer Sciences, Yarmouk University, Jordan


Abstract: In this work, we present a tagging system which classifies the words in a non-vocalized Arabic text to their tags. The proposed tagging system passes through three levels of analysis. The first level is a lexical analyzer that composed of a lexicon containing all fixed words and particles such as prepositions and pronouns. The second level is a morphological analyzer which relies on word structure using patterns and affixes to determine word class. The third level is a syntax analyzer or a grammatical tagging which relies on the process of assigning grammatical tags to words based on their context or the position of the word in the sentence.  The syntax analyzer level consists of two stages: the first stage depends on specific keywords that inform the tag of the successive word, the second stage is the reversed parsing technique which scans the available grammars of Arabic language to get the class of a single ambiguity word in the sentence. We have tested the proposed system on a corpus consists of 2355 words. Experimental results showed that the proposed system achieved a rate of success approaching 94% of the total number of words in the sample used in the study.

Keywords: Part-of-speech tagging, lexical analyzer, morphological analyzer, Arabic language processing.

Received July 3, 2008; accepted September 3, 2008

Read 5322 times Last modified on Wednesday, 20 January 2010 01:29
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…