Resources


Arabic WordNet :
Arabic WordNet Description: This improved version is an extension of the original Arabic Wordnet (http://globalwordnet.org/arabic-wordnet/awn-browser/), it was enriched by new verbs, nouns including the broken plurals that is a specific form for Arabic words.

Citation: L. Abouenour, K. Bouzoubaa and P. Rosso, "On the evaluation and improvement of Arabic WordNet coverage and usability," Language Resources and Evaluation, vol. 47, n° 13, pp. 891-917, 2013.
Arabic characters lexicon :
LMF version Description: An LMF conformant XML-based file containing all Arabic characters (letters, vowels and punctuations). Each character described with a description, different displays (isolated, at the beginning, middle and the end of a word), a codification (Unicode, others could be added later), and two transliterations (Buckwalter and wiki). The lexicon is composed of 42 characters : the 28 known letters, 5 hamza forms, 9 special letters, 9 vowels and3 punctuation marks.

ISLRN: 250-846-271-090-1

Citation:

- T. Loukili, K. Bouzoubaa, "Structuration et Standardisation des ressources linguistiques de l'Arabe - cas de l'alphabet, préfixes et suffixes", Journées Doctorales en Technologies de l'Information et Communication, Tangier, Morocco, 7/ 2011

- Driss Namly, Yasser Regragui and Karim Bouzoubaa. "Interoperable Arabic language resources building and exploitation in SAFAR platform". International Conference on Computer Systems and Applications, AICCSA 2016.
XML version Description: A XML-based file containing all Arabic characters (letters, vowels and punctuations). Each character described with a description, different displays (isolated, at the beginning, middle and the end of a word), a codification (Unicode, others could be added later), and two transliterations (Buckwalter and wiki). The lexicon is composed of 42 characters : the 28 known letters, 5 hamza forms, 9 special letters, 9 vowels and3 punctuation marks

ISLRN: 306-352-322-908-4

Citation:

- T. Loukili, K. Bouzoubaa, "Structuration et Standardisation des ressources linguistiques de l'Arabe - cas de l'alphabet, préfixes et suffixes", Journées Doctorales en Technologies de l'Information et Communication, Tangier, Morocco, 7/ 2011

- Driss Namly, Yasser Regragui and Karim Bouzoubaa. "Interoperable Arabic language resources building and exploitation in SAFAR platform". International Conference on Computer Systems and Applications, AICCSA 2016.
Arabic clitics :
Enclitics Description: A XML-based file containing all Arabic enclitics and consisting of 14 atomic enclitics, which generates about 73 enclitics when applying their association rules.

ISLRN: 356-004-001-278-7

Citation:

- Driss Namly, Yasser Regragui and Karim Bouzoubaa. "Interoperable Arabic language resources building and exploitation in SAFAR platform". International Conference on Computer Systems and Applications, AICCSA 2016.
Proclitics Description: A XML-based file containing all Arabic proclitics and consisting of 12 atomic proclitics, which generates about 94 proclitics when applying their association rules.

ISLRN: 382-029-397-588-7

Citation:

- Driss Namly, Yasser Regragui and Karim Bouzoubaa. "Interoperable Arabic language resources building and exploitation in SAFAR platform". International Conference on Computer Systems and Applications, AICCSA 2016.
Arabic Stop-words lexicon :
Special nouns Description: An XML-based file containing Arabic Stop-words respecting nouns syntax; particle nouns, signal nouns, separated pronouns and connected nouns. This lexicon is composed by 180 special nouns.

ISLRN: 324-965-777-406-1

Citation:

- Driss Namly and al. "Development of Arabic particles lexicon using the LMF framework". Colloque pour les Etudiants Chercheurs en Traitement Automatique du Langage Naturel et ses applications (CEC-TAL 2015). Sousse - Tunisie, le 23-25 Mars 2015.

- Driss Namly and al. "A Complex Arabic stop-words list design". The Second National Doctoral Symposium On Arabic Language Engineering (JDILA'2015) ENSA of Fez USMBA, 28-29 October 2015.
Special verbs Description: An XML-based file containing Arabic Stop-words respecting verbs syntax. This lexicon is composed by 66 special verbs.

ISLRN: 920-357-109-622-2

Citation:

- Driss Namly and al. "Development of Arabic particles lexicon using the LMF framework". Colloque pour les Etudiants Chercheurs en Traitement Automatique du Langage Naturel et ses applications (CEC-TAL 2015). Sousse - Tunisie, le 23-25 Mars 2015.

- Driss Namly and al. "A Complex Arabic stop-words list design". The Second National Doctoral Symposium On Arabic Language Engineering (JDILA'2015) ENSA of Fez USMBA, 28-29 October 2015.
Particles Description: An XML-based file containing Arabic particles . This lexicon is composed by 69 native particles.

ISLRN: 171-983-898-544-7

Citation:

- Driss Namly and al. "Development of Arabic particles lexicon using the LMF framework". Colloque pour les Etudiants Chercheurs en Traitement Automatique du Langage Naturel et ses applications (CEC-TAL 2015). Sousse - Tunisie, le 23-25 Mars 2015.

- Driss Namly and al. "A Complex Arabic stop-words list design". The Second National Doctoral Symposium On Arabic Language Engineering (JDILA'2015) ENSA of Fez USMBA, 28-29 October 2015.
"Al wassit" Arabic dictionary :
LMF version
(Waiting for validation)
Description: An LMF conformant XML-based file containing the electronic version of al wassit dictionary. An Arabic monolingual dictionary accomplished by the Academy of the Arabic Language in Cairo. Al wassit dictionary is constitued by: 6900 roots, 61101 lexical entries (18199 verbs, 42731 nouns and 171 particles), 8821 examples (5231 verbs, and 3590 nouns) and 119140 meanings.

ISLRN: 795-847-093-546-5

Citation:

- Driss Namly, Yasser Regragui and Karim Bouzoubaa. "Interoperable Arabic language resources building and exploitation in SAFAR platform". International Conference on Computer Systems and Applications, AICCSA 2016.
XML version
(Waiting for validation)
Description: An XML-based file containing the electronic version of al wassit dictionary. An Arabic monolingual dictionary accomplished by the Academy of the Arabic Language in Cairo. Al wassit dictionary is constitued by: 6900 roots, 61101 lexical entries (18199 verbs, 42731 nouns and 171 particles), 8821 examples (5231 verbs, and 3590 nouns) and 119140 meanings.

ISLRN: 283-443-022-502-4

Citation:

- Driss Namly, Yasser Regragui and Karim Bouzoubaa. "Interoperable Arabic language resources building and exploitation in SAFAR platform". International Conference on Computer Systems and Applications, AICCSA 2016.
Contemporary Arabic dictionary :
LMF version
(Waiting for validation)
Description: An LMF conformant XML-based file containing the electronic version of al logha al arabia al moassira (Contemporary Arabic) dictionary. An Arabic monolingual dictionary accomplished by Ahmed Mukhtar Abdul Hamid Omar (deceased: 1424) with the help of a working group. The Contemporary dictionary material is composed by 5778 roots, 32300 lexical entries (10475 verbs, 21457 nouns and 368 particles), 29118 entries example and 43384 additional examples, 63019 meanings and 17883 contextual expressions.

ISLRN: 264-069-820-478-0

Citation:

- Driss Namly, Karim Bouzoubaa. "LMF conversion of an editorial dictionary: the case of the Contemporary Arabic dictionary". Journée d’étude Ressources langagières de l’arabe pour le TAL : construction, standardisation, gestion et exploitation, 26 Novembre 2015 Institut d’Etudes et de Recherches pour l’Arabisation, Rabat

- Driss Namly, Yasser Regragui and Karim Bouzoubaa. "Interoperable Arabic language resources building and exploitation in SAFAR platform". International Conference on Computer Systems and Applications, AICCSA 2016.
XML version
(Waiting for validation)
Description: An XML-based file containing the electronic version of al logha al arabia al moassira (Contemporary Arabic) dictionary. An Arabic monolingual dictionary accomplished by Ahmed Mukhtar Abdul Hamid Omar (deceased: 1424) with the help of a working group. The Contemporary dictionary material is composed by 5778 roots, 32300 lexical entries (10475 verbs, 21457 nouns and 368 particles), 29118 entries example and 43384 additional examples, 63019 meanings and 17883 contextual expressions.

ISLRN: 065-323-843-026-9

Citation:

- Driss Namly, Karim Bouzoubaa. "LMF conversion of an editorial dictionary: the case of the Contemporary Arabic dictionary". Journée d’étude Ressources langagières de l’arabe pour le TAL : construction, standardisation, gestion et exploitation, 26 Novembre 2015 Institut d’Etudes et de Recherches pour l’Arabisation, Rabat

- Driss Namly, Yasser Regragui and Karim Bouzoubaa. "Interoperable Arabic language resources building and exploitation in SAFAR platform". International Conference on Computer Systems and Applications, AICCSA 2016.
CLEF-TREC Q/A Questions :
Excel version Description: List of 2264 questions + answers of CLEF and TREC, translated to Arabic

Citation: Abouenour L., Bouzoubaa K., Rosso P. "On the Evaluation and Improvement of Arabic WordNet Coverage and Usability", Languages Resources and Evaluation, Springer Netherlands 10.1007/s10579-013-9237-0 6/ 2013
Morphological evaluation corpus :
Evaluation corpus Description: An annotated corpus dedicated to the benchmark and evaluation of Arabic morphological analyzers. It consists of 100 words with all their possible analysis. The corpus contains several morphological information such as stem, pattern, root, lemma, etc.
Stemming evaluation corpora :
NAFIS Gold Standard Corpus Description: Normalized Arabic Fragments for Inestimable Stemming (NAFIS) is an Arabic stemming gold standard corpus composed by a collection of texts, selected to be representative of Arabic stemming tasks and manually annotated.

ISLRN: 305-450-745-774-1

Citation:

- Driss Namly, Rachida Tajmout, Karim Bouzoubaa, Lahsen. Abouenour. "NAFIS: A Gold Standard Corpus for Arabic Stemmers Evaluation". International Business Information Management Association (IBIMA), November 2016 Seville, Spain
Quranic stemming evaluation corpus Description: This is a reduced version of the Quranic corpus developed by Kais Dukes et al. (http://corpus.quran.com/). It contacins 18352 words with their stems, roots and lemmas. We have created this reduced version to serve as stemming evaluation corpus in our article:

Citation:

- Jaafar, Y., Namly, D., Bouzoubaa, K., & Yousfi, A. (2017). "Enhancing Arabic stemming process using resources and benchmarking tools". Journal of King Saud University-Computer and Information Sciences, 29(2), 164-170.
Moroccan Arabic :
LID Corpus Description: This resource is a corpus containing 34k Moroccan Colloquial Arabic sentences collected from dierent sources. The sentences are written in Arabic letters. This resource can be useful in some NLP applications such as Language Identi$cation.

ISLRN: 048-993-307-382-7

Citation: R. Tachicart, K. Bouzoubaa, Si Lhoucine Aouragh and Hamid Jaafar "Automatic Identification of Moroccan Colloquial Arabic", in the 6th International Conference on Arabic Language Processing ICALP'17, October 2017, Fez, Morocco.

MADED Lexicon Description: Moroccan Dialect Electronic Dictionary (MDED) is an electronic lexicon containing almost 15000 MSA entries. They are written in Arabic letters and translated to Moroccan Arabic dialect. In addition, MDED entries are annotated useful metadata such as POS, Origin and root. MDED can be useful in some advanced NLP applications such as Machine translation and morphological analyzer.

ISLRN: 977-057-254-691-5

Citation: R. Tachicart, K. Bouzoubaa, "Building a Moroccan dialect electronic Dictionary (MDED)", in the 5th International Conference on Arabic Language Processing CITALA, Oujda, Morocco, 11/2014.

MORALEX Lexicon Description: MORALEX is a lexicon of morphemes that includes 402 Moroccan Arabic affixes and clitics that were manually created and linguistically checked. Indeed, MORALEX is composed of 24 atomic affixes, 43 atomic clitics and 335 compound morphemes. The main advantage of this resource is its rich morphological information such as POS, form, and person, etc. It can be used in different contexts particularly in morphological tasks.

Citation: R. Tachicart, K. Bouzoubaa, "Towards Automatic Normalization of the Moroccan Dialectal Arabic User Generated Text" in the 7th International Conference on Arabic Language Processing ICALP'19, October 2019, Nancy, France

                                                                 Copyright © 2012 IBTIKARAT research group| Mohammadia School of Engineers | Mohamed V University Agdal | Rabat-Morocco | Contact Us