Software/ Resources


  1. SAFAR: Software Architecture For Arabic language pRocessing, platform dedicated to ANLP
  2. MCA LID: Moroccan Colloquial Arabic processing including Language Identification System which can distinguish between MCA and MSA

MSA Resources

  1. AWN v2: it is an extension of the original Arabic Wordnet , it was enriched by new verbs, nouns including the broken plurals that is a specific form for Arabic words
  2. NAFIS Gold Standard Corpus: Normalized Arabic Fragments for Inestimable Stemming (NAFIS) is an Arabic stemming gold standard corpus composed by a collection of texts, selected to be representative of Arabic stemming tasks and manually annotated
  3. Arabic characters lexicon: An LMF file containing all Arabic characters (letters, vowels and punctuations). Each character described with a description, different displays (isolated, at the beginning, middle and the end of a word), a codification (Unicode, others could be added later), and two transliterations (Buckwalter and wiki)
  4. “Al wassit” Arabic dictionary: An LMF file containing the electronic version of al wassit dictionary containing: 6900 roots, 61101 lexical entries (18199 verbs, 42731 nouns and 171 particles), 8821 examples (5231 verbs, and 3590 nouns) and 119140 meanings
  5. Contemporary Arabic dictionary: An LMF file containing the electronic version of al logha al arabia al moassira (Contemporary Arabic) dictionary which is composed with 5778 roots, 32300 lexical entries (10475 verbs, 21457 nouns and 368 particles), 29118 entries example and 43384 additional examples, 63019 meanings and 17883 contextual expressions
  6. CLEF-TREC Q/A Questions: List of 2264 questions + answers of CLEF and TREC, translated to Arabic

MCA Resources

  1. MDED: an MCA-MSA lexicon containg almost 15k entries
  2. MCA corpus: an MCA corpus containing 34k sentences collected from different sources


