Applying Vector Space Model (VSM) Techniques in Information Retrieval for Arabic Language
Information Retrieval (IR) is a part of Neutral Language Processing (NLP), which is basically the science of retrieving useful (relative) information and keeps the irrelative information behind as much as possible. Building an Information Retrieval system for any language is imperative and there are many researches try to build IR systems using any of its models that are valid for specific language. This report basically presents an implementation for one of IR techniques that is Vector Space Model (VSM). We have chosen VSM model for our project since it is term weighting scheme, and the retrieved documents could be sorted out according to their relevancy degree. One other significant feature for such technique is the ability to get a relevance feedback from the users of the system; users can judge whether the retrieved document is relative to their need or not. We have built our web site, mainly using PHP and HTML languages, that covers all techniques of vector space model and valid over Arabic language.
READ FULL TEXT