CS583 Info Retrieval
CS583 Info Retrieval
CS583 Info Retrieval
Keyword queries
Boolean queries (using AND, OR, NOT)
Phrase queries
Proximity queries
Full document queries
Natural language questions
frequency.
N: total number of docs
dfi: the number of docs that ti
appears.
The final TF-IDF term
weight is:
may be constructed
Why do we need to remove stopwords?
Reduce indexing (or data) file size
stopwords accounts 20-30% of total word counts.