Unit1 Introduction
Unit1 Introduction
Unit1 Introduction
DEPARTMENT OF CSE
IV YEAR - SEMESTER VIII
CS8080 INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
Stop-words
To reduce the set of representative keywords from
large collection a", "and", "but", "how", "or“
For example, "What is a motherboard?“ "motherboard" . The removal
of stop words usually improves IR effectiveness.
Stop-list: contain stop-words, not to be used as index
Prepositions, Articles, Pronouns
Some adverbs and adjectives, Some frequent words (e.g. document)
The removal of stop-words usually improves IR effectiveness
Reason for stemming
Different word forms may bear similar meaning (e.g. search, searching):
create a “standard” representation for them
Stemming Ex:
Which reduces distinct words to their common
grammatical root Removing some endings of word
computer
compute
computes
computing comput
computed
computation
The standard interface for a textual query is a search box entry form
Studies suggest a relationship between query length and the width of
the entry form
o Results found that either small forms discourage long queries or
wide forms encourage longer queries
Some entry forms are followed by a form that filters the query in some
way
For instance, at yelp.com, the user can refine the search by location
using a second form
Notice that the yelp.com form also shows the user’s home location,
if it has been specified previously
Some search forms show hints on what kind of information should
be entered into each form
For instance, in zvents.com search, the first box is labeled “what
are you looking for”?
The previous example also illustrates specialized input types that
some search engines are supporting today
o The zvents.com site recognizes that words like “tomorrow”
are time-sensitive
o It also allows flexibility in the syntax of dates
To illustrate, searching for “comedy on wed ” automatically
computes the date for the nearest future Wednesday
o This is an example of how the interface can be designed to
reflect how people think
Some interfaces show a list of query suggestions as the user types
the query
o This is referred to as auto-complete, auto-suggest, or
dynamic query suggestions
o Anick et al found that users clicked on dynamic Yahoo
suggestions one third of the time
Often the suggestions shown are those whose prefix matches the
characters typed so far
o However, in some cases, suggestions are shown that only
have interior letters matching
Further, suggestions may be shown that are synonyms of the
words typed so far
Dynamic query suggestions, from Netflix.com