Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
DEPARTMENT OF CSE
CS8080 INFORMATION RETRIEVAL TECHNIQUES
IR Model Definition:
A Taxonomy of IR Models
Retrieval models most frequently associated with distinct
combinations of a document logical view and a user task.
The users task includes retrieval and browsing. In retrieval
1
i) Ad Hoc Retrieval:
The documents in the collection remain relatively static while
new queries are submitted to the system.
ii) Filtering
The queries remain relatively static while new documents come
into the system
Classic IR model:
Each document is described by a set of representative
keywords called index terms. Assign a numerical
weights to distinct relevance between index terms.
Three classic models: Boolean, vector, probabilistic
Boolean Model :
The Boolean retrieval model is a model for information
retrieval in which we can pose any query which is in
the form of a Boolean expression of terms, that is, in
which terms are combined with the operators AND,
OR, and NOT. The model views each document as just
a set of words. Based on a binary decision criterion
without any notion of a grading scale. Boolean
expressions have precise semantics.
Vector Model
Assign non-binary weights to index terms in queries
and in documents. Compute the similarity between
documents and query. More precise than Boolean
model.
Probabilistic Model
The probabilistic model tries to estimate the probability
that the user will find the document dj relevant with
ratio
P(dj relevant to q)/P(dj nonrelevant to q)
2
Given a user query q, and the ideal answer set R of the
relevant documents, the problem is to specify the
properties for this set. Assumption (probabilistic
principle): the probability of relevance depends on the
query and document representations only; ideal answer
set R should maximize the overall probability of
relevance
Basic Concepts
Each document is represented by a set of representative
keywords or index terms
Index term:
In a restricted sense: it is a keyword that has
some meaning on its own; usually plays the role of
a noun
In a more general form: it is any word that appears in
a document
Let, t be the number of index terms
in the document collection ki be a
generic index term Then,
The vocabulary V = {k1, . . . , kt} is
the set of all distinct index terms in
the collection
3
In matrix form, this can written as
5
Example :
A fat book which many people own is Shakespeare‟s Collected
Works.
6
Mercy 1 0 1 1 1 1
Worser 1 0 1 1 1 0
Term Frequency tf
One of the weighting scheme is Term Frequency and is
denoted tft,d, with the subscripts denoting the term and the
document in order.
Term frequency TF(t, d) of term t in document d = number of
times that t occurs in d
Ex: Term-Document Count Matrix
but we would like to give more weight to documents that have
a term several times as opposed to ones that contain it only
once. To do this we need term frequency information the
number of times a term occurs in a document .
Assign a score to represent the number of occurrences
8
Log-Frequency Weighting
Log-frequency weight of term t in document d is calculated
as
TF-IDF Weighting
tf-idf weight of a term: product of tf weight and idf weight
, Best known weighting scheme in information retrieval.
TF(t, d) measures the importance of a term t in document
d , IDF(t) measures the importance of a term t in the
whole collection of documents
TF/IDF weighting: putting TF and IDF together
11
2.4 The Vector Model
Boolean matching and binary weights is too limiting
The vector model proposes a framework in which partial
matching is possible
This is accomplished by assigning non-binary weights to index
terms in queries and in documents
Term weights are used to compute a degree of similarity
between a query and each document
The documents are ranked in decreasing order of their degree of
Similarity
12
Weights in the Vector model are basically tf-idf weights
These equations should only be applied for values of term
frequency greater than zero
If the term frequency is zero, the respective weight is also zero.
13
After length normalization
Example 2:
N=1000000
Since in the above example only one document is given, only one score is
calculated . if suppose n documents given , n score will be calculated and
ranking done in decreasing order of scores.
Advantages:
term-weighting improves quality of the answer set
partial matching allows retrieval of docs that approximate the
query conditions
cosine ranking formula sorts documents according to a degree of
similarity to the query
document length normalization is naturally built-in into the ranking
Disadvantages:
It assumes independence of index terms
15
An initial set of documents is retrieved somehow,The
user inspects these docs looking for the relevant ones (in
truth, only top 10-20 need to be inspected). The IR
system uses this information to refine the description of
the ideal answer set. By repeating this process, it is
expected that the description of the ideal answer set will
improve.
16
decreasing order of probability of relevance to the
information need: P(R | q,di)
The Ranking
17
1. Find measurable statistics (tf, df ,document length) that
affect judgments about document relevance
2. Combine these statistics to estimate the probability of
document relevance
3. Order documents by decreasing estimated probability of
relevance P(R|d, q)
4. Assume that the relevance of each document is
independent of the relevance of other documents
• Rank documents using the log odds ratios for the terms
in the query ct
contingency table :
Advantages:
Docs ranked in decreasing order of probability of
relevance
Disadvantages:
need to guess initial estimates for piR
method does not take into account tf factors
the lack of document length normalization
Key Idea :
The process of matching documents to a given query
could be concept matching instead of index term
matching
A document that shares concepts with another
document known to be relevant might be of interest
20
The key idea is to map documents and queries into a
lower dimensional space (i.e., composed of higher level
concepts which are in fewer number than the index
terms), Retrieval in this reduced concept space might
be superior to retrieval in the space of index terms
LSI increases recall and hurts precision.
21
Example : LSI
Document d2, d3 have similar meaning words like ship ,boat , but similarity
score is 0 without SVD, after applying SVD in Latent semantic indexing ,
score is 0.52.
23
2.7 Neural Network Model
Classic IR:
Terms are used to index documents and queries
Retrieval is based on index term matching
Motivation: Neural networks are known to be good pattern matchers
24
A neural network is an oversimplified representation of the neuron
interconnections in the human brain:
nodes are processing units
edges are synaptic connections
the strength of a propagating signal is modeled by a weight
assigned to each edge
the state of a node is defined by its activation level
depending on its activation level, a node might issue an output
signal
Neural Network Model
n input of X is given with weight , W0 is bias , Neuron has a threshold to fire
,Todetermine the output Neuron uses from unlinear function ft(a)
Example
x0 x1 x2 S y
1 0 0 -1.5 0
1 0 1 -0.5 0
1 1 0 -0.5 0
25
1 1 1 +0.5 1
Neural Network Model For Information Retrieval
Three layers network: one for the query terms, one for the
document terms, and a third one for the documents
Signals propagate across the network
First level of propagation:
o Query terms issue the first signals
o These signals propagate across the network to reach the
document nodes
Second level of propagation:
o Document nodes might themselves generate new signals
which affect thedocument term nodes
o Document term nodes might respond with new signals of
their own
2.8 Retrieval Evaluation
26
the results to the user
Usually, its computation requires comparing the results
produced by the system with results suggested by humans for a
same set of queries
The next step was to devise a set of experiments that would allow
evaluating each indexing system in isolation more thoroughly. The
result was a test reference collection composed of documents,
queries, and relevance judgements . It became known as the
Cranfield-2 collection. The reference collection allows using the same
set of documents and queries to evaluate different ranking systems.
The uniformity of this setup allows quick evaluation of new ranking
functions
Reference Collections
The main TREC collection has been growing steadily over the years.
The TREC-3 collection has roughly 2 gigabytes.
The TREC-6 collection has roughly 5.8 gigabytes
It is distributed in 5 CD-ROM disks of roughly 1 gigabyte of
compressed text each
Its 5 disks were also used at the TREC-7 and TREC-8
Conferences.
The Terabyte test collection introduced at TREC-15, also referred to as
GOV2, includes 25 million Web documents crawled from sites in the
“.gov” domain
TREC documents come from the following sources:
<top>
<num> Number: 168
<title> Topic: Financing AMTRAK
<desc> Description:
A document will address the role of the Federal Government in
financing the operation of the National Railroad Transportation
Corporation (AMTRAK)
<narr> Narrative: A relevant document must provide information on
the government’s responsibility to make AMTRAK an economically
viable
29
entity. It could also discuss the privatization of AMTRAK as an
alternative to continuing government subsidies. Documents
comparing
government subsidies given to air and bus transportation with those
provided to AMTRAK would also be relevant
</top>
The set of relevant documents for each topic is obtained from a pool
of possible relevant documents.This pool is created by taking the top
K documents (usually,K = 100) in the rankings generated by various
retrieval systems. The documents in the pool are then shown to
human assessors who ultimately decide on the relevance of each
document.
This technique of assessing relevance is called the pooling method
and is based on two assumptions:
First, that the vast majority of the relevant documents is
collected in the assembled pool
Second, that the documents which are not in the pool can be
considered to be not relevant
The Benchmark Tasks
30
Other tasks added in TREC-6:
Cross languages — ad hoc task in which the documents are in
one language but the topics are in a different language
High precision — task in which the user of a retrieval system is
asked to retrieve ten documents that answer a given
information request within five minutes
Spoken document retrieval — intended to stimulate research
on retrieval techniques for spoken documents
Very large corpus — ad hoc task in which the retrieval systems
have to deal with collections of size 20 gigabytes
31
Human Experimentation in the Lab
Side by side panels for Yahoo! and Google ,Top 5 answers produced
by each search engine, with regard to the query “information retrieval
evaluation”
32
that they are participating in an experiment.Further, a side-by-side
experiment cannot be repeated in the same conditions of a previous
execution.Finally, side-by-side panels do not allow measuring how
much better is system A when compared to system B. Despite these
disadvantages, side-by-side panels constitute a dynamic evaluation
method that provides insights that complement other evaluation
methods
Crowdsourcing
33
Evaluation using Clickthrough Data
34
clickthrough data for the two rankings. To mix the results of the two
rankings, we look at the top
results from each ranking and mix them.
Notice that, among any set of top r ranked answers, the number of
answers origin are from each ranking differs by no more than 1
By collecting clickthrough data for the combined ranking, we further
ensure that the data is unbiased and reflects the user preferences
36
In an implicit relevance feedback cycle, the feedback
information is derived implicitly by the system
There are two basic approaches for compiling implicit feedback
information:
o local analysis, which derives the feedback information
from the top ranked documents in the result set
o global analysis, which derives the feedback information
from external sources such as a thesaurus
37
Explicit Relevance Feedback
In a classic relevance feedback cycle, the user is presented with
a list of the retrieved documents
Then, the user examines them and marks those that are
relevant
In practice, only the top 10 (or 20) ranked documents need to
be examined
The main idea consists of
o selecting important terms from the documents that have
been identified as relevant, and
o enhancing the importance of these terms in a new query
formulation
Expected effect: the new query will be moved towards the
relevant docs and away from the non-relevant ones
Early experiments have shown good improvements in precision
for small test collections
Relevance feedback presents the following characteristics:
o it shields the user from the details of the query
reformulation process (all the user has to provide is a
relevance judgement)
o it breaks down the whole searching task into a sequence
of small steps which are easier to grasp
39
Rocchio 1971 algorithm
40
weight to positive feedback. Many systems only allow
positive feedback.
41
Disadvantages of Relevance feedback
42