C H A P T E R
21
Information Retrieval
Practice Exercises
21.1 Compute the relevance (using appropriate definitions of term fre-
quency and inverse document frequency) of each of the Practice Ex-
ercises in this chapter to the query “SQL relation”.
Answer: We do not consider the questions containing neither of the
keywords as their relevance to the keywords is zero. The number of
words in a question include stop words. We use the equations given
in Section 21.2 to compute relevance; the log term in the equation is
assumed to be to the base 2.
Q#
#wo- # #“rela- “
SQL” “relation relation” “SQL” “ ”
-rds “
SQL” -tion” term freq. term freq. relv. relv. relv.
Tota
1 84 1 1 0.0170 0.0170 0.0002 0.0002 0.0004
4 22 0 1 0.0000 0.0641 0.0000 0.0029 0.0029
5 46 1 1 0.0310 0.0310 0.0006 0.0006 0.0013
6 22 1 0 0.0641 0.0000 0.0029 0.0000 0.0029
7 33 1 1 0.0430 0.0430 0.0013 0.0013 0.0026
8 32 1 3 0.0443 0.1292 0.0013 0.0040 0.0054
9 77 0 1 0.0000 0.0186 0.0000 0.0002 0.0002
14 30 1 0 0.0473 0.0000 0.0015 0.0000 0.0015
15 26 1 1 0.0544 0.0544 0.0020 0.0020 0.0041
21.2 Suppose you want to find documents that contain at least k of a given
set of n keywords. Suppose also you have a keyword index that gives
you a (sorted) list of identifiers of documents that contain a specified
keyword. Give an efficient algorithm to find the desired set of docu-
ments.
Answer: Let S be a set of n keywords. An algorithm to find all docu-
ments that contain at least k of these keywords is given below :
1
评论0