| Date / Time | Title | Description
|
|---|
| Lecture 1
| 1 February, 2010 / 18:15h-19:15h
| Introduction to IR
| Information retrieval problem, inverted index, processing boolean queries
|
| Lecture 2
| 1 February, 2010 / 19:30h-20:30h
| The term vocabulary, tolerant retrieval
| Tokenization, stemming, lematization, wildcard queries, spelling correction
|
| Lecture 3
| 2 February, 2010 / 18:15h-19:15h
| The vector space model
| Inverse document frequency, TF-IDF weighting, Efficient scoring and ranking
|
| Lecture 4
| 2 February, 2010 / 19:30h-20:30h
| Evaluation in IR
| Test collections, Evaluation of ranked (unranked) retrieval sets
|
| Lecture 5
| 3 February, 2010 / 18:15h-19:15h
| Text classification
| Naive Bayes text classification, Rocchio classification, K-NN, Evaluation of text classification
|
| Lecture 6
| 3 February, 2010 / 19:30h-20:30h
| Text clustering | K-Means, Hierarchical agglomerative clustering, Cluster labeling
|
| Lecture 7
| 4 February, 2010 / 18:15h-19:15h
| Web search engines
| Index size and estimation, Crawling, Near-duplicates detection, Distributing indexes
|
| Lecture 8
| 4 February, 2010 / 19:30h-20:30h
| Link analysis and PageRank algorithm
| The Web as a graph, Hubs and Authorities, The Google's PageRank computation
|