Workshop on

INFORMATION RETRIEVAL AND WEB SEARCH

1st - 4th February, 2010

Organized by

Open and Distance Learning Centre at
The Faculty of Electrical Engineering and Information Technologies

and
TIME.mk Education

FINISHED !!!



Description

Information retrieval (IR) is the science of searching for documents, for information within documents as well as that of searching relational databases and the World Wide Web. IR is interdisciplinary, based on computer science, mathematics, library science, linguistics and statistics. Automated information retrieval systems are used to reduce what has been called "information overload".

This workshop will cover traditional material as well as recent advances in IR, the study of the processing, indexing, querying, organization, and classification of textual documents, including hypertext documents available on the World Wide Web. Practical working code examples will be shown that demonstrate some of the presented ideas.

Prerequisite:
Basic knowledge of data structures, algorithms and probability.

Language:
All materials (slides, codes, etc.) will be provided in English language but the lectures will be taught in Macedonian language.

Learning outcomes:
On completion of this workshop, you should be able to:
  1. Build complete IR system from scratch.
  2. Build simple web search engine (crawler, indexing, ranking).
  3. Apply different machine learning techniques (classification, clustering, etc.) on text documents.

Schedule

Date / TimeTitleDescription
Lecture 1 1 February, 2010 / 18:15h-19:15h Introduction to IR Information retrieval problem, inverted index, processing boolean queries
Lecture 2 1 February, 2010 / 19:30h-20:30h The term vocabulary, tolerant retrieval Tokenization, stemming, lematization, wildcard queries, spelling correction
Lecture 3 2 February, 2010 / 18:15h-19:15h The vector space model Inverse document frequency, TF-IDF weighting, Efficient scoring and ranking
Lecture 4 2 February, 2010 / 19:30h-20:30h Evaluation in IR Test collections, Evaluation of ranked (unranked) retrieval sets
Lecture 5 3 February, 2010 / 18:15h-19:15h Text classification Naive Bayes text classification, Rocchio classification, K-NN, Evaluation of text classification
Lecture 6 3 February, 2010 / 19:30h-20:30h Text clusteringK-Means, Hierarchical agglomerative clustering, Cluster labeling
Lecture 7 4 February, 2010 / 18:15h-19:15h Web search engines Index size and estimation, Crawling, Near-duplicates detection, Distributing indexes
Lecture 8 4 February, 2010 / 19:30h-20:30h Link analysis and PageRank algorithm The Web as a graph, Hubs and Authorities, The Google's PageRank computation

Lecturer

Dr. Igor Trajkovski graduated from the Institute of Informatics, Faculty of Natural Sciences and Mathematics, Ss. Cyril and Methodius University - Skopje, in 2001. In 2004 he obtained Master of Computer Science at Max Planck Institute for Informatics, Saarbrucken, Germany. His studies and research work has continued on Jozef Stefan Institute in Ljubljana, Slovenia in the Department of Knowledge Technologies, where in 2007 gains the academic title Doctor of Sciences. In 2007-2008 he worked for Google (Mountain View, California, USA and Zurich, Switcerland) as software engineer and software engineer in testing. He is the founder of the Macedonian and Slovenian news aggregators TIME.mk and TIMES.si and TIME.mk's branch for life-long continuous education, TIME.mk Education. His research interests include: Statistical Natural Language Processing, Machine Learning, Advanced Algorithms and Data Structures, Parallel Algorithms, Bioinformatics.

Registration and fees

The workshop tuition, 10200 MKD (6800 MKD for undergraduate students - proof required) includes four days (8 lectures) of presentations and learning. You will also receive take-home comprehensive reference material. The minimal (maximum) number of attendees is 10 (30). The offer is on a first-come-first-served basis, according to the payment of the workshop tuition.

First and Last Name:
Email address:
Status:
Comments:
PP 50 for tuition paymentVenue of the event
For larger image, click on it

Certificates of Attendance, issued by ODLC at FEIT and edu.TIME.mk, will be awarded to students who will attend at least 6 of the 8 lectures.

Contact

If you have additional questions concerning the workshop, you can ask Dr. Igor Trajkovski (email: admin@time.mk).



© TIME.mk Education