Počet kreditů 4
Vyučováno v Winter
Rozsah výuky 2+2c
Garant předmětu
Přednášející
Cvičící

The subject reviews current tools for data mining and illustrates their properties using real-life tasks. Specific attention is devoted to descriptive presentation of the obtained results along the data-mining process - an approach that significantly improves and facilitates communication with the domain expert or data owner (e.g. medical professional) who can thus take active part in the process by focussing to the most promising direction.

1. Data-mining - CRISP-DM process description and methodology. Some motivating case studies.
2. Review of data modelling tools and examples of their application I.
3. Sources of data. Data anonymization and protection.
4. Fusing data from heterogeneous sources.
5. Data understanding, pre-processing and aggregation.
6. Methods of data visualisation. Identification of outliers or wrong values.
7. Choice of relevant attributes.
8. Time series data and their processing.
9. Review of data modelling tools and examples of their application II.
10. Model evaluation and knowledge derived. Deployment.
11. Visualization of models
12. Processing input in the form of natural language text.
13. Processing very complex data.
14. Reserve.

Accompanying computer labs provide the students with an opportunity to master the tools and methods presented during the lectures when solving some simple real-life problems. Hand-on exercises follow the syllabus of the lecture. All the students are given individual data mining assignments which help them to gain experience in CRISP data mining methodology.

[1] Few, S.: Simple Visualization Techniques for Quantitative Analysis - Now you see it. Analytics Press 2009.
[2] Larose, D.T.: Discovering Knowledge in Data: An Introduction to Data Mining, Wiley 2005.
[3] Larose, D.T.: Data Mining - Methods and Models, Wiley 2006.

Rozvrh předmětu
Po
Út
St
Čt
PřednáškyCvičení