Data Mining, CSE 498C/598C, Spring 2005, MW 3:15 to 4:30 PM, Debartolo 216

Professor-In-Charge: Dr. Nitesh Chawla (nchawla [at] cse.nd.edu)

Join the hunt for patterns in data.

"Print, film, magnetic, and optical storage media produced about 5 exabytes of new information in 2002. Ninety-two percent of the new information was stored on magnetic media, mostly in hard disks. ", http://www.sims.berkeley.edu/research/projects/how-much-info-2003/


l

Data mining: “The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”. Fayyad, Piatetsky-Shapiro & Smyth, 1996

Data mining is the process of automatic discovery of (potentially) useful information, patterns, associations, and even anomalies. It is becoming a ubiquitous and pervasive concept in various sectors, including but not limited to medicine, biology, commerce, WWW, security, network intrusion and fraud detection, space research.

Data mining uses methods from multiple fields including: machine learning, pattern recognition, databases, probability, statistics, information theory and visualization. The focus of this course will primarily be the machine learning component, with relevant inclusions and references from probability, statistics, pattern recognition, and information theory. The course will provide an introduction to the key principles, techniques (in data preparation and preprocessing, feature selection, classification, regression, clustering, combining multiple models, etc.), performance evaluation criteria, and applications. It will give you an opportunity to implement and experiment with some of the concepts, and apply them to the real world data sets. It will discuss some of the challenges encountered in data mining applications in the real world --- massive data sets, high class imbalance in data, unlabeled data, etc. It will also touch upon some of the advances in related fields such as web mining, intrusion detection, bioinformatics, distributed data mining. In addition, we will discuss the role of data mining in the society, drawing inferences from the popular media.

Given the flood of data, there is a lot of information to mine and data mining is, indeed, becoming a very compelling field.