Fall 2013 DATA 101

= Important Links =
 * Learn2Mine
 * Learn2Mine Galaxy
 * Learn2Mine RStudio
 * Course Materials
 * Book website for code and data
 * One of many online tutorials

= Course Description = Introduction to the use of computer based tools for the analysis of large data sets for the purpose of knowledge discovery. Students will learn to understand the Data Science process and the difference between deductive hypothesis-driven and inductive data-driven modeling. Students will have hands-on experience with various on-line analytical processing and data mining software and complete a project using real data.

= Syllabus = [[Media:DATA_101_Syllabus_Fall_2013.pdf|Download PDF version here]]

= Facebook Group = https://www.facebook.com/groups/324697457664695/

= Department of Computer Science = [[Media:Computer_Science_Guide.pdf|Guide to the Computer Science Department]]

= Teams =

Section 01
Team 1: Ken Startin, Julia Pogue, Travis Nesland, Justin Carper, and Robert Ziehr (dropped)

Email: startinkr@g.cofc.edu, poguejk@g.cofc.edu, tnesland@gmail.com, carperjt@g.cofc.edu

Team 2: Cristovam, Aaron Walton, Joe Suggs, Kellan Fluett, and Anna Bishop (dropped)

Email: fluetteka@g.cofc.edu, araujosegundoc@g.cofc.edu, Awalton1337@gmail.com, suggsjm@g.cofc.edu

Team 3: Clay Gardner, Ethan Redel, Isis Barber (dropped), Carson Smith, Chris Johnson, and Alex Rand

Email: gardnercm@g.cofc.edu,redelea@g.cofc.edu, johnsoncd2@g.cofc.edu, smithc2@g.cofc.edu, jalex.rand@gmail.com

Team 4: Alex Wang, Josh Pugsley (dropped), Bryan Craig, Dayna Karns, and Brandon Olesh

Email: craigbj@g.cofc.edu,wangmx@g.cofc.edu,karnsdf@g.cofc.edu,bsolesh@g.cofc.edu,goldbergjl@g.cofc.edu

Team 5: Patrick Brewer, Ben Rucker, Tony Tang, Alex Wray, and Maya Jackson

Email: pkbrewer@g.cofc.edu, jacksonmd@g.cofc.edu, tktang@g.cofc.edu, ruckerbe@g.cofc.edu, wrayat@g.cofc.edu

Section 02
Team 1: Ian Dilling, Liana Valentino, Alexander Shoup, Bryce Lowell, and Raven Mack

Email: valentinol@g.cofc.edu, mackrl1@g.cofc.edu, dillingib@g.cofc.edu, lowellba@g.cofc.edu, shoupar@g.cofc.edu

Team 2: Stephanie Roberts, Sheree Grant, Connor Olds, Brad Maran, and Declan Whitmyer

Email: sagrant@g.cofc.edu, robertssr@g.cofc.edu, maranbg@g.cofc.edu, oldscw@g.cofc.edu, whitmyerdw@g.cofc.edu

Team 3: Laura Henderson, Megan McCorry, Logan Dowdle, Diana Devine, and Melissa Lorang

Email: lorangmr@g.cofc.edu, hendersonla@g.cofc.edu, dowdlelt@g.cofc.edu, devinedm@g.cofc.edu, mccorrymk@g.cofc.edu

Team 4: Dan Adams, Brooke Boyd, John Lloyd, Madi McGregor, and Whitney Miles

Email: adamsdb@g.cofc.edu, mcgregormr@g.cofc.edu, lloydje1@g.cofc.edu, brooke.m.boyd22@gmail.com, wdmiles@g.cofc.edu

Team 5: Blake Jenks, Domenick Larosa, Scott Sandie, Kaya Tollos, and Veronica Dales

Email: dlarosa@g.cofc.edu, tollask@g.cofc.edu, jenksbc@g.cofc.edu, dalesvr@g.cofc.edu, sandiesm@g.cofc.edu

= R Tutorial = One of many online tutorials

= Course Materials = Course Materials

= Schedule =

Midterm
The midterm is scheduled for Thursday, October 10th.

Week 1

 * Tuesday
 * Overview of Emerging Field
 * History background and motivations
 * Knowledge discovery overview
 * Thursday
 * Select partners and introduction to stand-up format
 * Introduction to Kaggle.com
 * Introduction to Learn2Mine
 * Introduction to R and Rstudio

Week 2

 * Tuesday
 * Overview of Emerging Field
 * History background and motivations
 * Knowledge discovery overview
 * Algorithms and Complexity
 * Thursday
 * Standup meeting on kaggle topics
 * Introduction to R and Rstudio

Week 3

 * Tuesday
 * Predicting Algae Blooms
 * Data Description (2.1 and 2.2)
 * Loading Data into R (2.3)


 * Thursday
 * Lab based on Chapter 2 and progress on Tuesday

Week 4

 * Tuesday
 * Predicting Algae Blooms
 * Data Visualization and Summarization (2.4)
 * Unknown Values (2.5)


 * Thursday
 * Lab based on Chapter 2 and progress on Tuesday

Week 5

 * Tuesday
 * Obtaining Prediction Models (2.6)
 * Model Evaluation and Selection
 * Predictions for the 7 Algae


 * Thursday
 * Lab based on Chapter 2 and progress on Tuesday

Week 6

 * Tuesday
 * Obtaining Prediction Models (2.6)
 * Model Evaluation and Selection
 * Predictions for the 7 Algae


 * Thursday
 * Lab based on Chapter 2 and progress on Tuesday

Week 7

 * Prediction Stock Market Returns
 * Available Data (3.2)
 * Defining Prediction Tasks (3.3)
 * Prediction Models (3.4)
 * From Predictions into Actions (3.5)
 * Model Evaluation and Selection (3.6)
 * Trading System (3.7)


 * Thursday
 * Lab based on Chapter 3 and progress on Tuesday

Week 8

 * Tuesday
 * Detecting Fraudulent Transactions
 * Problem Description and Objectives (4.1)
 * Available Data (4.2)
 * Defining the Data Mining Tasks (4.3)


 * Thursday
 * Lab based on Chapter 4 and progress on Tuesday

Week 9

 * Tuesday
 * Detecting Fraudulent Transactions
 * Obtaining Outlier Ranking (4.4)
 * Unsupervised Approaches (4.4.1)


 * Thursday
 * Lab based on Chapter 4 and progress on Tuesday

Week 10

 * Tuesday
 * Detecting Fraudulent Transactions
 * Obtaining Outlier Ranking (4.4)
 * Supervised Approaches (4.4.2)


 * Thursday
 * Lab based on Chapter 4 and progress on Tuesday

Week 11

 * Tuesday
 * Detecting Fraudulent Transactions
 * Obtaining Outlier Ranking (4.4)
 * Supervised Approaches (4.4.2)
 * Semi-Supervised Approaches (4.4.3)


 * Thursday
 * Lab based on Chapter 4 and progress on Tuesday

Week 12

 * Tuesday
 * Classifying Microarray Samples
 * Problem Description and Objectives (5.1)
 * Available Data (5.2)
 * Gene (Feature) Selection (5.3)
 * Simple Filters Based on Distribution Properties


 * Thursday
 * Lab based on Chapter 5 and progress on Tuesday

Week 13

 * Tuesday
 * Classifying Microarray Samples
 * Gene (Feature) Selection (5.3)
 * ANOVA Filters
 * Filtering Using Random Forests
 * Filtering Using Feature Clustering Ensembles


 * Thursday
 * Lab based on Chapter 5 and progress on Tuesday

Week 14

 * Tuesday
 * Classifying Microarray Samples
 * Predicting Cytogenetic Abnormalities (5.4)
 * Defining the Prediction Task
 * Evaluation Metric
 * The Experimental Procedure


 * Thursday
 * Lab based on Chapter 4 and progress on Tuesday

Week 15

 * Tuesday
 * Classifying Microarray Samples
 * Predicting Cytogenetic Abnormalities (5.4)
 * The Modeling Techniques
 * Random Forests
 * k-Nearest Neighbors
 * Comparing Models


 * Thursday
 * Lab based on Chapter 4 and progress on Tuesday