Understanding Data Mining: Extracting, Organizing, and Analyzing Large Sets of Data
Author: | Dail Midgette |
Level: | High School |
Content Area: | Mathematics |
Author: | Dail Midgette |
Level: | High School |
Content Area: | Mathematics |
In Algebra I, students study data sets with one predictor variable and one response variable. However, in the real world, most response variables have numerous predictor variables, many of which may have a significant impact on the data. These different variables may also have differing effects on the situation at hand, so it is important to identify their effects and then use them appropriately to make sound, more valid predictions. In this lesson, students will create their own data set and then use R Statistical Software to mine their data in an attempt to identify the variables that most significantly impact the selling price of a home. Students will utilize multiple methods of variable selection—forward selection, backward selection, and stepwise selection—in an attempt to determine which variables are most influential.
At the end of this lesson, students should be able to use R to:
Students should also be able to:
To complete this lesson, one block period or two traditional periods (for a total of approximately ninety minutes) would be necessary. This is the second lesson in a set of three.
Prior to completing this lesson, students should be able to:
Prior to implementing this lesson, teachers should:
Distribute the Putting it All Together packet to students. Lead students through the overall outline of their assignment. It may be helpful to go through the initial search engine setup with students so they can easily find the information they will need to create their own set of data. At this point in the instructional unit, students should be able to use their previous lesson packets to guide them through this assignment independently. At the end of the lesson, students should print a copy of their R script (including all commands used) and attach it to their lesson packets.
This activity can be used in almost any type of classroom setting. It may be helpful to pair students with learning disabilities or English language learners with higher-achieving students. Also, identify “helpers”—those students who quickly catch on to how the R environment works and can help you keep other students on track. One teacher trying to help thirty students write programming code can be a frustrating experience for everyone involved.