Understanding Data Mining: Extracting, Organizing, and Analyzing Large Sets of Data
Author: | Dail Midgette |
Level: | High School |
Content Area: | Mathematics |
Author: | Dail Midgette |
Level: | High School |
Content Area: | Mathematics |
In Algebra I, students study data sets with one predictor variable and one response variable. However, in the real world, most response variables have numerous predictor variables, many of which may have a significant impact on the data. These different variables may also have differing effects on the situation at hand, so it is important to identify their effects and then use them appropriately to make sound, more valid predictions. In this lesson, students will use R Statistical Software to navigate through the basics of data mining, a process in which the effects of individual variables can be determined. Students will utilize multiple methods of variable selection—forward selection, backward selection, and stepwise selection—in an attempt to determine which variables are most influential in a given situation.
At the end of this lesson, students should be able to use R to:
Students should also be able to:
To complete this lesson, one block period or two traditional periods (for a total of approximately ninety minutes) would be necessary. This is the second lesson in a set of three.
R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0,
URL http://www.R-project.org.)
Prior to completing this lesson, students should be able to:
Prior to implementing this lesson, teachers should:
Distribute the Using R for Data Mining packet to students. Lead students through the information in section I, and then allow students to complete section II on their own for approximately ten minutes. Have students share some of their responses with the class, and then begin guiding students through section III. Encourage students to take their time when typing the code, and also emphasize to them the complexity of the work that the software is doing. After completing section III as a group, allow students approximately ten minutes to complete section IV. At the end of class, have students share some of the uses of data mining they found, and compile a list for future reference.
This activity can be used in almost any type of classroom setting. It may be helpful to pair students with learning disabilities or English language learners with higher-achieving students. Also, identify “helpers”—those students who quickly catch on to how the R environment works and can help you keep other students on track. One teacher trying to help thirty students write programming code can be a frustrating experience for everyone involved.