Understanding Data Mining: Extracting, Organizing, and Analyzing Large Sets of Data
Students often learn how to do linear regression activities using a graphing calculator. This lesson provides an extension—an opportunity to complete these same types of assignments using R statistical software. Students will use the software to create scatter plots and to develop linear regression models that can be used to make predictions.
At the end of this lesson, students should be able to use R to:
Students should also be able to:
To complete this lesson, one block period or two traditional periods (for a total of approximately ninety minutes) would be necessary. This is the first lesson in a set of three.
Prior to completing this lesson, students should be able to:
Prior to implementing this lesson, teachers should:
First, have students use a graphing calculator to create a scatterplot and to find a linear regression model for the following data set, which details the results of a survey of one of my Honors Geometry classes. I asked students to identify how many hours of television they watched on the night before our big test on triangles:
|Hours Spent Watching TV||Grade on Test (out of 100)|
Discuss with students the meaning of the correlation coefficient, as well as correct interpretations of the slope and y-intercept of the linear regression equation. Share with students that the graphing calculator is just one of many tools that can be used to work on linear regression problems, and then hand out The Basics of R sheet. This would be an optimal time to review the usefulness of R as well as some of the basic commands that will be utilized. Emphasize to students that R uses code, just like a program.
Next, distribute the Using R for Linear Regression lesson packet. Complete section I together as a class, and make sure that you model the steps on your computer using a projector. After finishing section I, have students address the questions in section II (either independently or in small groups) as you circulate. After a few minutes, discuss the responses as a class, and then have students complete section III on their own. At this point, circulate to answer questions and/ or address concerns. After all students have had an opportunity to complete the task, move on to together (to make sure students properly load the data set from Microsoft Excel) and let students work on their own to complete numbers 5 through 8.
This activity can be used in almost any type of classroom setting. It may be helpful to pair students with learning disabilities or English language learners with higher-achieving students. Also, identify “helpers”—those students who quickly catch on to how the R environment works and can help you keep other students on track. One teacher trying to help thirty students write programming code can be a frustrating experience for everyone involved. You can also choose to have students create their own data for section IV, rather than using a pre-assigned set. In the past, I have had students work in groups of 6 to collect their forearm lengths and heights as their data set.