Kenan Fellows Program Logo and page header graphic

Understanding Data Mining: Extracting, Organizing, and Analyzing Large Sets of Data

Lesson One: Learning to Use R Statistical Software for Linear Regression — An Alternative to the Graphing Calculator

Introduction

Students often learn how to do linear regression activities using a graphing calculator. This lesson provides an extension—an opportunity to complete these same types of assignments using R statistical software. Students will use the software to create scatter plots and to develop linear regression models that can be used to make predictions.

Learning Outcomes

At the end of this lesson, students should be able to use R to:

  • Create scatterplots with titles
  • Find linear regression models
  • Add the graph of the line of best fit to their scatterplots
  • Calculate correlation of variables

Students should also be able to:

  • Interpret the meaning of the slope and the y-intercept of linear models
  • Use linear models to make predictions

Classroom Time Required

To complete this lesson, one block period or two traditional periods (for a total of approximately ninety minutes) would be necessary. This is the first lesson in a set of three.

Materials Needed

  • The Basics of R handout
  • Using R for Linear Regression packet
  • pencil/pen
  • Graphing calculator
  • Overhead graphing panel (optional)
  • One computer per student
  • One computer with projector for teacher
  • R Statistical Software, (used with permission from R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.)
  • Microsoft Excel

Pre-Activities

Prior to completing this lesson, students should be able to:

  • Distinguish between independent and dependent variables
  • Create scatterplots, and describe correlation
  • Use a graphing calculator to find linear regression models for data sets

Prior to implementing this lesson, teachers should:

  • Become familiar with basic R commands
  • Develop data sets that may be used for demonstration purposes, if necessary
  • Ensure that R (a free software) is installed on student computers

Activities

First, have students use a graphing calculator to create a scatterplot and to find a linear regression model for the following data set, which details the results of a survey of one of my Honors Geometry classes. I asked students to identify how many hours of television they watched on the night before our big test on triangles:

Hours Spent Watching TV Grade on Test (out of 100)
4 71
2 81
4 62
1 86
3 77
1 93
2 84
3 80
2 85

Discuss with students the meaning of the correlation coefficient, as well as correct interpretations of the slope and y-intercept of the linear regression equation. Share with students that the graphing calculator is just one of many tools that can be used to work on linear regression problems, and then hand out The Basics of R sheet. This would be an optimal time to review the usefulness of R as well as some of the basic commands that will be utilized. Emphasize to students that R uses code, just like a program.

Next, distribute the Using R for Linear Regression lesson packet. Complete section I together as a class, and make sure that you model the steps on your computer using a projector. After finishing section I, have students address the questions in section II (either independently or in small groups) as you circulate. After a few minutes, discuss the responses as a class, and then have students complete section III on their own. At this point, circulate to answer questions and/ or address concerns. After all students have had an opportunity to complete the task, move on to together (to make sure students properly load the data set from Microsoft Excel) and let students work on their own to complete numbers 5 through 8.

Modifications

This activity can be used in almost any type of classroom setting. It may be helpful to pair students with learning disabilities or English language learners with higher-achieving students. Also, identify “helpers”—those students who quickly catch on to how the R environment works and can help you keep other students on track. One teacher trying to help thirty students write programming code can be a frustrating experience for everyone involved. You can also choose to have students create their own data for section IV, rather than using a pre-assigned set. In the past, I have had students work in groups of 6 to collect their forearm lengths and heights as their data set.

File Links for Supporting Materials