Understanding Data Mining
Author: | Celia Rowland |
Level: | High School |
Content Area: | Mathematics |
Author: | Celia Rowland |
Level: | High School |
Content Area: | Mathematics |
Corporations use data mining to track consumer behavior in purchasing products ranging from gas to oranges to Coca-Cola™. Hospitals create predictive models about infection rates, demand for x-rays and anticipated births. Today most Fortune 500 corporations use data mining to create credible models which anticipate the demand for goods and services. Students will go beyond the textbook lessons for linear regression in AP Statistics with these explorations on data analysis and model fitting. The use of R statistical software by students will provide them with a real taste of how data analysis is conducted in the 21st century business environments across our globe. You, the teacher, will work in a consultation mode rather than a director model with these lessons.
Students should be able to answer these essential questions after completion of these lessons:
These lesson plans are aligned to the North Carolina Standard Course of Study, the AP Statistics course requirements and the National Council of Teachers of Mathematics Standards.
NCTM Standard: Algebra – Use mathematical models to represent and understand quantitative relationships. (1)
NCSCOS (2005) Competency Goal 4 – The learner will analyze bivariate data solve problems. (Objective 4.01 a, Objective 4.01b, Objective 4.01c..)(2)
AP Statistics Topic 1D – Exploring bivariate data.(3)
Lesson 1 requires one block period or two 45-minute classes on sequential days. This includes time to download and install R statistical software. Ideally, this lesson should occur immediately after linear regression has been introduced in the classroom. You will want to have covered residual analysis prior to this lab. Students will have a greater understanding of residual plots if the concepts have been introduced in the classroom prior to the lab.
Lesson 2 requires one block period or two 45-minute classes on sequential days. Lesson 2 can occur at any time following Lesson 1. For continuity’s sake, this lesson should be completed prior to moving to another unit in the course.
Separate handouts for students with commands and outline of lesson.
Lesson 1 has two handouts: (1) Installing and Working with R Statistical Software, (2) Least Squares Linear Regression in R.
Lesson 2 has one handout: Multivariate Least Squares Regression Model with Variable Selection.
Data files are provided for each lesson as Excel spreadsheets with directions on how to convert the file to the proper format for use with R.
Statistical software program, R, and computer with Windows platform. (There is a version available if your school has Mac computers.) Download the software at http://www.r-project.org. You will need a central location to place data files for students to access. Some examples are your own web page, the course site on Blackboard, or a central location on the school’s server. These files do not work well if placed in Google Documents as the statistical software cannot read it. Students may want a flash drive to save their files if they do not have space on the school’s server.
Basic lessons on correlation, coefficient of determination and the least-squares linear regression model from the AP Statistics curriculum should be covered in the classroom prior to the lab. It is optional if you wish to show students how to run simple linear regression on the Ti-8* series graphic calculator prior to any lab time. Students should already be familiar with the vocabulary related to LSLR: explanatory variable, response variable, error, slope, intercept and residual.
A multiple-question quiz for each of the two lessons is provided. These questions can be incorporated into a unit test that covers all of the linear regression material covered in AP Statistics. There are also two short assignments that students may complete as a follow-up to their lab activities. Rubrics are provided for the assignments.
(1) Principles and Standards for School Mathematics, NCTM, p. 37.
(2) http://www.ncpublicschools.org/curriculum/mathematics/scos/2003/9-12/72a...
(3) AP Statistics Teacher’s Guide, p. 3.