Home Links Resources Bookshelf Tutorials Sabermetrics Data
DR. JOHN RASP'S STATISTICS WEBSITE
BSAN 179 STAT 301 SPTB 345 STAT 440 STAT 460 Minor

STAT 440 - Forecasting
Class Activity - Correlation Transformation
(counts as Lecture Review #13)

This in-class activity uses the Houses data set. These data were obtained by the class in a previous semester, from housing listings for DeLand found on the Realtor.com website. The goal is to predict the house’s asking price, given three variables: the number of bedrooms, number of bathrooms, and house size (in square feet). Use these data to answer the following questions:

1) Begin by fitting a multiple regression model that predicts house price given the three predictor variables (bedrooms, bathrooms, house size.

2) Interpret the slope coefficients. Do they make sense? Can we easily and meaningfully say that one predictor variable is more (or less) important than the others?

3) Find the correlations between each of the three predictors and the dependent variable. What do these tell you about the strength of these relationships?

4) One problem with comparing the three slopes directly is that we are in some sense comparing "apples to oranges" — since the units on the variables are not the same. One way of dealing with this issue is called the correlation transformation:

transformed data = (1/√(n-1)) * (data-mean)/stdev

Compute the correlation-transformed data for each of the four variables (Y and X1 through X3). NOTE that we can interpret the transformed data values as representing "number of standard deviations away from the mean."

5) Begin your analysis of the transformed data by fitting three separate regression models, one for each of the three predictor variables. Do those slopes look familiar? (Hint: see your answer to Question 3.)

6) Now do the full regression again (don’t "re-run" it … running is bad for regressions), only with the transformed data. What are the slopes? How do we interpret them?

7) What is the intercept of your model on the transformed data? Why?

8) Now use matrix computations for regression (since it’s been a while since we’ve done them, and we need to review them so we can use them in upcoming class material). Recall that the matrix formula for a regression model is

β = ( X T X )-1 ( X T Y )

9) You can back-transform coefficients obtained from the correlation transformation by

β = (stdev(y))/(stdev(x)) * β*

where β represents the slope in un-transformed world, and β* is the slope in the transformed variables. Do this back-transformation for all slopes. Do the resulting numbers look familiar?


This website maintained by John Rasp. Contact me via email: jrasp@stetson.edu