STATISTICS 174: COURSE DESCRIPTION
Fall 2004


Instructor

Richard L. Smith
Smith 201
Office Phone: 962-2660
Fax: 962-0391
Home Phone: 408-8126
Email: rls@email.unc.edu
Web Page: /faculty/rsmith.html

Once on the web page you can click "Course Web Page" for updated information about the course. The "Data Page" is used to store all data sets which are used in the text or set as class exercises.

Please check the course web page often, as important information will be placed on it.

Class time and place

Tuesdays and Thursdays 12:30 p.m. - 1:45 p.m.
Hanes 308

Office Hours

Feel free to come by my office any time, or approach me after class, or email or call for an appointment. However, there will be an official office hour as well, when anyone can drop in. Day and time to be announced, after consultation with the class.

Grader

Jungyeon Yoon.
Office: New West 108
Phone: 962-5707
Email: june0821@email.unc.edu.

Course Text

The course text will be the draft version of Linear Regression by R.L. Smith and K.D.S. Young. There will be a charge of $25 to cover photocopying and staff time. Donna Terrell, Smith 210 (962-8401, daterrel@email.unc.edu), will be responsible for collecting the money and distributing the text. The schedule for distributing this material will be announced in class.

Chapter Headings

Chapter 1: Air pollution and public health: A case study for regression analysis.
This introductory chapter discusses a major public policy issue where the use (or, depending on your point of view, misuse) of regression analysis has featured heavily. It illustrates some of the techniques which we will be discussing in detail later in the course, and also describes some of the pitfalls associated with the use of regression to solve substantive scientific problems.

Chapter 2: Simple linear regression.
For most of you, much of this material will be revision, covering the simple case of one y variable and one x variable. However, we also discuss some more subtle features, such as simultaneous confidence intervals, inverse regression or calibration, and tests for autocorrelation.

Chapter 3: Multiple regression.
Matrix formulation and solutions. Confidence and prediction intervals, and hypothesis tests. Simultaneous estimation. Power of the F test. Examples. The chapter concludes with an outline of the geometric approach to least squares theory, with the aid of which we are able to provide slick proofs of all the major mathematical results.

Chapter 4: Diagnostics for influential observations.
This chapter is concerned with the effect of outliers among either the x or y values. The hat matrix. Diagnostics for influence: DFFITS, DFBETAS, Cook's statistic, COVRATIO. Graphical methods. Examples.

Chapter 5: Diagnostics for model selection.
Multicollinearity. Variable selection. Transformations. Applications.

Chapter 6: Miscellaneous topics in regression.
Weighted and Generalized least squares. Response surface methodology. Introduction to nonlinear regression.

Chapter 7: Analysis of Designed Experiments.
One-way and two-way analysis of variance, latin squares, factorial designs.

Chapter 8: An introduction to generalized linear models.
Extensions of linear regression: logistic regression, Poisson regression.

Computing

The course includes an extensive practical computing component. The main software packages I intend to use are SAS and S-PLUS. Depending on how the time goes, I may also introduce a little MATLAB. There is also R, a freeware version of S-PLUS, which can be used as an alternative to S-PLUS for most applications.

It is your responsibility to get hold of these packages and to familiarize yourself with their basic features, but I will give you every help that I can to get started.

To use these packages, you have two main options: (1) use the University's statistical applications computer "statapps", for which, as graduate students, you should all have direct access via your ONYEN. This is a Unix-based system which includes SAS, S-PLUS and Matlab.

(2) Use a free-standing PC or laptop with SAS and S-PLUS installed. For most students, this is the more convenient option. Statistics Department students have access to Statistics Department machines, which have these packages installed. OR Department students have access to machines in their own department, which should also have the packages installed, but please let me know if there are any problems with that. As an alternative, you can install SAS and S-PLUS yourself, either on a departmental machine or your own personal machine, using CDs that you can obtain from ATN. To find out the procedures for this, send an email to the Software Acquisition Office, software@unc.edu. The corresponding web page is

http://www.unc.edu/atn/software/

If you want to install R, you should go to

http://cran.r-project.org

If your machine is running Windows, click on the button that says "Windows (95 and later)", go into the "base" directory, and then click on "SetupR.exe" to install the basic package.

Although I will be providing handouts or weblinks for all the features of SAS and S-PLUS which will actually be used for the exercises, you may want to gain some familiarity with these packages for yourself. An excellent introduction to SAS is is The Little SAS Book by Delwiche and Slaughter, available through the Campus Store. This is written for beginners, but it will take you as far as PROC REG and PROC ANOVA (Chapter 7) which is plenty to give you the flavor of how the package works. In addition, there are various web-based guides - I have included in link to Bob Derr's guide in my home web page and you can also access SAS documentation through the ATN pages on the web (www.unc.edu and follow links to "Computing", "ATN", "Technical Support", "Statistical Computing"). Further links from there to "StatApps", "Introduction to the Statistical Server" and "Running Statistical Applications" will tell you exactly how to access SAS on the Statapps machine.

On the course home webpage you will also find an introduction to S-PLUS written by Kouros Owzar, as well as links to the official S-PLUS documentation. For R, there is a very nice book entitled Introductory Statistics with R by Peter Dalgaard (Springer Verlag, published 2002) which could actually serve as a pretty good introduction to S-PLUS as well.

If we get as far as using Matlab, I will provide separate instructions and documentation about this.

Assignments and Exams

Homeworks consisting of both theoretical and computational exercises will be set, at approximately two-week intervals. There will be a midterm and a final exam. Provisional distribution of marks: 30% for homework assignments, 30% for the midterm, 40% for the final exam.

Further reading

Other references that may be helpful include the following:

Atkinson, A.C. (1985), Plots, transformations, and regression. Oxford : Oxford University Press. QA278.2 .A85 1985
Cook, R.D. and Weisberg, S. (1982), Residuals and influence in regression. New York : Chapman and Hall. QA278.2 .C665 1982
Cook, R.D. and Weisberg, S. (1999), Applied regression including computing and graphics. New York : Wiley. QA278.2 .C6617 1999
Dean, A. and Voss, D. (1999), Design and analysis of experiments. New York : Springer. QA279 .D43 1999
Draper, N.R. and Smith, H. (1998), Applied Regression Analysis (Third Edition). New York: Wiley. QA278.2 .D7 1998
McCullagh, P. and Nelder, J.A. (1989), Generalized linear models. London : Chapman and Hall. QA276 .M38 1989
Neter, Kutner, Nachtsheim and Wasserman (1996), Applied Linear Statistical Models. Fourth Edition: Irwin, Chicago. QA278.2 .A66 1996
Rawlings, J.O., Pantula, S. and Dickey, D.A. (1998), Applied regression analysis : a research tool. New York : Springer. QA278.2 .R38 1998
Scheffe, H. (1959), The analysis of variance. New York : Wiley. QA276 .S34
Weisberg, S. (1985), Applied linear regression. New York : Wiley. QA278.2 .W44 1985

Return to Richard Smith's page