Instructor
Richard L. Smith
Once on the web page you can click "Course Web Page" for updated information
about the course. The "Data Page" is used to store all data sets which are
used in the text or set as class exercises.
Please check the course web page often, as important information
will be placed on it.
Class time and place
Tuesdays and Thursdays 12:30 p.m. - 1:45 p.m.
Office Hours
Feel free to come by my office any time, or approach me after class,
or email or call for an
appointment. However, there will be an official office hour as
well, when anyone can drop in. Day and time to be announced,
after consultation with the class.
Grader
Jungyeon Yoon.
Course Text
The course text will be the draft version of Linear Regression by
R.L. Smith and K.D.S. Young.
There will be a charge of $25 to cover photocopying and staff time.
Donna Terrell, Smith 210 (962-8401, daterrel@email.unc.edu), will be
responsible for collecting the money and distributing the text.
The schedule for distributing this material will be announced in class.
Chapter Headings
Chapter 1: Air pollution and public health: A case study for regression analysis.
Chapter 2: Simple linear regression.
Chapter 3: Multiple regression.
Chapter 4: Diagnostics for influential observations.
Chapter 5: Diagnostics for model selection.
Chapter 6: Miscellaneous topics in regression.
Chapter 7: Analysis of Designed Experiments.
Chapter 8: An introduction to generalized linear models.
Computing
The course includes an extensive practical computing component. The
main software packages I intend to use are SAS and S-PLUS. Depending
on how the time goes, I may also introduce a little MATLAB. There is
also R, a freeware version of S-PLUS, which can be used as an alternative
to S-PLUS for most applications.
It is your responsibility to get hold of these packages and to
familiarize yourself with their basic features, but I will give you
every help that I can to get started.
To use these packages, you have two main options: (1) use the University's
statistical applications computer "statapps", for which, as graduate
students, you should all have direct access via your ONYEN. This is
a Unix-based system which includes SAS, S-PLUS and Matlab.
(2) Use a free-standing PC or laptop with SAS and S-PLUS installed. For most
students, this is the more convenient option. Statistics Department
students have access to Statistics Department machines, which have
these packages installed. OR Department students have access to machines
in their own department, which should also have the packages installed,
but please let me know if there are any problems with that. As an
alternative, you can install SAS and S-PLUS yourself, either on a
departmental machine or your own personal machine, using CDs that you
can obtain from ATN. To find out the procedures for this, send an
email to the Software Acquisition Office, software@unc.edu.
The corresponding web page is
http://www.unc.edu/atn/software/
If you want to install R, you should go to
If your machine is running Windows, click on the button that says
"Windows (95 and later)", go into the "base" directory, and then
click on "SetupR.exe" to install the basic package.
Although I will be providing handouts or weblinks for all the features
of SAS and S-PLUS which will actually be used for the exercises, you
may want to gain some familiarity with these packages for yourself.
An excellent introduction to SAS is is
The Little SAS Book by Delwiche and Slaughter, available through the
Campus Store. This is written for beginners,
but it will take you as far as PROC REG and PROC ANOVA (Chapter 7) which
is plenty to give you the flavor of how the package works.
In addition, there are various web-based guides - I have included in link to
Bob Derr's guide in my home web page and you can also access SAS documentation
through the ATN pages on the web (www.unc.edu and follow links to "Computing",
"ATN", "Technical Support", "Statistical Computing"). Further links from there
to "StatApps", "Introduction to the Statistical Server" and "Running
Statistical Applications" will tell you exactly how to access SAS on the
Statapps machine.
On the course home webpage you will also find an introduction to
S-PLUS written by Kouros Owzar, as well as links to the official S-PLUS
documentation. For R, there is a very nice book entitled
Introductory Statistics with R by Peter Dalgaard (Springer
Verlag, published 2002) which could actually serve as a pretty good
introduction to S-PLUS as well.
If we get as far as using Matlab, I will provide separate instructions
and documentation about this.
Assignments and Exams
Homeworks consisting of both theoretical and computational exercises will be set,
at approximately two-week intervals. There will be a midterm and a final exam.
Provisional distribution of marks: 30% for homework assignments, 30% for the
midterm, 40% for the final exam.
Further reading
Other references that may be helpful include the following:
Atkinson, A.C. (1985),
Plots, transformations, and regression.
Oxford : Oxford University Press.
QA278.2 .A85 1985
Return to Richard
Smith's page
Smith 201
Office Phone: 962-2660
Fax: 962-0391
Home Phone: 408-8126
Email: rls@email.unc.edu
Web Page: /faculty/rsmith.html
Hanes 308
Office: New West 108
Phone: 962-5707
Email: june0821@email.unc.edu.
This introductory chapter discusses a major public policy issue where the use
(or, depending on your point of view, misuse) of regression analysis has
featured heavily. It illustrates some of the techniques which we will be
discussing in detail later in the course, and also describes some of the
pitfalls associated with the use of regression to solve substantive
scientific problems.
For most of you, much of this material will be revision, covering the simple
case of one y variable and one x variable. However, we also discuss some more
subtle features, such as simultaneous confidence intervals, inverse regression
or calibration, and tests for autocorrelation.
Matrix formulation and solutions. Confidence and prediction intervals, and
hypothesis tests. Simultaneous estimation. Power of the F test. Examples.
The chapter concludes with an outline of the geometric approach to least
squares theory, with the aid of which we are able to provide slick proofs
of all the major mathematical results.
This chapter is concerned with the effect of outliers among either the x or
y values. The hat matrix. Diagnostics for influence: DFFITS, DFBETAS, Cook's
statistic, COVRATIO. Graphical methods. Examples.
Multicollinearity. Variable selection. Transformations.
Applications.
Weighted and Generalized least squares.
Response surface methodology.
Introduction to nonlinear regression.
One-way and two-way analysis of variance, latin squares, factorial designs.
Extensions of linear regression: logistic regression, Poisson regression.
Cook, R.D. and Weisberg, S. (1982),
Residuals and influence in regression.
New York : Chapman and Hall.
QA278.2 .C665 1982
Cook, R.D. and Weisberg, S. (1999),
Applied regression including computing and graphics.
New York : Wiley.
QA278.2 .C6617 1999
Dean, A. and Voss, D. (1999),
Design and analysis of experiments.
New York : Springer.
QA279 .D43 1999
Draper, N.R. and Smith, H. (1998),
Applied Regression Analysis (Third Edition).
New York: Wiley.
QA278.2 .D7 1998
McCullagh, P. and Nelder, J.A. (1989),
Generalized linear models.
London : Chapman and Hall.
QA276 .M38 1989
Neter, Kutner, Nachtsheim and Wasserman (1996),
Applied Linear Statistical Models.
Fourth Edition: Irwin, Chicago.
QA278.2 .A66 1996
Rawlings, J.O., Pantula, S. and Dickey, D.A. (1998),
Applied regression analysis : a research tool.
New York : Springer.
QA278.2 .R38 1998
Scheffe, H. (1959),
The analysis of variance.
New York : Wiley.
QA276 .S34
Weisberg, S. (1985),
Applied linear regression.
New York : Wiley.
QA278.2 .W44 1985