STOR 556, SPRING 2019
ADVANCED METHODS OF DATA ANALYSIS
Instructor: Richard L. Smith


This course is a continuation of STOR 455 STATISTICAL METHODS I. The prerequisites are that course and STOR 435 INTRODUCTION TO PROBABILITY (MATH 535).

First class January 10: Tuesdays and Thursdays, 9:30-10:45 am, Hanes 120. If any students who plan to take the course are not yet registered, please contact the instructor immediately (rls at email.unc.edu).

The primary intention behind the course is to cover those parts of linear models and regression analysis that go beyond STOR 455, which is where those concepts are introduced. The flavor of the course will be very applied, with an emphasis on computing using the R statistical programming language.

Major topics to be covered:

(a) Generalized Linear Models applied to various specific kinds of data, such as binary data, count data, proportions, contingency tables and multinomial data. An introduction to logistic regression for binary data was already given in STOR 455: we shall start there and explore the very many generalizations of that concept.
(b) Random Effects Models. Many analysis of variance (ANOVA) models are more conventiently and more logically represented as models where the treatment and other effects being estimated are themselves random variables: this leads us into the whole work of random effects models and their extensions to such concepts as mixed effect models.
(c) Nonparametric Regression and Additive Models. For many statistical problems, a model that assumes the response is a simple linear function of a covariate is too simple to represent reality: we want responses that are nonlinear and in some cases nonparametric functions of the covariate(s). Nonparametric estimators include concepts such as kernel estimators, splines and wavelets; when these are combined using several covariates as predictors, we enter the world of Generalized Additive Models.
(d) Trees and Neural Networks. If time permits: more advanced techniques for representing data in very general settings.

For each of these techniques, packages are available within the R statistical language. R is a freely downloadable computer language that will be assumed familiar to most students from prior use in STOR 455. The course will not aim to teach R; however, there will be extensive examples showing how R is used for each of the statistical problems discussed. The main purpose of the course is to teach students how to use R for each of the statistical techniques covered, and then to use R to answer common statistical questions, such as confidence intervals, hypothesis tests and predictions. Theoretical derivations will be kept to a minimum, but facility with college-level algebra and probability theory at the level of STOR 435 will be assumed.

Assessment will be in the form of regular homework assignments (mostly exercises in R); one midterm exam and one final exam. I intend to explore the possibility of making both the miterm and the final take-home exams so that they can focus on R as well; however, until announced otherwise, please assume that the midterm exam will be in class on Thursday, February 28, and the final will be a written exam at 8:00 am on Friday, May, 3, 2019, as announced in the University Registrar's schedule.

The required text is the following:
Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, Second Edition
by Julian J. Faraway, Published 2016, Chapman and Hall/CRC Press.

Return to Richard Smith's page