The following data series accompany the draft manuscript LINEAR REGRESSION , by R.L. Smith and K.D.S. Young.
lon52.d Pollution-mortality data from London, 1952.
lon57.d Pollution-mortality data from London, 1957; Table 1.1.
lon5872.d Annual pollution and mortality averages from London, 1958--72; Table 1.2.
lon59.s Pollution-mortality data from London, 1959; Table 1.4. (S-PLUS format; for SAS applications, remove header in line 1).
phily.s Pollution-mortality data from Philadelphia, 1974-1988
f1-3.i S-PLUS source code for Figure 1.3.
f1-8.i S-PLUS source code for Figure 1.8.
f1-9.i S-PLUS source code for Figure 1.9.
amherst.dat Amherst data; Table 2.1.
amherst.s S-PLUS version of the above data file; Section 2.6.4.
Updated Amherst File: 1893-2021
Updated Mount Airy and Charleston File: 1937-2021
R code for simulated goodness of fit tests
amh2.sas SAS code for Amherst data set; Section 2.5.
amh2.lst Output of amh2.sas program.
amh2.i S-PLUS code for Amherst data set; Section 2.5.
amh3.i S-PLUS code for simulated goodness of fit tests; Section 2.6.
lfit.sas SAS code for the testing linearity procedure; Section 2.7.2.
lfit.d Data file for the above.
weight1.dat Weights (col. 1) and heights (col. 2) of 15 students; Table 2.11.
gesell.dat Age at first word (col. 2) and Gesell adaptive score (col. 3) for 21 children; Table 2.12.
olympic.dat Year (col. 1) and winning time in the 1500 meter race (col. 2) in the Olympic Games, 1896-2024; updated in 2024.
co.dat K/C atomic ratio (col. 2) and CO desorbed (col. 2) for 22 experiments of carbon monoxide desorption; Table 2.14.
forbes.dat Forbes' data: boiling point of water (col. 2) and pressure (col. 3); Table 2.15.
fiber1.dat Fiber lengths (col. 2) and strengths (col. 1); Table 2.16.
providen.dat Annual rainfall totals in Providence, Rhode Island, 1835-1997; Table 2.17.
weight2.dat Sex (0=male; 1=female; col. 2), age (col. 3), height (col. 4) and weight (col. 5) for 21 students; Table 2.18.
berkeley.dat Year (col. 1), mean temperature in Berkeley (col. 2) and mean temperature in Santa Barabara (col. 3); Table 2.19.
dmark.dat 1991 weekly prices in US dollars of the German mark (col. 2) and the British pound (col. 3); Table 2.20
marathon.dat Measurement of the Los Angeles Olympic marathon course: true distance of segment if known (col. 1; 0 indicates a missing value), counts of the bicycle wheel (col. 2); Table 2.21
tree.dat Minitab tree data; Table 3.3.
nukes.dat Nuclear power plant data; Table 3.4.
yields.dat Yields from six plots for each of three different amounts of fertilizer; Table 3.6.
weight3.dat Weight (col. 2), age (col. 3), height (col. 4) and sex (0=male, 1=female; col. 5) for 23 individuals; Table 3.7.
inflate.dat Inflation rate (col. 2) and unemployment rate (col. 3); Table 3.8.
fiber2.dat Fiber strength data of Table 3.10; variables X1-X4 are in columns 2-5.
protein.dat Data on protein content (col. 2) and six reflectance measurements (cols. 3-8); Table 3.11.
reflect.dat Prediction data set for previous example; Table 3.12.
prater.dat Prater's gasoline data set; Table 4.1.
prater1.dat Modified form of Prater's gasoline data suitable for question 4.12 (b,c).
iqscores.dat IQ scores; Table 5.12
ukrain.dat U.K. rainfall data; Table 5.17
nmmaps.s NMMAPS summary data table (Section 6.3) (S-PLUS format; omit first row for SAS analysis).
longley.d Longley's data (problem 15.6/5.7)
crime2.txt English crimes data (problem 5.8)
salinity.d Pamlico Sound salinity data (problem 5.9).
lint.dat Lindhurst data
rock2.dat Venables and Ripley rock permeability data (problem 5.10)
prog1 SPlus code to draw the residual plots for problem 4.12 (from my SPlus handout)
fig1.rsd Data file needed for the Splus code prog1
nukes.s Splus version of nuclear power data set, for import into a data frame (includes column headers)
sfns.i SPlus code for regression diagnostics (Section 4.6; download to your personal directory and type "source sfns.i" within SPlus, to make the functions available to you).
dnsim.i Example SPlus code to produce Atkinson's plots for simulated diagnostics (Section 4.6)
noncentral S-PLUS function to evaluate Pearson and Hartley charts for the noncentral F distribution (Section 3.7)
Data and Programs for Birmingham Particulate Matter Example:
reg00.out1
SAS handout distributed in class (used to illustrate
computational method for Birmingham PM data)
pmda3
Full data series.
pmda3new.txt
Shortened data series (omit first 215 observations where there are
not many PM10 values).
Full data series.
bir1.sas
Initial exploration of Birmingham data;
Examine number of knots for B-spline and effect of met variables;
Then select specific models using forward, stepwise, backward selection;
bir1.lst
Output of bir1.sas.
bir2.sas
Use met model with K=12 and variables sh0, sh2, shsq0, shsq2;
Try 12 different models for PM10;
Also use 5 daily values of PM10 and compute multicollinarity diagnostics;
bir2.lst
Output of bir2.sas.
bir3.sas
Like bir2, but use alternative met model;
bir3.lst
Output of bir3.sas.
bir4.sas
Add nonlinear PM effects to models in bir2 and bir3
bir4.lst
Output of bir4.sas.
bir5.sas
Use full met model plus all pm covariates;
Ridge and IPC regression;
bir5.lst
Output of bir5.sas.
bir6.sas
Use full met model plus all pm covariates;
Uses PROC PLS;
bir6.lst
Output of bir6.sas.
bir7.sas
Program for power calculations
bir7.lst
Output of bir7.sas.
Data and Programs for Florida Elections Example:
Link to Florida elections website
Link to U.S. Census Bureau website (includes Florida map)
fldat1.txt
Florida elections data set
fldat2.txt
Explanations of Florida elections data set
Data and Programs for RSREG and NLIN Examples
Michaelis-Menten example dataset
Michaelis-Menten example SAS code
Michaelis-Menten example output
Mile running times example SAS code
Mile running times example output
Chapter 8 Data Sets
SAS code for analysis of PEMA data
S-PLUS code for analysis of PEMA data
SAS code for analysis of alloy data
S-PLUS code for analysis of alloy data
SAS code for analysis of Fisher's barley data
S-PLUS code for analysis of Fisher's barley data
SAS code for first analysis of chlorophyll data
SAS code for second analysis of chlorophyll data
Return to Richard
Smith's page