STOR 664 Data Page

DATA SERIES FOR LINEAR REGRESSION

The following data series accompany the draft manuscript LINEAR REGRESSION , by R.L. Smith and K.D.S. Young. Last Update February 16, 2026.

lon52.d Pollution-mortality data from London, 1952.

lon57.d Pollution-mortality data from London, 1957; Table 1.1.

lon5872.d Annual pollution and mortality averages from London, 1958--72; Table 1.2.

lon59.s Pollution-mortality data from London, 1959; Table 1.4. (S-PLUS format; for SAS applications, remove header in line 1).

phily.s Pollution-mortality data from Philadelphia, 1974-1988

f1-3.i S-PLUS source code for Figure 1.3.

f1-8.i S-PLUS source code for Figure 1.8.

f1-9.i S-PLUS source code for Figure 1.9.

amherst.dat Amherst data; Table 2.1.

amherst.s S-PLUS version of the above data file; Section 2.6.4.

Updated Amherst File: 1893-2021

charles.dat

Updated Mount Airy and Charleston File: 1937-2021

R code for simulated goodness of fit tests

amh2.sas SAS code for Amherst data set; Section 2.5.

amh2.lst Output of amh2.sas program.

amh2.i S-PLUS code for Amherst data set; Section 2.5.

amh3.i S-PLUS code for simulated goodness of fit tests; Section 2.6.

lfit.sas SAS code for the testing linearity procedure; Section 2.7.2.

lfit.d Data file for the above.

weight1.dat Weights (col. 1) and heights (col. 2) of 15 students; Table 2.11.

gesell.dat Age at first word (col. 2) and Gesell adaptive score (col. 3) for 21 children; Table 2.12.

olympic.dat Year (col. 1) and winning time in the 1500 meter race (col. 2) in the Olympic Games, 1896-2024; updated in 2024.

co.dat K/C atomic ratio (col. 2) and CO desorbed (col. 2) for 22 experiments of carbon monoxide desorption; Table 2.14.

forbes.dat Forbes' data: boiling point of water (col. 2) and pressure (col. 3); Table 2.15. csv format

fiber1.dat Fiber lengths (col. 2) and strengths (col. 1); Table 2.16.

providen.dat Annual rainfall totals in Providence, Rhode Island, 1835-1997; Table 2.17.

weight2.dat Sex (0=male; 1=female; col. 2), age (col. 3), height (col. 4) and weight (col. 5) for 21 students; Table 2.18.

berkeley.dat Year (col. 1), mean temperature in Berkeley (col. 2) and mean temperature in Santa Barabara (col. 3); Table 2.19.

dmark.dat 1991 weekly prices in US dollars of the German mark (col. 2) and the British pound (col. 3); Table 2.20

marathon.dat Measurement of the Los Angeles Olympic marathon course: true distance of segment if known (col. 1; 0 indicates a missing value), counts of the bicycle wheel (col. 2); Table 2.21 csv format

tree.dat Minitab tree data; Table 3.3.

nukes.dat Nuclear power plant data; Table 3.4.

yields.dat Yields from six plots for each of three different amounts of fertilizer; Table 3.6.

weight3.dat Weight (col. 2), age (col. 3), height (col. 4) and sex (0=male, 1=female; col. 5) for 23 individuals; Table 3.7.

inflate.dat Inflation rate (col. 2) and unemployment rate (col. 3); Table 3.8.

fiber2.dat Fiber strength data of Table 3.10; variables X1-X4 are in columns 2-5.

protein.dat Data on protein content (col. 2) and six reflectance measurements (cols. 3-8); Table 3.11.

reflect.dat Prediction data set for previous example; Table 3.12.

prater.dat Prater's gasoline data set; Table 4.1.

prater1.dat Modified form of Prater's gasoline data suitable for question 4.12 (b,c).

iqscores.dat IQ scores; Table 5.12

ukrain.dat U.K. rainfall data; Table 5.17

nmmaps.s NMMAPS summary data table (Section 6.3) (S-PLUS format; omit first row for SAS analysis).

longley.d Longley's data (problem 15.6/5.7)

crime2.txt English crimes data (problem 5.8)

salinity.d Pamlico Sound salinity data (problem 5.9).

lint.dat Lindhurst data

rock2.dat Venables and Ripley rock permeability data (problem 5.10)

prog1 SPlus code to draw the residual plots for problem 4.12 (from my SPlus handout)

fig1.rsd Data file needed for the Splus code prog1

nukes.s Splus version of nuclear power data set, for import into a data frame (includes column headers)

sfns.i SPlus code for regression diagnostics (Section 4.6; download to your personal directory and type "source sfns.i" within SPlus, to make the functions available to you).

dnsim.i Example SPlus code to produce Atkinson's plots for simulated diagnostics (Section 4.6)

noncentral S-PLUS function to evaluate Pearson and Hartley charts for the noncentral F distribution (Section 3.7)

Data and Programs for Birmingham Particulate Matter Example:

reg00.out1 SAS handout distributed in class (used to illustrate computational method for Birmingham PM data)

pmda3 Full data series.

pmda3new.txt Shortened data series (omit first 215 observations where there are not many PM10 values). Full data series.

bir1.sas Initial exploration of Birmingham data; Examine number of knots for B-spline and effect of met variables; Then select specific models using forward, stepwise, backward selection;

bir1.lst Output of bir1.sas.

bir2.sas Use met model with K=12 and variables sh0, sh2, shsq0, shsq2; Try 12 different models for PM10; Also use 5 daily values of PM10 and compute multicollinarity diagnostics;

bir2.lst Output of bir2.sas.

bir3.sas Like bir2, but use alternative met model;

bir3.lst Output of bir3.sas.

bir4.sas Add nonlinear PM effects to models in bir2 and bir3

bir4.lst Output of bir4.sas.

bir5.sas Use full met model plus all pm covariates; Ridge and IPC regression;

bir5.lst Output of bir5.sas.

bir6.sas Use full met model plus all pm covariates; Uses PROC PLS;

bir6.lst Output of bir6.sas.

bir7.sas Program for power calculations

bir7.lst Output of bir7.sas.

Data and Programs for Florida Elections Example: