|
An Introduction to Categorical Data Analysis |
3 |
|
|
Contents |
7 |
|
|
Preface |
11 |
|
|
About the Companion Website |
15 |
|
|
1 Introduction |
17 |
|
|
1.1 CATEGORICAL RESPONSE DATA |
17 |
|
|
1.1.1 Response Variable and Explanatory Variables |
18 |
|
|
1.1.2 Binary–Nominal–Ordinal Scale Distinction |
18 |
|
|
1.1.3 Organization of this Book |
19 |
|
|
1.2 PROBABILITY DISTRIBUTIONS FOR CATEGORICAL DATA |
19 |
|
|
1.2.1 Binomial Distribution |
19 |
|
|
1.2.2 Multinomial Distribution |
21 |
|
|
1.3 STATISTICAL INFERENCE FOR A PROPORTION |
21 |
|
|
1.3.1 Likelihood Function and Maximum Likelihood Estimation |
21 |
|
|
1.3.2 Significance Test About a Binomial Parameter |
23 |
|
|
1.3.3 Example: Surveyed Opinions About Legalized Abortion |
23 |
|
|
1.3.4 Confidence Intervals for a Binomial Parameter |
24 |
|
|
1.3.5 Better Confidence Intervals for a Binomial Proportion |
24 |
|
|
1.4 STATISTICAL INFERENCE FOR DISCRETE DATA |
26 |
|
|
1.4.1 Wald, Likelihood-Ratio, and Score Tests |
26 |
|
|
1.4.2 Example: Wald, Score, and Likelihood-Ratio Binomial Tests |
27 |
|
|
1.4.3 Small-Sample Binomial Inference and the Mid P-Value |
28 |
|
|
1.5 BAYESIAN INFERENCE FOR PROPORTIONS |
29 |
|
|
1.5.1 The Bayesian Approach to Statistical Inference |
30 |
|
|
1.5.2 Bayesian Binomial Inference: Beta Prior Distributions |
31 |
|
|
1.5.3 Example: Opinions about Legalized Abortion, Revisited |
32 |
|
|
1.5.4 Other Prior Distributions |
32 |
|
|
1.6 USING R SOFTWARE FOR STATISTICAL INFERENCE ABOUT PROPORTIONS |
33 |
|
|
1.6.1 Reading Data Files and Installing Packages |
33 |
|
|
1.6.2 Using R for Statistical Inference about Proportions |
34 |
|
|
1.6.3 Summary: Choosing an Inference Method |
36 |
|
|
Exercises |
37 |
|
|
2 Analyzing Contingency Tables |
41 |
|
|
2.1 PROBABILITY STRUCTURE FOR CONTINGENCY TABLES |
42 |
|
|
2.1.1 Joint, Marginal, and Conditional Probabilities |
42 |
|
|
2.1.2 Example: Sensitivity and Specificity |
42 |
|
|
2.1.3 Statistical Independence of Two Categorical Variables |
44 |
|
|
2.1.4 Binomial and Multinomial Sampling |
44 |
|
|
2.2 COMPARING PROPORTIONS IN 2×2 CONTINGENCY TABLES |
45 |
|
|
2.2.1 Difference of Proportions |
45 |
|
|
2.2.2 Example: Aspirin and Incidence of Heart Attacks |
46 |
|
|
2.2.3 Ratio of Proportions (Relative Risk) |
46 |
|
|
2.2.4 Using R for Comparing Proportions in 2×2 Tables |
47 |
|
|
2.3 THE ODDS RATIO |
47 |
|
|
2.3.1 Properties of the Odds Ratio |
48 |
|
|
2.3.2 Example: Odds Ratio for Aspirin Use and Heart Attacks |
49 |
|
|
2.3.3 Inference for Odds Ratios and Log Odds Ratios |
49 |
|
|
2.3.4 Relationship Between Odds Ratio and Relative Risk |
50 |
|
|
2.3.5 Example: The Odds Ratio Applies in Case-Control Studies |
51 |
|
|
2.3.6 Types of Studies: Observational Versus Experimental |
52 |
|
|
2.4 CHI-SQUARED TESTS OF INDEPENDENCE |
52 |
|
|
2.4.1 Pearson Statistic and the Chi-Squared Distribution |
53 |
|
|
2.4.2 Likelihood-Ratio Statistic |
54 |
|
|
2.4.3 Testing Independence in Two-Way Contingency Tables |
54 |
|
|
2.4.4 Example: Gender Gap in Political Party Affiliation |
55 |
|
|
2.4.5 Residuals for Cells in a Contingency Table |
55 |
|
|
2.4.6 Partitioning Chi-Squared Statistics |
57 |
|
|
2.4.7 Limitations of Chi-Squared Tests |
58 |
|
|
2.5 TESTING INDEPENDENCE FOR ORDINAL VARIABLES |
58 |
|
|
2.5.1 Linear Trend Alternative to Independence |
59 |
|
|
2.5.2 Example: Alcohol Use and Infant Malformation |
59 |
|
|
2.5.3 Ordinal Tests Usually Have Greater Power |
61 |
|
|
2.5.4 Choice of Scores |
61 |
|
|
2.5.5 Trend Tests for r×2 and 2×c and Nominal–Ordinal Tables |
62 |
|
|
2.6 EXACT FREQUENTIST AND BAYESIAN INFERENCE |
62 |
|
|
2.6.1 Fisher’s Exact Test for 2×2 Tables |
62 |
|
|
2.6.2 Example: Fisher’s Tea Tasting Colleague |
63 |
|
|
2.6.3 Conservatism for Actual (Type I Error) |
65 |
|
|
2.6.4 Small-Sample Confidence Intervals for Odds Ratio |
66 |
|
|
2.6.5 Bayesian Estimation for Association Measures |
66 |
|
|
2.6.6 Example: Bayesian Inference in a Small Clinical Trial |
67 |
|
|
2.7 ASSOCIATION IN THREE-WAY TABLES |
68 |
|
|
2.7.1 Partial Tables |
69 |
|
|
2.7.2 Example: Death Penalty Verdicts and Race |
69 |
|
|
2.7.3 Simpson’s Paradox |
70 |
|
|
2.7.4 Conditional and Marginal Odds Ratios |
71 |
|
|
2.7.5 Homogeneous Association |
72 |
|
|
Exercises |
72 |
|
|
3 Generalized Linear Models |
81 |
|
|
3.1 COMPONENTS OF A GENERALIZED LINEAR MODEL |
82 |
|
|
3.1.1 Random Component |
82 |
|
|
3.1.2 Linear Predictor |
82 |
|
|
3.1.3 Link Function |
82 |
|
|
3.1.4 Ordinary Linear Model: GLM with Normal Random Component |
83 |
|
|
GENERALIZED LINEAR MODELS FOR BINARY DATA |
84 |
|
|
3.2.1 Linear Probability Model |
84 |
|
|
3.2.2 Logistic Regression Model |
84 |
|
|
3.2.3 Example: Snoring and Heart Disease |
85 |
|
|
3.2.4 Using R to Fit Generalized Linear Models for Binary Data |
87 |
|
|
3.2.5 Data Files: Ungrouped or Grouped Binary Data |
88 |
|
|
3.3 GENERALIZED LINEAR MODELS FOR COUNTS AND RATES |
88 |
|
|
3.3.1 Poisson Distribution for Counts |
88 |
|
|
3.3.2 Poisson Loglinear Model |
89 |
|
|
3.3.3 Example: Female Horseshoe Crabs and their Satellites |
89 |
|
|
3.3.4 Overdispersion: Greater Variability than Expected |
91 |
|
|
3.4 STATISTICAL INFERENCE AND MODEL CHECKING |
92 |
|
|
3.4.1 Wald, Likelihood-Ratio, and Score Inference Use the Likelihood Function |
93 |
|
|
3.4.2 Example: Political Ideology and Belief in Evolution |
94 |
|
|
3.4.3 The Deviance of a GLM |
96 |
|
|
3.4.4 Model Comparison Using the Deviance |
96 |
|
|
3.4.5 Residuals Comparing Observations to the Model Fit |
97 |
|
|
3.5 FITTING GENERALIZED LINEAR MODELS |
98 |
|
|
3.5.1 The Fisher Scoring Algorithm Fits GLMs |
98 |
|
|
3.5.2 Bayesian Methods for Generalized Linear Models |
99 |
|
|
3.5.3 GLMs: A Unified Approach to Statistical Analysis |
99 |
|
|
Exercises |
100 |
|
|
4 Logistic Regression |
105 |
|
|
4.1 THE LOGISTIC REGRESSION MODEL |
105 |
|
|
4.1.1 The Logistic Regression Model |
106 |
|
|
4.1.2 Odds Ratio and Linear Approximation Interpretations |
106 |
|
|
4.1.3 Example: Whether a Female Horseshoe Crab Has Satellites |
107 |
|
|
4.1.4 Logistic Regression with Retrospective Studies |
109 |
|
|
4.1.5 Normally Distributed X Implies Logistic Regression for Y |
110 |
|
|
4.2 STATISTICAL INFERENCE FOR LOGISTIC REGRESSION |
110 |
|
|
4.2.1 Confidence Intervals for Effects |
110 |
|
|
4.2.2 Significance Testing |
111 |
|
|
4.2.3 Fitted Values and Confidence Intervals for Probabilities |
112 |
|
|
4.2.4 Why Use a Model to Estimate Probabilities? |
113 |
|
|
4.3 LOGISTIC REGRESSION WITH CATEGORICAL PREDICTORS |
114 |
|
|
4.3.1 Indicator Variables Represent Categories of Predictors |
114 |
|
|
4.3.2 Example: Survey about Marijuana Use |
115 |
|
|
4.3.3 ANOVA-Type Model Representation of Factors |
116 |
|
|
4.3.4 Tests of Conditional Independence and of Homogeneity for Three-Way Contingency Tables |
117 |
|
|
4.4 MULTIPLE LOGISTIC REGRESSION |
118 |
|
|
4.4.1 Example: Horseshoe Crabs with Color and Width Predictors |
118 |
|
|
4.4.2 Model Comparison to Check Whether a Term is Needed |
120 |
|
|
4.4.3 Example: Treating Color as Quantitative or Binary |
120 |
|
|
4.4.4 Allowing Interaction between Explanatory Variables |
122 |
|
|
4.4.5 Effects Depend on Other Explanatory Variables in Model |
123 |
|
|
4.5 SUMMARIZING EFFECTS IN LOGISTIC REGRESSION |
123 |
|
|
4.5.1 Probability-Based Interpretations |
123 |
|
|
4.5.2 Marginal Effects and Their Average |
124 |
|
|
4.5.3 Standardized Interpretations |
125 |
|
|
4.6 SUMMARIZING PREDICTIVE POWER: CLASSIFICATION TABLES, ROC CURVES, AND MULTIPLE CORRELATION |
126 |
|
|
4.6.1 Summarizing Predictive Power: Classification Tables |
126 |
|
|
4.6.2 Summarizing Predictive Power: ROC Curves |
127 |
|
|
4.6.3 Summarizing Predictive Power: Multiple Correlation |
128 |
|
|
EXERCISES |
129 |
|
|
5 Building and Applying Logistic Regression Models |
139 |
|
|
5.1 STRATEGIES IN MODEL SELECTION |
139 |
|
|
5.1.1 How Many Explanatory Variables Can the Model Handle? |
140 |
|
|
5.1.2 Example: Horseshoe Crab Satellites Revisited |
140 |
|
|
5.1.3 Stepwise Variable Selection Algorithms |
141 |
|
|
5.1.4 Purposeful Selection of Explanatory Variables |
142 |
|
|
5.1.5 Example: Variable Selection for Horseshoe Crabs |
143 |
|
|
5.1.6 AIC and the Bias/Variance Tradeoff |
144 |
|
|
5.2 MODEL CHECKING |
146 |
|
|
5.2.1 Goodness of Fit: Model Comparison Using the Deviance |
146 |
|
|
5.2.2 Example: Goodness of Fit for Marijuana Use Survey |
147 |
|
|
5.2.3 Goodness of Fit: Grouped versus Ungrouped Data and Continuous Predictors |
147 |
|
|
5.2.4 Residuals for Logistic Models with Categorical Predictors |
148 |
|
|
5.2.5 Example: Graduate Admissions at University of Florida |
148 |
|
|
5.2.6 Standardized versus Pearson and Deviance Residuals |
150 |
|
|
5.2.7 Influence Diagnostics for Logistic Regression |
150 |
|
|
5.2.8 Example: Heart Disease and Blood Pressure |
151 |
|
|
5.3 INFINITE ESTIMATES IN LOGISTIC REGRESSION |
152 |
|
|
5.3.1 Complete and Quasi-Complete Separation: Perfect Discrimination |
152 |
|
|
5.3.2 Example: Infinite Estimate for Toy Example |
153 |
|
|
5.3.3 Sparse Data and Infinite Effects with Categorical Predictors |
154 |
|
|
5.3.4 Example: Risk Factors for Endometrial Cancer Grade |
155 |
|
|
5.4 BAYESIAN INFERENCE, PENALIZED LIKELIHOOD, AND CONDITIONAL LIKELIHOOD FOR LOGISTIC REGRESSION |
156 |
|
|
5.4.1 Bayesian Modeling: Specification of Prior Distributions |
157 |
|
|
5.4.2 Example: Risk Factors for Endometrial Cancer Revisited |
157 |
|
|
5.4.3 Penalized Likelihood Reduces Bias in Logistic Regression |
159 |
|
|
5.4.4 Example: Risk Factors for Endometrial Cancer Revisited |
160 |
|
|
5.4.5 Conditional Likelihood and Conditional Logistic Regression |
160 |
|
|
5.4.6 Conditional Logistic Regression and Exact Tests for Contingency Tables |
161 |
|
|
5.5 ALTERNATIVE LINK FUNCTIONS: LINEAR PROBABILITY AND PROBIT MODELS |
161 |
|
|
5.5.1 Linear Probability Model |
162 |
|
|
5.5.2 Example: Political Ideology and Belief in Evolution |
162 |
|
|
5.5.3 Probit Model and Normal Latent Variable Model |
163 |
|
|
5.5.4 Example: Snoring and Heart Disease Revisited |
164 |
|
|
5.5.5 Latent Variable Models Imply Binary Regression Models |
165 |
|
|
5.5.6 CDFs and Shapes of Curves for Binary Regression Models |
165 |
|
|
5.6 SAMPLE SIZE AND POWER FOR LOGISTIC REGRESSION |
166 |
|
|
5.6.1 Sample Size for Comparing Two Proportions |
166 |
|
|
5.6.2 Sample Size in Logistic Regression Modeling |
166 |
|
|
5.6.3 Example: Modeling the Probability of Heart Disease |
167 |
|
|
Exercises |
167 |
|
|
6 Multicategory Logit Models |
175 |
|
|
6.1 BASELINE-CATEGORY LOGIT MODELS FOR NOMINAL RESPONSES |
175 |
|
|
6.1.1 Baseline-Category Logits |
176 |
|
|
6.1.2 Example: What Do Alligators Eat? |
176 |
|
|
6.1.3 Estimating Response Probabilities |
179 |
|
|
6.1.4 Checking Multinomial Model Goodness of Fit |
180 |
|
|
6.1.5 Example: Belief in Afterlife |
180 |
|
|
6.1.6 Discrete Choice Models |
182 |
|
|
6.1.7 Example: Shopping Destination Choice |
183 |
|
|
6.2 CUMULATIVE LOGIT MODELS FOR ORDINAL RESPONSES |
183 |
|
|
6.2.1 Cumulative Logit Models with Proportional Odds |
184 |
|
|
6.2.2 Example: Political Ideology and Political Party Affiliation |
185 |
|
|
6.2.3 Inference about Cumulative Logit Model Parameters |
187 |
|
|
6.2.4 Increased Power for Ordinal Analyses |
188 |
|
|
6.2.5 Example: Happiness and Family Income |
188 |
|
|
6.2.6 Latent Variable Linear Models Imply Cumulative Link Models |
190 |
|
|
6.2.7 Invariance to Choice of Response Categories |
191 |
|
|
6.3 CUMULATIVE LINK MODELS: MODEL CHECKING AND EXTENSIONS |
192 |
|
|
6.3.1 Checking Ordinal Model Goodness of Fit |
192 |
|
|
6.3.2 Cumulative Logit Model without Proportional Odds |
192 |
|
|
6.3.3 Simpler Interpretations Use Probabilities |
194 |
|
|
6.3.4 Example: Modeling Mental Impairment |
194 |
|
|
6.3.5 A Latent Variable Probability Comparison of Groups |
196 |
|
|
6.3.6 Cumulative Probit Model |
197 |
|
|
6.3.7 R2 Based on the Latent Variable Model |
198 |
|
|
6.3.8 Bayesian Inference for Multinomial Models |
199 |
|
|
6.3.9 Example: Modeling Mental Impairment Revisited |
199 |
|
|
6.4 PAIRED-CATEGORY LOGIT MODELING OF ORDINAL RESPONSES |
200 |
|
|
6.4.1 Adjacent-Categories Logits |
200 |
|
|
6.4.2 Example: Political Ideology Revisited |
201 |
|
|
6.4.3 Sequential Logits |
202 |
|
|
6.4.4 Example: Tonsil Size and Streptococcus |
202 |
|
|
Exercises |
203 |
|
|
7 Loglinear Models for Contingency Tables and Counts |
209 |
|
|
7.1 LOGLINEAR MODELS FOR COUNTS IN CONTINGENCY TABLES |
210 |
|
|
7.1.1 Loglinear Model of Independence for Two-Way Contingency Tables |
210 |
|
|
7.1.2 Interpretation of Parameters in the Independence Model |
210 |
|
|
7.1.3 Example: Happiness and Belief in Heaven |
211 |
|
|
7.1.4 Saturated Model for Two-Way Contingency Tables |
212 |
|
|
7.1.5 Loglinear Models for Three-Way Contingency Tables |
213 |
|
|
7.1.6 Two-Factor Parameters Describe Conditional Associations |
213 |
|
|
7.1.7 Example: Student Alcohol, Cigarette, and Marijuana Use |
214 |
|
|
7.2 STATISTICAL INFERENCE FOR LOGLINEAR MODELS |
216 |
|
|
7.2.1 Chi-Squared Goodness-of-Fit Tests |
216 |
|
|
7.2.2 Cell Standardized Residuals for Loglinear Models |
217 |
|
|
7.2.3 Significance Tests about Conditional Associations |
217 |
|
|
7.2.4 Confidence Intervals for Conditional Odds Ratios |
218 |
|
|
7.2.5 Bayesian Fitting of Loglinear Models |
219 |
|
|
7.2.6 Loglinear Models for Higher-Dimensional Contingency Tables |
219 |
|
|
7.2.7 Example: Automobile Accidents and Seat Belts |
220 |
|
|
7.2.8 Interpreting Three-Factor Interaction Terms |
221 |
|
|
7.2.9 Statistical Versus Practical Significance: Dissimilarity Index |
222 |
|
|
7.3 THE LOGLINEAR – LOGISTIC MODEL CONNECTION |
223 |
|
|
7.3.1 Using Logistic Models to Interpret Loglinear Models |
223 |
|
|
7.3.2 Example: Auto Accident Data Revisited |
224 |
|
|
7.3.3 Condition for Equivalent Loglinear and Logistic Models |
225 |
|
|
7.3.4 Loglinear/Logistic Model Selection Issues |
225 |
|
|
7.4 INDEPENDENCE GRAPHS AND COLLAPSIBILITY |
226 |
|
|
7.4.1 Independence Graphs |
226 |
|
|
7.4.2 Collapsibility Conditions for Contingency Tables |
227 |
|
|
7.4.3 Example: Loglinear Model Building for Student Substance Use |
228 |
|
|
7.4.4 Collapsibility and Logistic Models |
229 |
|
|
7.5 MODELING ORDINAL ASSOCIATIONS IN CONTINGENCY TABLES |
230 |
|
|
7.5.1 Linear-by-Linear Association Model |
231 |
|
|
7.5.2 Example: Linear-by-Linear Association for Sex Opinions |
232 |
|
|
7.5.3 Ordinal Significance Tests of Independence |
232 |
|
|
7.6 LOGLINEAR MODELING OF COUNT RESPONSE VARIABLES |
233 |
|
|
7.6.1 Count Regression Modeling of Rate Data |
233 |
|
|
7.6.2 Example: Death Rates for Lung Cancer Patients |
234 |
|
|
7.6.3 Negative Binomial Regression Models |
236 |
|
|
7.6.4 Example: Female Horseshoe Crab Satellites Revisited |
236 |
|
|
Exercises |
237 |
|
|
8 Models for Matched Pairs |
243 |
|
|
8.1 COMPARING DEPENDENT PROPORTIONS FOR BINARY MATCHED PAIRS |
244 |
|
|
8.1.1 McNemar Test Comparing Marginal Proportions |
244 |
|
|
8.1.2 Estimating the Difference between Dependent Proportions |
246 |
|
|
8.2 MARGINAL MODELS AND SUBJECT-SPECIFIC MODELS FOR MATCHED PAIRS |
246 |
|
|
8.2.1 Marginal Models for Marginal Proportions |
246 |
|
|
8.2.2 Example: Environmental Opinions Revisited |
247 |
|
|
8.2.3 Subject-Specific and Population-Averaged Tables |
248 |
|
|
8.2.4 Conditional Logistic Regression for Matched-Pairs |
249 |
|
|
8.2.5 Logistic Regression for Matched Case-Control Studies |
250 |
|
|
8.3 COMPARING PROPORTIONS FOR NOMINAL MATCHED-PAIRS RESPONSES |
251 |
|
|
8.3.1 Marginal Homogeneity for Baseline-Category Logit Models |
251 |
|
|
8.3.2 Example: Coffee Brand Market Share |
251 |
|
|
8.3.3 Using the Cochran–Mantel–Haenszel Test to Test Marginal Homogeneity |
253 |
|
|
8.3.4 Symmetry and Quasi-Symmetry Models for Square Contingency Tables |
253 |
|
|
8.3.5 Example: Coffee Brand Market Share Revisited |
254 |
|
|
8.4 COMPARING PROPORTIONS FOR ORDINAL MATCHED-PAIRS RESPONSES |
255 |
|
|
8.4.1 Marginal Homogeneity and Cumulative Logit Marginal Model |
256 |
|
|
8.4.2 Example: Recycle or Drive Less to Help the Environment? |
256 |
|
|
8.4.3 An Ordinal Quasi-Symmetry Model |
257 |
|
|
8.4.4 Example: Recycle or Drive Less Revisited? |
258 |
|
|
8.5 ANALYZING RATER AGREEMENT |
259 |
|
|
8.5.1 Example: Agreement on Carcinoma Diagnosis |
259 |
|
|
8.5.2 Cell Residuals for Independence Model |
259 |
|
|
8.5.3 Quasi-Independence Model |
260 |
|
|
8.5.4 Quasi Independence and Odds Ratios Summarizing Agreement |
261 |
|
|
8.5.5 Kappa Summary Measure of Agreement |
262 |
|
|
8.6 BRADLEY–TERRY MODEL FOR PAIRED PREFERENCES |
263 |
|
|
8.6.1 The Bradley–Terry Model and Quasi-Symmetry |
263 |
|
|
8.6.2 Example: Ranking Men Tennis Players |
263 |
|
|
Exercises |
265 |
|
|
9 Marginal Modeling of Correlated, Clustered Responses |
269 |
|
|
9.1 MARGINAL MODELS VERSUS SUBJECT-SPECIFIC MODELS |
270 |
|
|
9.1.1 Marginal Models for a Clustered Binary Response |
270 |
|
|
9.1.2 Example: Repeated Responses on Similar Survey Questions |
270 |
|
|
9.1.3 Subject-Specific Models for a Repeated Response |
271 |
|
|
9.2 MARGINAL MODELING: THE GENERALIZED ESTIMATING EQUATIONS (GEE) APPROACH |
271 |
|
|
9.2.1 Quasi-Likelihood Methods |
271 |
|
|
9.2.2 Generalized Estimating Equation Methodology: Basic Ideas |
272 |
|
|
9.2.3 Example: Opinion about Legalized Abortion Revisited |
273 |
|
|
9.2.4 Limitations of GEE Compared to ML |
275 |
|
|
9.3 MARGINAL MODELING FOR CLUSTERED MULTINOMIAL RESPONSES |
276 |
|
|
9.3.1 Example: Insomnia Study |
276 |
|
|
9.3.2 Alternative GEE Specification of Working Association |
278 |
|
|
9.4 TRANSITIONAL MODELING, GIVEN THE PAST |
279 |
|
|
9.4.1 Transitional Models with Explanatory Variables |
279 |
|
|
9.4.2 Example: Respiratory Illness and Maternal Smoking |
279 |
|
|
9.4.3 Group Comparisons Treating Initial Response as a Covariate |
281 |
|
|
9.5 DEALING WITH MISSING DATA |
282 |
|
|
9.5.1 Missing at Random: Impact on ML and GEE Methods |
282 |
|
|
9.5.2 Multiple Imputation: Monte Carlo Prediction of Missing Data |
283 |
|
|
Exercises |
284 |
|
|
10 Random Effects: Generalized Linear Mixed Models |
289 |
|
|
10.1 RANDOM EFFECTS MODELING OF CLUSTERED CATEGORICAL DATA |
289 |
|
|
10.1.1 The Generalized Linear Mixed Model (GLMM) |
290 |
|
|
10.1.2 A Logistic GLMM for Binary Matched Pairs |
290 |
|
|
10.1.3 Example: Environmental Opinions Revisited |
291 |
|
|
10.1.4 Differing Effects in GLMMs and Marginal Models |
292 |
|
|
10.1.5 Model Fitting for GLMMs |
293 |
|
|
10.1.6 Inference for Model Parameters and Prediction |
294 |
|
|
10.2 EXAMPLES: RANDOM EFFECTS MODELS FOR BINARY DATA |
294 |
|
|
10.2.1 Small-Area Estimation of Binomial Probabilities |
294 |
|
|
10.2.2 Example: Estimating Basketball Free Throw Success |
295 |
|
|
10.2.3 Example: Opinions about Legalized Abortion Revisited |
297 |
|
|
10.2.4 Item Response Models: The Rasch Model |
299 |
|
|
10.2.5 Choice of Marginal Model or Random Effects Model |
299 |
|
|
10.3 EXTENSIONS TO MULTINOMIAL RESPONSES AND MULTIPLE RANDOM EFFECT TERMS |
300 |
|
|
10.3.1 Example: Insomnia Study Revisited |
300 |
|
|
10.3.2 Meta-Analysis: Bivariate Random Effects for Association Heterogeneity |
301 |
|
|
10.4 MULTILEVEL (HIERARCHICAL) MODELS |
304 |
|
|
10.4.1 Example: Two-Level Model for Student Performance |
304 |
|
|
10.4.2 Example: Smoking Prevention and Cessation Study |
305 |
|
|
10.5 LATENT CLASS MODELS |
307 |
|
|
10.5.1 Independence Given a Latent Categorical Variable |
307 |
|
|
10.5.2 Example: Latent Class Model for Rater Agreement |
308 |
|
|
Exercises |
311 |
|
|
11 Classification and Smoothing |
315 |
|
|
11.1 CLASSIFICATION: LINEAR DISCRIMINANT ANALYSIS |
316 |
|
|
11.1.1 Classification with Fisher’s Linear Discriminant Function |
316 |
|
|
11.1.2 Example: Horseshoe Crab Satellites Revisited |
317 |
|
|
11.1.3 Discriminant Analysis Versus Logistic Regression |
318 |
|
|
11.2 CLASSIFICATION: TREE-BASED PREDICTION |
318 |
|
|
11.2.1 Classification Trees |
318 |
|
|
11.2.2 Example: A Classification Tree for Horseshoe Crab Mating |
319 |
|
|
11.2.3 How Does the Classification Tree Grow? |
320 |
|
|
11.2.4 Pruning a Tree and Checking Prediction Accuracy |
320 |
|
|
11.2.5 Classification Trees Versus Logistic Regression and Discriminant Analysis |
321 |
|
|
11.3 CLUSTER ANALYSIS FOR CATEGORICAL RESPONSES |
322 |
|
|
11.3.1 Measuring Dissimilarity Between Observations |
322 |
|
|
11.3.2 Hierarchical Clustering Algorithm and Dendrograms |
323 |
|
|
11.3.3 Example: Clustering States on Presidential Elections |
324 |
|
|
11.4 SMOOTHING: GENERALIZED ADDITIVE MODELS |
326 |
|
|
11.4.1 Generalized Additive Models |
326 |
|
|
11.4.2 Example: GAMs for Horseshoe Crab Data |
327 |
|
|
11.4.3 How Much Smoothing? The Bias/Variance Tradeoff |
327 |
|
|
11.4.4 Example: Smoothing to Portray Probability of Kyphosis |
328 |
|
|
11.5 REGULARIZATION FOR HIGH-DIMENSIONAL CATEGORICAL DATA (LARGE p) |
329 |
|
|
11.5.1 Penalized-Likelihood Methods and Lq-Norm Smoothing |
330 |
|
|
11.5.2 Implementing the Lasso |
331 |
|
|
11.5.3 Example: Predicting Opinion on Abortion with Student Survey |
331 |
|
|
11.5.4 Why Shrink ML Estimates Toward 0? |
334 |
|
|
11.5.5 Issues in Variable Selection (Dimension Reduction) |
334 |
|
|
11.5.6 Controlling the False Discovery Rate |
335 |
|
|
11.5.7 Large p also Makes Bayesian Inference Challenging |
337 |
|
|
Exercises |
337 |
|
|
12 A Historical Tour of Categorical Data Analysis |
341 |
|
|
The Pearson–Yule Association Controversy |
341 |
|
|
R.A. Fisher’s Contributions |
342 |
|
|
Logistic Regression |
343 |
|
|
Multiway Contingency Tables and Loglinear Models |
344 |
|
|
Final Comments |
345 |
|
|
Appendix: Software for Categorical Data Analysis |
347 |
|
|
A.1 R FOR CATEGORICAL DATA ANALYSIS |
347 |
|
|
A.2 SAS FOR CATEGORICAL DATA ANALYSIS |
348 |
|
|
Chapters 1–2: Introduction and Contingency Tables |
348 |
|
|
Chapters 3–5: Generalized Linear Models and Logistic Regression |
350 |
|
|
Chapters 6–7: Multicategory Logit Models and Loglinear Models |
352 |
|
|
Chapter 8: Matched Pairs |
354 |
|
|
Chapters 9–10: Marginal Models and Random Effects Models (GLMMs) |
357 |
|
|
Chapter 11: Non-Model-Based Classification and Clustering |
358 |
|
|
A.3 STATA FOR CATEGORICAL DATA ANALYSIS |
358 |
|
|
Chapters 1–2: Introduction and Contingency Tables |
359 |
|
|
Chapters 3–5: Generalized Linear Models and Logistic Regression |
360 |
|
|
Chapters 6–7: Multicategory Logit Models and Loglinear Models |
361 |
|
|
Chapters 8–11: Correlated Observations, Advanced Methods |
362 |
|
|
A.4 SPSS FOR CATEGORICAL DATA ANALYSIS |
362 |
|
|
Chapters 1–2: Introduction and Contingency Tables |
363 |
|
|
Chapters 3–5: Generalized Linear Models and Logistic Regression |
363 |
|
|
Chapters 6–7: Multicategory Logit Models and Loglinear Models |
364 |
|
|
Chapters 8–11: Correlated Observations, Advanced Methods |
364 |
|
|
Brief Solutions to Odd-Numbered Exercises |
365 |
|
|
Chapter 1 |
365 |
|
|
Chapter 2 |
366 |
|
|
Chapter 3 |
367 |
|
|
Chapter 4 |
368 |
|
|
Chapter 5 |
370 |
|
|
Chapter 6 |
371 |
|
|
Chapter 7 |
373 |
|
|
Chapter 8 |
374 |
|
|
Chapter 9 |
375 |
|
|
Chapter 10 |
376 |
|
|
Chapter 11 |
377 |
|
|
Bibliography |
379 |
|
|
Examples Index |
381 |
|
|
Subject Index |
385 |
|
|
EULA |
392 |
|