Hilfe Warenkorb Konto Anmelden
 
 
   Schnellsuche   
     zur Expertensuche                      
Categorical Data Analysis
  Großes Bild
 
Categorical Data Analysis
von: Alan Agresti
Wiley, 2014
ISBN: 9781118710852
752 Seiten, Download: 59882 KB
 
Format:  PDF
geeignet für: Apple iPad, Android Tablet PC's Online-Lesen PC, MAC, Laptop

Typ: A (einfacher Zugriff)

 

 
eBook anfordern
Inhaltsverzeichnis

  Cover 1  
  Title Page 5  
  Copyright Page 6  
  Contents 9  
  Preface 15  
  1 Introduction: Distributions and Inference for Categorical Data 19  
     1.1 Categorical Response Data 19  
        1.1.1 Response–Explanatory Variable Distinction 20  
        1.1.2 Binary–Nominal–Ordinal Scale Distinction 20  
        1.1.3 Discrete–Continuous Variable Distinction 21  
        1.1.4 Quantitative–Qualitative Variable Distinction 21  
        1.1.5 Organization of Book and Online Computing Appendix 22  
     1.2 Distributions for Categorical Data 23  
        1.2.1 Binomial Distribution 23  
        1.2.2 Multinomial Distribution 24  
        1.2.3 Poisson Distribution 24  
        1.2.4 Overdispersion 25  
        1.2.5 Connection Between Poisson and Multinomial Distributions 25  
        1.2.6 The Chi-Squared Distribution 26  
     1.3 Statistical Inference for Categorical Data 26  
        1.3.1 Likelihood Functions and Maximum Likelihood Estimation 27  
        1.3.2 Likelihood Function and ML Estimate for Binomial Parameter 27  
        1.3.3 Wald–Likelihood Ratio–Score Test Triad 28  
        1.3.4 Constructing Confidence Intervals by Inverting Tests 30  
     1.4 Statistical Inference for Binomial Parameters 31  
        1.4.1 Tests About a Binomial Parameter 31  
        1.4.2 Confidence Intervals for a Binomial Parameter 32  
        1.4.3 Example: Estimating the Proportion of Vegetarians 33  
        1.4.4 Exact Small-Sample Inference and the Mid P- Value 34  
     1.5 Statistical Inference for Multinomial Parameters 35  
        1.5.1 Estimation of Multinomial Parameters 35  
        1.5.2 Pearson Chi-Squared Test of a Specified Multinomial 36  
        1.5.3 Likelihood-Ratio Chi-Squared Test of a Specified Multinomial 36  
        1.5.4 Example: Testing Mendel's Theories 37  
        1.5.5 Testing with Estimated Expected Frequencies 38  
        1.5.6 Example: Pneumonia Infections in Calves 38  
        1.5.7 Chi-Squared Theoretical Justification 40  
     1.6 Bayesian Inference for Binomial and Multinomial Parameters 40  
        1.6.1 The Bayesian Approach to Statistical Inference 40  
        1.6.2 Binomial Estimation: Beta and Logit-Normal Prior Distributions 42  
        1.6.3 Multinomial Estimation: Dirichlet Prior Distributions 43  
        1.6.4 Example: Estimating Vegetarianism Revisited 44  
        1.6.5 Binomial and Multinomial Estimation: Improper Priors 44  
     Notes 45  
     Exercises 46  
  2 Describing Contingency Tables 55  
     2.1 Probability Structure for Contingency Tables 55  
        2.1.1 Contingency Tables 55  
        2.1.2 Joint/Marginal/Conditional Distributions for Contingency Tables 56  
        2.1.3 Example: Sensitivity and Specificity for Medical Diagnoses 57  
        2.1.4 Independence of Categorical Variables 58  
        2.1.5 Poisson, Binomial, and Multinomial Sampling 58  
        2.1.6 Example: Seat Belts and Auto Accident Injuries 59  
        2.1.7 Example: Case–Control Study of Cancer and Smoking 60  
        2.1.8 Types of Studies: Observational Versus Experimental 61  
     2.2 Comparing Two Proportions 61  
        2.2.1 Difference of Proportions 62  
        2.2.2 Relative Risk 62  
        2.2.3 Odds Ratio 62  
        2.2.4 Properties of the Odds Ratio 63  
        2.2.5 Example: Association Between Heart Attacks and Aspirin Use 64  
        2.2.6 Case–Control Studies and the Odds Ratio 64  
        2.2.7 Relationship Between Odds Ratio and Relative Risk 65  
     2.3 Conditional Association in Stratified 2 × 2 Tables 65  
        2.3.1 Partial Tables 66  
        2.3.2 Example: Racial Characteristics and the Death Penalty 66  
        2.3.3 Conditional and Marginal Odds Ratios 68  
        2.3.4 Marginal Independence Versus Conditional Independence 69  
        2.3.5 Homogeneous Association 71  
        2.3.6 Collapsibility: Identical Conditional and Marginal Associations 71  
     2.4 Measuring Association in I × J Tables 72  
        2.4.1 Odds Ratios in I x J Tables 72  
        2.4.2 Association Factors 73  
        2.4.3 Summary Measures of Association 74  
        2.4.4 Ordinal Trends: Concordant and Discordant Pairs 74  
        2.4.5 Ordinal Measure of Association: Gamma 75  
        2.4.6 Probabilistic Comparisons of Two Ordinal Distributions 76  
        2.4.7 Example: Comparing Pain Ratings After Surgery 77  
        2.4.8 Correlation for Underlying Normality 77  
           Exercises 78  
     Notes 78  
  3 Inference for Two-Way Contingency Tables 87  
     3.1 Confidence Intervals for Association Parameters 87  
        3.1.1 Interval Estimation of the Odds Ratio 87  
        3.1.2 Example: Seat-Belt Use and Traffic Deaths 88  
        3.1.3 Interval Estimation of Difference of Proportions and Relative Risk 89  
        3.1.4 Example: Aspirin and Heart Attacks Revisited 89  
        3.1.5 Deriving Standard Errors with the Delta Method 90  
        3.1.6 Delta Method Applied to the Sample Logit 91  
        3.1.7 Delta Method for the Log Odds Ratio 91  
        3.1.8 Simultaneous Confidence Intervals for Multiple Comparisons 93  
     3.2 Testing Independence in Two-way Contingency Tables 93  
        3.2.1 Pearson and Likelihood-Ratio Chi-Squared Tests 93  
        3.2.2 Example: Education and Belief in God 95  
        3.2.3 Adequacy of Chi-Squared Approximations 95  
        3.2.4 Chi-Squared and Comparing Proportions in 2 x 2 Tables 96  
        3.2.5 Score Confidence Intervals Comparing Proportions 96  
        3.2.6 Profile Likelihood Confidence Intervals 97  
     3.3 Following-up Chi-Squared Tests 98  
        3.3.1 Pearson Residuals and Standardized Residuals 98  
        3.3.2 Example: Education and Belief in God Revisited 99  
        3.3.3 Partitioning Chi-Squared 99  
        3.3.4 Example: Origin of Schizophrenia 101  
        3.3.5 Rules for Partitioning 102  
        3.3.6 Summarizing the Association 102  
        3.3.7 Limitations of Chi-Squared Tests 102  
        3.3.8 Why Consider Independence If It's Unlikely to Be True? 103  
     3.4 Two-Way Tables with Ordered Classifications 104  
        3.4.1 Linear Trend Alternative to Independence 104  
        3.4.2 Example: Is Happiness Associated with Political Ideology? 105  
        3.4.3 Monotone Trend Alternatives to Independence 105  
        3.4.4 Extra Power with Ordinal Tests 106  
        3.4.5 Sensitivity to Choice of Scores 106  
        3.4.6 Example: Infant Birth Defects by Maternal Alcohol Consumption 107  
        3.4.7 Trend Tests for I x 2 and 2 x J Tables 108  
        3.4.8 Nominal-Ordinal Tables 108  
     3.5 Small-Sample Inference for Contingency Tables 108  
        3.5.1 Fisher's Exact Test for 2 x 2 Tables 108  
        3.5.2 Example: Fisher's Tea Drinker 109  
        3.5.3 Two-Sided P-Values for Fisher's Exact Test 110  
        3.5.4 Confidence Intervals Based on Conditional Likelihood 110  
        3.5.5 Discreteness and Conservatism Issues 111  
        3.5.6 Small-Sample Unconditional Tests of Independence 111  
        3.5.7 Conditional Versus Unconditional Tests 112  
     3.6 Bayesian Inference for Two-way Contingency Tables 114  
        3.6.1 Prior Distributions for Comparing Proportions in 2 x 2 Tables 114  
        3.6.2 Posterior Probabilities Comparing Proportions 115  
        3.6.3 Posterior Intervals for Association Parameters 115  
        3.6.4 Example: Urn Sampling Gives Highly Unbalanced Treatment Allocation 116  
        3.6.5 Highest Posterior Density Intervals 116  
        3.6.6 Testing Independence 117  
        3.6.7 Empirical Bayes and Hierarchical Bayesian Approaches 118  
     3.7 Extensions for Multiway Tables and Nontabulated Responses 118  
        3.7.1 Categorical Data Need Not Be Contingency Tables 118  
     Notes 119  
     Exercises 121  
  4 Introduction to Generalized Linear Models 131  
     4.1 The Generalized Linear Model 131  
        4.1.1 Components of Generalized Linear Models 132  
        4.1.2 Binomial Logit Models for Binary Data 132  
        4.1.3 Poisson Loglinear Models for Count Data 133  
        4.1.4 Generalized Linear Models for Continuous Responses 133  
        4.1.5 Deviance of a GLM 133  
        4.1.6 Advantages of GLMs Versus Transforming the Data 134  
     4.2 Generalized Linear Models for Binary Data 135  
        4.2.1 Linear Probability Model 135  
        4.2.2 Example: Snoring and Heart Disease 136  
        4.2.3 Logistic Regression Model 137  
        4.2.4 Binomial GLM for 2 x 2 Contingency Tables 138  
        4.2.5 Probit and Inverse cdf Link Functions 139  
        4.2.6 Latent Tolerance Motivation for Binary Response Models 140  
     4.3 Generalized Linear Models for Counts and Rates 140  
        4.3.1 Poisson Loglinear Models 141  
        4.3.2 Example: Horseshoe Crab Mating 141  
        4.3.3 Overdispersion for Poisson GLMs 144  
        4.3.4 Negative Binomial GLMs 145  
        4.3.5 Poisson Regression for Rates Using Offsets 146  
        4.3.6 Example: Modeling Death Rates for Heart Valve Operations 146  
        4.3.7 Poisson GLM of Independence in Two-Way Contingency Tables 148  
     4.4 Moments and Likelihood for Generalized Linear Models 148  
        4.4.1 The Exponential Dispersion Family 148  
        4.4.2 Mean and Variance Functions for the Random Component 149  
        4.4.3 Mean and Variance Functions for Poisson and Binomial GLMs 150  
        4.4.4 Systematic Component and Link Function of a GLM 150  
        4.4.5 Likelihood Equations for a GLM 151  
        4.4.6 The Key Role of the Mean–Variance Relationship 152  
        4.4.7 Likelihood Equations for Binomial GLMs 152  
        4.4.8 Asymptotic Covariance Matrix of Model Parameter Estimators 153  
        4.4.9 Likelihood Equations and cov(?) for Poisson Loglinear Model 154  
     4.5 Inference and Model Checking for Generalized Linear Models 154  
        4.5.1 Deviance and Goodness of Fit 154  
        4.5.2 Deviance for Poisson GLMs 155  
        4.5.3 Deviance for Binomial GLMs: Grouped Versus Ungrouped Data 155  
        4.5.4 Likelihood-Ratio Model Comparison Using the Deviances 156  
        4.5.5 Score Tests for Goodness of Fit and for Model Comparison 157  
        4.5.6 Residuals for GLMs 158  
        4.5.7 Covariance Matrices for Fitted Values and Residuals 160  
        4.5.8 The Bayesian Approach for GLMs 160  
     4.6 Fitting Generalized Linear Models 161  
        4.6.1 Newton–Raphson Method 161  
        4.6.2 Fisher Scoring Method 162  
        4.6.3 Newton–Raphson and Fisher Scoring for Binary Data 163  
        4.6.4 ML as Iterative Reweighted Least Squares 164  
        4.6.5 Simplifications for Canonical Link Functions 165  
     4.7 Quasi-Likelihood and Generalized Linear Models 167  
        4.7.1 Mean–Variance Relationship Determines Quasi-likelihood Estimates 167  
        4.7.2 Overdispersion for Poisson GLMs and Quasi-likelihood 167  
        4.7.3 Overdispersion for Binomial GLMs and Quasi-likelihood 168  
        4.7.4 Example: Teratology Overdispersion 169  
     Notes 170  
     Exercises 171  
  5 Logistic Regression 181  
     5.1 Interpreting Parameters in Logistic Regression 181  
        5.1.1 Interpreting ?: Odds, Probabilities, and Linear Approximations 182  
        5.1.2 Looking at the Data 183  
        5.1.3 Example: Horseshoe Crab Mating Revisited 184  
        5.1.4 Logistic Regression with Retrospective Studies 186  
        5.1.5 Logistic Regression Is Implied by Normal Explanatory Variables 187  
     5.2 Inference for Logistic Regression 187  
        5.2.1 Inference About Model Parameters and Probabilities 187  
        5.2.2 Example: Inference for Horseshoe Crab Mating Data 188  
        5.2.3 Checking Goodness of Fit: Grouped and Ungrouped Data 189  
        5.2.4 Example: Model Goodness of Fit for Horseshoe Crab Data 190  
        5.2.5 Checking Goodness of Fit with Ungrouped Data by Grouping 190  
        5.2.6 Wald Inference Can Be Suboptimal 192  
     5.3 Logistic Models with Categorical Predictors 193  
        5.3.1 ANOVA-Type Representation of Factors 193  
        5.3.2 Indicator Variables Represent a Factor 193  
        5.3.3 Example: Alcohol and Infant Malformation Revisited 194  
        5.3.4 Linear Logit Model for I × 2 Contingency Tables 195  
        5.3.5 Cochran–Armitage Trend Test 196  
        5.3.6 Example: Alcohol and Infant Malformation Revisited 197  
        5.3.7 Using Directed Models Can Improve Inferential Power 197  
        5.3.8 Noncentral Chi-Squared Distribution and Power for Narrower Alternatives 198  
        5.3.9 Example: Skin Damage and Leprosy 199  
        5.3.10 Model Smoothing Improves Precision of Estimation 200  
     5.4 Multiple Logistic Regression 200  
        5.4.1 Logistic Models for Multiway Contingency Tables 201  
        5.4.2 Example: AIDS and AZT Use 202  
        5.4.3 Goodness of Fit as a Likelihood-Ratio Test 204  
        5.4.4 Model Comparison by Comparing Deviances 205  
        5.4.5 Example: Horseshoe Crab Satellites Revisited 205  
        5.4.6 Quantitative Treatment of Ordinal Predictor 207  
        5.4.7 Probability-Based and Standardized Interpretations 208  
        5.4.8 Estimating an Average Causal Effect 209  
     5.5 Fitting Logistic Regression Models 210  
        5.5.1 Likelihood Equations for Logistic Regression 210  
        5.5.2 Asymptotic Covariance Matrix of Parameter Estimators 211  
        5.5.3 Distribution of Probability Estimators 212  
        5.5.4 Newton–Raphson Method Applied to Logistic Regression 212  
     Notes 213  
     Exercises 214  
  6 Building, Checking, and Applying Logistic Regression Models 225  
     6.1 Strategies in Model Selection 225  
        6.1.1 How Many Explanatory Variables Can Be in the Model? 226  
        6.1.2 Example: Horseshoe Crab Mating Data Revisited 226  
        6.1.3 Stepwise Procedures: Forward Selection and Backward Elimination 227  
        6.1.4 Example: Backward Elimination for Horseshoe Crab Data 228  
        6.1.5 Model Selection and the "Correct" Model 229  
        6.1.6 AIC: Minimizing Distance of the Fit from the Truth 230  
        6.1.7 Example: Using Causal Hypotheses to Guide Model Building 231  
        6.1.8 Alternative Strategies, Including Model Averaging 233  
     6.2 Logistic Regression Diagnostics 233  
        6.2.1 Residuals: Pearson, Deviance, and Standardized 233  
        6.2.2 Example: Heart Disease and Blood Pressure 234  
        6.2.3 Example: Admissions to Graduate School at Florida 236  
        6.2.4 Influence Diagnostics for Logistic Regression 238  
     6.3 Summarizing the Predictive Power of a Model 239  
        6.3.1 Summarizing Predictive Power: R and R-Squared Measures 239  
        6.3.2 Summarizing Predictive Power: Likelihood and Deviance Measures 240  
        6.3.3 Summarizing Predictive Power: Classification Tables 241  
        6.3.4 Summarizing Predictive Power: ROC Curves 242  
        6.3.5 Example: Evaluating Predictive Power for Horseshoe Crab Data 242  
     6.4 Mantel–Haenszel and Related Methods for Multiple 2 × 2 Tables 243  
        6.4.1 Using Logistic Models to Test Conditional Independence 244  
        6.4.2 Cochran–Mantel–Haenszel Test of Conditional Independence 245  
        6.4.3 Example: Multicenter Clinical Trial Revisited 246  
        6.4.4 CMH Test Is Advantageous for Sparse Data 246  
        6.4.5 Estimation of Common Odds Ratio 247  
        6.4.6 Meta-analyses for Summarizing Multiple 2 x 2 Tables 248  
        6.4.7 Meta-analyses for Multiple 2 x 2 Tables: Difference of Proportions 249  
        6.4.8 Collapsibility and Logistic Models for Contingency Tables 250  
        6.4.9 Testing Homogeneity of Odds Ratios 250  
        6.4.10 Summarizing Heterogeneity in Odds Ratios 251  
        6.4.11 Propensity Scores in Observational Studies 251  
     6.5 Detecting and Dealing with Infinite Estimates 251  
        6.5.1 Complete or Quasi-complete Separation 252  
        6.5.2 Example: Multicenter Clinical Trial with Few Successes 253  
        6.5.3 Remedies When at Least One ML Estimate Is Infinite 254  
     6.6 Sample Size and Power Considerations 255  
        6.6.1 Sample Size and Power for Comparing Two Proportions 255  
        6.6.2 Sample Size Determination in Logistic Regression 256  
        6.6.3 Sample Size in Multiple Logistic Regression 257  
        6.6.4 Power for Chi–Squared Tests in Contingency Tables 257  
        6.6.5 Power for Testing Conditional Independence 258  
        6.6.6 Effects of Sample Size on Model Selection and Inference 259  
     Notes 259  
     Exercises 261  
  7 Alternative Modeling of Binary Response Data 269  
     7.1 Probit and Complementary Log-log Models 269  
        7.1.1 Probit Models: Three Latent Variable Motivations 270  
        7.1.2 Probit Models: Interpreting Effects 270  
        7.1.3 Probit Model Fitting 271  
        7.1.4 Example: Modeling Flour Beetle Mortality 272  
        7.1.5 Complementary Log–Log Link Models 273  
        7.1.6 Example: Beetle Mortality Revisited 275  
     7.2 Bayesian Inference for Binary Regression 275  
        7.2.1 Prior Specifications for Binary Regression Models 275  
        7.2.2 Example: Risk Factors for Endometrial Cancer Grade 276  
        7.2.3 Bayesian Logistic Regression for Retrospective Studies 278  
        7.2.4 Probability–Based Prior Specifications for Binary Regression Models 278  
        7.2.5 Example: Modeling the Probability a Trauma Patient Survives 279  
        7.2.6 Bayesian Fitting for Probit Models 281  
        7.2.7 Bayesian Model Checking for Binary Regression 283  
     7.3 Conditional Logistic Regression 283  
        7.3.1 Conditional Likelihood 283  
        7.3.2 Small-Sample Inference for a Logistic Regression Parameter 285  
        7.3.3 Small-Sample Conditional Inference for 2 x 2 Contingency Tables 285  
        7.3.4 Small-Sample Conditional Inference for Linear Logit Model 286  
        7.3.5 Small-Sample Tests of Conditional Independence in 2 x 2 x K Tables 287  
        7.3.6 Example: Promotion Discrimination 287  
        7.3.7 Discreteness Complications of Using Exact Conditional Inference 288  
     7.4 Smoothing: Kernels, Penalized Likelihood, Generalized Additive Models 288  
        7.4.1 How Much Smoothing? The Variance/Bias Trade-off 288  
        7.4.2 Kernel Smoothing 289  
        7.4.3 Example: Smoothing to Portray Probability of Kyphosis 290  
        7.4.4 Nearest Neighbors Smoothing 290  
        7.4.5 Smoothing Using Penalized Likelihood Estimation 291  
        7.4.6 Why Shrink Estimates Toward 0? 293  
        7.4.7 Firth's Penalized Likelihood for Logistic Regression 293  
        7.4.8 Example: Complete Separation but Finite Logistic Estimates 293  
        7.4.9 Generalized Additive Models 294  
        7.4.10 Example: GAMs for Horseshoe Crab Mating Data 295  
        7.4.11 Advantages/Disadvantages of Various Smoothing Methods 295  
     7.5 Issues in Analyzing High–Dimensional Categorical Data 296  
        7.5.1 Issues in Selecting Explanatory Variables 296  
        7.5.2 Adjusting for Multiplicity: The Bonferroni Method 297  
        7.5.3 Adjusting for Multiplicity: The False Discovery Rate 298  
        7.5.4 Other Variable Selection Methods with High–Dimensional Data 299  
        7.5.5 Examples: High–Dimensional Applications in Genomics 300  
        7.5.6 Example: Motif Discovery for Protein Sequences 301  
        7.5.7 Example: The Netflix Prize 302  
        7.5.8 Example: Credit Scoring 303  
     Notes 303  
     Exercises 305  
  8 Models for Multinomial Responses 311  
     8.1 Nominal Responses: Baseline–Category Logit Models 311  
        8.1.1 Baseline–Category Logits 311  
        8.1.2 Example: Alligator Food Choice 312  
        8.1.3 Estimating Response Probabilities 314  
        8.1.4 Fitting Baseline–Category Logistic Models 315  
        8.1.5 Multicategory Logit Model as a Multivariate GLM 317  
        8.1.6 Multinomial Probit Models 317  
        8.1.7 Example: Effect of Menu Pricing 318  
     8.2 Ordinal Responses: Cumulative Logit Models 319  
        8.2.1 Cumulative Logits 319  
        8.2.2 Proportional Odds Form of Cumulative Logit Model 319  
        8.2.3 Latent Variable Motivation for Proportional Odds Structure 321  
        8.2.4 Example: Happiness and Traumatic Events 322  
        8.2.5 Checking the Proportional Odds Assumption 324  
     8.3 Ordinal Responses: Alternative Models 326  
        8.3.1 Cumulative Link Models 326  
        8.3.2 Cumulative Probit and Log-Log Models 326  
        8.3.3 Example: Happiness Revisited with Cumulative Probits 327  
        8.3.4 Adjacent–Categories Logit Models 327  
        8.3.5 Example: Happiness Revisited 328  
        8.3.6 Continuation–Ratio Logit Models 329  
        8.3.7 Example: Developmental Toxicity Study with Pregnant Mice 330  
        8.3.8 Stochastic Ordering Location Effects Versus Dispersion Effects 331  
        8.3.9 Summarizing Predictive Power of Explanatory Variables 332  
     8.4 Testing Conditional Independence in I × J × K Tables 332  
        8.4.1 Testing Conditional Independence Using Multinomial Models 333  
        8.4.2 Example: Homosexual Marriage and Religious Fundamentalism 334  
        8.4.3 Generalized Cochran-Mantel–Haenszel Tests for I x J x K Tables 335  
        8.4.4 Example: Homosexual Marriage Revisited 337  
        8.4.5 Related Score Tests for Multinomial Logit Models 337  
     8.5 Discrete-Choice Models 338  
        8.5.1 Conditional Logits for Characteristics of the Choices 338  
        8.5.2 Multinomial Logit Model Expressed as Discrete-Choice Model 339  
        8.5.3 Example: Shopping Destination Choice 339  
        8.5.4 Multinomial Probit Discrete–Choice Models 339  
        8.5.5 Extensions: Nested Logit and Mixed Logit Models 340  
        8.5.6 Extensions: Discrete Choice with Ordered Categories 340  
     8.6 Bayesian Modeling of Multinomial Responses 341  
        8.6.1 Bayesian Fitting of Cumulative Link Models 341  
        8.6.2 Example: Cannabis Use and Mother's Age 342  
        8.6.3 Bayesian Fitting of Multinomial Logit and Probit Models 343  
        8.6.4 Example: Alligator Food Choice Revisited 344  
     Notes 344  
     Exercises 347  
  9 Loglinear Models for Contingency Tables 357  
     9.1 Loglinear Models for Two-way Tables 357  
        9.1.1 Independence Model for a Two-Way Table 357  
        9.1.2 Interpretation of Loglinear Model Parameters 358  
        9.1.3 Saturated Model for a Two-Way Table 358  
        9.1.4 Alternative Parameter Constraints 359  
        9.1.5 Hierarchical Versus Nonhierarchical Models 359  
        9.1.6 Multinomial Models for Cell Probabilities 360  
     9.2 Loglinear Models for Independence and Interaction in Three-way Tables 360  
        9.2.1 Types of Independence 360  
        9.2.2 Homogeneous Association and Three-Factor Interaction 362  
        9.2.3 Interpretation of Loglinear Model Parameters 363  
        9.2.4 Example: Alcohol, Cigarette, and Marijuana Use 364  
     9.3 Inference for Loglinear Models 366  
        9.3.1 Chi-Squared Goodness-of-Fit Tests 366  
        9.3.2 Inference about Conditional Associations 366  
     9.4 Loglinear Models for Higher Dimensions 368  
        9.4.1 Models for Four–Way Contingency Tables 368  
        9.4.2 Example: Automobile Accidents and Seat-Belt Use 368  
        9.4.3 Large Samples and Statistical Versus Practical Significance 370  
        9.4.4 Dissimilarity Index 370  
     9.5 Loglinear—Logistic Model Connection 371  
        9.5.1 Using Logistic Models to Interpret Loglinear Models 371  
        9.5.2 Example: Auto Accidents and Seat-Belts Revisited 372  
        9.5.3 Equivalent Loglinear and Logistic Models 372  
        9.5.4 Example: Detecting Gene–Environment Interactions in Case–Control Studies 373  
     9.6 Loglinear Model Fitting: Likelihood Equations and Asymptotic Distributions 374  
        9.6.1 Minimal Sufficient Statistics 374  
        9.6.2 Likelihood Equations for Loglinear Models 375  
        9.6.3 Unique ML Estimates Match Data in Sufficient Marginal Tables 376  
        9.6.4 Direct Versus Iterative Calculation of Fitted Values 376  
        9.6.5 Decomposable Models 377  
        9.6.6 Chi-Squared Goodness-of-Fit Tests 377  
        9.6.7 Covariance Matrix of ML Parameter Estimators 378  
        9.6.8 Connection Between Multinomial and Poisson Loglinear Models 379  
        9.6.9 Distribution of Probability Estimators 380  
        9.6.10 Proof of Uniqueness of ML Estimates 381  
        9.6.11 Pseudo ML for Complex Sampling Designs 381  
     9.7 Loglinear Model Fitting: Iterative Methods and Their Application 382  
        9.7.1 Newton-Raphson Method 382  
        9.7.2 Iterative Proportional Fitting 383  
        9.7.3 Comparison of IPF and Newton–Raphson Iterative Methods 384  
        9.7.4 Raking a Table: Contingency Table Standardization 385  
     Notes 386  
     Exercises 387  
  10 Building and Extending Loglinear Models 395  
     10.1 Conditional Independence Graphs and Collapsibility 395  
        10.1.1 Conditional Independence Graphs 395  
        10.1.2 Graphical Loglinear Models 396  
        10.1.3 Collapsibility in Three–Way Contingency Tables 397  
        10.1.4 Collapsibility for Multiway Tables 398  
     10.2 Model Selection and Comparison 398  
        10.2.1 Considerations in Model Selection 398  
        10.2.2 Example: Model Building for Student Survey 399  
        10.2.3 Loglinear Model Comparison Statistics 401  
        10.2.4 Partitioning Chi-Squared with Model Comparisons 402  
        10.2.5 Identical Marginal and Conditional Tests of Independence 402  
     10.3 Residuals for Detecting Cell-Specific Lack of Fit 403  
        10.3.1 Residuals for Loglinear Models 403  
        10.3.2 Example: Student Survey Revisited 403  
        10.3.3 Identical Loglinear and Logistic Standardized Residuals 404  
     10.4 Modeling Ordinal Associations 404  
        10.4.1 Linear-by-Linear Association Model for Two-Way Tables 405  
        10.4.2 Corresponding Logistic Model for Adjacent Responses 406  
        10.4.3 Likelihood Equations and Model Fitting 407  
        10.4.4 Example: Sex and Birth Control Opinions Revisited 407  
        10.4.5 Directed Ordinal Test of Independence 409  
        10.4.6 Row Effects and Column Effects Association Models 409  
        10.4.7 Example: Estimating Category Scores for Premarital Sex 410  
        10.4.8 Ordinal Variables in Models for Multiway Tables 410  
     10.5 Generalized Loglinear and Association Models, Correlation Models, and Correspondence Analysis 411  
        10.5.1 Generalized Loglinear Model 411  
        10.5.2 Multiplicative Row and Column Effects Model 412  
        10.5.3 Example: Mental Health and Parents' SES 413  
        10.5.4 Correlation Models 413  
        10.5.5 Correspondence Analysis 414  
        10.5.6 Model Selection and Score Choice for Ordinal Variables 416  
     10.6 Empty Cells and Sparseness in Modeling Contingency Tables 416  
        10.6.1 Empty Cells: Sampling Versus Structural Zeros 416  
        10.6.2 Existence of Estimates in Loglinear Models 417  
        10.6.3 Effects of Sparseness on X2, G2, and Model-Based Tests 418  
        10.6.4 Alternative Sparse Data Asymptotics 419  
        10.6.5 Adding Constants to Cells of a Contingency Table 419  
     10.7 Bayesian Loglinear Modeling 419  
        10.7.1 Estimating Loglinear Model Parameters in Two-Way Tables 420  
        10.7.2 Example: Polarized Opinions by Political Party 420  
        10.7.3 Bayesian Loglinear Modeling of Multidimensional Tables 421  
        10.7.4 Graphical Conditional Independence Models 422  
     Notes 422  
     Exercises 425  
  11 Models for Matched Pairs 431  
     11.1 Comparing Dependent Proportions 432  
        11.1.1 Confidence Intervals Comparing Dependent Proportions 432  
        11.1.2 McNemar Test Comparing Dependent Proportions 433  
        11.1.3 Example: Changes in Presidential Election Voting 433  
        11.1.4 Increased Precision with Dependent Samples 434  
        11.1.5 Small-Sample Test Comparing Dependent Proportions 434  
        11.1.6 Connection Between McNemar and Cochran-Mantel–Haenszel Tests 435  
        11.1.7 Subject-Specific and Population–Averaged (Marginal) Tables 436  
     11.2 Conditional Logistic Regression for Binary Matched Pairs 436  
        11.2.1 Subject–Specific Versus Marginal Models for Matched Pairs 436  
        11.2.2 Logistic Models with Subject-Specific Probabilities 437  
        11.2.3 Conditional ML Inference for Binary Matched Pairs 438  
        11.2.4 Random Effects in Binary Matched-Pairs Model 439  
        11.2.5 Conditional Logistic Regression for Matched Case–Control Studies 439  
        11.2.6 Conditional Logistic Regression for Matched Pairs with Multiple Predictors 440  
        11.2.7 Marginal Models and Subject-Specific Models: Extensions 441  
     11.3 Marginal Models for Square Contingency Tables 442  
        11.3.1 Marginal Models for Nominal Classifications 442  
        11.3.2 Example: Regional Migration 443  
        11.3.3 Marginal Models for Ordinal Classifications 443  
        11.3.4 Example: Opinions on Premarital and Extramarital Sex 444  
     11.4 Symmetry, Quasi-Symmetry, and Quasi-Independence 444  
        11.4.1 Symmetry as Logistic and Loglinear Models 445  
        11.4.2 Quasi-symmetry 445  
        11.4.3 Marginal Homogeneity and Quasi-symmetry 447  
        11.4.4 Quasi–independence 447  
        11.4.5 Example: Migration Revisited 448  
        11.4.6 Ordinal Quasi-symmetry 449  
        11.4.7 Example: Premarital and Extramarital Sex Revisited 450  
     11.5 Measuring Agreement Between Observers 450  
        11.5.1 Agreement: Departures from Independence 451  
        11.5.2 Using Quasi–independence to Analyze Agreement 451  
        11.5.3 Quasi-symmetry and Agreement Modeling 452  
        11.5.4 Kappa: A Summary Measure of Agreement 452  
        11.5.5 Weighted Kappa: Quantifying Disagreement 453  
        11.5.6 Extensions to Multiple Observers 453  
     11.6 Bradley-Terry Model for Paired Preferences 454  
        11.6.1 Bradley-Terry Model 454  
        11.6.2 Example: Major League Baseball Rankings 454  
        11.6.3 Example: Home Team Advantage in Baseball 455  
        11.6.4 Bradley-Terry Model and Quasi-symmetry 456  
        11.6.5 Extensions to Ties and Ordinal Pairwise Evaluations 457  
     11.7 Marginal Models and Quasi-Symmetry Models for Matched Sets 457  
        11.7.1 Marginal Homogeneity, Complete Symmetry, and Quasi-symmetry 457  
        11.7.2 Types of Marginal Symmetry 458  
        11.7.3 Comparing Binary Marginal Distributions in Multiway Tables 458  
        11.7.4 Example: Attitudes Toward Legalized Abortion 459  
        11.7.5 Marginal Homogeneity for a Multicategory Response 460  
        11.7.6 Wald and Generalized CMH Score Tests of Marginal Homogeneity 460  
     Notes 461  
     Exercises 463  
  12 Clustered Categorical Data: Marginal and Transitional Models 473  
     12.1 Marginal Modeling: Maximum Likelihood Approach 474  
        12.1.1 Example: Longitudinal Study of Mental Depression 474  
        12.1.2 Modeling a Repeated Multinomial Response 476  
        12.1.3 Example: Insomnia Clinical Trial 476  
        12.1.4 ML Fitting of Marginal Logistic Models: Constraints on Cell Probabilities 477  
        12.1.5 ML Fitting of Marginal Logistic Models: Other Methods 479  
     12.2 Marginal Modeling: Generalized Estimating Equations (GEEs) Approach 480  
        12.2.1 Generalized Estimating Equations Methodology: Basic Ideas 480  
        12.2.2 Example: Longitudinal Mental Depression Revisited 481  
        12.2.3 Example: Multinomial GEE Approach for Insomnia Trial 482  
     12.3 Quasi-Likelihood and Its GEE Multivariate Extension: Details 483  
        12.3.1 The Univariate Quasi-likelihood Method 483  
        12.3.2 Properties of Quasi–likelihood Estimators 484  
        12.3.3 Sandwich Covariance Adjustment for Variance Misspecification 485  
        12.3.4 GEE Multivariate Methodology: Technical Details 486  
        12.3.5 Working Associations Characterized by Odds Ratios 488  
        12.3.6 GEE Approach: Multinomial Responses 488  
        12.3.7 Dealing with Missing Data 489  
     12.4 Transitional Models: Markov Chain and Time Series Models 491  
        12.4.1 Markov Chains 491  
        12.4.2 Example: Changes in Evapotranspiration Rates 492  
        12.4.3 Transitional Models with Explanatory Variables 493  
        12.4.4 Example: Child's Respiratory Illness and Maternal Smoking 494  
        12.4.5 Example: Initial Response in Matched Pair as a Covariate 495  
        12.4.6 Transitional Models and Loglinear Conditional Models 496  
     Notes 496  
     Exercises 497  
  13 Clustered Categorical Data: Random Effects Models 507  
     13.1 Random Effects Modeling of Clustered Categorical Data 507  
        13.1.1 Generalized Linear Mixed Model 508  
        13.1.2 Logistic GLMM with Random Intercept for Binary Matched Pairs 509  
        13.1.3 Example: Changes in Presidential Voting Revisited 510  
        13.1.4 Extension: Rasch Model and Item Response Models 510  
        13.1.5 Random Effects Versus Conditional ML Approaches 511  
     13.2 Binary Responses: Logistic-Normal Model 512  
        13.2.1 Shared Random Effect Implies Nonnegative Marginal Correlations 512  
        13.2.2 Interpreting Heterogeneity in Logistic-Normal Models 512  
        13.2.3 Connections Between Random Effects Models and Marginal Models 513  
        13.2.4 Comments About GLMMs Versus Marginal Models 515  
     13.3 Examples of Random Effects Models for Binary Data 516  
        13.3.1 Example: Small–Area Estimation of Binomial Proportions 516  
        13.3.2 Modeling Repeated Binary Responses: Attitudes About Abortion 518  
        13.3.3 Example: Longitudinal Mental Depression Study Revisited 520  
        13.3.4 Example: Capture–Recapture Prediction of Population Size 521  
        13.3.5 Example: Heterogeneity Among Multicenter Clinical Trials 523  
        13.3.6 Meta-analysis Using a Random Effects Approach 525  
        13.3.7 Alternative Formulations of Random Effects Models 525  
        13.3.8 Example: Matched Pairs with a Bivariate Binary Response 526  
        13.3.9 Time Series Models Using Autocorrelated Random Effects 527  
        13.3.10 Example: Oxford and Cambridge Annual Boat Race 528  
     13.4 Random Effects Models for Multinomial Data 529  
        13.4.1 Cumulative Logit Model with Random Intercept 529  
        13.4.2 Example: Insomnia Study Revisited 529  
        13.4.3 Example: Combining Measures on Ordinal Items 530  
        13.4.4 Example: Cluster Sampling 531  
        13.4.5 Baseline-Category Logit Models with Random Effects 532  
        13.4.6 Example: Effectiveness of Housing Program 532  
     13.5 Multilevel Modeling 533  
        13.5.1 Hierarchical Random Terms: Partitioning Variability 534  
        13.5.2 Example: Children's Care for an Unmarried Mother 534  
     13.6 GLMM Fitting, Inference, and Prediction 537  
        13.6.1 Marginal Likelihood and Maximum Likelihood Fitting 537  
        13.6.2 Gauss–Hermite Quadrature Methods for ML Fitting 538  
        13.6.3 Monte Carlo and EM Methods for ML Fitting 538  
        13.6.4 Laplace and Penalized Quasi-likelihood Approximations to ML 539  
        13.6.5 Inference for GLMM Parameters 540  
        13.6.6 Prediction Using Random Effects 540  
     13.7 Bayesian Multivariate Categorical Modeling 541  
        13.7.1 Marginal Homogeneity Analyses for Matched Pairs 541  
        13.7.2 Bayesian Approaches to Meta-analysis and Multicenter Trials 541  
        13.7.3 Example: Bayesian Analyses for a Multicenter Trial 542  
        13.7.4 Bayesian GLMMs and Marginal Models 542  
     Notes 543  
     Exercises 545  
  14 Other Mixture Models for Discrete Data 553  
     14.1 Latent Class Models 553  
        14.1.1 Independence Given a Latent Categorical Variable 554  
        14.1.2 Fitting Latent Class Models 555  
        14.1.3 Example: Latent Class Model for Rater Agreement 556  
        14.1.4 Example: Latent Class Models for Capture-Recapture 558  
        14.1.5 Example: Latent Class Transitional Models 559  
     14.2 Nonparametric Random Effects Models 560  
        14.2.1 Logistic Models with Unspecified Random Effects Distribution 560  
        14.2.2 Example: Attitudes About Legalized Abortion 560  
        14.2.3 Example: Nonparametric Mixing of Logistic Regressions 561  
        14.2.4 Is Misspecification of Random Effects a Serious Problem? 561  
        14.2.5 Rasch Mixture Model 563  
        14.2.6 Example: Modeling Rater Agreement Revisited 563  
        14.2.7 Nonparametric Mixtures and Quasi-symmetry 564  
        14.2.8 Example: Attitudes About Legalized Abortion Revisited 565  
     14.3 Beta-Binomial Models 566  
        14.3.1 Beta-Binomial Distribution 566  
        14.3.2 Models Using the Beta-Binomial Distribution 567  
        14.3.3 Quasi-likelihood with Beta-Binomial Type Variance 567  
        14.3.4 Example: Teratology Overdispersion Revisited 568  
        14.3.5 Conjugate Mixture Models 570  
     14.4 Negative Binomial Regression 570  
        14.4.1 Gamma Mixture of Poissons Is Negative Binomial 571  
        14.4.2 Negative Binomial Regression Modeling 571  
        14.4.3 Example: Frequency of Knowing Homicide Victims 572  
     14.5 Poisson Regression with Random Effects 573  
        14.5.1 A Poisson GLMM 574  
        14.5.2 Marginal Model Implied by Poisson GLMM 574  
        14.5.3 Example: Homicide Victim Frequency Revisited 575  
        14.5.4 Negative Binomial Models versus Poisson GLMMs 575  
     Notes 575  
     Exercises 576  
  15 Non-Model-Based Classification and Clustering 583  
     15.1 Classification: Linear Discriminant Analysis 583  
        15.1.1 Classification with Normally Distributed Predictors 584  
        15.1.2 Example: Horseshoe Crab Satellites Revisited 585  
        15.1.3 Multicategory Classification and Other Versions of Discriminant Analysis 586  
        15.1.4 Classification Methods for High Dimensions 587  
        15.1.5 Discriminant Analysis Versus Logistic Regression 587  
     15.2 Classification: Tree-Structured Prediction 588  
        15.2.1 Classification Trees 588  
        15.2.2 Example: Classification Tree for a Health Care Application 589  
        15.2.3 How Does the Classification Tree Grow? 590  
        15.2.4 Pruning a Tree and Checking Prediction Accuracy 591  
        15.2.5 Classification Trees Versus Logistic Regression 592  
        15.2.6 Support Vector Machines for Classification 593  
     15.3 Cluster Analysis for Categorical Data 594  
        15.3.1 Supervised Versus Unsupervised Learning 595  
        15.3.2 Measuring Dissimilarity Between Observations 595  
        15.3.3 Clustering Algorithms: Partitions and Hierarchies 596  
        15.3.4 Example: Clustering States on Election Results 597  
     Notes 599  
     Exercises 600  
  16 Large- and Small-Sample Theory for Multinomial Models 605  
     16.1 Delta Method 605  
        16.1.1 O, o Rates of Convergence 606  
        16.1.2 Delta Method for a Function of a Random Variable 606  
        16.1.3 Delta Method for a Function of a Random Vector 607  
        16.1.4 Asymptotic Normality of Functions of Multinomial Counts 608  
        16.1.5 Delta Method for a Vector Function of a Random Vector 609  
        16.1.6 Joint Asymptotic Normality of Log Odds Ratios 609  
     16.2 Asymptotic Distributions of Estimators of Model Parameters and Cell Probabilities 610  
        16.2.1 Asymptotic Distribution of Model Parameter Estimator 610  
        16.2.2 Asymptotic Distribution of Cell Probability Estimators 611  
        16.2.3 Model Smoothing Is Beneficial 612  
     16.3 Asymptotic Distributions of Residuals and Goodness-of-fit Statistics 612  
        16.3.1 Joint Asymptotic Normality of p and ? 612  
        16.3.2 Asymptotic Distribution of Pearson and Standardized Residuals 613  
        16.3.3 Asymptotic Distribution of Pearson X2 Statistic 614  
        16.3.4 Asymptotic Distribution of Likelihood-Ratio Statistic 615  
        16.3.5 Asymptotic Noncentral Distributions 616  
     16.4 Asymptotic Distributions for Logit/Loglinear Models 617  
        16.4.1 Asymptotic Covariance Matrices 617  
        16.4.2 Connection with Poisson Loglinear Models 618  
     16.5 Small-Sample Significance Tests for Contingency Tables 619  
        16.5.1 Exact Conditional Distribution for I x J Tables Under Independence 619  
        16.5.2 Exact Tests of Independence for I x J Tables 620  
        16.5.3 Example: Sexual Orientation and Party ID 620  
     16.6 Small-Sample Confidence Intervals for Categorical Data 621  
        16.6.1 Small-Sample CIs for a Binomial Parameter 621  
        16.6.2 CIs Based on Tests Using the Mid P- Value 623  
        16.6.3 Example: Proportion of Vegetarians Revisited 623  
        16.6.4 Small-Sample CIs for Odds Ratios 624  
        16.6.5 Example: Fisher's Tea Taster Revisited 625  
        16.6.6 Small-Sample CIs for Logistic Regression Parameters 625  
        16.6.7 Example: Diarrhea and an Antibiotic 626  
        16.6.8 Unconditional Small-Sample CIs for Difference of Proportions 627  
     16.7 Alternative Estimation Theory for Parametric Models 628  
        16.7.1 Weighted Least Squares for Categorical Data 628  
        16.7.2 Inference Using the WLS Approach to Model Fitting 629  
        16.7.3 Scope of WLS Versus ML Estimation 630  
        16.7.4 Minimum Chi-Squared Estimators 631  
        16.7.5 Minimum Discrimination Information 632  
     Notes 633  
     Exercises 634  
  17 Historical Tour of Categorical Data Analysis 641  
     17.1 Pearson-Yule Association Controversy 641  
     17.2 R. A. Fisher's Contributions 643  
     17.3 Logistic Regression 645  
     17.4 Multiway Contingency Tables and Loglinear Models 647  
     17.5 Bayesian Methods for Categorical Data 651  
     17.6 A Look Forward, and Backward 652  
  Appendix A Statistical Software for Categorical Data Analysis 655  
  Appendix B Chi-Squared Distribution Values 659  
  References 661  
  Author Index 707  
  Example Index 719  
  Subject Index 723  


nach oben


  Mehr zum Inhalt
Kapitelübersicht
Kurzinformation
Inhaltsverzeichnis
Leseprobe
Blick ins Buch
Fragen zu eBooks?

  Medientyp
  eBooks
  eJournal
  alle

  Navigation
Belletristik / Romane
Computer
Geschichte
Kultur
Medizin / Gesundheit
Philosophie / Religion
Politik
Psychologie / Pädagogik
Ratgeber
Recht
Reise / Hobbys
Technik / Wissen
Wirtschaft

© 2008-2024 ciando GmbH | Impressum | Kontakt | F.A.Q. | Datenschutz