|
Introduction to Statistics for Forensic Scientists |
3 |
|
|
Contents |
7 |
|
|
Preface |
11 |
|
|
List of figures |
13 |
|
|
List of tables |
15 |
|
|
1 A short history of statistics in the law |
17 |
|
|
1.1 History |
17 |
|
|
1.2 Some recent uses of statistics in forensic science |
19 |
|
|
1.3 What is probability? |
20 |
|
|
2 Data types, location and dispersion |
23 |
|
|
2.1 Types of data |
23 |
|
|
2.2 Populations and samples |
25 |
|
|
2.3 Distributions |
25 |
|
|
2.4 Location |
27 |
|
|
2.5 Dispersion |
29 |
|
|
2.6 Hierarchies of variation |
30 |
|
|
3 Probability |
33 |
|
|
3.1 Aleatory probability |
33 |
|
|
One throw of a six-sided die |
33 |
|
|
A single throw with more than one outcome of interest |
34 |
|
|
Two six-sided dice |
35 |
|
|
3.2 Binomial probability |
37 |
|
|
3.3 Poisson probability |
40 |
|
|
3.4 Empirical probability |
41 |
|
|
Modelled empirical probabilities |
41 |
|
|
Truly empirical probabilities |
43 |
|
|
4 The normal distribution |
45 |
|
|
4.1 The normal distribution |
45 |
|
|
4.2 Standard deviation and standard error of the mean |
46 |
|
|
4.3 Percentage points of the normal distribution |
48 |
|
|
4.4 The t-distribution and the standard error of the mean |
50 |
|
|
4.5 t-testing between two independent samples |
52 |
|
|
4.6 Testing between paired observations |
56 |
|
|
4.7 Confidence, significance and p-values |
58 |
|
|
5 Measures of nominal and ordinal association |
61 |
|
|
5.1 Association between discrete variables |
61 |
|
|
5.2 c(2) test for a 2 × 2 table |
62 |
|
|
5.3 Yules Q |
64 |
|
|
5.4 c(2) tests for greater than 2 × 2 tables |
65 |
|
|
5.5 f(2) and Cramers V(2) |
66 |
|
|
5.6 The limitations of c(2) testing |
67 |
|
|
5.7 Interpretation and conclusions |
68 |
|
|
6 Correlation |
71 |
|
|
6.1 Significance tests for correlation coefficients |
75 |
|
|
6.2 Correlation coefficients for non-linear data |
76 |
|
|
6.3 The coefficient of determination |
79 |
|
|
6.4 Partial correlation |
79 |
|
|
6.5 Partial correlation controlling for two or more covariates |
85 |
|
|
7 Regression and calibration |
91 |
|
|
7.1 Linear models |
91 |
|
|
7.2 Calculation of a linear regression model |
94 |
|
|
7.3 Testing ‘goodness of fit’ |
96 |
|
|
7.4 Testing coefficients a and b |
97 |
|
|
7.5 Residuals |
99 |
|
|
7.6 Calibration |
101 |
|
|
A linear calibration model |
102 |
|
|
Calculation of a confidence interval for a point |
105 |
|
|
7.7 Points to remember |
107 |
|
|
8 Evidence evaluation |
111 |
|
|
8.1 Verbal statements of evidential value |
111 |
|
|
8.2 Evidence types |
112 |
|
|
8.3 The value of evidence |
113 |
|
|
8.4 Significance testing and evidence evaluation |
118 |
|
|
9 Conditional probability and Bayes’ theorem |
121 |
|
|
9.1 Conditional probability |
121 |
|
|
9.2 Bayes’ theorem |
124 |
|
|
9.3 The value of evidence |
128 |
|
|
10 Relevance and the formulation of propositions |
133 |
|
|
10.1 Relevance |
133 |
|
|
10.2 Hierarchy of propositions |
134 |
|
|
10.3 Likelihood ratios and relevance |
136 |
|
|
10.4 The logic of relevance |
138 |
|
|
10.5 The formulation of propositions |
139 |
|
|
10.6 What kind of propositions can we not evaluate? |
140 |
|
|
11 Evaluation of evidence in practice |
145 |
|
|
11.1 Which database to use |
145 |
|
|
Type and geographic factors |
145 |
|
|
DNA and database selection |
147 |
|
|
11.2 Verbal equivalence of the likelihood ratio |
149 |
|
|
11.3 Some common criticisms of statistical approaches |
152 |
|
|
12 Evidence evaluation examples |
155 |
|
|
12.1 Blood group frequencies |
155 |
|
|
12.2 Trouser fibres |
157 |
|
|
12.3 Shoe types |
160 |
|
|
12.4 Airweapon projectiles |
164 |
|
|
12.5 Height description from eyewitness |
166 |
|
|
13 Errors in interpretation |
171 |
|
|
13.1 Statistically based errors of interpretation |
171 |
|
|
Transposed conditional |
171 |
|
|
Defender’s fallacy |
172 |
|
|
Another match error |
173 |
|
|
Numerical conversion error |
173 |
|
|
13.2 Methodological errors of interpretation |
174 |
|
|
Different level error |
174 |
|
|
Defendant’s database fallacy |
175 |
|
|
Independence assumption |
175 |
|
|
14 DNA I |
177 |
|
|
14.1 Loci and alleles |
177 |
|
|
14.2 Simple case genotypic frequencies |
178 |
|
|
14.3 Hardy-Weinberg equilibrium |
180 |
|
|
14.4 Simple case allelic frequencies |
182 |
|
|
14.5 Accounting for sub-populations |
184 |
|
|
15 DNA II |
187 |
|
|
15.1 Paternity – mother and father unrelated |
187 |
|
|
15.2 Database searches and value of evidence |
190 |
|
|
15.3 Discussion |
192 |
|
|
16 Sampling and sample size estimation |
195 |
|
|
16.1 Estimation of a mean |
195 |
|
|
16.2 Sample sizes for t-tests |
197 |
|
|
Two sample t-test |
197 |
|
|
One sample t-test |
199 |
|
|
16.3 How many drugs to sample |
200 |
|
|
16.4 Concluding comments |
204 |
|
|
17 Epilogue |
207 |
|
|
17.1 Graphical models and Bayesian Networks |
208 |
|
|
Graphical models |
208 |
|
|
Bayesian networks |
210 |
|
|
17.2 Kernel density estimation |
211 |
|
|
17.3 Multivariate continuous matching |
212 |
|
|
Appendices |
215 |
|
|
A Worked solutions to questions |
215 |
|
|
B Percentage points of the standard normal distribution |
241 |
|
|
C Percentage points of t-distributions |
243 |
|
|
D Percentage points of c(2)-distributions |
245 |
|
|
E Percentage points of beta-beta distributions |
247 |
|
|
F Percentage points of F-distributions |
249 |
|
|
G Calculating partial correlations using Excel software |
251 |
|
|
H Further algebra using the “third law” |
255 |
|
|
References |
259 |
|
|
Index |
265 |
|