Description
When Statistics at Square One was first published in 1976 the type of statistics seen in the medical literature was relatively simple: means and medians, t-tests and Chi-squared tests. Carrying out complicated analyses then required arcane skills in calculation and computers, and was restricted to a minority who had undergone considerable training in data analysis. Since then statistical methodology has advanced considerably and, more recently, statistical software has become available to enable research workers to carry out complex analyses with little effort. It is now commonplace to see advanced statistical methods used in medical research, but often the training received by the practitioners has been restricted to a cursory reading of a software manual. I have this nightmare of investigators actually learning statistics by reading a computer package manual. This means that much statistical methodology is used rather uncritically, and the data to check whether the methods are valid are often not provided when the investigators write up their results.
This book is intended to build on Statistics at Square One.1 It is hoped to be a “vade mecum” for investigators who have undergone a basic statistics course, to extend and explain what is found in the statistical package manuals and help in the presentation and reading of the literature. It is also intended for readers and users of the medical literature, but is intended to be rather more than a simple “bluffer’s guide”. Hopefully, it will encourage the user to seek professional help when necessary. Important sections in each chapter are tips on reporting about a particular technique and the book emphasises correct interpretation of results in the literature.
Since most researchers do not want to become statisticians, detailed explanations of the methodology will be avoided. I hope it will prove useful to students on postgraduate courses and for this reason there are a number of exercises.
The choice of topics reflects what I feel are commonly encountered in the medical literature, based on many years of statistical refereeing. The linking theme is regression models, and we cover multiple regression, logistic regression, Cox regression, ordinal regression and Poisson regression. The predominant philosophy is frequentist, since this reflects the literature and what is available in most packages.However, a section on the uses of Bayesian methods is given. Probably the most important contribution of statistics to medical research is in the design of studies. I make no apology for an absence of direct design issues here, partly because I think an investigator should consult a specialist to design a study and partly because there are a number of books available.2–5 Most of the concepts in statistical inference have been covered in Statistics at Square One. In order to keep this book short, reference will be made to the earlier book for basic concepts. All the analyses described here have been conducted in STATA8.6 However, most, if not all, can also be carried out using common statistical packages, such as SPSS, SAS, StatDirect or Splus.
While updating this book for the second edition, I have been motivated by two inclusion criteria: (i) techniques that are not included in elementary books but have widespread use, particularly as used in the British Medical Journal, the New England Journal of Medicine and other leading medical journals, and (ii) topics mentioned in the syllabus for the Part 1 Examinations of the Faculty of Public Health Medicine in the UK. I now have a section on what are known as robust standard errors, since they seem to me to be very useful, and are not widely appreciated at an elementary level. The most common use of random effects models would appear to be meta-analysis and so this is covered, including a description of forest and funnel plots. I have expanded the section on model building, to make it clearer how models are developed. Simpson’s paradox is discussed under logistic regression. Recent developments in Poisson regression have appeared useful to me and so are included in the final chapter. All practical statisticians have to deal with missing data, hence I have discussed these and I have also added a Glossary.
I am also aware that most readers will want to use the book to help them interpret the literature and therefore I have removed the multiple-choice questions and replaced them with questions based on interpreting genuine papers.
I am grateful to Stephen Walters, Steven Julious and Jenny Freeman for support and comments, and to readers who contacted me, for making useful suggestions and removing some of the errors and ambiguities, and to David Machin and Ben Armstrong for their detailed comments on the manuscript for the first edition. Any remaining errors are my own.
Michael J. Campbell
Sheffield, 2006