t
·t
)ISCOVERING STATISTICS
JSING R
SUB Hamburg
XXXXXXXXXXIllllllIlllIlll 11I11 IIllllllll Illl
B/117107
)ISCOVERING STATISTICS
JSING R
NDY FIELD I JEREMY MILES I ZOE FIELD
ISAGE
Los Angeles I London I New Delhi
SingapoI'e I Wa
hington DC
I

© Andy Field, Jeremy Miles and Zoe Field 2012
First published 2012
Reprinted 2012
Apart from any fair dealing for the purposes of research or
private study, or criticism or review, as permitted under the
Copyright, Designs and Patents Act, 1988, this publication
may be reproduced, stored or transmitted in any form, or by
any means, only with the prior permission in writing of the
publishers, or in the case of reprographic reproduction, in
accordance with the terms of licences issued by the Copyright
Licensing Agency. Enquiries concerning reproduction outside
those terms should be sent to the publishers.
SAGE Publications Ltd
1 Oliver's Yard
55 City Road
London ECl Y lSP
SAGE Publications Inc.
2455 Teller Road
Thousand Oaks, California 91320
SAGE Publications India Pvt Ltd
B 1/1 1 Mohan Cooperative Industrial Area
Mathura Road
New Delhi XXXXXXXXXX
SAGE Publications AsiaPacific Pte Ltd
3 Church Street
#1004 Samsung Hub
Singapore 049483
Li
ary of Congress Control Number: Available
British Li
ary Cataloguing in Publication data
A catalogue record for this book is available from the British Li
ary
ISBN XXXXXXXXXX
ISBN XXXXXXXXXXpbk)
Typeset by C&M Digitals (P) Ltd, Chennai, India
Printed and bound in Great Britain by the MPG Books Group
Printed on paper from sustainable resources
IJ
FSC
www.l!iemg
MIX
Paper from
esponsible sources
FSCS C018575
•
CONTENTS
Preface
How to use this book
Acknowledgements
Dedication
Symbols used in this book
Some maths revision
1
2
Why is my evil lecturer forcing me to Learn statistics?
1.1. What will this chapter tell me? CD
1.2. What the hell am I doing here? I don't belong here CD
1.3. Initial observation: finding something that needs explaining CD
1.4. Generating theories and testing them CD
1.5. Data collection 1: what to measure CD
XXXXXXXXXXVariables CD
1.5.2. Measurement e
or CD
1.5.3. Validity and reliability CD
1.6. Data collection 2: how to measure CD
XXXXXXXXXXCo
elational research methods CD
1.6.2. Experimental research methods CD
1.6.3. Randomization CD
1.7. Analysing data CD
1.7.1. Frequency distributions CD
1.7.2. The centre of a distribution CD
1.7.3. The dispersion in a distribution CD
1.7.4. Using a frequency distribution to go beyond the data CD
1.7.5. Fitting statistical models to the data CD
What have I discovered about statistics? CD
Key terms that I've discovered
Smart Alex's tasks
Further reading
Interesting real research
Everything you ever wanted to know about statistics
(well, sort of)
2.1. What will this chapter tell me? CD
2.2. Building statistical models CD
xxi
xxv
xxix
xxxi
xxxii
xxxiv
1
1
2
4
4
7
7
11
12
13
13
13
17
19
19
21
24
25
28
29
29
30
31
31
32
32
33
vi DISCOVERING STATISTICS USING R
2.3. Populations and samples CD 36
2.4. Simple statistical models
2.4.1. The mean: a very simple statistical model 2.4.2. Assessing the fit of the mean: sums of squares, variance
and standard deviations ill 37
2.4.3. Expressing the mean as a model ® 40
2.5. Going beyond the data 2.5.1. The standard e
or 2.5.2, Confidence intervals ® 43
2,6, Using statistical models to test research questions Q) 49
2.6.1, Test statistics ill 53
2.6.2. One and twotailed tests ill 55
2,6.3. Type I and Type II e
ors CD 56
2.6.4. Effect sizes ® 57
2.6,5. Statistical power ® 58
What have I discovered about statistics? Key terms that I've discovered 60
Smart Alex's tasks 60
Further reading 60
Interesting real research 61
3 The R environment 62
3.1. What will this chapter telf me? G) 62
3.2. Before you start ill 63
3.2.1. The Rchitecture G) 63
3.2.2. Pros and cons of R 3.2.3. Downloading and installing R 3.2.4. Versions of R G) 66
3.3. Getting started (j) 66
3.3.1. The main windows in R ill 67
3.3,2. Menus in R (j) 67
3.4. Using R CD 71
3.4.1. Commands, objects and functions Q) 71
3.4.2. Using scripts Q) 75
3.4.3. The R workspace 3.4.4. Setting a working directory ® 77
3.4.5. Installing packages 3.4.6. Getting help 3.5. Getting data into R ill 81
3.5.1. Creating variables 3.5.2, Creating dataframes ill 81
3.5.3. Calculating new variables from exisiting ones 3.5.4. Organizing your data CD 85
3.5.5. Missing values CD 92
3.6. Entering data with R Commander 3.6.1. Creating variables and entering data with R Commander 3.6.2. Creating coding variables with R Commander 3.7. Using other software to enter and edit data CD 95
3.7.1. Importing data CD 97
3.7.2. Importing SPSS data files directly CD 99
L
CONTENTS
3.7.3. Importing data with R Commander 3.7.4. Things that can go wrong 3.S. Saving data 3.9. Manipulating data @
3.9.1. Selecting parts of a dataframe ®
3.9.2. Selecting data with the subsetO function ®
3.9.3. Dataframes and matrices ®
3.9.4. Reshaping data @
What have I discovered about statistics? CD
R packages used in this chapter
R functions used in this chapter
Key terms that I've discovered
Smart Alex's tasks
Further reading
4 ExpLoring data with graphs
4.1. What will this chapter tell me? CD
4.2. The art of presenting data CD
4.2.1. Why do we need graphs 4.2.2. What makes a good graph? 4.2.3. Lies, damned lies, and ... erm '" graphs 4.3. Packages used in this chapter CD
4.4. Introducing ggplot2 4.4.1. The anatomy of a plot 4.3.2. Geometric objects (geoms) 4.4.3. Aesthetics 4.4.4. The anatomy of the ggplotO function 4.4.5. Stats and geoms @
4.4.6. Avoiding overplotting ®
4.4.7. Saving graphs 4.4.8. Putting it all together: a quick tutorial ®
4.5. Graphing relationships: the scatterplot CD
4.5.1. Simple scatterplot 4.5.2. Adding a funky line CD
4.5.3. Grouped scatterplot CD
4.6. Histograms: a good way to spot obvious problems 4.7. Boxplots (boxwhisker diagrams) G)
4.8. Density plots 4.9. Graphing means ®
4.9.1. Bar charts and e
or bars ®
4.9.2. Line graphs ®
4.10. Themes and options What have I discovered about statistics? G)
R packages used in this chapter
R functions used in this chapter
Key terms that I've discovered
Smart Alex's tasks
Further reading
Interesting real: research
==O=~"""""5, :"3'C'!',~~E~,'::l':=~.' .• ~?~.:.§~ ... ~.::~.§!I'~!I.!!,I ... !I!FIl .. I ... I.::I.~:I XXXXXXXXXX1III
vii
101
102
103
103
103
105
106
107
113
113
113
114
114
115
116
116
117
117
117
120
121
121
121
123
125
127
128
130
131
132
136
136
138
140
142
144
148
149
149
155
161
163
163
164
164
164
164
165
viii DISCOVERING STATISTICS USING R
5 Exploring assumptions 166
5.1. What will this chapter tell me? G) 166
5.2. What are assumptions? Q) 167
5.3. Assumptions of parametric data (i) 167
5.4. Packages used in this chapter G) 169
5.5. The assumption of normality Q) 169
5.5.1. Oh no, it's that pesky frequency distribution again:
checking normality visually G) 169
5.5.2. Quantifying normality with numbers (i) 173
5.5.3. Exploring groups of data (i) 177
5.6. Testing whether a distribution is normal CD 182
5.6.1. Doing the ShapiroWilk test in R CD 182
5.6.2. Reporting the ShapiroWilk test (i) 185
5.7. Testing for homogeneity of variance (i) 185
5.7.1. Levene's test CD 186
5.7.2. Reporting Levene's test G) 188
5.7.3. Hartley's F : the variance ratio Q) max 189
5.8. Co
ecting problems in the data ® 190
5.8.1. Dealing with outliers ® 190
5.8.2. Dealing with nonnormality and unequal variances ® 191
5.8.3. Transforming the data using R ® 194
5.8.4. When it all goes ho
ibly wrong @ 201
What have I discovered about statistics? (i) 203
R packages used in this chapter 204
R