Displaying Time Series, Spatial and Space-Time Data with R

Krzysztof Podgórski

Displaying Time Series, Spatial and Space-Time Data with R

Krzysztof Podgórski

2015, International Statistical Review

visibility

…

description

16 pages

link

1 file

Readership: Quantitatively oriented evolutionary biologists, statistics graduate and advanced undergraduate students, as well as researchers in the field.

International Statistical Review (2015), 83, 1, 160–174 doi:10.1111/insr.12095 Book Reviews Editor: Ananda Sen Bayesian Phylogenetics: Methods, Algorithms and Applications Edited by Ming-Hui Chen, Lynn Kuo, and Paul O. Lewis Chapman & Hall/CRC, 2014, xxx C 365 pages, £63.99/$99.95, hardcover ISBN: 978-1-4665-0079-2 Readership: Quantitatively oriented evolutionary biologists, statistics graduate and advanced undergraduate students, as well as researchers in the field. Phylogenetics is involved with the estimation of evolutionary relationships among biological organisms, typically using molecular sequence data from living organisms, but also sometimes morphological characteristics from both living organisms and fossils. In many situations, the primary biological questions of interest are encapsulated in the estimation of a tree (part of a great Tree of Life) as a graphical structure that represents the past evolutionary history of speciation and descent of a group of organisms. Sometimes the biological questions are not directly related to the relationships among species, but an estimated phylogeny is critical to account for dependence due to shared history from common ancestors in data measured across different species. The history of phylogenetic methods traces back to the middle of the previous century. Statistical methods based on likelihood models for molecular sequence evolution originated in the early 1980s, and Bayesian approaches began in the mid-1990s, exploding in popularity throughout the 2000s following the release of powerful software packages specifically designed to carry out Bayesian phylogenetic analyses. There are many previous books on the topic of phylogenetics; some of these are statistical in point of view, and some of these devote a chapter or so to the Bayesian approach. However, Bayesian Phylogenetics has the distinction of being the first published text devoted solely to the Bayesian approach to phylogenetics. The editors describe Bayesian Phylogenetics as providing a ‘snapshot of current trends in Bayesian phylogenetic research’ with a ‘heavy emphasis on model selection’ and other important themes including new approaches to improve MCMC mixing, divergence time estimation, the movement towards more biologically realistic models and the interface with population genetics. This is a fairly accurate portrayal of the book, which highlights both its strengths and weaknesses. The included chapters are generally well-written and informative about their topics, all of which are current, but Bayesian Phylogenetics has gaping holes of omission with regard to an informative introduction to the field and a broader survey of methods and topics that are outside the direct interests of the editors. The first chapter provides an introduction to the contents of the book but fails to provide an overview of phylogenetics or the Bayesian approach to phylogenetics. A statistician who seeks an introduction to the field without previous exposure to phylogenetics will need to look elsewhere before finding this book to be useful. The closest resemblance to an introduction is curiously located in Chapter 7. This chapter contains a nice history of the earliest development of Bayesian phylogenetic methods and brief descriptions of tree representations and models of sequence evolution. An expansion of this part of the chapter as a separate chapter that introduced both the foundations and the history of the development of methods, as © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. Published by John Wiley & Sons Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA. BOOK REVIEWS 161 well as locating this introduction at the beginning of the book (right after the short overview) would have improved the usefulness of the book immensely. Unfortunately, the remainder of Chapter 7 elects to compare a relatively new sequential stochastic approximation Monte Carlo approach to early conventional software implementations that have either been in stasis for more than a decade or have been replaced by a newer version much more recently than the 2001 version used. Chapter 2 contains a nice summary of recent literature results that demonstrate unrecognised effects on branch length and divergence time estimation from what are supposedly only vaguely informative prior distributions. Chapters 3–6 extend for 100 pages and provide a wonderful description of many modern computational methods for estimating marginal likelihoods in a phylogenetic context. Readers interested in marginal likelihood calculations, even in other contexts, will find a wealth of helpful information. Readers interested in the broad array of current research in Bayesian phylogenetics will be disappointed that more than a quarter of the total content of the book is so narrowly focused. Chapter 8 describes a sequential Monte Carlo approach that can also be of interest beyond phylogenetics. Chapters 9–13 each outline an application of Bayesian inference for specific phylogenetic questions. Bayesian Phylogenetics makes no claim to being comprehensive, but the list of topics that are excluded is both lengthy and contains areas under intensive active research by multiple groups. The omission of these topics lessens the impact that the book have might otherwise had. Missing topics include: co-speciation and coevolution, ancestral state estimation, detection of positive selection, joint estimation of phylogeny and alignment, gene-tree/species-tree estimation, species networks, estimation of the rate of speciation, trait evolution and the effects of protein structure on models of molecular evolution. Despite its weaknesses, Bayesian Phylogenetics will be a useful resource for many researchers in the field and for statisticians interested in joining the game. For this latter group, Felsenstein’s Inferring Phylogenies (2004) is a better place to get started for an overview of statistical phylogenetics, albeit from a markedly non-Bayesian point of view, before trying to digest the material here. Furthermore, the primary scientific literature will continue to be the only available resource for many important and interesting topics in Bayesian phylogenetics. Bret Larget: brlarget@wisc.edu Departments of Statistics and of Botany 1300 University Avenue, University of Wisconsin, Madison, WI 53706, USA References Felsenstein, J. (2004). Inferring Phylogenies. Sunderland, MA, USA: Sinauer Associates, Inc. Biostatistics Decoded A. Gouveia Oliveira Wiley, 2013, x + 346 pages, £40.00/C50.10, paperback ISBN: 978-1-119-95337-1 Readership: Biostatisticians, clinical researchers, health professionals, biostatistics and statistics graduate students. International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. 162 BOOK REVIEWS As stated in the preface, the purpose of Biostatistics Decoded is to present statistical theory and biostatistical methods and applications through a different approach. In the next discussions are the four claims along with a review regarding how well those claims are met. Claim 1: Integrates topics usually covered in separate books (sampling, study design, statistical methods) This integration begins with the introduction in Chapter 1 as it gets directly at key terms of population, study design, sampling and statistical inference with the ‘BIG IDEA’ approach. There is good emphasis up front on the need for a clear definition of the population and the need for representativeness of sample. This book is organised in short chapters, providing the reader with smaller digestible content. After the introduction, basic concepts and statistical inference chapters are reviewed as needed; the reader can be more selective in exploring the sampling methods, study design and statistical methods according to the area of application of interest. Claim 2: Strictly non-mathematical approach A cursory flip through pages does show a few formulas. If you wish to lessen the math, you might avoid some of the shaded grey boxes that appear throughout, which do go into more detailed discussion that is not often needed to grasp the bigger picture. One example is Section 2.8, which provides a detailed justification for the use of the n1 divisor in the sample standard deviation formula. This book also includes statistical tables and explains how to use them to compute probabilities and p-values. If the focus is not on math, then use a statistical computer package to give the test statistic and p-value, and instead provide more focus on understanding what methods to apply, what these results are measuring and how to interpret them in context. Chapters 2 and 3 do cover many key introductory statistical topics and do avoid nearly all mathematical expressions, but both could be enhanced with some nice visual pictures to better convey the descriptions, and both could also use more examples with a clinical context (versus generic variable A and B). In general, the book should promote graphing data first before ‘compressing’ data into just a few numbers. Claim 3: Focus on topics for clinical researchers (simple and advanced) Indeed the later chapters do provide more context examples and overviews to better understand the various studies, designs, tests and issues that arise in the medical field. The focus is on breadth rather than theoretical depth, covering from simple inference on means and proportions to logistic regression, factor analysis and meta-analysis: from sample size estimation to measurement issues, adaptive clinical trials and secondary analyses. Some inappropriate usage of data exists that can be misleading, especially to the clinical researcher. For example, in Section 2.6, the variable patient chart number was treated as a numerical (quantitative variable), a histogram was made and summary measures (median, etc) were computed, which are not appropriate. Using a medical variable (e.g. pulse rates) would be better to show the main idea in this section. The most disconcerting inaccuracy lies in the illustration of interpretation of confidence interval. On p. 44, the following statement appears: If sample mean is 210, then P[210 – 2(std errors) <mu <210 C2(std errors)] D 0.95. This is clearly incorrect as there is nothing random in this probability statement (the probability is 0 or 1). Throughout, the book continues to use the probability idea even when using the estimated standard error in making a confidence statement. International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. BOOK REVIEWS 163 Claim 4: Theory is built on basic concepts so the reader can understand the conditions required and limitations This claim is achieved to some extent, but the accuracy of the presentation has sometimes been hampered due to oversimplification. A case in point appears on p. 42, where a couple of bullets state the following: Sample means have a normal distribution, regardless of the distribution of the attribute, but on the condition that THEY are large (THEY refers to the sample size, not that sample mean, which is unclear). Small samples have a normal distribution only if the attribute has a normal distribution (the authors meant to state this for sample means). This book starts by presenting basic concepts, such as the properties of means and variances, the properties of the normal distribution and the central limit theorem as building blocks, but much of the building block pieces are presented generically, without context, few visuals, and some statistical inaccuracies. However, because of its breadth, this book can be a great reference and contains very important pieces that are not always well emphasised or addressed in other texts. Brenda Gunderson: bkg@umich.edu Department of Statistics, University of Michigan 1085 South University, Ann Arbor, MI 48109-1107, USA Nonparametric Statistics: A Step-by-Step Approach Gregory W. Corder and Dale I. Foreman Wiley, 2014, xiv + 267 pages, £63.50/C76.20, hardcover ISBN: 978-1-118-84031-3 Readership: Students of statistics and researchers in other disciplines, especially in social sciences. The nine chapters in the book cover the basic introductory topics in non-parametric statistics. The focus is on comparing two or more populations measured on nominal/ordinal as well as interval/ratio scales. The final chapter discusses a test for randomness based on the runs statistic. The authors start off by defining their parameters as the assumptions of traditional introductory statistical methods. This unusual use of language tripped me up from the very beginning and I never really regained my footing. Many other loose uses of language remain from Corder & Foreman (2010), for instance the distinction between data and variables, samples and observations and critical values and p-values. Columns of squared, cubed and fourth power deviations for calculating skewness and kurtosis in Chapter 2 are followed by hand calculations for many of the non-parametric methods. While I do think there is some value in at least discussing the hand calculations to provide insight into the working of test statistics, it is unlikely that this is how most students will be tackling their own datasets. An eight-step procedure is used for each test, typically comparing the test statistic to a tabulated critical value. The sections on uses of each test in the literature are helpful, with references from the 1990s and 2000s in general. International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. 164 BOOK REVIEWS A handful of drill-style questions accompany each chapter. Solutions follow immediately. The authors present plenty of SPSS 21 output, but they are silent on the issue of how to get confidence intervals even though they are recommended by the APA. The cover says that this is the second edition (2014). This journal carried my review (Richardson, 2010) of Corder & Foreman (2010). The book has gained 20 pages, due largely to the addition of two tests. Appendices cover SPSS basics and 10 tables of critical values. The preface refers to web-based tools including a decision tree in the form of a Prezi that links to YouTube videos of SPSS 21 use. Unless you are desperate for the extra tests or this online material, you do not need to buy the second edition if you already own the first. Alice M. Richardson: Alice.Richardson@canberra.edu.au Faculty of Education, Science, Technology & Mathematics University of Canberra, ACT 2601, Australia References Corder, G.W. & Foreman, D.I. (2010). Nonparametric Statistics for Non-statisticians: A Step-by-Step Approach. Hoboken, NJ: Wiley. Richardson, A.M. (2010). Review of "Nonparametric Statistics for Non-Statisticians". Int. Stat. Rev., 78, 451–452. Circular Statistics in R Arthur Pewsey, Markus Neuhäuser, and Graeme D. Ruxton Oxford University Press, 2013, 208 pages, $49.95, paperback ISBN: 978-0-19-967113-7 Readership: Graduate students, researchers and professionals across biological, social, physical, medical and earth sciences. The book is an easy to access guide suitable for all readers with varying experiences with circular statistics and R. The book has eight chapters. It covers a wide range of circular statistics topics from graphics and summary statistics to circular distribution theory, and from inference to correlation and regression, with an emphasis on fitting a broad range of models and hypothesis testing. It introduces R’s circular package, web-based R code and resources. It uses R not only as a data analysis method but also as a learning tool. It is not a textbook; however, it features many examples and case studies with intensive R illustrations and graphics. It also includes an appendix, which briefly comments on and features additional reading from the only six other books available. The book has an accompanying webpage http://circstatinr. st-andrews.ac.uk/, which provides access to resources, including text files with R codes and data sets discussed. Indeed, this book is a much needed up-to-date guide to the theory and practice of circular statistics, for graduate students, researchers and professionals working in a wide range of scientific disciplines, and for applied statisticians. Shuangzhe Liu: shuangzhe.liu@canberra.edu.au Faculty of Education, Science, Technology & Mathematics University of Canberra, ACT 2601, Australia International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. BOOK REVIEWS 165 Displaying Time Series, Spatial and Space-Time Data with R Oscar Perpinan Lamigueiro CRC Press, 2014, vii C 200 pages, £49.99, hardback ISBN: 978-1-4665-6520-3 Readership: Statistics undergraduate and graduate students, researchers in spatial statistics, environmental and geo-sciences, and computer science students. A well-known deficiency of R–a popular free software environment for statistical computing and graphics–is its limited graphical capabilities, especially when compared with MATLAB, its main commercial competitor. A need for high-end graphical visualisation is particularly needed when working with space-time data. Thus, a researcher in this field will particularly benefit from this compact account of some elegant visualisation techniques implemented in R for time, space and timespace data. The focus of the book is not on methodological data analysis but rather on purely graphical descriptive presentation of the data. While no attempt has been made to present a complete set of graphical methods, selected examples are very informative in presenting visualisation capacities of different R packages that have been developed and are currently available from CRAN. The author is knowledgeable on the subject and has ability to present essentials without obstructing them with technical details. Descriptions of the methods are somewhat sketchy, which sometimes makes it difficult to fully appreciate the associated features. Colorful illustrations that nicely complement the descriptive aspects of the book help in the process of understanding. The book is organised into three chapters: Time Series, Spatial Data and Space-Time Data. Each chooses several visualisation techniques that are applied to present examples of relevant real life data. The use of real data makes the presentation more convincing. The book would, however, benefit from having a set of computer labs that would allow a reader to perform her or his own hands-on exploration of the presented computational tools. The text contains a substantial amount of references to the material that should help in enhancing the knowledge on the selected applications and methods. The book website and its repository freely provide the complete codes needed. All datasets are available without restriction for public use either from the repository or from other publicly available websites. To summarise, the book is a valuable source of graphical visualisation analyses in R that would be very much appreciated by anyone considering non-commercial alternatives to MATLAB for her or his work on space and/or time data. Krzysztof Podgorski: Krzysztof.Podgorski@stat.lu.se Department of Statistics Lund University, Box 743, 220 07 Lund, Sweden A Chronicle of Permutation Statistical Methods: 1920–2000, and Beyond Kenneth J. Berry, Janis E. Johnston, and Paul W. Mielke Jr. Springer, 2014, xix C 517 pages, C116.59, hardcover ISBN: 978-3-319-02743-2 Readership: Anyone with an interest in the history of statistics and a good knowledge of statistical methods. International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. 166 BOOK REVIEWS The authors of this book state that the purpose is ‘to chronicle the birth and development of permutation statistical methods over the approximately 80-year period from 1920 to 2000’ (p. viii). To this end, the authors have consulted an impressive number of articles, books and other sources, with the reference list running over 56 pages and containing 1 498 entries. After a brief introductory chapter that explains the basics of permutation statistical methods and gives an overview of the contents of the remaining chapters, the main part of the book consists of four chapters devoted to detailing the development of permutation statistical methods during each of the approximately 20-year periods 1920–1939, 1940–1959, 1960–1979 and 1980–2000. The final chapter gives a brief overview of the development of permutation statistical methods beyond 2000. The text details the important discoveries in the area of permutation-based statistical methods, the persons who made the discoveries, as well as the historical and social context in which the discoveries were made. This is accompanied by fact boxes, mainly containing short biographies of individuals mentioned in the text. The permutation statistical methods are described in such detail, including necessary formulas, that it often should be possible for the reader to apply the methods directly on his or her own datasets. Many of the early important discoveries in the area of permutation statistical methods were made by people trained in and primarily working in other disciplines than statistics. This means that many of the presented methods also had practical motivations and applications based on real world datasets that the researcher had to handle. Especially, research in the agricultural area seems to have given rise to real world problems that were solved by newly discovered permutation statistical methods. Some of these real world data sets are also presented in the text and the calculations given in detail. Because the practical implementation of permutation statistical methods is very much dependent on access to cheap high-speed computing, the development of computing is an integral part of the text. Although this is often interesting and gives valuable insight into the practicalities of the application of permutation statistical methods to real world datasets, sometimes it goes too far outside the scope of the book’s subject, like when it provides fact boxes about Google and with biographies of the Google founders Larry Page and Sergey Brin. One wonders what these have achieved in the area of permutation statistical methods. Although this is a very impressive work, there are some minor flaws, such as repeating blocks of texts, indicating that the text has not been properly edit-checked. For example, on p. 12, at the end of the text in one paragraph, it says: ‘Table 1.2 contains the 1831 per capita relief expenditures, in shillings, for 36 parishes in two counties: Oxford and Hertford.’ Then, the next paragraph on the same page starts: ‘The relief expenditure data from Oxford and Hertford counties are listed in Table 1.2.’ Likewise, on p. 32, it says: ‘Geary investigated the ways that cancer mortality rates varied with the consumption of potatoes in Ireland [. . . ]’. Then, just a few rows down on the same page, it says: ‘[. . . ] Geary considered potato consumption and the incidence of cancer deaths in Ireland.’ Despite these shortcomings, it should be emphasised that this book is a very impressive work with a unique, comprehensive in-depth coverage of the birth and development of permutationbased statistical methods that should prove to be invaluable for anyone interested in this topic. It is, however, hard to agree with the back cover’s assertion that its ‘non-mathematical approach makes the text accessible to readers of all levels’. On the contrary, a good knowledge of statistical methods is necessary to fully appreciate the text. But for anyone with this knowledge and an interest in permutation statistical methods and the history of statistics, this is a must-have book. It should also be useful for practitioners who are using permutation-based statistical methods and are interested in understanding the foundations and the motivations behind the usage of a specific method. In short, it is a highly recommended book for the audience indicated in the readership category. International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. BOOK REVIEWS 167 Andreas Rosenblad: andreas.rosenblad@ltv.se Center for Clinical Research Västerås, Uppsala University Västmanland County Hospital Västerås, S-721 89 Västerås, Sweden Statistical Methods for Survival Data Analysis Elisa T. Lee and John Wenyu Wang Wiley, 2013, 513 pages, £86.95/C104.40, hardcover ISBN: 978-1-118-09502-7 Readership: Biostatistics and Statistics graduate students, researchers in the area of survival analysis. This fourth edition of the original 1980 text by Elisa Lee enhances a book that has been a solid introductory survival analysis text for over three decades. Interim editions in 1992 and 2003, when co-author John Wenyu Wang was added, have each made substantial improvements to previous editions. Much of the book remains accessible to readers with college algebra but not calculus. As in previous editions, examples with real data are abundant, and concepts are clearly explained. The layout of the text and equations, and frequent graphs and tables, all contribute to a pleasing and welcoming look. The fourth edition adds several new topics and notably has added examples using the R software package in addition to the existing SAS and SPSS examples. Chapter 4 on non-parametric methods of estimating survival functions now includes the Nelson–Aalen estimator. Chapters 6 and 7 on parametric survival distributions and estimation of their parameters, respectively, have added the parameterization for the generalised gamma distribution used in SAS software. Chapter 7 has also added the Gompertz distribution. Chapter 11 on the Cox model has added substantial new material on model assessment, including methods for checking the proportional hazards assumption and the functional form of continuous covariates. The methods presented include plots based on Schoenfeld residuals in R and plots based on martingale residuals using the ASSESS option of the PHREG procedure in SAS. Chapter 12 on identifying prognostic factors in non-proportional hazards models has added several pages on methods for correlated data, specifically frailty models, both based on the Cox model and based on parametric accelerated failure time models. New exercises have been added at the end of each chapter. Some material has been dropped, including the bibliographic remarks at the end of each chapter. Although much of the material presented is standard, I would take issue with a few items in the book: First, the presentation of the Kaplan–Meier (KM) estimator using interpolation between failure point estimates of S(t) rather than the maximum-likelihood-based step function is inappropriate. In Section 4.1, an example dataset of time to death in months for 10 lung cancer patients (with no censoring) is presented with both the step function version and the ‘connect the dots’ version of the KM plot as two viable alternatives The text goes on to calculate the sample median, which equals eight by taking the mean of the fifth and sixth ordered observations (both equal to eight), and which is the same as the median observed from the step function KM plot. However, the interpolated version of the KM yields a sample median of 7.3, which is clearly wrong; an unfortunate misplacement of a point on the KM graph (Figure 4.1b) further confuses the issue. This improper suggestion again appears in Figure 4.3(b). Second, the short discussion on missing data in Section 10.1 gives poor advice, including the suggestion to replace missing values of continuous variables with the mean and to replace missing categorical variables with a new indicator variable for a category of missing data. Multiple imputation deserves mention, and the other methods, albeit useful expedients in cases International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. 168 BOOK REVIEWS of a few missing values deserve a healthy dose of caution with non-trivial amounts of missing data. Third, Chapter 13 gives useful methods to treat survival data as a dichotomous (e.g. alive and dead) or polychotomous outcome (e.g. complete, partial or no remission) but barely mentions that these methods can only be used when all subjects have the same potential followup time (e.g. survival to 1 year [yes/no] among those with potential follow-up through 1 year or 1-month evaluation of cancer remission). If this requirement is not met, such analyses are not appropriate. Two other survival analysis texts at a similar level deserve mention–one by Hosmer et al. (2008), & another by Kleinbaum & Klein (2011). Although all three texts cover much of the same material, there are differences in presentation and coverage of side issues that may distinguish them for different purposes. More advanced texts by Klein & Moeschberger (2005) and Collett Collett (2014) may also be of interest. In summary, this book continues to improve, and the fourth edition is a welcome addition to the available books on survival analysis. The expanded sections on modelling and the addition of R software examples are particularly helpful. Brenda W. Gillespie: bgillesp@umich.edu Center for Statistical Consultation and Research University of Michigan, Ann Arbor, MI, USA References Hosmer, D.W. Jr., Lemeshow, S. & May, S. (2008). Applied Survival Analysis: Regression Modeling of Time to Event Data. Hoboken, NJ: Wiley. Kleinbaum, D.G. & Klein, M. (2011). Survival Analysis: A Self-Learning Text, 3rd ed. New York, NY: Springer. Klein, J.P. & Moeschberger, M.L. (2005). Survival Analysis: Techniques for Censored and Truncated Data. New York, NY: Springer. Collett, D. (2014). Modelling Survival Data in Medical Research, 3rd ed. Boca Raton, FL: Chapman & Hall/CRC. Sample Size Determination and Power Thomas P. Ryan Wiley, 2013, 374 pages, £73.50/C88.20, hardcover ISBN: 978-1-118-43760-5 Readership: Readers interested in methodology, upper-level undergraduate and graduate students. For nearly all scientific studies, it is important to determine a large enough sample size that will give adequate statistical power to detect a desired effect size. Sample size determination and power consideration therefore can arise in any scientific endeavours across a variety of fields. It is also an active research area moving along with expanding scientific fields and new design considerations. Because sample size consideration is a broad subject, it is impossible to have an excellent reference book that can cover all areas adequately and provide appropriate examples. The aim of this book is to provide a general purpose book on the subject of sample size determination. This text is a useful modern addition to the literature, and as intended, covers a wide range of topics and applications that require sample size considerations. This very comprehensive book is well structured. It starts with an introductory chapter briefly reviewing hypothesis testing and confidence intervals, the core statistical concepts needed for International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. BOOK REVIEWS 169 understanding sample size determination and power, and presents a comprehensive approach to sample size determination. It then covers a wide range of specific topics from comparison of means and proportions to more complicated designs and settings. Broad topic and application areas discussed include clinical trials, survival analysis, reliability, epidemiology, microarrays and quality improvement. Each chapter ends with a section covering currently available commercial and non-commercial software for sample size determination considered in that chapter. This book gives a comprehensive overview with an extensive list of topics. For specific topics, however, the discussion is inevitably a quick overview, and attention is given to providing extensive literature coverage for many specific topics. For example, a chapter titled Clinical Trials provides a good overview and discussion of recent developments and issues surrounding clinical trials, including the particular importance of adequate sample size when sampling human subjects. This chapter includes many sections including phase II trials, cluster randomised trials, longitudinal clinical trials, non-inferiority trials, repeated measurements and more; however, each is given less than half a page. The discussions, therefore, are brief, and a more unified presentation rather than a brief mention of published articles is desired at times. For example, if anyone considering a cluster randomised trial is looking to carry out a sample size calculation, this book gives references, but no clear guidance or an example showing how to do it. In summary, this book succeeds in the author’s aim of providing a general purpose text for readers interested in methodology without much technical fuss. In general, the book focuses on providing wealth of discussions and reviews for sample size and power and is an excellent source for up-to-date software and references for a wide range of topics related to sample size determination. It is written with enough mathematical sophistication for inquisitive readers and can be a supplementary textbook for upper-level undergraduate and graduate-level special topics courses. Because of the wide coverage, for many topics, this book chooses to provide extensive references with brief discussions about important issues in lieu of the details of the specifics of the sample size calculation. This book, however, is an appealing modern reference book for statisticians as well as scientific investigators interested in understanding general issues and theoretical backgrounds in sample size consideration. Hyungjin Myra Kim: myrakim@umich.edu Center for Statistical Consultation and Research and Department of Biostatistics University of Michigan, 1085 South University, Ann Arbor, MI 48109-1107, USA Analysis of Multivariate and High-Dimensional Data Inge Koch Cambridge University Press, 2013, 526 pages, CAD$92.95, hardcover ISBN: 978-0-5218-8793-9 Readership: Researches in multivariate and high dimensional statistics, graduate students with a significant theoretical statistical background who are interested in the area. At the outset, I must mention that this book is not for the uninitiated and requires a serious reading with lots of concentration. That said, I must highly commend the author for writing an excellent comprehensive review of multivariate and high dimensional statistics albeit on the International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. 170 BOOK REVIEWS highly selected chosen topics. The lucid treatment and thoughtful presentation are two additional attractive features that make this book very attractive. I have been an admirer of books written by G. A. F. Seber, and this book, again coming from that part of world, in terms of rigour as well as its presentation, reminds me of those books. The book concentrates on various topics nicely classified in three parts as Classical Methods, Factors and Groups, and Non-Gaussian Analysis. The first part covers the standard techniques of principal component, canonical and discriminant analyses. The second part is about Clustering, Factoring and Multidimensional Scaling, which arguably are also the classical methods. It is, however, the third part that is relatively new and neoclassical dealing with Independent Component Analysis, Projection Pursuits, Kernel Component Analyses and Feature Selection. The first part, the Classical Methods, is very well written and at a much higher level, in terms of mathematical rigour as well as understanding, than the other recent books on similar topics. Discussion of techniques with all mathematical details, interpretations and asymptotics makes the discussion very comprehensive and nearly complete. Especially attractive is the discussion of principal component analysis for various cases when dimensionality increases. This makes a strong justification for the current revival of interest in principal component analysis as an exploratory tool for high dimensional data. The canonical correlation analysis chapter includes much more material than what is normally discussed in any multivariate analysis course. Clarity of presentation for a topic such as this, and inherently crowded by long tedious mathematical expressions, is a difficult task, but the author seems to have tamed the monster successfully. Addition of topics like partial least squares is very appropriate, taking the mystery out of the algorithms as well as the approach and bringing it into the mainstream statistics from a specialised discipline like chemometrics. The chapter on discriminant analysis is well written. It is, however, a vast topic and receives less attention in this book than it rightly must deserve. Logistic regression has become an standard technique for classification. Yet, it receives only a cursory treatment. The support vector machines are relatively newer, and their inclusion with more details along with a discussion of their comparison with other techniques could really add considerable value for learners. Discrimination with repeated measures data, where information through covariance structures could be incorporated, is another important missing section. As for the second part, dealing with factorings and groupings, the chapters again provide a wealth of useful information in a succinct manner. However, some chapters, especially the one on cluster analysis could have used some more updating by including the newer techniques. It is the third part of the book dealing with non-Gaussian data where the book really stands out at its best. All three topics (Independent Component Analysis, Projection Pursuit and Kernel Methods) are relatively very new, and the author has taken pain to address each of them very carefully. She has nicely extracted the relevant details from the recent literature on these subjects and sequenced them in a such way that a serious reader will be able to appreciate the importance of these methods and will understand how to apply them. I must say that by reading this part of the book, I felt not only a sense of satisfaction but also a sense of accomplishment. I believe there are very few people in the statistics domain, who could claim to be an authority on these topics having full understanding and appreciation of these methods. In that sense, this book does a great service by providing a concise account of these more recent techniques. Without any hesitation and with admiration, I would give the author a 10 out of 10 for undertaking and accomplishing this task. A book of reasonable size, aiming to cover topics in reasonable details, will have limitations on what to include, and I cannot fault the author for her selection of topics. However, in my personal preference, I would have preferred to see an adequate coverage of Gaussian-based International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. BOOK REVIEWS 171 analyses. The author has completely stayed away from multivariate regression, multivariate analysis of variance and covariance structures. Multivariate distribution theory is minimally included and important topics such as multivariate t or Wishart distributions are completely avoided. The multivariate Gaussian theory is elegant in its own way but that aside, given that enough (and important enough) asymptotics are covered in this book, it would have been nice to provide some basic results. The absence of Gaussian theory somewhat takes the punch out of the third part of the book titled ‘Non-Gaussian Analysis’ as a reader is missing an appropriate baseline for comparison. An average reader is likely to miss out on the importance of this part solely because of his or her lack of appreciation due to missing discussion of Gaussian distribution. Correspondence analysis is another topic, which I think deserves a place in a book similar to this one. In terms of some minor quibbles, the use of ‘we’ as well as ‘I’ (e.g. see Section 1.3.2) even within the same paragraph is somewhat annoying; notational convention is somewhat new (but the author has a reason for it), parallel coordinates as a graphical technique are inadequately covered, and some of the figures, in my opinion, are too tiny for what they are trying to depict, especially when they correspond to large datasets with a large number of points. Small pictures can unintentionally hide some important features, which the author may see but a reader may not. As I already have said, I will not hold these points as a criticism of the book–these were the choices made by the author for her book. What I see is an excellent collection of information presented in a very succinct and organised way. The feat she has accomplished successfully for this difficult area of statistics is something very few could accomplish. The wealth of information is enormous and a motivated student can learn a great deal from this book. This book is not for everyone but I believe that it is going to work as a model for many more multivariate books to appear in the future. I highly recommend this excellent book to researchers working in the field of high dimensional data and to motivated graduate students. Ravindra Khattree: khattree@oakland.edu Department of Mathematics and Statistics, Oakland University Rochester Hills, Michigan, USA Statistical Analysis of Network Data with R Eric D. Kolaczyk and Gabor Csardi Springer, 2014, xiii C 207 pages, C52.99, softcover ISBN: 978-1-4939-0982-7 Readership: Anyone interested in analyzing network data in R. Network data is quite trendy at the moment as it is available in abundance, especially on the internet. This type of data is in many ways quite different from traditional data and therefore requires special methods for handling and analyzing it. This book is a quite practical guide to get started with analyzing networks using the statistical software R. For the reader, it is definitely beneficial to be already familiar with this software. The book starts in Chapter 2 with a definition of network graphs, how they are best represented and how to operate on them. As in all chapters in the book, the terminology is always well explained and everything is easy to follow. Chapter 3 concerns the visualisation of graphs, which is not a trivial task as networks are usually large and many different concepts and options are available. Chapter 4 International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. 172 BOOK REVIEWS introduces descriptive measures to describe the network and concludes with graph partitioning. The next six chapters concern modelling network data where the foundation is laid in Chapter 5 with a short introduction to underlying mathematical models. Chapter 6 introduces the three common statistical models, the exponential random graph model, network block models and latent network models. More specialised topics like inference on network topology or processes on network graphs as well as analysis of the flow in a network are discussed in Chapters 7–9. The final chapter considers very briefly dynamic networks. Network data occurs in different fields with many different questions of interest and the book reflects that by having many real data sets coming from quite a range of disciplines, such as social science or biology. I would have preferred if the data description was a bit more detailed. Nonetheless, the book demonstrates well the methods and representation that can be used for the different data sets and illustrates their analysis well. Naturally, there are many R packages for the analysis available. The book, which is accompanied by its own package, relies here especially on the igraph package and other specialised packages for specific tasks. A nice feature of the accompanying package is that it installs all other packages needed to follow the analysis in the book (and there are many) and contains all the codes from the book. There are also references to alternative packages available in R. While Chapters 2–4 are quite self-containing and do not require additional reading, Chapters 5–10 are all quite short introductions to the specific methods, and additional reading is recommended there. Relevant references are conveniently provided at the end of each chapter. While in my opinion this book is not useful as a text book, it is a very nice hands-on introduction to the analysis of network data that gives a good overview suitable for applied scientists and statisticians. If certain models and methods need deeper understanding, the book provides the necessary references for further reading. Klaus Nordhausen: klaus.nordhausen@utu.fi Department of Mathematics and Statistics University of Turku, 20014 Turku, Finland Statistical Methods for Ranking Data Mayer Alvo and Philip L. H. Yu Springer, 2014, xi C 273 pages, C90.09, hardcover ISBN: 978-1-4939-1470-8 Readership: Basic and non-parametric statistics advanced undergraduate and graduate students, and practitioners and researchers interested in foundations and theories of statistical methods for ranking data. This book treats the most important aspects of non-parametric ranking methods. Situations where it is desired to rank a set of individuals or objects in accordance with some criterion are studied. Additionally, the authors are also ranking data arising when transforming continuous or discrete data in the context of non-parametric analysis. The interest of Mayer Alvo in this research was sparked by a problem emanating from the School of Nursing at the University of Ottawa; it was desired to test if patients who were mobile differed from those who were not with respect to how they ranked certain sets of situations. International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. BOOK REVIEWS 173 Following an introductory chapter starting the book off, Chapter 2 is dedicated to exploratory analysis of data. Correlation is studied in Chapter 3, followed by tests for randomness, agreement and interaction in Chapters 4 and 5. General theory of ranking is discussed in Chapter 6. Hypothesis testing for ordered alternatives is discussed in Chapter 7, following a description of general probability models in Chapter 8; more advanced topics such as probit and factor models and decision trees are investigated in Chapters 9 and 10. The book concludes in Chapter 11 with a description of some extensions of distance-based models. The theory deals with preferences between objects and judges of the objects, which can be highly subjective. This possibility would limit greatly the objectivity of the elements of the methods for ranking data. However, some studied elements in the book are perfectly objective. The associated theory based on probabilistic and asymptotic models serves as a rough approximation useful to carry out non-parametric inference. For these reasons, we suggest the study of this content with more complete objective methods based on objective real data. The investigation would be more interesting and applicable in practice, especially in the context of real data, when the subsequent ranking produces consequences of justice, for example, respect to the life, in health assistance professional recognition and economic retribution. Other subjective methods would be more productive in an academic sense but are less appropriate for objective practical purposes. Mariano Ruiz Espejo: mruiz033@alu.ucam.edu Departamento de Ciencias Humanas y Religiosas, UCAM Campus de Los Jerónimos 135, 30107 Guadalupe, Murcia, Spain Statistical Studies of Income, Poverty and Inequality in Europe: Computing and Graphics in R Using EU-SILC Nicholas T. Longford Chapman & Hall/CRC Press, 2015, xxii C 354 pages, £49.99, hardcover ISBN: 978-14665-6832-7 Readership: Researchers in specific methods of analysis of economic poverty and inequality. Poverty, according to Atkinson (1998), is about as widespread in our societies currently as it was a few decades ago when, admittedly, our standards for what amounts to prosperity were somewhat more modest. The author of this reviewed book wrote that ‘by definition, regardless of its details, the poor are poorly integrated in societies in which prosperity dominates, and substantial reduction of poverty would be generally hailed as a great social and economic achievement’. The poor population is studied with surveys. The design and analysis of such surveys are statistical and computational. The book focuses on statistical computing. The European Union Statistics on Income and Living Conditions (EU-SILC) is a programme of annual national surveys that collect data related to poverty and social exclusion by interviews with adult members of randomly selected households. In this book, the analyses of surveys conducted by EU-SILC are carried out using the statistical language R. With growing popularity of other software such as Stata or MATLAB, it would be useful to include implementations in alternative platforms. There are nine chapters altogether that go back and forth between introducing concepts associated with poverty analysis and statistical framework. One noteworthy section is Section 2.4 International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. 174 BOOK REVIEWS that is devoted to Horvitz–Thompson estimation and is methodologically solid. Other statistical techniques presented elsewhere in the book are often subjective or approximate in nature. Other books exist in the literature that provide a more objective guide to an appropriate statistical technique for survey data, but methodological development of statistical technique is not the focus of this book. On the other hand, the books open up a list of analytical agenda in the context of EU-SILC surveys. The presented methods are not exhaustive for the range of analysis suitable for EU-SILC but are illustrative in the use of software codes, figures, tables and graphics. Mariano Ruiz Espejo: mruiz033@alu.ucam.edu Departamento de Ciencias Humanas y Religiosas, UCAM Campus de Los Jerónimos 135, 30107 Guadalupe, Murcia, Spain References Atkinson, A.B. (1998). Poverty in Europe. Oxford: Blackwell. Statistical Theory and Inference David J. Olive Springer, 2014, xii C 434 pages, C62.99, hardcover ISBN: 978-3-319-04971-7 Readership: Statistics undergraduate, graduate and doctoral students and teachers. This book describes the most important aspects of subjective classical statistical theory and inference similar to the treatment in Rohatgi (1984). Topics that are revised cover probability and expectations, multivariate distributions and transformations, exponential families, estimation and hypothesis testing, large sample theory and Bayesian methods. The book also includes specific pointers for students. The book can be considered as a guide for teachers and students in the first or second courses in classical statistical methods, which many universities around the world teach as part of the core programme in statistics, both at a Masters and a PhD level. The interest in the book, however, would be limited by its subjectivity, as the book does not put an emphasis on the model diagnostics. This creates a gap between the mathematical model and its application to a real problem. A further shortcoming of the book is the omission of any reference to the issue of identifiability of the population units. The book has been written with careful details and can serve as a good reference on the topics it covers. It is highly recommended from a didactic point of view but is of limited use to statistical practitioners. Mariano Ruiz Espejo: mruiz033@alu.ucam.edu Departamento de Ciencias Humanas y Religiosas, UCAM Campus de Los Jerónimos 135, 30107 Guadalupe, Murcia, Spain References Rohatgi, V.K. (1984). Statistical Inference. New York, NY: Wiley. International Statistical Review (2015), 83, 1, 160–174 © 2015 The Authors. International Statistical Review © 2015 International Statistical Institute. Copyright of International Statistical Review is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Log In

Displaying Time Series, Spatial and Space-Time Data with R

Sign up for access to the world's latest research

Sign up for access to the world's latest research

Related papers

Related papers

Related topics