Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Developing open source scientific practice∗ K. Jarrod Millman Division of Biostatistics University of California, Berkeley Fernando Pérez Henry H. Wheeler Jr. Brain Imaging Center University of...

1 answer below »

Developing open source scientific practice∗
K. Ja
od Millman
Division of Biostatistics
University of California, Berkeley
Fernando Pérez
Henry H. Wheeler Jr. Brain Imaging Cente
University of California, Berkeley
August 31, 2017
Dedicated to the memory of John D. Hunter III, XXXXXXXXXX.
Contents
1 Introduction 2
2 Computational research 2
2.1 Computational research life cycle . . . . . . . . . . . . . . . . XXXXXXXXXX3
2.2 Open source ecosystem . . . . . . . . . . . . . . . . . . . . . . XXXXXXXXXX6
2.3 Communities of practice . . . . . . . . . . . . . . . . . . . . . XXXXXXXXXX7
3 Routine practice 8
3.1 Version control . . . . . . . . . . . . . . . . . . . . . . . . . . . XXXXXXXXXX8
3.2 Execution automation . . . . . . . . . . . . . . . . . . . . . . XXXXXXXXXX10
3.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XXXXXXXXXX10
3.4 Readability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XXXXXXXXXX11
3.5 Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . XXXXXXXXXX13
4 Collaboration 16
4.1 Distributed version control . . . . . . . . . . . . . . . . . . . XXXXXXXXXX16
4.2 Code review . . . . . . . . . . . . . . . . . . . . . . . . . . . . XXXXXXXXXX17
4.3 Infrastructure redux . . . . . . . . . . . . . . . . . . . . . . . XXXXXXXXXX18
5 Communication 19
5.1 Literate programming . . . . . . . . . . . . . . . . . . . . . . XXXXXXXXXX19
5.2 Literate computing . . . . . . . . . . . . . . . . . . . . . . . . XXXXXXXXXX20
5.3 IPython notebook . . . . . . . . . . . . . . . . . . . . . . . . . XXXXXXXXXX21
6 Conclusion 25
∗In Implementing Reproducible Research. Eds. Victoria Stodden, Friedrich Leisch, and Roger D. Peng. pages
149–183. Chapman and Hall/CRC Press, 2014.
1
1 Introduction
Computational tools are at the core of modern research. In addition to experiment and the-
ory, the notions of simulation and data-intensive discovery are often refe
ed to as “third
and fourth pillars” of science [12]. It is more accurate to simply accept that computing is
now inextricably woven into the DNA of science, as today, even theory and experiment are
computational. Experimental work requires computing (whether in data collection, prepro-
cessing, or analysis) and theoretical work requires symbolic manipulation and numerical
exploration to develop and refine models. Scanning the pages of any recent scientific jour-
nal, one is hard-pressed to find an article that does not depend on computing for its findings.
Yet, for all its importance, computing receives perfunctory attention in the training of
new scientists and in the conduct of everyday research. It is treated as an inconsequential
task that students and researchers learn “on the go” with little consideration for ensuring
computational results are trustworthy, comprehensible, and ultimately a secure foundation
for reproducible outcomes. Software and data are stored with poor organization, little doc-
umentation, and few tests. A haphazard patchwork of software tools is used with limited
attention paid to capturing the complex workflows that emerge. The evolution of code is
not tracked over time, making it difficult to understand what iteration of the code was used
to obtain any specific result. Finally, many of the software packages used by scientists in
esearch are proprietary and closed-source, preventing complete understanding and control
of the final scientific results.
We argue that these considerations must play a more central role in how scientists are
trained and conduct their research. Our approach grows out of our experience as part of
oth the research and the open source scientific Python communities. We begin (§ 2) by
outlining our vision for scientific software development in everyday research. In the re-
maining sections, we provide specific recommendations for computational work. First, we
describe the routine practices (§ 3) that should be part of the daily conduct of computational
work. We next discuss tools and practices developed by open source communities to enable
and streamline collaboration (§ 4). Finally, we present an approach to developing and com-
municating computational work that we call literate computing in contrast to the traditional
approach of literate programming (§ 5).
2 Computational research
Consider a researcher using Matlab for prototyping a new analysis method, developing
high-performance code in C, post-processing by twiddling controls in a graphical user in-
terface, importing data back into Matlab for generating plots, polishing the resulting plots
y hand in Adobe Illustrator, and finally pasting the plots into a publication manuscript
or PowerPoint presentation. What if months later they realize there is a problem with the
esults? Will they will be able to remember what buttons they clicked to reproduce the
workflow to generate updated plots, manuscript, and presentation? Can they validate that
their programs and overall workflow is free of e
ors? Will other researchers or students
e able to reproduce these steps to learn how a new method works or understand how the
presented results were obtained?
The pressure to publish encourages us to charge forward chasing the goal of an ac-
cepted manuscript, but the term “reproducibility” implies repetition and thus a requirement
to also move back—to retrace one’s steps, question or change assumptions, and move for-
2
ward again. Unfortunately, the all-too-common way scientists conduct computational work
makes this necessary part of the research process difficult at best, often impossible.
The open source software development community1 has cultivated tools and practices
that, if em
aced and adapted by the scientific community, will greatly enhance our ability
to achieve reproducible outcomes. Open source software development uses public forums
for most discussion and systems for sharing code and data. There is a strong culture of
public disclosure, tracking and fixing of bugs, and development often includes exhaustive
validation tests that are executed automatically whenever changes are made to the software
and whose output is publicly available on the Internet. This detects problems early, miti-
gates their recu
ence, and ensures that the state and quality of the software is known unde
a wide variety of situations (operating systems, inputs, parameter ranges, etc). The same
systems used for sharing code also track the authorship of contributions. All of this ensures
an open collaboration that recognizes the work of individual developers and allows for a
meritocracy to emerge.
As we learn from the open source process how to improve our scientific practice, we
ecognize that the ideal of scientific reproducibility is by necessity a reality of shades. We
see a gradation from a pure mathematical result whose proof should be accessible to any
person skilled in the necessary specialty to one-of-a-kind experiments such as the Large
Hadron Collider or the Hu
le Space Telescope, that cannot be reproduced in any
Answered 2 days After Jan 29, 2022

Solution

Shubham answered on Jan 31 2022
111 Votes
The computational research uses MATLAB for prototyping along with a new analysis method for the development of high-performance code. It is integrated with an approach for computing with the entire life cycle for research to explore data and ideas to the presentation of the result. It is a key element for problems that can help in filling the gap that exists between final outcome of the scientific effort. In the open-source ecosystem, the approach focuses on the need for practice and tools that enable researchers for considered the complete life cycle of the research. Python language is considered an expressive, simple, and accessible language that focuses o code readability (Millman and Pérez, 2018). The concern arises with an attempt to...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here