For Interactive Data Science Collaboration CineGrid December 10, 2015
CAROL WILLING ➤ Python Software Foundation, Director ➤ Project Jupyter, Contributor ➤ Fab Lab San Diego, Geek in Residence
MANAGER AND ANALYST
WONDER AND CURIOSITY
PROJECT JUPYTER Just the Facts
The Notebook: “Literate Computing” Computational Narratives ❖ Computers deal with code and data. ❖ Humans deal with narratives that communicate. Literate Computing (not Literate Programming) narratives anchored in a live computation, that communicate a story based on data and results. Cf: Mathematica, Maple, MuPad, Sage…
“Project Jupyter serves not only the academic and scientific communities but also a much broader constituency of data scientists in research, education, industry and journalism… - Fernando Pérez UC Berkeley
“…we see uses of our tools that range from high school education in programming to the nation’s supercomputing facilities and the leaders of the tech industry. - Fernando Pérez UC Berkeley
“More than a million people are currently using Jupyter for everything from… -Prof. Brian Granger Cal Poly
“…analyzing massive gene sequencing datasets to processing images from the Hubble Space Telescope and developing models of financial markets. -Prof. Brian Granger Cal Poly
“We are excited by the potential of Project Jupyter to reach even wider audiences and to contribute to increased cross-disciplinary collaboration in the sciences. -Betsy Fader Helmsley Charitable Trust
“Jupyter Notebook… will enable data exploration, visualization, and analysis in a way that encourages sound science and speeds progress. -Chris Mentzel The Gordon and Betty Moore Foundation
DATA CHALLENGES Constraints or Opportunities?
OPPORTUNITIES Use our strengths
“The purpose of computing is insight, not numbers” –Hamming'62
The Lifecycle of a Scientific Idea (schematical y) 1. Individual exploratory work 2. Collaborative development 3. Parallel production runs (HPC, cloud, ...) 4. Publication & communication (reproducibly!) 5. Education 6. Goto 1.
JUPYTERHUB and Project Jupyter ecosystem
nbviewer: seamless notebook sharing ❖ Zero-install reading of notebooks ❖ Just share a URL ❖ nbviewer.ipython.org
Executable books Python for Signal Processing, by José Unpingco ❖ Springer hardcover book ❖ Chapters: IPython Notebooks ❖ Posted as a blog entry ❖ All available as a Github repo
University Courses These are just some we are aware of!
Jupyterhub at NERSC and OpenMSI Shreyas Cholia & ! Oliver Ruebel! NERSC Data & Analytics Services Group! Jupyterhub Day, July 17 2015
NERSC is the Production HPC & Data Facility for DOE Office of Science Research Largest$funder$of$physical$ science$research$in$U.S.$ Bio$Energy,$ Environment$ Compu2ng$ Materials,$Chemistry,$ Geophysics$ Par2cle$Physics,$ Nuclear$Physics$ Fusion$Energy,$ Astrophysics$ Plasma$Physics$ D$2$D$
Quantopian: algorithmic trading Karen Rubin Dir. Product Management at Quantopian Quantopian Research Post Fortune.com
Microsoft: Python Tools for Visual Studio Shahrokh Mortazavi, Dino Viehland, Wenming Ye, Dennis Gannon.
Microsoft Azure: Notebooks in the Cloud
Google CoLaboratory Kayur Patel, Kester Tong, Mark Sanders, Corinna Cortes @ Google Matt Turk @ NCSA/UIUC
JupyterHub: multiuser support ❖ Out of the box ❖ Unix accounts ❖ Local single-user notebooks ❖ Customizable ❖ Authentication: OAuth, LDAP, etc. ❖ Subprocess control: Docker, VMs, etc.