Jupyter Notebook Configuration

date: 2018-03-04
revised: 2018-07-16

Update: while notebooks usefully package code, kernel, and exposition, putting content into the ipynb qua json format is (or should be) the last step in my workflow.

Why? Rather than tooling around in a browser, I prefer to open a Jupyter Qt-console^[Always a nightmare to setup. Presently, I have qtconsole installed at $HOME/.local/lib/python3.7/site-packages/qtconsole. For whatever reason, I usually forget this, then install qtconsole for python 2.7 or python 3.6, then get stuck with cryptic error messages.], load a data set (which, thanks to the IPython kernel, will stay “hot in RAM”1), draft functions in vim, then %run -i to bring the defined functions into the kernel. I only create a *ipynb file (i) when I am certain I can generalize my sequence of transformations of the data, and (ii) when I am satisfied that the data set is exemplar to lead into exposition. Finally, thanks to ipynb_notedown, I can polish my argument in markdown.

Old: I treat Jupyter notebooks as journal entries. This post outlines my exposure and initial workflow in the IPython kernel as a user of SageMath.

Sage Kernel

In 2016, I compiled Sage from binary. I generally opened the Jupyter notebook interface with sage -n jupyter.

By 2018, I realized I wanted nbextensions (namely, the jupyter-vim-binding) for all my Jupyter notebooks. (See “Unofficial Jupyter Notebook Extensions — jupyter_contrib_nbextensions 0.4.0 documentation”.)

I discovered that Sage uses its own version of the Jupyter notebook, whence I needed to install sagemath as a kernel in Jupyter. I followed Samuel Lelièvre’s advice

sudo jupyter kernelspec install ./SageMath/local/share/jupyter/kernels/sagemath

Where SageMath is your root sagemath directory.

On MacOS /Applications/SageMath/. Further, they advised

…edit the kernel.json file to make the SageMath Jupyter kernel aware of SageMath’s location, by adding "env":{"SAGE_ROOT":"/path/to/sage"} at the end of the dictionary in kernel.json.

You will find the location of kernel.json by executing jupyter kernelspec list.

My kernel.json now reads

{
 "display_name": "SageMath 8.0", 
 "argv": [
  "/Applications/SageMath/local/bin/sage", 
  "--python", 
  "-m", 
  "sage.repl.ipython_kernel", 
  "-f", 
  "{connection_file}"], 
 "env":{"SAGE_ROOT":"/Applications/SageMath/"}
}

whence SageMath is loaded as a kernel in the (standard) Jupyter Notebook.

Version Control

How should I curate individual notebooks? By subject? By date? By most recently updated?

On one hand, I could follow Jonathan Whitmore’s suggestion - be verbose!

Let a traditional paper lab notebook be your guide here:

Yet, I have no one in mind as an imagined reader. I might as well get notebooks configured in a way that would be useful for a data-analytic “reading”. As in, which did I return to edit most frequently? which were referenced for code most often? least often?

Thinking about “Philip Guo - PG Podcast - Episode 35 - Audrey Boguchwal + Nadia Eghbal on sustainable online communities”, right now, I want neither to add noise to an online community nor hear it. It seems more beneficial to use notebooks to (i) build mechanical skill, and (ii) develop intuition, than to write for an audience.

Consequently, I’m following Tim Saley’s advice: strip notebooks of outputs and metadata for version control.

Because I occasionally run sagemath and mit-scheme as kernels for REPL style computations, I’ll only remove metadata from the individual cells.

I’ve set a filter to clean *.ipynb files in any of my git repos with jq.

In my ~/.gitconfig

[core]
attributesfile = ~/.gitattributes_global

[filter "nbstrip_full"]
clean = "jq --indent 1 \
        '(.cells[] | select(has(\"outputs\")) | .outputs) = []  \
        | (.cells[] | select(has(\"execution_count\")) | .execution_count) = null  \
        | .cells[].metadata = {} \
        '"
smudge = cat
required = true

and then in my ~/.gitattributes_global

*.ipynb filter=nbstrip_full

meta

Similar to my activities feeds, I’ve flattened the directory tree for notebooks. I have no doubt an organization beyond chronology will fall out.

In a sociological analog, from Why Workers Can Suffer in Bossless Companies Like GitHub, we have

Critics say flat organizations can conceal power structures and shield individuals from accountability. This idea dates to the 1972 essay The Tyranny of Structurelessness by Jo Freeman, who describes her experiences in “leaderless” feminist organizations in the 1960s. “There is no such thing as a structureless group,” Freeman wrote. “Any group of people of whatever nature that comes together for any length of time for any purpose will inevitably structure itself in some fashion.”

see also

  1. “Philip Guo - PG Vlog #145 - Python + R Data Analysis Setup”. Retrieved July 16, 2018.