Mining your IPython Notebooks with nbgrep

One of the things I like most about IPython Notebook is right there in the name -- it's a great notebook. Often I'll figure out how to do something, be it talk to a certain API, format a graph in a particular way, parse a certain kind of file, and so on, in one of my notebooks.

The problem is this: I have notebooks in lots directories around my machine. Each data collection/analysis project has it's own repository. I've got notebooks that go with presentations, notebooks for blog posts, a general playground direcory.... you get the idea. So, I'll often remember that I solved a particular problem in the past, but not where I solved it.

The second problem is that grep and ack don't work well with .ipynb files.

  1. They're not normal line-oriented text, they're JSON files.
  2. They don't just have the code; they have your text, but more distractingly, they have the output files, many of which might be SVG or base64 encoded images, large HTML tables, etc.

I found a useful techique from Michelle Gill that helps address this second problem. Using jq, a command line JSON processor, you can pick out only the code cells.

$ jq '.worksheets[].cells[] | select(.cell_type=="code") | .input[]' MyFile.ipynb

Great! Now I just need to find all the notebooks. Since I'm on OSX, I know that Spotlight knows where all my .ipynb files are, and I can access that from the CLI with mdfind.

$ mdfind -onlyin ~/work -name '.ipynb'

Update: Thanks to Thomas Spura for the fork, this now works on linux with find if you don't have mdfind; I updated the original gist.

Bolting those ideas together, and I have the very useful script nbgrep. So if I want to find the notebook I was playing around with the Twitter API in, it's an nbgrep twitter away. (Bonus: in the terminal, you even get python syntax highlighting.)

$ nbgrep twitter

/Users/jbarratt/work/notebookcookbook/Tweet Relief.ipynb:

import twitter
auth = twitter.oauth.OAuth(creds['access_token'], 
twitter_api = twitter.Twitter(auth=auth)
search_results ='#oscon', count
    search_results =**kwargs)

Quick Curated Podcasts

During a time in my life when I was spending around 3 hours a day in a car, I developed a big appreciation for podcasts. Now that I don't travel as much, I tend to listen to them when I'm doing otherwise rote chores or driving long distances. There's a lot to love about the medium.

Hosts aren't constrained by trying to fit things in a few minute segments, or the need to constantly "reset" the conversation in case someone just tuned into the radio. Podcasts (can be) extremely cheap to make, so people can take risks. There's something, too, about listening through headphones which can lead to a far more intimate feeling; more like being on a phone call than "consuming media." I particularly enjoy those produced by comics. For those who typically have to fight to get a few minutes of stage time, the liberation they feel in having this platform is palpable. I love to hear people having fun doing what they're doing.

So, to the purpose of this post: I occasionally would listen to an episode that I thought my wife would really appreciate. I don't think she'd be into every episode of most of the shows I listen to, but I wanted a way to hand-pick a few so that she could listen. And, after a few minutes of searching, I found an approach that I love: A Personal Podcast Generator Script. I'm using his script and approach with a few small tweaks.

  1. Create a new directory in your ~/Dropbox/Public/ directory, and put the generator script in it
  2. Configure the script. (Including giving it the public URL of the directory, which you can find by alt-clicking a file and saying "copy public link".)
  3. Start downloading the episodes you like into the directory.
  4. Run the script, which creates a podcast.rss
  5. Alt-click that URL and say "copy public link"; that link is an official podcast!

I configured my wife's phone with the Apple Podcasts App, because it seemed more straightforward for this than my favorite, Downcast, and it's been working just fine for this purpose.

For reference, my shows of choice these days:

Contents © 2014 Joshua Barratt - Powered by Nikola