Code Notes

Embed a gist in WordPress

Mostly I don’t mind the WordPress Gutenberg editor, but this was a real pain in the ass.

First, you need a new “block.” It needs to be a regular paragraph block, NOT a code block. Code blocks are for writing code.

Then you need to turn on “Add Custom HTML” for that block.

Then you take the embed code from your gist (see image below), including the script tags, and paste it. There is no WordPress embed option for gists. Do not waste your time Googling for how to embed a gist, as I did.

Screenshot of gist embed box at top of gist page

Now you’ll have a proper gist embed in your WordPress post.

Clarification: It’s not a pain in the ass once you know this is the way to do it. It’s a pain in the ass to figure it out because (a) it doesn’t work like other WordPress embeds in Gutenberg, and (b) there are a bunch of incorrect post about how to do it, including one at WordPress Support.

Exporting from Bokeh

Bokeh is a Python library for creating interactive data visualizations. I just started learning about it, and I immediately wanted to export either static images or HTML/JavaScript — or both! However, at first it seemed I would need to install extra libraries to make it happen.

Persistently, I kept searching, and I found that there are export options that do not require any extra libraries. Hooray!

So say you have already created and displayed a chart assigned to the variable chart1, using a Jupyter Notebook. This is all you’ll need to export a complete, fully functioning HTML file with included JavaScript:

Lordy, it was torture to embed that freaking gist using the WP Gutenberg editor. New post to follow.

The default toolset in Bokeh includes a “Save” icon. This outputs a PNG image of the chart.

Installing Python for beginners

Students really struggle with setup. By the time they’ve finished setting up Python, Jupyter Notebooks, etc., they’re ready to quit the course and not even learn Python at all — especially students using Windows.

I think with Miniconda I’ve finally tamed that beast. Here are my instructions for students, in one Google doc. Feel free to copy and edit it for your own use.

http://bit.ly/mm-conda

Magic commands in Jupyter

Here’s a Jupyter Notebook I exported:

https://weimergeeks.com/examples/jup/JupyterHowTo.html

That’s not a live notebook, but you can see what it does. Magic commands are pretty awesome. I learned about them in the video that’s linked at the end of that page.

I also found out how to export a notebook as HTML. 😃

Geocoding addresses

I used to use Google Sheets to geocode for Leaflet maps, but:

You can display Geocoding API results on a Google Map, or without a map. If you want to display Geocoding API results on a map, then these results must be displayed on a Google Map. It is prohibited to use Geocoding API data on a map that is not a Google map.
Geocoding API Policies

So here are other possibilities for getting your geocoding done:

Geocodio
SmartyStreets (Python) — or choose another
Nominatim (OSM is OpenStreetMap)
ggmap (if you use R and ggplot)

And if you are going to use a Google Map, then this code will be helpful to get your addresses geocoded for you.

Responsive YouTube embeds

Embed YouTube video in an iframe.
Place the iframe in a container div.
Give the iframe position: absolute; top: 0; left: 0; width: 100%; height: 100%;
The container gets position: relative; and its ::before pseudo-element is: content: ""; display: block; padding-bottom: n% where n is the ratio of height to width multiplied by 100. So a 16:9 video should be given a bottom padding of 9 / 16 * 100, or 56.25%.

This solution was provided by Thomas Wilburn in the News Nerdery Slack, and it can be coded auto-magically for you here.

Python, data work, and O’Reilly books

I own many O’Reilly books about code. I’m kind of mad that they quit selling PDFs, because I loved those PDFs for searchability, and the Kindle editions are nowhere near as good (they have layout issues that don’t occur in PDFs).

Recently, though, I bought a hardcopy of Python Data Science Handbook, and this inspired me to examine my O’Reilly Python library.

First, a bit about Python Data Science Handbook: It’s a large book, 530 pages, but it has only five chapters:

“iPython: Beyond Normal Python” (all the stuff you can do with the iPython shell, which is different from Jupyter Notebooks)
Intro to NumPy
Pandas
Matplotlib
Machine learning

That list is exactly why I bought this book, even though I already owned others. (See the whole book online.) I especially want to learn more about using Matplotlib in a Jupyter Notebook.

After reading chapters 1 and 2, I went into my older O’Reilly PDFs to see what other Python books I have in that collection. I opened Data Wrangling with Python and ended up spending more time in it than I’d expected, because — surprise! — not only is it completely different from Python Data Science Handbook; it is all about the kinds of things journalists use Python for the most: web scraping, document management, data cleaning. I don’t know why I’ve never spent more time with that book! (See the table of contents.) The first two chapters explain the Python language for beginners, and then it goes on to data types (CSV, JSON, XML) that you need to know about when dealing with data provided by government agencies and the like. There’s a whole chapter on working with PDFs.

The big downside to Data Wrangling with Python is that the examples and code are Python 2.7. I understand why the authors made that choice in 2015, but now it’s a detriment, as those old 2.7 libraries are no longer being maintained. You can still learn from this book, and if you’re a bit experienced with Python and the differences between 2.x and 3.x, it should be easy to work around any issues caused by the 2.7 code.

Another criticism I’d offer about Wrangling is that the chapter “Data Exploration and Analysis” uses agate, a Python library designed for journalists, but in 2019 Pandas (another Python library) would be a much better choice.

I’ve been teaching web scraping with Python to journalism students for four years now, and I’ve used a different O’Reilly book, Web Scraping with Python, by Ryan Mitchell, since the beginning. An updated second edition of Mitchell’s book came out last year, updating from 2.x to 3.x, which is good. (See the table of contents.)

I have several other Python books (including some not from O’Reilly), but as I’m focused here on dealing with data issues (analysis and charts as well as scraping and documents), there’s only one other book I’d like to include in this post. It’s actually not a Python book, but it is from O’Reilly: Doing Data Science, by Schutt and O’Neil. (See the table of contents.) It’s older (published in 2013), but I think it holds up as an introduction to data analysis, algorithms, etc. It even has a chapter titled “Social Networks and Data Journalism.” Charts are in color, which I like very much. There’s not a lot of code in the book — it’s not about showing us how to write the code — and examples are in several languages, including Python, R, and Go.

All four books referenced here are distinctly different from one another. Although there is some overlap, it’s minimal.

(This post was edited in November 2019. After a recent closer reading of several chapters in the first edition of Data Wrangling with Python, I have concluded that it really needs an update, and much of it cannot be comfortably used with today’s libraries.)

Scraping details

I’ve been scraping websites with BeautifulSoup for several years, but not always using the Requests library.

Old way:

from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://weimergeeks.com/index.html"
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")

New way:

import requests
from bs4 import BeautifulSoup
url = "https://weimergeeks.com/index.html"
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')

So they are really similar, but it turns out that the Requests library offers us two choices for html.text — instead, we could use html.content — so what’s the diff, and does it matter?

As usual, it’s Stack Overflow to the rescue. html.text will be the normal, usual choice. It gives us the content of the HTTP response in unicode, which will suit probably 99.9 percent of all requests. html.content would give us the content of the HTTP response in bytes — meaning raw. We would choose that for a non-HTML file, such as a PDF or an image.

Connect to a remote MySQL database

Pretty much every web hosting company allows you to create MySQL databases. It’s reasonably simple with cPanel and the MySQL Database Wizard. Then you can create tables and field names, etc., with phpMyAdmin.

So what if you’re developing a small database application on your laptop, and you don’t want to mess around with XAMPP or MAMP? No problem! Handy cPanel also gives us Remote MySQL so you can allow remote access to your databases from your laptop. Find it under Databases in cPanel.

To find the IP address your laptop is using right now, go to ip4.me. Copy and paste it into the box labeled “Host” and then click “Add Host.”

NOTE 1: At home, at school, and in a coffee shop, your IP address will be different. Therefore you will need to enter a new IP address into Remote MySQL at each location where you work on your app locally.

NOTE 2: Use the separate cPanel tool MySQL Databases for managing database users.

NOTE 3: Alternatives to phpMyAdmin that run on your computer include Sequel Pro (Mac) and HeidiSQL (Windows).

Here are detailed instructions for an assignment for which my students create a new database via cPanel.

Cron jobs

Cron is a utility for scheduling tasks to run on Linux and Unix systems. For example, you could run a script to scrape a website every Monday at 3 a.m. Or every third Monday. Or every four hours.

“What is Cron?” and other basic questions are answered clearly here.

Crontab Guru is a page that helps you write cron scheduling expressions correctly. Corntab is similar.

“EasyCron” is a web-based cron manager. Starter plan: $12/month. Try it free for eight days.

Added 11/8/2019:

A clear 6-min. introduction to cron jobs