Geocoding addresses

I used to use Google Sheets to geocode for Leaflet maps, but:

You can display Geocoding API results on a Google Map, or without a map. If you want to display Geocoding API results on a map, then these results must be displayed on a Google Map. It is prohibited to use Geocoding API data on a map that is not a Google map.

Geocoding API Policies

So here are other possibilities for getting your geocoding done:

And if you are going to use a Google Map, then this code will be helpful to get your addresses geocoded for you.

Responsive YouTube embeds

  1. Embed YouTube video in an iframe.
  2. Place the iframe in a container div.
  3. Give the iframe position: absolute; top: 0; left: 0; width: 100%; height: 100%;
  4. The container gets position: relative; and its ::before pseudo-element is: content: ""; display: block; padding-bottom: n% where n is the ratio of height to width multiplied by 100. So a 16:9 video should be given a bottom padding of 9 / 16 * 100, or 56.25%.

This solution was provided by Thomas Wilburn in the News Nerdery Slack, and it can be coded auto-magically for you here.

Python, data work, and O’Reilly books

I own many O’Reilly books about code. I’m kind of mad that they quit selling PDFs, because I loved those PDFs for searchability, and the Kindle editions are nowhere near as good (they have layout issues that don’t occur in PDFs).

Recently, though, I bought a hardcopy of Python Data Science Handbook, and this inspired me to examine my O’Reilly Python library.

First, a bit about Python Data Science Handbook: It’s a large book, 530 pages, but it has only five chapters:

  1. “iPython: Beyond Normal Python” (all the stuff you can do with the iPython shell, which is different from Jupyter Notebooks)
  2. Intro to NumPy
  3. Pandas
  4. Matplotlib
  5. Machine learning

That list is exactly why I bought this book, even though I already owned others. (See the whole book online.) I especially want to learn more about using Matplotlib in a Jupyter Notebook.

After reading chapters 1 and 2, I went into my older O’Reilly PDFs to see what other Python books I have in that collection. I opened Data Wrangling with Python and ended up spending more time in it than I’d expected, because — surprise! — not only is it completely different from Python Data Science Handbook; it is all about the kinds of things journalists use Python for the most: web scraping, document management, data cleaning. I don’t know why I’ve never spent more time with that book! (See the table of contents.) The first two chapters explain the Python language well for beginners, and then it goes on to data types (CSV, JSON, XML) that you need to know about when dealing with data provided by government agencies and the like. There’s a whole chapter on working with PDFs.

The only downside to Data Wrangling with Python is that the examples and code are Python 2.7. I understand why the authors made that choice in 2015, but now it’s a detriment, as those old 2.7 libraries are no longer being maintained. You can still learn a ton from this book, and if you’re a bit experienced with Python and the differences between 2.x and 3.x, it should be easy to work around any issues caused by the 2.7 code.

One other criticism I’d offer about Wrangling is that the chapter “Data Exploration and Analysis” uses agate, a Python library designed for journalists, but I think Pandas (another Python library) would be a better choice.

I’ve been teaching web scraping with Python to journalism students for four years now, and I’ve used a different O’Reilly book, Web Scraping with Python, by Ryan Mitchell, since the beginning. An updated second edition of Mitchell’s book came out last year, updating from 2.x to 3.x, which is good. (See the table of contents.) However, after yesterday’s time spent with Data Wrangling with Python, I wish I were using that book instead. The 2.x issue will prevent changing, though, because my students are beginners and we use Python 3.x. I like a lot of things about Mitchell’s book, but it’s a bit of a tough slog for Python beginners.

I have several other Python books (including some not from O’Reilly), but as I’m focused here on dealing with data issues (analysis and charts as well as scraping and documents), there’s only one other book I’d like to include in this post. It’s actually not a Python book, but it is from O’Reilly: Doing Data Science, by Schutt and O’Neil. (See the table of contents.) It’s older (published in 2013), but I think it holds up as an introduction to data analysis, algorithms, etc. It even has a chapter titled “Social Networks and Data Journalism.” Charts are in color, which I like very much. There’s not a lot of code in the book — it’s not about showing us how to write the code — and examples are in several languages, including Python, R, and Go.

All four books referenced here are distinctly different from one another. Although there is some overlap, it’s minimal.

Scraping details

I’ve been scraping websites with BeautifulSoup for several years, but not always using the Requests library.

Old way:

from urllib.request import urlopen
from bs4 import BeautifulSoup
url = ""
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")

New way:

import requests
from bs4 import BeautifulSoup
url = ""
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')

So they are really similar, but it turns out that the Requests library offers us two choices for html.text — instead, we could use html.content — so what’s the diff, and does it matter?

As usual, it’s Stack Overflow to the rescue. html.text will be the normal, usual choice. It gives us the content of the HTTP response in unicode, which will suit probably 99.9 percent of all requests. html.content would give us the content of the HTTP response in bytes — meaning raw. We would choose that for a non-HTML file, such as a PDF or an image.

Connect to a remote MySQL database

Pretty much every web hosting company allows you to create MySQL databases. It’s reasonably simple with cPanel and the MySQL Database Wizard. Then you can create tables and field names, etc., with phpMyAdmin.

So what if you’re developing a small database application on your laptop, and you don’t want to mess around with XAMPP or MAMP? No problem! Handy cPanel also gives us Remote MySQL so you can allow remote access to your databases from your laptop. Find it under Databases in cPanel.

To find the IP address your laptop is using right now, go to Copy and paste it into the box labeled “Host” and then click “Add Host.”

Remote MySQL panel

NOTE 1: At home, at school, and in a coffee shop, your IP address will be different. Therefore you will need to enter a new IP address into Remote MySQL at each location where you work on your app locally.

NOTE 2: Use the separate cPanel tool MySQL Databases for managing database users.

NOTE 3: Alternatives to phpMyAdmin that run on your computer include Sequel Pro (Mac) and HeidiSQL (Windows).

Here are detailed instructions for an assignment for which my students create a new database via cPanel.

Cron jobs

Cron is a utility for scheduling tasks to run on Linux and Unix systems. For example, you could run a script to scrape a website every Monday at 3 a.m. Or every third Monday. Or every four hours.

“What is Cron?” and other basic questions are answered clearly here.

Crontab Guru is a page that helps you write cron scheduling expressions correctly. Corntab is similar.

EasyCron” is a web-based cron manager. Starter plan: $12/month. Try it free for eight days.

Set up a PostgreSQL database at Heroku

For whatever reason, I’ve been slow to embrace Heroku. Its charms are growing on me, though. For a project I made for an online course, creating a PostgreSQL database at Heroku was required. These are the instructions, and they worked perfectly.

  1. Navigate to Heroku, and create an account if you don’t already have one.
  2. On Heroku’s Dashboard, click “New” and choose “Create new app.”
  3. Give your app a name,* and click “Create app.”
  4. On your app’s “Overview” page, click the “Configure Add-ons” button.
  5. In the “Add-ons” section of the page, type in and select “Heroku Postgres.”
  6. Choose the “Hobby Dev – Free” plan, which will give you access to a free PostgreSQL database that will support up to 10,000 rows of data. Click “Provision.”
  7. Now, click the “Heroku Postgres :: Database” link.
  8. You should now be on your database’s overview page. Click on “Settings”, and then “View Credentials.” This is the information you’ll need to log into your database.

* This app name will appear in the URL if later you deploy an online app using this database. So do not name it something generic like “database1.” If your app is about cities in Europe, for example, name it europe-cities. A hyphen is fine. Do not use spaces or other marks of punctuation.

Now, how will you use that database? That will be the topic of a future post. Stay tuned.

Could regex’s days be numbered?

Regex lets us find anything, check for patterns, format accurately — and drives us crazy with anxiety and deepest discomfort. Regex stands for regular expressions, and my fave regex editor for Python is Pythex (I could not possibly write regex without it).

But there’s hope for the future! Check out Rosie and the Rosie Pattern Language (RPL) for Python!

I watched this video (the whole thing) and it just made me feel so happy. Oh, and Rosie works with other languages too!

This post will help me remember “What was that thing I heard about that can replace regex?” when I need to.

See also: Getting Started with RPL in 15 minutes

5-minute JavaScript tutorials

So many tutorial videos are so darned long! This summer, I set out to make a new series of videos for absolute beginners in JavaScript. They are free on YouTube, and here is the playlist:

JavaScript Beginners

Most of these video are shorter than 5 minutes. Only one is longer than 6 minutes (6:43), and I might try to go back and make that one shorter.

I have tried to title each video very specifically so that you can scan the playlist and pick a focused topic without thinking too hard.

The idea was to demonstrate characteristics of JavaScript, as well as the most basic programming concepts, in the JavaScript console — which every modern web browser has. I hope this will encourage beginners to play along and try things out in the console themselves.

Beginner JavaScript Resources

Eloquent JavaScript, 3rd edition (2018), by Marijn Haverbeke: This free online book is available as a PDF, EPUB or MOBI (Kindle) download. I recommend it highly (as do many others).

I urge you to seek out resources at Mozilla Developer Network (MDN) and NOT at W3schools. Here are some great MDN resources for finding answers about JavaScript:

The excellent book series You Don’t Know JS is fully available online to read for free.

These two recently updated Codecademy courses are interactive, hands-on, and really great for beginners:

If you learned jQuery back when it was truly necessary, you might (like me) appreciate this resource for doing all those things using pure vanilla JavaScript instead: plainJS – The Vanilla JavaScript Repository