Data science

Data science is recognized as a field distinct from computer science; it has been called “the child of statistics and computer science” (Blei & Smyth, 2017, p. 8689 doi: 10.1073/pnas.1702076114).

Many universities have whipped up degree programs in data science. I’ve searched and examined a lot of the curricula, and my favorite is the master’s program at the University of San Francisco, because it seems very comprehensive, and the faculty have solid credentials.

There’s a block of “foundation courses” from which students must complete two:

  • MSDS 501 – Computation for Analytics
  • MSDS 502 – Review of Linear Algebra
  • MSDS 504 – Review Probability and Stats

Then there are 33 units of required courses:

  • MSDS 593 – EDA and Visualization
  • MSDS 601 – Linear Regression Analysis
  • MSDS 603 – Product Analytics
  • MSDS 604 – Time Series Analysis
  • MSDS 605 – Practicum I
  • MSDS 610 – Communications for Analytics
  • MSDS 621 – Intro to Machine Learning
  • MSDS 625 – Practicum II
  • MSDS 626 – Case Studies in Data Science
  • MSDS 627 – Practicum III
  • MSDS 629 – Experiments in Data Science
  • MSDS 630 – Advanced Machine Learning
  • MSDS 631 – Special Topics in Analytics
  • MSDS 632 – Practicum IV
  • MSDS 633 – Ethics in Data Science
  • MSDS 689 – Data Structures and Algorithms
  • MSDS 691 – Relational Databases
  • MSDS 692 – Data Acquisition
  • MSDS 694 – Distributed Computing
  • MSDS 697 – Distributed Data Systems
  • MSDS 699 – Machine Learning Laboratory

In addition, students must attend seminars and take 10 hours of interview skills training.

This is a one-year full-time residential program that includes 15 hours/week of practicum for nine months of the program.

Some of the things that most impress me about the curriculum:

  • Three courses on machine learning
  • A course devoted to ethics
  • A course on exploratory data analysis and visualization
  • The 2-unit course on data acquisition focuses on web scraping with Python (check out the course description for this!)
  • A communications course for learning how to present data to clients and stakeholders
  • Use of both R and Python; omission of unnecessary programming languages
  • A course on SQL databases and a separate course on MongoDB
  • A course on conducting experiments

I have no stake in this master’s degree program (in fact I work at a different university in another state), but when I’ve looked at other programs with “data science” in the title, I’ve concluded that most do not compare favorably with this one.

Mainly I am interested in the intersection of journalism and data science, so I’m continually making comparisons between data-focused journalism projects and the work of data scientists.

Related post: Python, data work, and O’Reilly books

Packaging apps

This is a summary of a news nerds discussion in early November 2019.

  • Grunt and Gulp are old news, although still used.
  • Webpack is difficult but widely used.
  • Rollup is simpler than Webpack, and good.
  • Microbundle is even easier than Rollup, and suitable for smaller modules.
  • The Airbnb JavaScript Style Guide is opinionated but useful for establishing and following conventions.

A long but clear explanation of the whole module-bundling process is Modern JavaScript Explained For Dinosaurs (2017).

Embed a gist in WordPress

Mostly I don’t mind the WordPress Gutenberg editor, but this was a real pain in the ass.

First, you need a new “block.” It needs to be a regular paragraph block, NOT a code block. Code blocks are for writing code.

Then you need to turn on “Add Custom HTML” for that block.

Screenshot of WP Custom HTML option

Then you take the embed code from your gist (see image below), including the script tags, and paste it. There is no WordPress embed option for gists. Do not waste your time Googling for how to embed a gist, as I did.

Screenshot of gist embed box at top of gist page

Now you’ll have a proper gist embed in your WordPress post.

Clarification: It’s not a pain in the ass once you know this is the way to do it. It’s a pain in the ass to figure it out because (a) it doesn’t work like other WordPress embeds in Gutenberg, and (b) there are a bunch of incorrect post about how to do it, including one at WordPress Support.

Exporting from Bokeh

Bokeh is a Python library for creating interactive data visualizations. I just started learning about it, and I immediately wanted to export either static images or HTML/JavaScript — or both! However, at first it seemed I would need to install extra libraries to make it happen.

Persistently, I kept searching, and I found that there are export options that do not require any extra libraries. Hooray!

So say you have already created and displayed a chart assigned to the variable chart1, using a Jupyter Notebook. This is all you’ll need to export a complete, fully functioning HTML file with included JavaScript:

Lordy, it was torture to embed that freaking gist using the WP Gutenberg editor. New post to follow.

Screenshot of Bokeh Save tool

The default toolset in Bokeh includes a “Save” icon. This outputs a PNG image of the chart.

Installing Python for beginners

Students really struggle with setup. By the time they’ve finished setting up Python, Jupyter Notebooks, etc., they’re ready to quit the course and not even learn Python at all — especially students using Windows.

I think with Miniconda I’ve finally tamed that beast. Here are my instructions for students, in one Google doc. Feel free to copy and edit it for your own use.

http://bit.ly/mm-conda