Data science

Data science is recognized as a field distinct from computer science; it has been called “the child of statistics and computer science” (Blei & Smyth, 2017, p. 8689 doi: 10.1073/pnas.1702076114).

Many universities have whipped up degree programs in data science. I’ve searched and examined a lot of the curricula, and my favorite is the master’s program at the University of San Francisco, because it seems very comprehensive, and the faculty have solid credentials.

There’s a block of “foundation courses” from which students must complete two:

  • MSDS 501 – Computation for Analytics
  • MSDS 502 – Review of Linear Algebra
  • MSDS 504 – Review Probability and Stats

Then there are 33 units of required courses:

  • MSDS 593 – EDA and Visualization
  • MSDS 601 – Linear Regression Analysis
  • MSDS 603 – Product Analytics
  • MSDS 604 – Time Series Analysis
  • MSDS 605 – Practicum I
  • MSDS 610 – Communications for Analytics
  • MSDS 621 – Intro to Machine Learning
  • MSDS 625 – Practicum II
  • MSDS 626 – Case Studies in Data Science
  • MSDS 627 – Practicum III
  • MSDS 629 – Experiments in Data Science
  • MSDS 630 – Advanced Machine Learning
  • MSDS 631 – Special Topics in Analytics
  • MSDS 632 – Practicum IV
  • MSDS 633 – Ethics in Data Science
  • MSDS 689 – Data Structures and Algorithms
  • MSDS 691 – Relational Databases
  • MSDS 692 – Data Acquisition
  • MSDS 694 – Distributed Computing
  • MSDS 697 – Distributed Data Systems
  • MSDS 699 – Machine Learning Laboratory

In addition, students must attend seminars and take 10 hours of interview skills training.

This is a one-year full-time residential program that includes 15 hours/week of practicum for nine months of the program.

Some of the things that most impress me about the curriculum:

  • Three courses on machine learning
  • A course devoted to ethics
  • A course on exploratory data analysis and visualization
  • The 2-unit course on data acquisition focuses on web scraping with Python (check out the course description for this!)
  • A communications course for learning how to present data to clients and stakeholders
  • Use of both R and Python; omission of unnecessary programming languages
  • A course on SQL databases and a separate course on MongoDB
  • A course on conducting experiments

I have no stake in this master’s degree program (in fact I work at a different university in another state), but when I’ve looked at other programs with “data science” in the title, I’ve concluded that most do not compare favorably with this one.

Mainly I am interested in the intersection of journalism and data science, so I’m continually making comparisons between data-focused journalism projects and the work of data scientists.

Related post: Python, data work, and O’Reilly books

5-minute JavaScript tutorials

So many tutorial videos are so darned long! This summer, I set out to make a new series of videos for absolute beginners in JavaScript. They are free on YouTube, and here is the playlist:

JavaScript Beginners

Most of these video are shorter than 5 minutes. Only one is longer than 6 minutes (6:43), and I might try to go back and make that one shorter.

I have tried to title each video very specifically so that you can scan the playlist and pick a focused topic without thinking too hard.

The idea was to demonstrate characteristics of JavaScript, as well as the most basic programming concepts, in the JavaScript console — which every modern web browser has. I hope this will encourage beginners to play along and try things out in the console themselves.

A new Flask tutorial for journalism students

This semester I’ve been gradually building out a single, comprehensive python-beginners repo at GitHub, and the latest segment is all about getting started with Flask — a popular Python framework for building web apps (and easier than Django).

I have tried to use various books and online tutorials to teach Flask for the past two years, but I’ve finally given up on that because there’s just so much extra stuff (confusing for my students), or they are outdated and largely wrong now, or both.

Miguel Grinberg’s new, fully updated mega-tutorial for Flask is comprehensive and thorough — but it’s just too thorough, really, for journalism and communications students who only learned Python about four weeks ago.

Leaflet for interactive maps

The latest update I’ve made to a handout for students is Introduction to Leaflet.js. Leaflet is a JavaScript library for creating very flexible interactive maps, using a variety of tile sets — which enables you to make a map that looks quite different from a Google Map. There are lots of options for customization.

The Leaflet Quick Start Guide is good, but it doesn’t walk a beginner through how to enable the use of Mapbox tile sets, even though Leaflet recommends using them. Mapbox changed drastically about a year back, and those changes made it necessary to rewrite my tutorial.

Separately, I have a Leaflet assignment for students in their seventh week of working with JavaScript.

Highcharts for data charts and graphs

I had to update my resource document Getting Started with Highcharts because the library has been updated, and jQuery is no longer required. The document covers:

  • Installation
  • Your First Chart
  • Changing Colors and Styles
  • Setting and Changing Chart Options
  • Licensing and Payments
  • Interactive Demos with jsFiddle
  • Which Chart Type Should I Use?

Separately, I have a Highcharts assignment for students in their third week of learning JavaScript. It includes a GitHub repo.

Learn progressive web apps

New find: Progressive Web Apps Training, from Google.

“A PWA is not an API or a technology, but it is a web development approach that uses a combination of tools and technologies already available to create targeted, ideal user experiences. [This course] shows how to use service workers, APIs, and an application shell architecture for meaningful offline experiences, fast first load, and easy user reengagement upon repeat visits.”

Two things PWAs can do that a normal web app can’t:

  • Send push alerts/notifications.
  • Be used offline, and update any changes you made when you are back online.

Smashing Magazine wrote:

“Progressive web apps could be the next big thing for the mobile web. Originally proposed by Google in 2015, they have already attracted a lot of attention because of the relative ease of development and the almost instant wins for the application’s user experience.”

Unlike a native app, a web app can be used on mobile immediately. There’s no download from the App Store or equivalent.

A progressive web app will be built with HTML5 and with JavaScript, probably using a JavaScript framework.

This video explains why a progressive web app is desirable and shows excellent examples — including The Washington Post‘s PWA.

Resources

What are Progressive Web Apps? A blog post that is quite clear, does not get bogged down in jargon (of which there is plenty, where PWAs are concerned), and summarizes the whole mess nicely.

Why does The Washington Post’s Progressive Web App increase engagement on iOS? Even without iOS support, a PWA just loads faster. Way faster, apparently.

Forbes rebuilt its new mobile website as a Progressive Web App: All journalism is mobile, and there’s a lot of food for thought in this article.

5 awesome progressive web apps worth exploring: Twitter Mobile, The Washington Post, Flipboard, Paper Planes, Topple Trump.

Lighthouse analyzes web apps and web pages, collecting modern performance metrics and insights on developer best practices (GitHub repo).

Service Worker API (MDN): A JavaScript file that control the web page/site, “intercepting and modifying navigation and resource requests, and caching resources …” Key to offline use of a PWA.

PWAs vs. native apps: A deeper dive into pros and cons (published January 2017).

css.php