Data science

Data science is recognized as a field distinct from computer science; it has been called “the child of statistics and computer science” (Blei & Smyth, 2017, p. 8689 doi: 10.1073/pnas.1702076114).

Many universities have whipped up degree programs in data science. I’ve searched and examined a lot of the curricula, and my favorite is the master’s program at the University of San Francisco, because it seems very comprehensive, and the faculty have solid credentials.

There’s a block of “foundation courses” from which students must complete two:

  • MSDS 501 – Computation for Analytics
  • MSDS 502 – Review of Linear Algebra
  • MSDS 504 – Review Probability and Stats

Then there are 33 units of required courses:

  • MSDS 593 – EDA and Visualization
  • MSDS 601 – Linear Regression Analysis
  • MSDS 603 – Product Analytics
  • MSDS 604 – Time Series Analysis
  • MSDS 605 – Practicum I
  • MSDS 610 – Communications for Analytics
  • MSDS 621 – Intro to Machine Learning
  • MSDS 625 – Practicum II
  • MSDS 626 – Case Studies in Data Science
  • MSDS 627 – Practicum III
  • MSDS 629 – Experiments in Data Science
  • MSDS 630 – Advanced Machine Learning
  • MSDS 631 – Special Topics in Analytics
  • MSDS 632 – Practicum IV
  • MSDS 633 – Ethics in Data Science
  • MSDS 689 – Data Structures and Algorithms
  • MSDS 691 – Relational Databases
  • MSDS 692 – Data Acquisition
  • MSDS 694 – Distributed Computing
  • MSDS 697 – Distributed Data Systems
  • MSDS 699 – Machine Learning Laboratory

In addition, students must attend seminars and take 10 hours of interview skills training.

This is a one-year full-time residential program that includes 15 hours/week of practicum for nine months of the program.

Some of the things that most impress me about the curriculum:

  • Three courses on machine learning
  • A course devoted to ethics
  • A course on exploratory data analysis and visualization
  • The 2-unit course on data acquisition focuses on web scraping with Python (check out the course description for this!)
  • A communications course for learning how to present data to clients and stakeholders
  • Use of both R and Python; omission of unnecessary programming languages
  • A course on SQL databases and a separate course on MongoDB
  • A course on conducting experiments

I have no stake in this master’s degree program (in fact I work at a different university in another state), but when I’ve looked at other programs with “data science” in the title, I’ve concluded that most do not compare favorably with this one.

Mainly I am interested in the intersection of journalism and data science, so I’m continually making comparisons between data-focused journalism projects and the work of data scientists.

Related post: Python, data work, and O’Reilly books