Data science is recognized as a field distinct from computer science; it has been called “the child of statistics and computer science” (Blei & Smyth, 2017, p. 8689 doi: 10.1073/pnas.1702076114).

Many universities have whipped up degree programs in data science. I’ve searched and examined a lot of the curricula, and my favorite is the master’s program at the University of San Francisco, because it seems very comprehensive, and the faculty have solid credentials.

There’s a block of “foundation courses” from which students must complete two:

- MSDS 501 – Computation for Analytics
- MSDS 502 – Review of Linear Algebra
- MSDS 504 – Review Probability and Stats

Then there are 33 units of required courses:

- MSDS 593 – EDA and Visualization
- MSDS 601 – Linear Regression Analysis
- MSDS 603 – Product Analytics
- MSDS 604 – Time Series Analysis
- MSDS 605 – Practicum I
- MSDS 610 – Communications for Analytics
- MSDS 621 – Intro to Machine Learning
- MSDS 625 – Practicum II
- MSDS 626 – Case Studies in Data Science
- MSDS 627 – Practicum III
- MSDS 629 – Experiments in Data Science
- MSDS 630 – Advanced Machine Learning
- MSDS 631 – Special Topics in Analytics
- MSDS 632 – Practicum IV
- MSDS 633 – Ethics in Data Science
- MSDS 689 – Data Structures and Algorithms
- MSDS 691 – Relational Databases
- MSDS 692 – Data Acquisition
- MSDS 694 – Distributed Computing
- MSDS 697 – Distributed Data Systems
- MSDS 699 – Machine Learning Laboratory

In addition, students must attend seminars and take 10 hours of interview skills training.

This is a one-year full-time residential program that includes 15 hours/week of practicum for nine months of the program.

Some of the things that most impress me about the curriculum:

- Three courses on machine learning
- A course devoted to
**ethics** - A course on exploratory data analysis and visualization
- The 2-unit course on
**data acquisition**focuses on web scraping with Python (check out the course description for this!) - A
**communications**course for learning how to present data to clients and stakeholders - Use of
**both R and Python**; omission of unnecessary programming languages - A course on
**SQL databases**and*a separate course*on MongoDB - A course on conducting experiments

I have no stake in this master’s degree program (in fact I work at a different university in another state), but when I’ve looked at other programs with “data science” in the title, I’ve concluded that most do not compare favorably with this one.

Mainly I am interested in the intersection of ** journalism** and data science, so I’m continually making comparisons between data-focused journalism projects and the work of data scientists.

Related post: Python, data work, and O’Reilly books