Data science is recognized as a field distinct from computer science; it has been called “the child of statistics and computer science” (Blei & Smyth, 2017, p. 8689 doi: 10.1073/pnas.1702076114).
Many universities have whipped up degree programs in data science. I’ve searched and examined a lot of the curricula, and my favorite is the master’s program at the University of San Francisco, because it seems very comprehensive, and the faculty have solid credentials.
There’s a block of “foundation courses” from which students must complete two:
- MSDS 501 – Computation for Analytics
- MSDS 502 – Review of Linear Algebra
- MSDS 504 – Review Probability and Stats
Then there are 33 units of required courses:
- MSDS 593 – EDA and Visualization
- MSDS 601 – Linear Regression Analysis
- MSDS 603 – Product Analytics
- MSDS 604 – Time Series Analysis
- MSDS 605 – Practicum I
- MSDS 610 – Communications for Analytics
- MSDS 621 – Intro to Machine Learning
- MSDS 625 – Practicum II
- MSDS 626 – Case Studies in Data Science
- MSDS 627 – Practicum III
- MSDS 629 – Experiments in Data Science
- MSDS 630 – Advanced Machine Learning
- MSDS 631 – Special Topics in Analytics
- MSDS 632 – Practicum IV
- MSDS 633 – Ethics in Data Science
- MSDS 689 – Data Structures and Algorithms
- MSDS 691 – Relational Databases
- MSDS 692 – Data Acquisition
- MSDS 694 – Distributed Computing
- MSDS 697 – Distributed Data Systems
- MSDS 699 – Machine Learning Laboratory
In addition, students must attend seminars and take 10 hours of interview skills training.
This is a one-year full-time residential program that includes 15 hours/week of practicum for nine months of the program.
Some of the things that most impress me about the curriculum:
- Three courses on machine learning
- A course devoted to ethics
- A course on exploratory data analysis and visualization
- The 2-unit course on data acquisition focuses on web scraping with Python (check out the course description for this!)
- A communications course for learning how to present data to clients and stakeholders
- Use of both R and Python; omission of unnecessary programming languages
- A course on SQL databases and a separate course on MongoDB
- A course on conducting experiments
I have no stake in this master’s degree program (in fact I work at a different university in another state), but when I’ve looked at other programs with “data science” in the title, I’ve concluded that most do not compare favorably with this one.
Mainly I am interested in the intersection of journalism and data science, so I’m continually making comparisons between data-focused journalism projects and the work of data scientists.
Related post: Python, data work, and O’Reilly books