Matthew Theisen

Data science and beyond

I'm Matt Theisen, a data scientist in the Greater Los Angeles area. I'm interested data science, conceptual modeling, and analytics. Here are some of my projects.

Featured Projects a project for exploring the Bureau of Labor Statistics' (BLS) Occupational Employment Statistics (OES) dataset.

Los Angeles Neighborhood Ranker: (Beta) An interactive web app which takes user-selected features to rank neighborhoods in Los Angeles.

Data science/statistics

Identifying Customers Using Logistic Regression With glmnet in R: using logistic regression to select customers to market to.

Recursive Clustering Algorithm for Word Cloud Quiz: Implementation of a recursive/hierarchical clustering algorithm that enforces roughly equal cluster sizes.

Using NLP to Find The Magic Words for Resumes: Machine learning applied to resume text.

Traversing The Cancer Genome Atlas: Ongoing project in which I use Python to parse, collect, and analyze data from The Cancer Genome Atlas.


The limits of data: article about how data can and cannot be used.

How I Got Into Data Science: the story of how I went from computational biology researcher to data scientist.

Writing on Grad School: A pamphlet on the many issues with PhD education.

Video production/visual storytelling

Video production: Contest-winning videos I have produced/co-produced for video contests at UCLA.