Data science and beyond
I'm Matt Theisen, a data scientist in the Greater Los Angeles area. I'm interested data science, conceptual modeling, and analytics. Here are some of my projects.
Jobs-Numbers.com: a project for exploring the Bureau of Labor Statistics' (BLS) Occupational Employment Statistics (OES) dataset.
Los Angeles Neighborhood Ranker: (Beta) An interactive web app which takes user-selected features to rank neighborhoods in Los Angeles.
Identifying Customers Using Logistic Regression With glmnet in R: using logistic regression to select customers to market to.
Recursive Clustering Algorithm for Word Cloud Quiz: Implementation of a recursive/hierarchical clustering algorithm that enforces roughly equal cluster sizes.
Using NLP to Find The Magic Words for Resumes: Machine learning applied to resume text.
Traversing The Cancer Genome Atlas: Ongoing project in which I use Python to parse, collect, and analyze data from The Cancer Genome Atlas.
The limits of data: article about how data can and cannot be used.
How I Got Into Data Science: the story of how I went from computational biology researcher to data scientist.
Writing on Grad School: A pamphlet on the many issues with PhD education.
Video production: Contest-winning videos I have produced/co-produced for video contests at UCLA.