The DIY Data Scientist is a weekly newsletter sent out each Saturday morning to thousands of professionals wanting to build real-world, do-it-yourself (DIY) data science skills for the AI-driven future of business.
The newsletter is structured around hands-on tutorials on the skills that I’ve used for years as a data scientist for previous employers and in my current work as a data science consultant and educator.
Tutorials include Python code and data so you can follow along and experiment to build your skills.
You can access past newsletter issues here, organized by the various tutorial series.
Data Profiling Tutorial Series
December 7th, 2024: Issue #1 - Data Profiling for Machine Learning
December 14th, 2024: Issue #2 - Profiling Numeric Data For Machine Learning
December 21st, 2024: Issue #3 - Profiling Categorical Data For Machine Learning
December 28th, 2024: Issue #4 - Profiling Date-Time & Text Data For Machine Learning
January 4th, 2025: Issue #5 - Profiling Correlation & Duplicate Data for Machine Learning
Missing Data Tutorial Series
January 11th, 2025: Issue #6 - Handling Missing Data for Machine Learning: Fixing Data & Algorithms
January 18th, 2025: Issue #7 - Handling Missing Data for Machine Learning: Removing Rows & Columns
January 25th, 2025: Issue #8 - Handling Missing Data For Machine Learning: Proxy Features & Imputation
Interpreting ML Models Tutorial Series
February 1st, 2025: Issue #9 - Interpreting ML Models Is Your Superpower
February 8th, 2025: Issue #10 - The Spectrum of ML Model Interpretability
February 15th, 2025: Issue #11 - Interpreting ML Models with Feature Importance
February 22nd, 2025: Issue #12 - Interpreting ML Models with Data Visualization
March 1st, 2024: Issue #13 - Using Surrogate Models for Interpretation
Hierarchical Clustering Tutorial Series
March 8th, 2025: Issue #14 Hierarchical Clustering Part 1: Introduction
March 15th, 2025: Issue #15 Hierarchical Clustering Part 2: Algorithm Intuition
March 22nd, 2025: Issue #16 Hierarchical Clustering Part 3: Calculating Distance
March 29th, 2025: Issue #17 Hierarchical Clustering Part 4: Python Code