This notebook provides a detailed technical walkthrough of the Total Depth Index (TDI) - a psychometric approach to measuring team depth in the NHL.
Author: Dylan Wiwad Data: NHL play-by-play and shift data (2010-2025)
Overview
I conceptualize depth as a latent construct - an unobservable quality that manifests through multiple measurable indicators. Using structural equation modeling (SEM), we combine four observable measures of roster balance:
Shot Depth (SOG_D): Distribution of shots on goal
Corsi Depth (CF_D): Distribution of shot attempts (possession proxy)
Expected Goals Depth (xG_D): Distribution of scoring chance quality
Time-on-Ice Depth (TOI_D): Distribution of ice time
Each indicator is computed as 1 - Gini coefficient for that team-game, where higher values indicate more even distribution across the roster.
Code
#!/usr/bin/env python3"""Total Depth Index (TDI) - Technical Appendix"""import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport matplotlib as mplimport seaborn as snsfrom scipy import statsfrom sklearn.metrics import roc_auc_score, log_loss, accuracy_scoreimport warningswarnings.filterwarnings('ignore')# Brand setupmpl.rcParams['font.family'] ='Charter'mpl.rcParams['figure.dpi'] =100# Brand colorscol_blue_pale ='#3B4B64'col_blue_dark ='#041E42'col_oil_orange ='#FF4C00'col_orange_pale ='#E76F2B'col_white ='#FFFFFF'col_teal ='#6EB4B8'col_gray ='#9A9A9A'# Project pathsPROJECT_ROOT ="/Users/dwiwad/dev/hockey_site"DATA_PATH =f"{PROJECT_ROOT}/data/total-depth-index/all_seasons"print("✅ Setup complete")
✅ Setup complete
We start by loading the processed team-game data that contains our four depth indicators along with outcome variables.
Code
# Load main datasetdata = pd.read_csv(f"{DATA_PATH}/final_data_20102025.csv")print(f"Dataset shape: {data.shape[0]:,} team-games × {data.shape[1]} variables")print(f"Seasons covered: {data['season'].min()} to {data['season'].max()}")print(f"Teams: {data['teamAbbrev'].nunique()} unique franchises")print(f"\nVariables:")print(data.columns.tolist())
✅ High inter-correlations among depth indicators (0.5-0.8) suggest they measure a common underlying construct
✅ Low correlations with volume metrics (|r| < 0.3) confirm that depth (distribution) is conceptually distinct from magnitude
This supports our theoretical model: depth is about how evenly production is distributed, not how much is produced.
3. Confirmatory Factor Analysis (CFA)
We now fit a single-factor model where all four depth indicators load onto one latent “Depth” variable. The model is specified as:
Depth =~ SOG_D + CF_D + xG_D + TOI_D
In R (lavaan), this was fit with:
model <-'depth =~ cf_depth_raw + sog_depth_raw + toi_depth_raw + xgoal_depth_raw'fit <-sem(model, data = data, std.lv =TRUE)
The factor scores from this model become our TDI metric. We’ve pre-computed these and they’re stored in tdi_factor column.
Code
# The factor scores are already in the data (computed in R via lavaan)# Let's inspect themtdi = data['tdi_factor'].dropna()print("="*60)print("TDI FACTOR SCORES - Summary Statistics")print("="*60)print(tdi.describe().round(3))print(f"\nSkewness: {stats.skew(tdi):.3f}")print(f"Kurtosis: {stats.kurtosis(tdi):.3f}")# Test normality_, p_shapiro = stats.shapiro(tdi.sample(5000)) # subsample for computational efficiencyprint(f"Shapiro-Wilk p-value (n=5000): {p_shapiro:.4f}")
# Distribution of TDI with reference linesfig, ax = plt.subplots(figsize=(12, 7))ax.hist(tdi, bins=60, density=True, color=col_blue_pale, alpha=0.7, edgecolor='white', linewidth=0.3)tdi.plot.kde(ax=ax, color=col_oil_orange, linewidth=3.5)# Add reference lines for quartilesq25, q50, q75 = tdi.quantile([0.25, 0.50, 0.75])for q, label inzip([q25, q50, q75], ['Q1', 'Median', 'Q3']): ax.axvline(q, color=col_blue_dark, linestyle='--', linewidth=1.2, alpha=0.6) ax.text(q, ax.get_ylim()[1]*0.95, f' {label}={q:.2f}', rotation=0, va='top', fontsize=10, color=col_blue_dark)sns.despine()ax.grid(False)ax.set_xlabel("Depth (latent factor score)", fontsize=18)ax.set_ylabel("Density", fontsize=18)ax.set_xlim(-3.5, 3.5)ax.tick_params(axis='both', labelsize=14)left_x = ax.get_position().x0plt.subplots_adjust(top=0.85)fig.suptitle("Distribution of Depth (TDI) Factor Scores", fontsize=20, weight='bold', x=left_x, ha='left', y=0.97)fig.text(left_x, 0.87, "Team–game factor scores across all seasons (2010-2025)", fontsize=14, ha='left')plt.show()
Model Fit Indices (from R lavaan output):
Fit Index
Value
Threshold
Interpretation
CFI
0.98
> 0.95
✅ Excellent
TLI
0.95
> 0.95
✅ Good
RMSEA
0.04
< 0.06
✅ Excellent
SRMR
0.02
< 0.08
✅ Excellent
Standardized Factor Loadings:
Indicator
Loading
SE
p-value
CF_D
0.85
0.01
< .001
SOG_D
0.83
0.01
< .001
xG_D
0.76
0.01
< .001
TOI_D
0.42
0.01
< .001
Key Findings: - Shot-based indicators (CF, SOG, xG) are the strongest contributors - TOI Depth loads weakly, suggesting coaches distribute minutes evenly regardless of performance depth - Model fits the data extremely well, confirming depth is a coherent latent construct
4. Game-to-Game Stability
A critical question: is depth a stable trait of a team, or does it fluctuate game-to-game?
Code
# Calculate lag-1 autocorrelation by team-seasonac1_results = []for (team, season), group in data.groupby(['teamAbbrev', 'season']): group = group.sort_values('game_id').reset_index(drop=True)iflen(group) >=10: # need sufficient games tdi_vals = group['tdi_factor'].values tdi_lag1 = pd.Series(tdi_vals).shift(1).values mask =~(pd.isna(tdi_vals) | pd.isna(tdi_lag1))if mask.sum() >=5: r = pd.Series(tdi_vals[mask]).corr(pd.Series(tdi_lag1[mask])) ac1_results.append({'team': team,'season': season,'n_games': len(group),'ac1': r })ac1_df = pd.DataFrame(ac1_results).dropna(subset=['ac1'])print("="*60)print("LAG-1 AUTOCORRELATION SUMMARY")print("="*60)print(ac1_df['ac1'].describe().round(3))print(f"\nMedian r: {ac1_df['ac1'].median():.3f}")print(f"IQR: [{ac1_df['ac1'].quantile(0.25):.3f}, {ac1_df['ac1'].quantile(0.75):.3f}]")print(f"\n% of team-seasons with r < 0.10: {(ac1_df['ac1'] <0.10).mean()*100:.1f}%")
============================================================
LAG-1 AUTOCORRELATION SUMMARY
============================================================
count 462.000
mean 0.012
std 0.109
min -0.306
25% -0.061
50% 0.013
75% 0.086
max 0.313
Name: ac1, dtype: float64
Median r: 0.013
IQR: [-0.061, 0.086]
% of team-seasons with r < 0.10: 79.4%
Code
# Visualize by team (ordered by median AC1)ac1_df_clean = ac1_df[~ac1_df['team'].isin(['ATL', 'UTA'])] # exclude defunct teamsteam_order = (ac1_df_clean.groupby('team')['ac1'] .median().sort_values().index.tolist())fig, ax = plt.subplots(figsize=(10, 5.5))# Violin plotparts = ax.violinplot( [ac1_df_clean[ac1_df_clean['team'] == t]['ac1'].values for t in team_order], positions=range(len(team_order)), widths=0.7, showmeans=False, showmedians=False, showextrema=False)for pc in parts['bodies']: pc.set_facecolor(col_blue_pale) pc.set_alpha(0.7)# Boxplot overlaybp = ax.boxplot( [ac1_df_clean[ac1_df_clean['team'] == t]['ac1'].values for t in team_order], positions=range(len(team_order)), widths=0.18, patch_artist=True, showfliers=False, medianprops=dict(color=col_blue_dark, linewidth=2), boxprops=dict(facecolor=col_white, edgecolor=col_blue_dark, linewidth=1.2), whiskerprops=dict(color=col_blue_dark, linewidth=1.2), capprops=dict(color=col_blue_dark, linewidth=1.2))ax.axhline(y=0, color=col_oil_orange, linewidth=1.4, linestyle='--', alpha=0.95)ax.axhline(y=ac1_df_clean['ac1'].median(), color=col_teal, linewidth=1.2, linestyle=':', alpha=0.7, label=f"League Median (r={ac1_df_clean['ac1'].median():.3f})")sns.despine()ax.grid(False)ax.set_xticks(range(len(team_order)))ax.set_xticklabels(team_order, rotation=90, fontsize=10, va='top')ax.set_xlabel("Team", fontsize=18)ax.set_ylabel("Lag-1 autocorrelation of Depth", fontsize=18)ax.tick_params(axis='y', labelsize=14)ax.legend(loc='upper left', fontsize=11, frameon=False)left_x = ax.get_position().x0plt.subplots_adjust(top=0.85, bottom=0.15)fig.suptitle("Short-Term Stability of Depth by Team", fontsize=20, weight='bold', x=left_x, ha='left', y=0.97)fig.text(left_x, 0.87, "Lag-1 autocorrelation within team-seasons (2010-2025)", fontsize=14, ha='left')plt.show()
Finding:
The median lag-1 autocorrelation is r ≈ 0.01 — essentially zero!
What this means: - Depth in game t tells you almost nothing about depth in game t+1 - ~98% of variance in depth is within-team, not between-teams - Depth is highly volatile game-to-game
Why does this matter? If we want depth to predict outcomes, we need to smooth out the noise. This motivates using a rolling average approach.
5. Rolling Window Analysis
To capture sustained depth patterns, we compute 10-game rolling averages of TDI. The choice of 10 games balances: - Capturing recent trends (not too long) - Smoothing out single-game noise (not too short) - Having enough data for early-season predictions
Code
# Calculate rolling 10-game average (lagged to avoid leakage)data_roll = data.sort_values(['teamAbbrev', 'game_id']).copy()data_roll['depth_roll10'] = ( data_roll.groupby('teamAbbrev')['tdi_factor'] .transform(lambda x: x.shift(1).rolling(10, min_periods=10).mean()))data_roll['depth_next'] = data_roll.groupby('teamAbbrev')['tdi_factor'].shift(-1)df_model = data_roll.dropna(subset=['depth_roll10', 'depth_next'])print(f"Valid observations for rolling model: {len(df_model):,}")
============================================================
ROLLING 10-GAME DEPTH → NEXT GAME DEPTH
============================================================
β (slope): 0.452
R²: 0.036
p-value: 7.87e-306
Correlation: 0.190
✅ Rolling average is a significant predictor!
Key Result:
✅ β = 0.38, p < .001 — Rolling depth significantly predicts next-game depth
While single-game depth is noise, the 10-game average captures a meaningful signal about a team’s sustained depth state.
6. Depth and Winning
Now for the key question: Does depth help you win?
We’ll examine this at three levels: 1. Same-game association: Does having more depth in a game predict winning that game? 2. Prospective prediction: Does recent depth (10-game rolling average) predict next-game outcomes? 3. Season-level correlation: Do teams with higher average depth win more?
Code
# 1. Same-game associationfrom sklearn.linear_model import LogisticRegression# Simple logistic regression: Win ~ Depthmodel_sg = LogisticRegression()X_sg = data[['tdi_factor']].dropna()y_sg = data.loc[X_sg.index, 'outcome']model_sg.fit(X_sg, y_sg)coef_sg = model_sg.coef_[0][0]odds_ratio_sg = np.exp(coef_sg)print("="*60)print("SAME-GAME ASSOCIATION")print("="*60)print(f"β (logit): {coef_sg:.4f}")print(f"Odds Ratio: {odds_ratio_sg:.4f}")print(f"Direction: {'✅ Positive'if coef_sg >0else'❌ Negative'}")if coef_sg <0:print("\n⚠️ Paradoxical finding: Higher same-game depth predicts LOWER win probability")print(" This suggests balanced depth in a single game may reflect desperation")print(" (trailing teams rolling all four lines) rather than dominance.")
============================================================
SAME-GAME ASSOCIATION
============================================================
β (logit): -0.0142
Odds Ratio: 0.9859
Direction: ❌ Negative
⚠️ Paradoxical finding: Higher same-game depth predicts LOWER win probability
This suggests balanced depth in a single game may reflect desperation
(trailing teams rolling all four lines) rather than dominance.
Code
# 2. Prospective prediction (10-game rolling depth)X_prosp = data_roll[['depth_roll10']].dropna()y_prosp = data_roll.loc[X_prosp.index, 'outcome']model_prosp = LogisticRegression()model_prosp.fit(X_prosp, y_prosp)coef_prosp = model_prosp.coef_[0][0]odds_ratio_prosp = np.exp(coef_prosp)print("\n"+"="*60)print("PROSPECTIVE PREDICTION (10-game rolling depth)")print("="*60)print(f"β (logit): {coef_prosp:.4f}")print(f"Odds Ratio: {odds_ratio_prosp:.4f}")print(f"Direction: {'✅ Positive'if coef_prosp >0else'❌ Negative'}")if coef_prosp >0:print("\n✅ Sustained depth (10-game avg) DOES predict winning!")print(f" Teams with 1 SD higher rolling depth have {(odds_ratio_prosp-1)*100:.1f}% higher odds of winning")
============================================================
PROSPECTIVE PREDICTION (10-game rolling depth)
============================================================
β (logit): 0.1453
Odds Ratio: 1.1564
Direction: ✅ Positive
✅ Sustained depth (10-game avg) DOES predict winning!
Teams with 1 SD higher rolling depth have 15.6% higher odds of winning
Code
# 3. Season-level correlationteam_season_summary = ( data.groupby(['teamAbbrev', 'season']) .agg({'tdi_factor': 'mean','outcome': 'mean','game_id': 'count' }) .rename(columns={'tdi_factor': 'avg_depth', 'outcome': 'win_rate', 'game_id': 'n_games'}) .reset_index())r_ts, p_ts = stats.pearsonr(team_season_summary['avg_depth'], team_season_summary['win_rate'])print("\n"+"="*60)print("SEASON-LEVEL CORRELATION")print("="*60)print(f"Correlation: r = {r_ts:.3f}")print(f"p-value: {p_ts:.2e}")print(f"R²: {r_ts**2:.3f}")print("\n✅ Teams with higher average depth win more games over the season")
============================================================
SEASON-LEVEL CORRELATION
============================================================
Correlation: r = 0.189
p-value: 4.41e-05
R²: 0.036
✅ Teams with higher average depth win more games over the season
The resolution: Single-game depth can reflect desperation (rolling all lines when trailing), but sustained depth over multiple games indicates a true roster advantage.
7. Mediation Analysis: Volume vs Quality
We know depth predicts winning, but how? Two competing hypotheses:
Volume Hypothesis: More depth → More total shots → More wins
Quality Hypothesis: More depth → Better shot quality (xG) → More wins
I test this with parallel mediation: Depth → {SOG, CF, xG} → Winning
Code
# Ensure we have the rolling depth column for mediation analysis# Either use the existing column or calculate it if neededif'depth_rolling10'in data.columns and'depth_roll10'notin data.columns:# Use existing pre-calculated column data['depth_roll10'] = data['depth_rolling10']print("✅ Using existing 'depth_rolling10' column")elif'depth_roll10'notin data.columns:# Calculate it freshprint("⚠️ Calculating rolling 10-game depth...") data = data.sort_values(['teamAbbrev', 'game_id']).copy() data['depth_roll10'] = ( data.groupby('teamAbbrev')['tdi_factor'] .transform(lambda x: x.shift(1).rolling(10, min_periods=10).mean()) )print("✅ Created 'depth_roll10' column")else:print("✅ 'depth_roll10' already exists")# Quick checkprint(f"\nRolling depth stats:")print(data['depth_roll10'].describe())print(f"Non-null values: {data['depth_roll10'].notna().sum():,} / {len(data):,}")
✅ Using existing 'depth_rolling10' column
Rolling depth stats:
count 38219.000000
mean 0.000391
std 0.420790
min -1.879913
25% -0.273492
50% 0.006728
75% 0.287610
max 1.548830
Name: depth_roll10, dtype: float64
Non-null values: 38,219 / 38,394
============================================================
FULL MODEL: Depth + Mediators → Winning
============================================================
Depth (rolling 10):
β = 0.1461
OR = 1.1574
SOG (z):
β = -0.1575
OR = 0.8542
Corsi For (z):
β = -0.2969
OR = 0.7431
xGoal (z):
β = 0.7326
OR = 2.0806
Mediation Results: The Quality Pathway
Key Findings:
Depth increases ALL three metrics (SOG, CF, xG) ✅
Only xG positively predicts winning when all three are in the model ✅
SOG and CF become NEGATIVE once xG is controlled ❌
Interpretation:
This is a classic suppression effect. Once shot quality (xG) is accounted for, extra shot volume actually indicates low-quality desperation — perimeter shots, blocked attempts, point shots into traffic.
The mechanism:Depth → Quality Chances (xG) → Winning
Depth helps you win because balanced rosters create dangerous chances across all four lines, not because they generate more total shots.
The big test: Can depth predict game outcomes before they happen across different years?
We use leave-one-season-out (LOSO) cross-validation: - Train on 14 seasons → Test on 1 held-out season - Repeat 15 times (one for each season) - Compare to baseline (home team always wins) and MoneyPuck’s model
Code
# ============================================================# LOAD SCHEDULE DATA FOR MATCHUP-LEVEL ANALYSIS# ============================================================# This replicates the R approach: one observation per game (home vs away)print("="*60)print("LOADING SCHEDULE DATA")print("="*60)sched = pd.read_csv(f"{DATA_PATH}/all_games_meta_20102025.csv")print(f"Schedule data: {len(sched):,} games")print(f"Columns: {sched.columns.tolist()}")# Process schedulesched_lookup = sched.rename(columns={'id': 'game_id'}).copy()sched_lookup['season'] = sched_lookup['season'].astype(str) # match data type# Extract home/away team abbreviations and scoressched_lookup['home_abbrev'] = sched_lookup['homeTeam.abbrev'].str.upper()sched_lookup['away_abbrev'] = sched_lookup['awayTeam.abbrev'].str.upper()sched_lookup['home_score'] = pd.to_numeric(sched_lookup['homeTeam.score'], errors='coerce')sched_lookup['away_score'] = pd.to_numeric(sched_lookup['awayTeam.score'], errors='coerce')# Keep only needed columnssched_lookup = sched_lookup[['game_id', 'season', 'home_abbrev', 'away_abbrev','home_score', 'away_score']].drop_duplicates('game_id')print(f"✅ Processed {len(sched_lookup):,} unique games")
# ============================================================# NORMALIZE TEAM ABBREVIATIONS# ============================================================# Match inconsistencies between data sources (e.g., LA vs LAK)abbr_map = {'LA': 'LAK','SJ': 'SJS','TB': 'TBL','NJ': 'NJD'}def normalize_abbrev(x): x_upper =str(x).upper()return abbr_map.get(x_upper, x_upper)# Apply to both datasetsdata['teamAbbrev'] = data['teamAbbrev'].apply(normalize_abbrev)sched_lookup['home_abbrev'] = sched_lookup['home_abbrev'].apply(normalize_abbrev)sched_lookup['away_abbrev'] = sched_lookup['away_abbrev'].apply(normalize_abbrev)print("✅ Normalized team abbreviations")
✅ Normalized team abbreviations
Code
# ============================================================# CALCULATE ROLLING AVERAGES FOR xG AND CORSI# ============================================================# Need rolling 10-game averages for xgoal and corsi_for (matching depth_rolling10)print("\n"+"="*60)print("CALCULATING ROLLING AVERAGES")print("="*60)data = data.sort_values(['teamAbbrev', 'game_id']).copy()# Create rolling averages (lagged by 1 to prevent leakage, 10-game window)data['xg_roll10'] = ( data.groupby('teamAbbrev')['xgoal'] .transform(lambda x: x.shift(1).rolling(10, min_periods=10).mean()))data['cf_roll10'] = ( data.groupby('teamAbbrev')['corsi_for'] .transform(lambda x: x.shift(1).rolling(10, min_periods=10).mean()))# Use existing depth_rolling10 or create if neededif'depth_rolling10'in data.columns: data['depth_roll10'] = data['depth_rolling10']else: data['depth_roll10'] = ( data.groupby('teamAbbrev')['tdi_factor'] .transform(lambda x: x.shift(1).rolling(10, min_periods=10).mean()) )print(f"✅ Created rolling averages:")print(f" - depth_roll10: {data['depth_roll10'].notna().sum():,} valid values")print(f" - xg_roll10: {data['xg_roll10'].notna().sum():,} valid values")print(f" - cf_roll10: {data['cf_roll10'].notna().sum():,} valid values")
============================================================
CALCULATING ROLLING AVERAGES
============================================================
✅ Created rolling averages:
- depth_roll10: 38,219 valid values
- xg_roll10: 38,044 valid values
- cf_roll10: 38,044 valid values
Code
# ============================================================# CREATE MATCHUP-LEVEL DATA (One row per game)# ============================================================# Join schedule to determine home/away, then pivot to matchupsprint("\n"+"="*60)print("CREATING MATCHUP-LEVEL DATA")print("="*60)# Ensure season dtypes match before mergedata['season'] = data['season'].astype(str)sched_lookup['season'] = sched_lookup['season'].astype(str)print(f"Data types before merge:")print(f" data['season']: {data['season'].dtype}")print(f" sched_lookup['season']: {sched_lookup['season'].dtype}")# Join schedule to team-game datadata_with_schedule = data.merge( sched_lookup, on=['game_id', 'season'], how='inner')print(f"✅ Merged: {len(data_with_schedule):,} rows")# Determine home/away for each teamdata_with_schedule['home_away'] ='UNKNOWN'data_with_schedule.loc[ data_with_schedule['teamAbbrev'] == data_with_schedule['home_abbrev'],'home_away'] ='home'data_with_schedule.loc[ data_with_schedule['teamAbbrev'] == data_with_schedule['away_abbrev'],'home_away'] ='away'# Filter to valid home/away assignmentsdata_with_schedule = data_with_schedule[data_with_schedule['home_away'].isin(['home', 'away'])]print(f"Rows with home/away assigned: {len(data_with_schedule):,}")# Check for any unmatched teamsunmatched = data_with_schedule[data_with_schedule['home_away'] =='UNKNOWN']iflen(unmatched) >0:print(f"⚠️ {len(unmatched)} rows with no home/away match")print(unmatched[['game_id', 'teamAbbrev', 'home_abbrev', 'away_abbrev']].head())# Pivot to matchup level (one row per game with home and away columns)matchups = data_with_schedule.pivot_table( index=['game_id', 'season'], columns='home_away', values=['teamAbbrev', 'depth_roll10', 'xg_roll10', 'cf_roll10'], aggfunc='first').reset_index()# Flatten column namesmatchups.columns = ['_'.join(col).strip('_') if col[1] else col[0] for col in matchups.columns]# Rename for claritymatchups = matchups.rename(columns={'teamAbbrev_home': 'home_team','teamAbbrev_away': 'away_team','depth_roll10_home': 'home_depth','depth_roll10_away': 'away_depth','xg_roll10_home': 'home_xg','xg_roll10_away': 'away_xg','cf_roll10_home': 'home_cf','cf_roll10_away': 'away_cf'})print(f"\n✅ Pivoted to matchups: {len(matchups):,} games")# Add outcome from schedule scoresmatchups = matchups.merge( sched_lookup[['game_id', 'home_score', 'away_score']], on='game_id', how='left')matchups['home_outcome'] = (matchups['home_score'] > matchups['away_score']).astype(int)# Calculate differentials (home - away)matchups['depth_diff'] = matchups['home_depth'] - matchups['away_depth']matchups['xg_diff'] = matchups['home_xg'] - matchups['away_xg']matchups['cf_diff'] = matchups['home_cf'] - matchups['away_cf']# Drop rows with missing valuesmatchups_clean = matchups.dropna(subset=['home_outcome', 'depth_diff', 'xg_diff', 'cf_diff'])print(f"\n✅ Created {len(matchups_clean):,} complete matchups")print(f" Dropped {len(matchups) -len(matchups_clean):,} incomplete rows")print(f"\nMatchup data shape: {matchups_clean.shape}")print(f"Sample:")print(matchups_clean[['game_id', 'season', 'home_team', 'away_team','depth_diff', 'xg_diff', 'home_outcome']].head(10))
============================================================
CREATING MATCHUP-LEVEL DATA
============================================================
Data types before merge:
data['season']: object
sched_lookup['season']: object
✅ Merged: 38,394 rows
Rows with home/away assigned: 38,394
✅ Pivoted to matchups: 19,197 games
✅ Created 18,987 complete matchups
Dropped 210 incomplete rows
Matchup data shape: (18987, 16)
Sample:
game_id season home_team away_team depth_diff xg_diff \
141 2010020142 20102011 ANA NJD -0.590623 -0.087772
145 2010020146 20102011 PHI NYI 0.263789 0.696856
153 2010020154 20102011 CGY WSH 0.144362 0.179592
154 2010020155 20102011 LAK NJD -0.824380 0.525740
156 2010020157 20102011 NYR CHI 0.169929 0.605389
157 2010020158 20102011 PHI CAR 0.011776 0.282827
159 2010020160 20102011 TOR OTT -0.788341 -0.211515
160 2010020161 20102011 CBJ MTL -0.419210 0.175837
164 2010020165 20102011 WSH TOR 0.358041 0.367720
165 2010020166 20102011 CAR NYI 0.199113 0.443913
home_outcome
141 0
145 1
153 0
154 1
156 1
157 1
159 0
160 1
164 1
165 1
# ============================================================# STACKED ACCURACY + LOGLOSS VISUALIZATION# ============================================================# Uses explicit integer x-positions to prevent squishingcv_results['season'] = cv_results['season'].astype(str)moneypuck['season'] = moneypuck['season'].astype(str)# Brand colorscol_depth ='#3B4B64'# Depth modelcol_moneypuck ='#6EB4B8'# MoneyPuckcol_baseline ='#9A9A9A'# Baselinecol_orange ='#E76F2B'# Reference lines# Ensure we have the baseline columnsif'accuracy_baseline'notin cv_results.columns:print("⚠️ Adding baseline columns from hardcoded R values") baseline_map =dict(zip( ['20102011', '20112012', '20122013', '20132014', '20142015','20152016', '20162017', '20172018', '20182019', '20192020','20202021', '20212022', '20222023', '20232024', '20242025'], [0.520, 0.552, 0.581, 0.541, 0.544,0.528, 0.558, 0.558, 0.535, 0.525,0.535, 0.541, 0.520, 0.537, 0.565] )) logloss_baseline_map =dict(zip( ['20102011', '20112012', '20122013', '20132014', '20142015','20152016', '20162017', '20172018', '20182019', '20192020','20202021', '20212022', '20222023', '20232024', '20242025'], [0.692, 0.688, 0.680, 0.690, 0.689,0.692, 0.686, 0.686, 0.691, 0.692,0.691, 0.690, 0.692, 0.690, 0.685] )) cv_results['accuracy_baseline'] = cv_results['season'].map(baseline_map) cv_results['logloss_baseline'] = cv_results['season'].map(logloss_baseline_map)# Create explicit x-positions for plottingcv_results = cv_results.sort_values('season').reset_index(drop=True)cv_results['x_pos'] =range(len(cv_results))# Map MoneyPuck seasons to x-positionsmoneypuck_with_pos = moneypuck.merge( cv_results[['season', 'x_pos']], on='season', how='left')# Create figure with 2 stacked subplotsfig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 11), sharex=True)# ==========================================# TOP PANEL: ACCURACY# ==========================================# Depth modelax1.plot(cv_results['x_pos'], cv_results['accuracy_model'], color=col_depth, linewidth=3.5, marker='o', markersize=8, label='Depth model (LOSO-CV)')# MoneyPuckax1.plot(moneypuck_with_pos['x_pos'], moneypuck_with_pos['acc_mp'], color=col_moneypuck, linewidth=3.5, linestyle='dotted', marker='^', markersize=9, label='MoneyPuck pre-game model')# Baselineax1.plot(cv_results['x_pos'], cv_results['accuracy_baseline'], color=col_baseline, linewidth=3, linestyle='--', marker='s', markersize=7, label='Baseline (home win rate)')# Random 50% referenceax1.axhline(y=0.5, color=col_orange, linewidth=1.4, linestyle='--', alpha=0.7)ax1.text(1, 0.503, 'Random baseline (50%)', color=col_orange, fontsize=11, style='italic', va='bottom')# Stylingsns.despine(ax=ax1)ax1.grid(False)ax1.set_ylabel("Accuracy", fontsize=18)ax1.tick_params(axis='y', labelsize=14)ax1.set_ylim(0.50, 0.66)# Format y-axis as percentagesax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{int(y*100)}%'))# ==========================================# BOTTOM PANEL: LOG LOSS# ==========================================# Depth modelax2.plot(cv_results['x_pos'], cv_results['logloss'], color=col_depth, linewidth=3.5, marker='o', markersize=8, label='Depth model (LOSO-CV)')# MoneyPuckax2.plot(moneypuck_with_pos['x_pos'], moneypuck_with_pos['logloss_mp'], color=col_moneypuck, linewidth=3.5, linestyle='dotted', marker='^', markersize=9, label='MoneyPuck pre-game model')# Baselineax2.plot(cv_results['x_pos'], cv_results['logloss_baseline'], color=col_baseline, linewidth=3, linestyle='--', marker='s', markersize=7, label='Baseline (home win rate)')# Random 0.693 referenceax2.axhline(y=0.693, color=col_orange, linewidth=1.4, linestyle='--', alpha=0.7)ax2.text(1, 0.6935, 'Random baseline (0.693)', color=col_orange, fontsize=11, style='italic', va='bottom')# Stylingsns.despine(ax=ax2)ax2.grid(False)ax2.set_xlabel("Season", fontsize=18)ax2.set_ylabel("Log Loss", fontsize=18)ax2.tick_params(axis='both', labelsize=14)ax2.set_ylim(0.645, 0.70)# X-axis: use integer positions with season labelsax2.set_xticks(cv_results['x_pos'])ax2.set_xticklabels(cv_results['season'], rotation=45, ha='right', fontsize=10)# ==========================================# TITLES AND LEGEND# ==========================================# Calculate left margin for title alignmentleft_x = ax1.get_position().x0# Reserve space for titlesplt.subplots_adjust(top=0.88, bottom=0.10, hspace=0.35)# Main titlefig.suptitle("Predictive Accuracy and Log Loss by Season", fontsize=20, weight='bold', x=left_x, ha='left', y=0.97)# Subtitlefig.text( left_x,0.92,"Depth model vs MoneyPuck vs seasonal baseline (LOSO-CV for depth model)", fontsize=14, ha='left')# Shared legend at bottomhandles, labels = ax1.get_legend_handles_labels()fig.legend(handles, labels, loc='center', ncol=3, frameon=False, fontsize=14, bbox_to_anchor=(0.5, 0.49))plt.tight_layout()plt.show()# Print summaryprint("\n"+"="*60)print("FIGURE SUMMARY")print("="*60)print(f"Seasons plotted: {len(cv_results)}")print(f"Average accuracy (depth model): {cv_results['accuracy_model'].mean():.1%}")print(f"Average accuracy (baseline): {cv_results['accuracy_baseline'].mean():.1%}")print(f"Average log loss (depth model): {cv_results['logloss'].mean():.4f}")print(f"Average log loss (baseline): {cv_results['logloss_baseline'].mean():.4f}")print(f"\n✅ Depth model beats baseline in {(cv_results['accuracy_model'] > cv_results['accuracy_baseline']).sum()}/{len(cv_results)} seasons")
============================================================
FIGURE SUMMARY
============================================================
Seasons plotted: 15
Average accuracy (depth model): 56.1%
Average accuracy (baseline): 54.3%
Average log loss (depth model): 0.6816
Average log loss (baseline): 0.6890
✅ Depth model beats baseline in 14/15 seasons
### Predictive Validity Summary:
✅ Depth model beats baseline in 14/15 seasons
✅ Accuracy improvement: +0.5 to +5.3 points depending on season
✅ Log loss consistently lower (better calibrated probabilities)
⚠️ MoneyPuck still better (as expected — they use comprehensive pre-game data)
Key insight: Depth captures a real, predictive signal about game outcomes that goes beyond shot volume and quality. Even with just historical depth, we can beat naive baselines.
9. Team-Level Depth Profiles (2024-25 Season)
Let’s examine which teams are currently playing with the most depth.
# Visualize top 10 and bottom 10fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))# Top 10top10 = team_2425.head(10)ax1.barh(range(10), top10['avg_depth'], color=col_blue_pale, edgecolor=col_blue_dark, linewidth=1.5)ax1.set_yticks(range(10))ax1.set_yticklabels(top10['teamAbbrev'].tolist())ax1.invert_yaxis()ax1.set_xlabel('Average Depth (TDI)', fontsize=13, weight='bold')ax1.set_title('Top 10 Teams by Depth', fontsize=14, weight='bold', pad=15)ax1.axvline(0, color=col_blue_dark, linewidth=1)sns.despine(ax=ax1)ax1.grid(axis='x', alpha=0.3)# Bottom 10bottom10 = team_2425.tail(10).sort_values('avg_depth')ax2.barh(range(10), bottom10['avg_depth'], color=col_oil_orange, edgecolor=col_orange_pale, linewidth=1.5)ax2.set_yticks(range(10))ax2.set_yticklabels(bottom10['teamAbbrev'].tolist())ax2.invert_yaxis()ax2.set_xlabel('Average Depth (TDI)', fontsize=13, weight='bold')ax2.set_title('Bottom 10 Teams by Depth', fontsize=14, weight='bold', pad=15)ax2.axvline(0, color=col_blue_dark, linewidth=1)sns.despine(ax=ax2)ax2.grid(axis='x', alpha=0.3)plt.tight_layout()plt.show()
## 10. Measurement Invariance
A critical psychometric question: Does depth mean the same thing across teams and seasons?
If the factor loadings vary substantially by group, then a “depth score of 0.5” might mean different things for different teams/eras.
In the R analysis (lavaan), we tested: - Configural invariance: Same structure (4 indicators → 1 factor) - Metric invariance: Same factor loadings across groups - Scalar invariance: Same indicator intercepts across groups
Across Teams: - Metric invariance: ✅ Supported - Scalar invariance: ⚠️ Partial support (TOI intercept freed)
Interpretation: The factor structure is largely stable, but TOI depth has slightly different baseline levels across groups. This makes sense: some teams/eras distribute ice time more evenly as a stylistic choice regardless of performance depth.
Conclusion: TDI scores are meaningfully comparable across teams and seasons, with minor adjustments.
11. Additional Exploratory Visualizations
Let’s examine some interesting patterns in the data.