chrisberry888.github.io

<!DOCTYPE html>

CMSC320_Final_Project

Expected Value, Win Probability, and why "Common Knowledge" Hurts Sports Teams

Tutorial by Chris Berry

The four major American sports leagues (MLB, NFL, NBA, and NHL) have very long and extensive histories, and established throughout each league's history were traditions and "common knowledge" that many managers and front offices adhere to, to this day. However, many of the rules and practices that teams follow simply do not match up with what they mathematically should be doing.

The goal of every sports team should be to win games, and so every decision made by a coach should be made with the intention of increasing the probability that that team will win the game. For example, if a football coach thinks that kicking an onside kick will make them more likely to win the game, then they should choose to kick an onside kick. Despite this, there are many decisions that coaches make that are made because of deep-rooted traditions in the sport, without considering the data whatsoever. One of the most famous examples of this is punting in football: conventional wisdom states that football teams should almost never go for it on 4th down, but many different sources show that punting is almost never the mathematically correct decision (https://www.footballstudyhall.com/2013/11/15/5105958/fourth-down-pulaski-academy-kevin-kelley#:~:text=Kevin%20Kelley%2C%20Pulaski's%20coach%20never,attempts%20onside%20kicks%20after%20scoring).

In this tutorial I will discuss the ways in which win probability is used and misused in when deciding to pull the goalie in the NHL. In doing so, I will walk through all the steps of analyzing data, from retreiving, cleaning, and processing the data to creating predictive models of different real-life scenareos.

Pulling the Goalie

Before we talk about win probability, we need to talk about a concept called "expected value". In the context of sports, expected value (EV) is a measure of how many points that you can expect will result from a decision. To explain EV, here's an example: say we are playing a game, where whenever I flip a coin, if it lands on heads, I give you 10 dollars, and if it lands on tails, you give me 5 dollars. The equation for EV in this situation is EV = .5(10 dollars) + .5(-5 dollars) = 2.5 dollars; therefore, you can "expect" to win 2 dollars and 50 cents every time I flip the coin.

You might expect that making the highest EV decisions would always result in a higher chance for winning a game, but this is not always the case. Imagine a hockey game between the Boston Bruins and the New York Rangers: there are 2 minutes left in the game, and the Bruins are losing 2-3. With two minutes left, the Bruins decide to substitute their goalie for another skater. Andrew Thomas of Harvard University analyzed data from all NHL games between 2003 and 2007 and found that, in games where a goalie was pulled, the winning team scored on the empty net 34% of the time, while the team with the extra skater scored 30% of the time (https://fliphtml5.com/snvq/xptx/basic). So if the Rangers and Bruins are evenly matched, then the Bruins' expected value of pulling the goalie is EV = .3 (1) + .34 (-1) = -.04.

If the Bruins get negative EV from pulling the goalie, then why do they do it? The answer is because they don't care whether they lose by 1 or lose by 2: they just want to tie the game. Despite being a negative EV decision, deciding to pull the goalie increases the Bruins' chances of winning the game. It would be terribly stupid to pull the goalie while the score is still 0-0 in the first period, but it becomes the smart thing to do if you're losing in the third. This begs the question: when should a team decide to pull their goalie?

In this part, we will be using NHL Game Data courtesy of Martin Ellis (https://www.kaggle.com/martinellis/nhl-game-data?select=game_plays.csv). This database contains game data from almost all NHL games played between the 2000-01 season and the 2019-20 season. We will be extracting data from 4 of the tables in this database: "total_shifts" tells us how much time each player is on the ice; "total_plays" tells us about every single play that happens in each game, be it a shot, penalty, goal, faceoff, or anything else; "games" gives us general stats about each game; and "game_goalie_stats" gives us the stats for each goaltender for every game. The ultimate goal is to learn more about when the optimal time to pull the goalie is in a losing game, based on available data.

The following are all of the python packages that we will be using. They will be used later on for manipulating data and creating models, among other things

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import datetime
import numpy as np
import re
from matplotlib import pyplot as plt
import sqlite3
import statistics
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.neural_network import MLPRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVR
import warnings
import random
warnings.filterwarnings("ignore")

pref = "D:\\Documents\\Computer Programs\\CMSC320\\Final_Project\\"

We start by importing the data into Python. The "drop_duplicates()" command ensures that no game, goal, or player is being double-counted.

In [2]:
total_shifts = pd.read_csv(pref + "NHL\\game_shifts.csv")
total_shifts = total_shifts.drop_duplicates()
total_plays = pd.read_csv(pref + "NHL\\game_plays.csv")
total_plays = total_plays.drop_duplicates()
games = pd.read_csv(pref + "NHL\\game.csv")
games = games.drop_duplicates(subset=["game_id"])
game_goalie_stats = pd.read_csv(pref + "NHL\\game_goalie_stats.csv")
game_goalie_stats = game_goalie_stats.drop_duplicates()

Next, we will find every game in which a team was losing at the end of the game and decided to pull the goalie. Unfortunately, this information is not readily available in the database, so we will have to extrapolate from what we have. So in this step, we will first isolate every goalie's shift. Goalies are the only players in hockey who stay on the ice for more than just a minute or so at a time. All goalies start their shift at the beginning of the third period (2400 seconds into the game), and given that the longest shift by a skater in NHL history is around 310 seconds, we can safely say that if a shift starts at time marker 2400 and ends beyond 2710, then it's a goalie's shift.

We also know that if a goalie finished the game in net (i.e. their shift ends at time marker 3600), then they were not pulled. Therefore, any goalie whose shift ends before the 3600 mark must have been pulled. We store these shifts in the "pulled_goalies" variable.

In [3]:
goalies = total_shifts[ (total_shifts["shift_start"] == 2400) & (total_shifts["shift_end"] > 2710) ]
pulled_goalies = goalies[goalies["shift_end"] < 3600]

There are almost always only 2 goalies playing per game, which means that by dividing the total number of third-period goalie shifts by two, we will get the total number of games in this data set. Dividing the number of games with a pulled goalie by the total number of games yields the proportion of games that have a pulled goalie:

In [4]:
number_of_games = goalies.shape[0] / 2
print(pulled_goalies.shape[0] / number_of_games)
0.7292955290796629

So about 72.93% of games have a situation where the goalie is pulled at the end of the game. While this is a highly accurate figure, it is not completely accurate. The main culprit of this inaccuracy has to do with delayed penalties: when a team gets flagged for a penalty, the other team's goalie will go to the bench as a way of starting their power play early. This means that the 72.9% number is an overestimate, since it counts all delayed penalties like that in its calculation. Despite this inaccuracy, we will continue to use this table since it is accurate enough.

In [5]:
pulled_goalies.head()
Out[5]:
game_id player_id period shift_start shift_end
1599 2018020003 8469608 3 2400 3469.0
2646 2018020004 8474889 3 2400 3511.0
6597 2018020005 8475622 3 2400 3297.0
8294 2018020008 8475852 3 2400 3418.0
8708 2018020006 8468685 3 2400 3489.0

This is a small section of the "pulled_goalies" table. From now on, we will use this table to represent the games that have a pulled goalie in them. This table doesn't tell us very much about each game, so we'll use the dataset to add more and more useful information. First, we'll add the season that each game was played in, as well as the time left in the game when the goalie was pulled:

In [6]:
seasons = []
for index, row in pulled_goalies.iterrows():
    seasons.append(int(games.loc[games["game_id"] == row["game_id"], "season"].drop_duplicates()))
pulled_goalies["season"] = seasons

pulled_goalies["time_left"] = 3600 - pulled_goalies["shift_end"]

pulled_goalies.head()
Out[6]:
game_id player_id period shift_start shift_end season time_left
1599 2018020003 8469608 3 2400 3469.0 20182019 131.0
2646 2018020004 8474889 3 2400 3511.0 20182019 89.0
6597 2018020005 8475622 3 2400 3297.0 20182019 303.0
8294 2018020008 8475852 3 2400 3418.0 20182019 182.0
8708 2018020006 8468685 3 2400 3489.0 20182019 111.0

Next, we'll figure out what teams played in this game, what the score was when the goalie was pulled, and what happened afterwards. To do this, we'll start by creating another table containing every goal, as well as what game the goal belongs to:

In [7]:
goals = total_plays[total_plays["event"] == "Goal"]
goals["gameTime"] = goals["periodTime"] + (goals["period"] - 1) * 1200

This next chunk of code looks intimidating, but it's simpler than it looks. We want a few bits of information from the "goals" table: for each game we want to figure out who's winning, who's losing, what's the score, how much is the losing team losing by, how many goals did the losing team get after pulling the goalie, and did the winning team score at all.

In [8]:
losing_scores = []
losing_teams = []
winning_scores = []
winning_teams = []
goals_for_losing_team = []
did_winning_team_score = []
for index, row in pulled_goalies.iterrows():
    game = row["game_id"]
    time = row["shift_end"]
    home_team = games[games["game_id"] == game]
    home_team = int(home_team["home_team_id"])
    away_team = games[games["game_id"] == game]
    away_team = int(away_team["away_team_id"])
    p3_goals = goals[goals["game_id"] == game]
    post_hook_goals = p3_goals[(p3_goals["gameTime"] > time)&(p3_goals["period"] == 3)]
    
    p3_goals = p3_goals[p3_goals["gameTime"] < time]
    
    if p3_goals.shape[0] > 0:
        
    
        goal_before = p3_goals[p3_goals["gameTime"] == p3_goals["gameTime"].max()]
        
        
        a = int(goal_before["goals_away"])
        h = int(goal_before["goals_home"])
        losing_score = ([a, away_team] if a < h else [h, home_team])
        winning_score = ([h, home_team] if a < h else [a, away_team])
        
        diff = winning_score[0] - losing_score[0]
        phg = 0 #goals scored by losing team before 1) the period ends, 2) the other team scores, or 3) they even the score
        dwts = 0
        if(len(post_hook_goals) > 0):
            g1 = post_hook_goals.iloc[0]
            if g1.loc["team_id_for"] == losing_score[1]:
                phg += 1
                if diff >= 2 and len(post_hook_goals) >= 2:
                    g2 = post_hook_goals.iloc[1]
                    if g2.loc["team_id_for"] == losing_score[1]:
                        phg += 1
                        if diff >= 3 and len(post_hook_goals) >= 3:
                            g3 = post_hook_goals.iloc[2]
                            if g3.loc["team_id_for"] == losing_score[1]:
                                phg += 1
                            else:
                                dwts = 1
                    else:
                        dwts = 1
            else:
                dwts = 1
        
        losing_scores.append(losing_score[0])
        losing_teams.append(losing_score[1])
        winning_scores.append(winning_score[0])
        winning_teams.append(winning_score[1])
        goals_for_losing_team.append(phg)
        did_winning_team_score.append(dwts)
    else:
        losing_scores.append(0)
        losing_teams.append(home_team)
        winning_scores.append(0)
        winning_teams.append(away_team)
        goals_for_losing_team.append(0)
        did_winning_team_score.append(0)
    
pulled_goalies["losingScore"] = losing_scores
pulled_goalies["losingTeam"] = losing_teams
pulled_goalies["winningScore"] = winning_scores
pulled_goalies["winningTeam"] = winning_teams
pulled_goalies["goalsForLosingTeam"] = goals_for_losing_team
pulled_goalies["didWinningTeamScore"] = did_winning_team_score

pulled_goalies = pulled_goalies[pulled_goalies["losingScore"] != pulled_goalies["winningScore"]]

pulled_goalies["goal_diff"] = pulled_goalies["winningScore"] - pulled_goalies["losingScore"]


pulled_goalies.head()
    
    
Out[8]:
game_id player_id period shift_start shift_end season time_left losingScore losingTeam winningScore winningTeam goalsForLosingTeam didWinningTeamScore goal_diff
1599 2018020003 8469608 3 2400 3469.0 20182019 131.0 2 20 4 23 0 1 2
2646 2018020004 8474889 3 2400 3511.0 20182019 89.0 2 28 4 24 0 1 2
6597 2018020005 8475622 3 2400 3297.0 20182019 303.0 0 7 3 6 0 1 3
8294 2018020008 8475852 3 2400 3418.0 20182019 182.0 0 12 1 2 1 0 1
8708 2018020006 8468685 3 2400 3489.0 20182019 111.0 1 3 2 18 0 1 1

At this point, we have enough data to do some simple analysis. But what other factors might influence the losing team's ability to score? Well, if the losing team is usually pretty good on offense, then clearly they will be more likely to come back. Likewise, if the winning team is usually pretty good on defense, then a comeback will be less likely. Because of this, we should calculate how good each losing team's offense is, as well as how good each winning team's defense is. How can we quantify this? One way of measuring a team's offense is by using their Goals Per Game (GPG), which is the average number of goals that team scores in an average game. A way of quantifying a team's defense is by using their goalie's Goals Against Average (GAA), which is the average number of goals that they allow in an average game.

In [9]:
seasons = []
for index, row in game_goalie_stats.iterrows():
    seasons.append(int(games.loc[games["game_id"] == row["game_id"], "season"].drop_duplicates()))
    

game_goalie_stats["season"] = seasons
goalies = game_goalie_stats[["team_id", "season"]].drop_duplicates()


game_goalie_stats["goalsAgainst"] = game_goalie_stats["shots"] - game_goalie_stats["saves"]
GAA = []
for index, row in goalies.iterrows():
    total_GA = game_goalie_stats.loc[(game_goalie_stats["team_id"] == row["team_id"])&(game_goalie_stats["season"] == row["season"]), "goalsAgainst"].sum()
    TOI = game_goalie_stats.loc[(game_goalie_stats["team_id"] == row["team_id"])&(game_goalie_stats["season"] == row["season"]), "timeOnIce"].sum()
    GAA.append(total_GA/(TOI/3600))
    

teams = games[["away_team_id", "season"]].drop_duplicates()
GPG = []
for index, row in teams.iterrows():
    total_GA = games.loc[(games["away_team_id"] == row["away_team_id"])&(games["season"] == row["season"]), "away_goals"].sum()
    total_GA += games.loc[(games["home_team_id"] == row["away_team_id"])&(games["season"] == row["season"]), "home_goals"].sum()
    total_games = int(games.loc[(games["away_team_id"] == row["away_team_id"])&(games["season"] == row["season"])].shape[0]*2)
    GPG.append(total_GA/total_games)
    
goalies["GAA"] = GAA
teams["GPG"] = GPG

We then add this information to the main table:

In [10]:
losing_GPG = []
winning_GAA = []
for index, row in pulled_goalies.iterrows():
    losing_GPG.append(float(teams.loc[(teams["away_team_id"] == row["losingTeam"])&(teams["season"] == row["season"]), "GPG"]))
    winning_GAA.append(float(goalies.loc[(goalies["team_id"] == row["winningTeam"])&(goalies["season"] == row["season"]), "GAA"]))
    
pulled_goalies["losingGPG"] = losing_GPG
pulled_goalies["winningGAA"] = winning_GAA
In [11]:
pulled_goalies.head()
Out[11]:
game_id player_id period shift_start shift_end season time_left losingScore losingTeam winningScore winningTeam goalsForLosingTeam didWinningTeamScore goal_diff losingGPG winningGAA
1599 2018020003 8469608 3 2400 3469.0 20182019 131.0 2 20 4 23 0 1 2 3.488372 2.908813
2646 2018020004 8474889 3 2400 3511.0 20182019 89.0 2 28 4 24 0 1 2 3.470000 2.843036
6597 2018020005 8475622 3 2400 3297.0 20182019 303.0 0 7 3 6 0 1 3 2.756098 2.324237
8294 2018020008 8475852 3 2400 3418.0 20182019 182.0 0 12 1 2 1 0 1 2.897959 2.185846
8708 2018020006 8468685 3 2400 3489.0 20182019 111.0 1 3 2 18 0 1 1 2.768293 2.517451

We finally have all the information we need. In our analysis, we'll define a few measures of success. The first is simply whether the losing team came back and won the game: 1 represents a win, 0 represents a loss.

In [12]:
did_they_win = []
for index, row in pulled_goalies.iterrows():
    succ = 0
    game = games.loc[games["game_id"] == row["game_id"]]
    
    if (row["losingTeam"] == game["away_team_id"]).bool() and ("away win" in str(game["outcome"])):
        succ = 1
    elif (row["losingTeam"] == game["home_team_id"]).bool() and "home win" in game["outcome"]:
        succ = 1
    did_they_win.append(succ)

pulled_goalies["didTheyWin"] = did_they_win

The next is whether the losing team overcame their defecit, regardless of whether they won or lost: 1 represents if they did, 0 represents if they didn't.

In [13]:
pulled_goalies["didTheyComeBack"] = np.where(pulled_goalies["goal_diff"] - pulled_goalies["goalsForLosingTeam"] == 0, 1, 0)

The third measure of success will be net goals, i.e. the goals they scored with the net empty minus the goals their opponent scored.

In [14]:
pulled_goalies["net_goals"] = pulled_goalies["goalsForLosingTeam"] - pulled_goalies["didWinningTeamScore"]

Before we construct any models of this data, we will first make some general observations. First, one might wonder: how often are teams able to eliminate their defecit when they pull the goalie?

In [15]:
temp = pulled_goalies

print(temp.shape[0]) #sample size
print(temp.loc[temp["didTheyComeBack"] == 1].shape[0]) # number of games where they came back
print(temp.loc[temp["didTheyComeBack"] == 0].shape[0]) # number of games where they didn't come back
print(temp.loc[temp["didTheyComeBack"] == 1].shape[0]/temp.shape[0]) # proportion of games where they came back
8182
761
7421
0.09300904424346125

According to our data, pulling the goalie works about 9.3% of the time. Now, assuming parity in the league, one would expect teams to win even-scored games about equally as often as they lose them, so one would expect the losing team to win about 4.65% of the time:

In [16]:
print(temp.loc[temp["didTheyWin"] == 1].shape[0]/temp.shape[0])
0.02309948667807382

However, this is not the case; the losing teams win a measley 2.3% of the time when they pull the goalie. Why is this, what is causing this discrepency? There are a few explanations, and the first is with the data itself. The database is not 100% accurate, and as stated at the beginning of this tutorial, there are incorrect data points representing delayed penalties and mid-period goalie changes. These inaccurace data points will make our findings less accurate, but it certainly doesn't explain the above discrepency. Another explanation is that there isn't parity in these games: if a team gets out to a big lead at the beginning of the game, then there's a good chance that they're simply a better team.

Next we'll look at how success changes as the defecit changes:

In [17]:
temp1 = pulled_goalies[(pulled_goalies["goal_diff"] == 1)]
temp2 = pulled_goalies[(pulled_goalies["goal_diff"] == 2)]
temp3 = pulled_goalies[(pulled_goalies["goal_diff"] == 3)]

print(temp1.loc[temp["didTheyComeBack"] == 1].shape[0]/temp1.shape[0])
print(temp2.loc[temp["didTheyComeBack"] == 1].shape[0]/temp2.shape[0])
print(temp3.loc[temp["didTheyComeBack"] == 1].shape[0]/temp3.shape[0])
0.1578708946772367
0.02038664323374341
0.009259259259259259

This indicates that pulling the goalie will help eliminate a 1, 2, and 3 goal defecit 15.79%, 2.04%, and .93% of the time, respectively. This tells us the obvious: the larger your opponent's lead, the harder it will be to claw your way back. In fact, only once in the past 20 years has a team come back to win the game after pulling the goalie with a 3-0 defecit.

Next we'll look at the times that successful and unsuccessful teams decide to take the goalie out:

In [18]:
temp1 = pulled_goalies
temp2 = pulled_goalies[pulled_goalies["didTheyComeBack"] == 1]
temp3 = pulled_goalies[pulled_goalies["didTheyComeBack"] == 0]

q1 = temp2["time_left"].quantile(.25)
q3 = temp2["time_left"].quantile(.75)
mask = temp2["time_left"].between(q1, q3, inclusive=True)
iqr_success = temp2.loc[mask, "time_left"]

q1 = temp3["time_left"].quantile(.25)
q3 = temp3["time_left"].quantile(.75)
mask = temp3["time_left"].between(q1, q3, inclusive=True)
iqr_fail = temp3.loc[mask, "time_left"]


print(temp1.loc[temp1["didTheyComeBack"] == 1, "time_left"].median())
print(temp1.loc[temp1["didTheyComeBack"] == 0, "time_left"].median())
print()
print(iqr_success.mean())
print(iqr_fail.mean())
121.0
116.0

155.90414507772022
121.37707997852925

The above block calculated the median time and interquartile mean time that losing teams were successful/unsuccessful when pulling the goalie: the median successful time is 121 seconds left, while the median unsuccessful time is 116 seconds. Meanwhile, the IQM successful time is a full 34 seconds sooner on average than the IQM unsuccessful time. This gives the indication that teams should pull their goalies sooner than they do now.

NOTE: I decided to use the interquartile mean instead of the regular mean in order to account for the very large and very small outliers in the dataset; the IQM is insensitive to outliers, unlike the regular mean.

Lastly, we'll attempt to model the relationship between the different independent variables with success rate. We'll start by creating a neural network that takes 3 inputs (time left in the game, the losing team's GPG, and the winning team's GAA) and returns one output, which we can interpret as the probability that a team will come back if a goalie is pulled in a given situation.

In [19]:
data = pulled_goalies[pulled_goalies["goal_diff"] == 1]

X = data[["time_left", "losingGPG", "winningGAA"]]
y = data["didTheyComeBack"]

reg = MLPRegressor(hidden_layer_sizes=(10,10), activation = 'logistic', max_iter=1000)
reg.fit(X, y)

# sit = [[time left, GPG, GAA]]
sit1 = [[60, 3, 3]]
sit2 = [[150, 2, 4]]
sit3 = [[300, 2, 4]]
sit4 = [[1200, 4, 3]]
print(reg.predict(sit1))
print(reg.predict(sit2))
print(reg.predict(sit3))
print(reg.predict(sit4))
[0.09582011]
[0.17331494]
[0.27777061]
[0.37132783]

There is a very clear problem with this model, and that's that it's very close to being proportional to the the time left in the game. Situation 4 above is one in which the losing team is down by 1 and there are 20 minutes left in the game; it's logically impossible for the losing team to have such a high chance of winning if they pull the goalie. Problems show up in other kinds of models as well:

In [20]:
data = pulled_goalies[pulled_goalies["goal_diff"] == 1]

X = data[["time_left", "losingGPG", "winningGAA"]]
y = data["didTheyComeBack"]

reg = SVR()
reg.fit(X, y)

# sit = [[time left, GPG, GAA]]
sit1 = [[60, 3, 3]]
sit2 = [[150, 3, 3]]
sit3 = [[300, 3, 3]]
sit4 = [[1200, 3, 3]]
print(reg.predict(sit1))
print(reg.predict(sit2))
print(reg.predict(sit3))
print(reg.predict(sit4))
[0.10004893]
[0.10033785]
[0.09958022]
[0.10627567]

This time we trained a support vector machine, and the numbers are practically indistinguishable from each other. Despite all of the data wrangling we did to get to this point, there is very little insight that we can gather from these machine learning models: the situation is simply too complicated to be predicted with the data points that we chose. So if we can't build a machine learning model that tells us when the optimal time to pull a goalie, what insights can we gain from this? Let's look at goal frequency, and how often a goal is scored at even strength versus during a pulled-goalie scenareo.

In this step we will calculate the average goals-per-game in "regular" situations, then calculate the goals-against frequency and goals-for frequency with the goalie pulled.

In [21]:
regular_goals_number = pulled_goalies["losingScore"].sum() + pulled_goalies["winningScore"].sum()
regular_goals_frequency = regular_goals_number/(3600 * pulled_goalies.shape[0] - pulled_goalies["time_left"].sum())/2
print(regular_goals_frequency) # Expected goals scored per second
print(regular_goals_frequency*3600) # Expected goals scored per game
0.0006976627193157048
2.511585789536537

The above block calculates the goals-per-second and goals-per-game that one can expect in all situations where the goalie is not pulled. In any given second, a team has a .0698% chance of scoring. This translates to about 2.51 goals per 60 minutes, or one goal per 23.9 minutes.

In [22]:
goals_for_frequency = pulled_goalies["goalsForLosingTeam"].sum()/pulled_goalies["time_left"].sum()
goals_against_frequency = pulled_goalies["didWinningTeamScore"].sum()/pulled_goalies["time_left"].sum()

print(goals_for_frequency*3600)
print(goals_against_frequency*3600)
3.4886327803892905
7.0691292452778605

The above block calculates the goals-per-game that one can expect 1) if your goalie is pulled and 2) if your opponent has pulled the goalie. If your goalie is pulled, you can expect to score about 3.49 goals per 60 minutes, or one goal per 17.2 minutes. If your opponent's goalie is pulled, you can expect to score about 7.07 goals per 60 minutes, or one goal per 8.49 minutes.

This alligns with what we already suspect about pulling the goalie: it is a negative EV decision, but while your opponent is much more likely to score a goal, your likelihood of scoring also increases.

In this next part, we will simulate 1000 games for every 10-second time interval, and we will determine the percent of times that the losing team comes back from their defecit after pulling their goalie during this time interval.

In [24]:
df = pd.DataFrame([], columns = ["defecit","time_left","even_strength_comeback_percentage","pull_goalie_comeback_percentage"])
escp = []
defecits = []
timeLeft = []
for defecit in [1]: # was [1,2], this took too long to run
    for time in range(0,1200,10):
        es_comebacks = 0
        for i in range(1000): # simulate 1000 games with these conditions
            goalsTotal = 0
            for sec in range(time):
                if random.random() < regular_goals_frequency:
                    goalsTotal += 1
                elif random.random() < regular_goals_frequency:
                    goalsTotal -= 1
            
                if goalsTotal >= defecit:
                    es_comebacks += 1
                    break
        escp.append(es_comebacks/1000)
        defecits.append(defecit)
        timeLeft.append(time)
        
df["defecit"] = defecits
df["time_left"] = timeLeft
df["even_strength_comeback_percentage"] = escp

pgcp = []
for index, row in df.iterrows():
    pg_comebacks = 0
    for i in range(1000): # simulate 1000 games
        goalsTotal = 0
        for sec in range(row["time_left"]):
            if random.random() < goals_for_frequency:
                goalsTotal += 1
            elif random.random() < goals_against_frequency:
                goalsTotal -= 1
                
            if goalsTotal >= row["defecit"]:
                pg_comebacks += 1
                break
        
    pgcp.append(pg_comebacks/1000)

df["pull_goalie_comeback_percentage"] = pgcp

df["diff"] = df["pull_goalie_comeback_percentage"] - df["even_strength_comeback_percentage"]
In [25]:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(df.loc[df["defecit"] == 1])
defecit time_left even_strength_comeback_percentage pull_goalie_comeback_percentage diff
0 1 0 0.000 0.000 0.000
1 1 10 0.005 0.009 0.004
2 1 20 0.010 0.017 0.007
3 1 30 0.026 0.031 0.005
4 1 40 0.021 0.033 0.012
5 1 50 0.037 0.042 0.005
6 1 60 0.028 0.048 0.020
7 1 70 0.045 0.066 0.021
8 1 80 0.053 0.074 0.021
9 1 90 0.052 0.078 0.026
10 1 100 0.054 0.084 0.030
11 1 110 0.086 0.115 0.029
12 1 120 0.082 0.098 0.016
13 1 130 0.069 0.113 0.044
14 1 140 0.081 0.113 0.032
15 1 150 0.103 0.109 0.006
16 1 160 0.101 0.124 0.023
17 1 170 0.109 0.119 0.010
18 1 180 0.115 0.154 0.039
19 1 190 0.113 0.135 0.022
20 1 200 0.117 0.150 0.033
21 1 210 0.120 0.167 0.047
22 1 220 0.139 0.163 0.024
23 1 230 0.125 0.162 0.037
24 1 240 0.152 0.170 0.018
25 1 250 0.137 0.174 0.037
26 1 260 0.138 0.180 0.042
27 1 270 0.154 0.194 0.040
28 1 280 0.177 0.183 0.006
29 1 290 0.172 0.169 -0.003
30 1 300 0.192 0.212 0.020
31 1 310 0.168 0.199 0.031
32 1 320 0.187 0.205 0.018
33 1 330 0.173 0.207 0.034
34 1 340 0.170 0.208 0.038
35 1 350 0.186 0.211 0.025
36 1 360 0.207 0.239 0.032
37 1 370 0.186 0.216 0.030
38 1 380 0.198 0.221 0.023
39 1 390 0.180 0.221 0.041
40 1 400 0.228 0.235 0.007
41 1 410 0.195 0.263 0.068
42 1 420 0.215 0.246 0.031
43 1 430 0.221 0.245 0.024
44 1 440 0.222 0.231 0.009
45 1 450 0.224 0.258 0.034
46 1 460 0.243 0.246 0.003
47 1 470 0.251 0.249 -0.002
48 1 480 0.245 0.267 0.022
49 1 490 0.249 0.267 0.018
50 1 500 0.267 0.277 0.010
51 1 510 0.273 0.251 -0.022
52 1 520 0.260 0.264 0.004
53 1 530 0.263 0.267 0.004
54 1 540 0.278 0.300 0.022
55 1 550 0.269 0.297 0.028
56 1 560 0.296 0.299 0.003
57 1 570 0.291 0.279 -0.012
58 1 580 0.278 0.279 0.001
59 1 590 0.277 0.267 -0.010
60 1 600 0.264 0.306 0.042
61 1 610 0.300 0.296 -0.004
62 1 620 0.311 0.295 -0.016
63 1 630 0.290 0.283 -0.007
64 1 640 0.308 0.279 -0.029
65 1 650 0.291 0.301 0.010
66 1 660 0.306 0.312 0.006
67 1 670 0.280 0.317 0.037
68 1 680 0.305 0.329 0.024
69 1 690 0.333 0.286 -0.047
70 1 700 0.327 0.299 -0.028
71 1 710 0.308 0.334 0.026
72 1 720 0.336 0.292 -0.044
73 1 730 0.303 0.314 0.011
74 1 740 0.342 0.311 -0.031
75 1 750 0.310 0.321 0.011
76 1 760 0.333 0.344 0.011
77 1 770 0.309 0.314 0.005
78 1 780 0.383 0.318 -0.065
79 1 790 0.348 0.358 0.010
80 1 800 0.370 0.357 -0.013
81 1 810 0.354 0.349 -0.005
82 1 820 0.348 0.328 -0.020
83 1 830 0.376 0.352 -0.024
84 1 840 0.351 0.347 -0.004
85 1 850 0.356 0.366 0.010
86 1 860 0.371 0.332 -0.039
87 1 870 0.381 0.363 -0.018
88 1 880 0.389 0.346 -0.043
89 1 890 0.371 0.335 -0.036
90 1 900 0.379 0.351 -0.028
91 1 910 0.392 0.329 -0.063
92 1 920 0.367 0.331 -0.036
93 1 930 0.363 0.348 -0.015
94 1 940 0.361 0.322 -0.039
95 1 950 0.393 0.342 -0.051
96 1 960 0.411 0.352 -0.059
97 1 970 0.389 0.335 -0.054
98 1 980 0.377 0.387 0.010
99 1 990 0.392 0.339 -0.053
100 1 1000 0.360 0.354 -0.006
101 1 1010 0.443 0.360 -0.083
102 1 1020 0.399 0.347 -0.052
103 1 1030 0.421 0.364 -0.057
104 1 1040 0.405 0.377 -0.028
105 1 1050 0.390 0.349 -0.041
106 1 1060 0.403 0.353 -0.050
107 1 1070 0.398 0.356 -0.042
108 1 1080 0.408 0.338 -0.070
109 1 1090 0.406 0.381 -0.025
110 1 1100 0.415 0.372 -0.043
111 1 1110 0.409 0.381 -0.028
112 1 1120 0.416 0.360 -0.056
113 1 1130 0.415 0.365 -0.050
114 1 1140 0.417 0.398 -0.019
115 1 1150 0.435 0.389 -0.046
116 1 1160 0.427 0.357 -0.070
117 1 1170 0.425 0.351 -0.074
118 1 1180 0.428 0.393 -0.035
119 1 1190 0.462 0.361 -0.101

Shown above is the table of all simulation results for games with a one-goal defecit: it contains the game's time, the percent of the time that the team came back from their defecit without ever pulling the goalie, the percent of the time that the team came back after pulling the goalie at time = "time_left", and the difference between these two percentages. Shown below is a scatter plot of the difference between the percentages versus time left in the game when the goalie was pulled.

In [26]:
temp = df.loc[df["defecit"] == 1]

ax = plt.scatter(temp["time_left"], temp["diff"])
plt.ylabel("Comeback Percent Difference")
plt.xlabel("Time Left In Game (Seconds)")

plt.show()

Coaches should want to give their team the greatest opportunity to succeed, so they should decide to pull the goalie when pulling the goalie gives them the best advantage versus leaving the goalie in. According to this graph, with a one-goal defecit this happens around the 300-second mark: that is the approximate maximum of this graph.

Conclusion:

This is an imperfect analysis, but it indicates something that many hockey enthusiasts would not expect: if teams want to win games, they should pull their goalies sooner. On average, successful teams pull their goalie with around 2 minutes and 30 seconds left in the game, which is a full 30 seconds sooner than unsuccessful teams. When examining times that teams score and get scored on with the goalie pulled, we also found that teams pulling their goalie with up to 5 minutes remaining could be the best possible option. Further analysis needs to be done, but this shows one thing: if hockey teams want to win games, they should pull their goalies earlier.

In [ ]: