The Chargers and Close Games

A Good Way to Lose

As a Chargers fan for 20+ years, the last 5ish years have been interesting to say the least. In some ways it has been comical watching coach after coach lead Philip Rivers to an endless line of close defeats. It’s funny to blame the quarterback, but the failure modes transcend beyond quarterback play. The on-field buffoonery has been written about plenty. Click here for a comprehensive review. I am curious to see if other teams experience close contests and if they similarly suck. Below we will examine close game frequency and win percentage across the NFL to see how the Chargers stack up against the rest of the league.

Get Ready

While R is my native tongue, this post contains some python. If you would like to follow along, you’ll need to be prepared to use Python and should consider a consult with the reticulate documentation.

library(tidyverse)
library(reticulate)
library(rvest)
library(janitor)
library(glue)
library(nflfastR)
library(ggimage)

# i use this to force python chunks to use anaconda distro
#use_condaenv("anaconda3", required = TRUE)

Finding a Game Log

In order to compare close game rates between NFL teams, we’ll need to find a source for historical scores. Pro Football Reference looks to contain the information we need. Inspecting the HTML in a browser, you can see the table class id is “gamelog2020”. With the rvest package, we can parse the html and search xpaths for “//*[@id="gamelog2020"]“. We’ll rely on the html_table() function to return a list of child tables (which should result in 1 table).

url <- 'https://www.pro-football-reference.com/teams/sdg/2020/gamelog.html'

url %>% 
  read_html() %>% 
  html_nodes(xpath = '//*[@id="gamelog2020"]') %>%
  html_table() %>% 
  .[[1]] %>%  # when result is list of 1 dataframe, we can extract this way
  clean_names() %>% 
  glimpse()
## Rows: 17
## Columns: 36
## $ x          <chr> "Week", "1", "2", "3", "4", "5", "7", "8", "9", "10", "11",~
## $ x_2        <chr> "Day", "Sun", "Sun", "Sun", "Sun", "Mon", "Sun", "Sun", "Su~
## $ x_3        <chr> "Date", "September 13", "September 20", "September 27", "Oc~
## $ x_4        <chr> "", "boxscore", "boxscore", "boxscore", "boxscore", "boxsco~
## $ x_5        <chr> "", "W", "L", "L", "L", "L", "W", "L", "L", "L", "W", "L", ~
## $ x_6        <chr> "OT", "", "OT", "", "", "OT", "", "", "", "", "", "", "", "~
## $ x_7        <chr> "", "@", "", "", "@", "@", "", "@", "", "@", "", "@", "", "~
## $ x_8        <chr> "Opp", "Cincinnati Bengals", "Kansas City Chiefs", "Carolin~
## $ score      <chr> "Tm", "16", "20", "16", "31", "27", "39", "30", "26", "21",~
## $ score_2    <chr> "Opp", "13", "23", "21", "38", "30", "29", "31", "31", "29"~
## $ passing    <chr> "Cmp", "16", "22", "35", "20", "20", "28", "29", "28", "20"~
## $ passing_2  <chr> "Att", "30", "33", "49", "25", "34", "44", "44", "42", "32"~
## $ passing_3  <chr> "Yds", "207", "296", "319", "278", "239", "349", "275", "31~
## $ passing_4  <chr> "TD", "0", "1", "1", "3", "4", "3", "3", "2", "2", "3", "1"~
## $ passing_5  <chr> "Int", "0", "1", "1", "1", "0", "0", "2", "0", "1", "0", "1~
## $ passing_6  <chr> "Sk", "2", "2", "2", "2", "3", "1", "2", "2", "2", "3", "3"~
## $ passing_7  <chr> "Yds", "1", "15", "11", "12", "25", "2", "3", "14", "13", "~
## $ passing_8  <chr> "Y/A", "6.9", "9.4", "6.7", "11.6", "7.8", "8.0", "6.3", "7~
## $ passing_9  <chr> "NY/A", "6.5", "8.5", "6.3", "10.3", "6.5", "7.8", "6.0", "~
## $ passing_10 <chr> "Cmp%", "53.3", "66.7", "71.4", "80", "58.8", "63.6", "65.9~
## $ passing_11 <chr> "Rate", "75.3", "92.5", "87.0", "135.9", "119.6", "110.9", ~
## $ rushing    <chr> "Att", "39", "44", "23", "23", "32", "32", "38", "31", "29"~
## $ rushing_2  <chr> "Yds", "155", "183", "117", "46", "111", "135", "210", "128~
## $ rushing_3  <chr> "Y/A", "4.0", "4.2", "5.1", "2.0", "3.5", "4.2", "5.5", "4.~
## $ rushing_4  <chr> "TD", "1", "1", "1", "0", "0", "2", "0", "1", "1", "0", "1"~
## $ scoring    <chr> "FGM", "3", "2", "1", "1", "0", "2", "3", "2", "0", "2", "1~
## $ scoring_2  <chr> "FGA", "4", "2", "1", "2", "1", "3", "3", "3", "0", "2", "1~
## $ scoring_3  <chr> "XPM", "1", "2", "1", "4", "3", "3", "3", "2", "3", "4", "0~
## $ scoring_4  <chr> "XPA", "1", "2", "1", "4", "4", "4", "3", "2", "3", "4", "1~
## $ punting    <chr> "Pnt", "5", "3", "3", "3", "7", "4", "3", "3", "5", "3", "6~
## $ punting_2  <chr> "Yds", "244", "131", "119", "160", "339", "138", "117", "14~
## $ downs      <chr> "3DConv", "6", "6", "10", "4", "8", "6", "7", "7", "4", "9"~
## $ downs_2    <chr> "3DAtt", "16", "13", "15", "10", "17", "14", "16", "15", "1~
## $ downs_3    <chr> "4DConv", "0", "1", "0", "1", "0", "0", "0", "2", "2", "0",~
## $ downs_4    <chr> "4DAtt", "2", "2", "0", "1", "1", "0", "1", "2", "3", "1", ~
## $ x_9        <chr> "ToP", "30:10", "39:27", "31:11", "25:05", "30:42", "34:31"~

Data Wrangle

This is a good start, but we need to clean the data. The first row contains the column headers and there are quite a few columns that are not needed to compare final scores. We’ll remove the first row with slice(), add a column to distinguish the season and clean up the column names with dplyr.

url %>% 
  read_html() %>% 
  html_nodes(xpath = '//*[@id="gamelog2020"]') %>%
  html_table() %>% 
  .[[1]] %>% # when result is list of 1 dataframe, we can extract this way
  clean_names() %>% 
  slice(-1) %>% # remove first row 
  mutate(season = '2020') %>%
  select(season, x, x_8, score, score_2) %>% 
  rename(week = x,
         opp = x_8,
         team_score = score,
         opp_score = score_2) %>% 
  kable(format = 'html') %>% 
  kable_styling()
season week opp team_score opp_score
2020 1 Cincinnati Bengals 16 13
2020 2 Kansas City Chiefs 20 23
2020 3 Carolina Panthers 16 21
2020 4 Tampa Bay Buccaneers 31 38
2020 5 New Orleans Saints 27 30
2020 7 Jacksonville Jaguars 39 29
2020 8 Denver Broncos 30 31
2020 9 Las Vegas Raiders 26 31
2020 10 Miami Dolphins 21 29
2020 11 New York Jets 34 28
2020 12 Buffalo Bills 17 27
2020 13 New England Patriots 0 45
2020 14 Atlanta Falcons 20 17
2020 15 Las Vegas Raiders 30 27
2020 16 Denver Broncos 19 16
2020 17 Kansas City Chiefs 38 21

Retrieving Multiple Seasons

It’s nice having one season’s worth of scores, but we are trying to acquire scores since 2015. Below we turn the code above into a function. Using the glue package, we’ll simply set up our code to accept a ‘year’ parameter. The additional data wrangling steps are for using the ‘year’ input as a column to distinguish the season as well as extracting the team slug in the url for generating a ‘team’ column. Using purrr, we can map our function to the input list of seasons. The result is a nice, clean data frame.

scrape_pfr_team_game_logs <- function(year) {
  url <- glue::glue('https://www.pro-football-reference.com/teams/sdg/{year}/gamelog.html')
  
  url %>% 
  read_html() %>% 
  html_nodes(xpath = glue::glue('//*[@id="gamelog{year}"]')) %>%
  html_table() %>% 
  .[[1]] %>% 
  clean_names() %>% 
  slice(-1) %>% 
  mutate(season = as.character(year), # use year parameter to annotate season
         team = substr(url, 46, 48)) %>% # use portion of url to label the team
  select(season, x, team, x_8, score, score_2) %>% 
  rename(week = x,
         opp = x_8,
         team_score = score,
         opp_score = score_2)
}

years <- 2015:2020

scores <- map_df(years, scrape_pfr_team_game_logs)

scores %>%
  sample_n(10) %>%  # picking 10 random rows to highlight
  kable(format = 'html') %>% 
  kable_styling()
season week team opp team_score opp_score
2017 15 sdg Kansas City Chiefs 13 30
2017 16 sdg New York Jets 14 7
2017 11 sdg Buffalo Bills 54 24
2017 10 sdg Jacksonville Jaguars 17 20
2018 12 sdg Arizona Cardinals 45 10
2017 5 sdg New York Giants 27 22
2015 14 sdg Kansas City Chiefs 3 10
2017 3 sdg Kansas City Chiefs 10 24
2017 7 sdg Denver Broncos 21 0
2016 8 sdg Denver Broncos 19 27

Retrieving Multiple Teams’ Seasons

In order to compare the Chargers against the rest of the league, we’ll need game logs from all of the other teams. Usually I can use rvest and obtain whatever I need from the web, but here, I decided BeautifulSoup was a better fit. With this python library, it’s a bit more intuitive to find explicit tags. The ability to string together super flexible find commands was extremely helpful for this particular website.

What we’re trying to do here is find our base URLs for the other teams. Once you locate the class that houses all of the links, you can parse them with regex. We’ll loop through the results and generate a list of base URLs as shown below.

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen("https://www.pro-football-reference.com/teams/")
bsObj = BeautifulSoup(html)

teamSlugs = bsObj.find("div", {"id":"all_teams_active"}).findAll("a", href = re.compile("^(/teams/)((?!:).)*$"))

teamSlugList = []
for ts in teamSlugs[0:]:
  result = ts.attrs["href"]
  teamSlugList.append(result)
  
print(teamSlugList)
## ['/teams/crd/', '/teams/atl/', '/teams/rav/', '/teams/buf/', '/teams/car/', '/teams/chi/', '/teams/cin/', '/teams/cle/', '/teams/dal/', '/teams/den/', '/teams/det/', '/teams/gnb/', '/teams/htx/', '/teams/clt/', '/teams/jax/', '/teams/kan/', '/teams/rai/', '/teams/sdg/', '/teams/ram/', '/teams/mia/', '/teams/min/', '/teams/nwe/', '/teams/nor/', '/teams/nyg/', '/teams/nyj/', '/teams/phi/', '/teams/pit/', '/teams/sfo/', '/teams/sea/', '/teams/tam/', '/teams/oti/', '/teams/was/']

Another Function

First on the agenda is transferring our python object over to R. Then we need to parse the slug.

team_slugs <- py$teamSlugList %>% # pass to R
  map(~ substr(., 8, 10)) %>% # extract team slugs
  unlist()

print(team_slugs[1:5])
## [1] "crd" "atl" "rav" "buf" "car"

Using crossing(), we’ll generate all combinations of teams and years. Updating our function too allow for ‘team’ and ‘year’, this will give us two arguments for our scrape function. With purrr, pmap() accepts multiple arguments which supports our effort here. This will take a few minutes to run.

teams_and_years <- crossing(year = years, team = team_slugs) # all possible combinations

# use glue to concat input w/ strings
scrape_pfr_team_game_logs <- function(team, year) {
  url <- glue::glue('https://www.pro-football-reference.com/teams/{team}/{year}/gamelog.html')
  
  url %>% 
  read_html() %>% 
  html_nodes(xpath = glue::glue('//*[@id="gamelog{year}"]')) %>%
  html_table() %>% 
  .[[1]] %>% 
  clean_names() %>% 
  slice(-1) %>% 
  mutate(season = as.character(year),
         team = substr(url, 46, 48)) %>% 
  select(season, x, team, x_8, score, score_2) %>% 
  rename(week = x,
         opp = x_8,
         team_score = score,
         opp_score = score_2) %>%
  filter(!is.na(team_score)) # unplayed games can be filtered out
}

all_scores <- pmap_df(teams_and_years, scrape_pfr_team_game_logs)

all_scores %>% 
  sample_n(15) %>% 
  kable(format = 'html') %>% 
  kable_styling()
season week team opp team_score opp_score
2015 3 rai Cleveland Browns 27 20
2018 4 cin Atlanta Falcons 37 36
2015 17 sfo St. Louis Rams 19 16
2019 5 sdg Denver Broncos 13 20
2020 8 det Indianapolis Colts 21 41
2020 17 oti Houston Texans 41 38
2019 5 oti Buffalo Bills 7 14
2018 1 tam New Orleans Saints 48 40
2015 14 buf Philadelphia Eagles 20 23
2020 17 mia Buffalo Bills 26 56
2016 17 min Chicago Bears 38 10
2017 15 mia Buffalo Bills 16 24
2019 16 nwe Buffalo Bills 24 17
2017 4 sdg Philadelphia Eagles 24 26
2017 16 tam Carolina Panthers 19 22

Transform the Data

Equipped with several years of final scores from the NFL, we can start to perform transformations and do some digging. We’ll start by calculating three additional fields. ‘score_difference’ will hold the margin of victory or defeat. ‘within_a_score’ will determine if a game’s final score is decided by 8 points or less. ‘is_win’ will store a 1 for wins and 0 for losses. Within our the summarize call, we’ll assemble metrics as follows: ‘win_pct’ is the percentage of games resulting in a win. ‘close_game_pct’ is the percentage of games decided by 8 points or less. ‘close_game_win_pct’ represents win percentage in games decided by 8 points or less. ‘pct_win_close’ calculates the percentage of a team’s wins which were decided by 8 points or less.

transform <- all_scores %>%
  mutate(team = case_when(team == 'sdg' ~ 'lac',
                          team == 'oti' ~ 'ten',
                          TRUE ~ as.character(team))) %>% 
  mutate_at(vars(team_score, opp_score),
            as.numeric) %>% 
  mutate(score_difference = abs(team_score - opp_score),
         within_a_score = case_when(score_difference <= 8 ~ 1,
                                  TRUE ~ 0),
         is_win = case_when(team_score > opp_score ~ 1,
                            TRUE ~ 0)) %>% 
  group_by(team) %>% 
  summarize(win_pct = sum(is_win) / n(),
            close_game_pct = sum(within_a_score) / n(),
            close_game_win_pct = sum(is_win[within_a_score == 1]) / length(team[within_a_score ==1]),
            pct_win_close = sum(within_a_score[is_win == 1]) / sum(is_win))

Visualize

With the data oriented in this manner, we can begin to evaluate the prevalence and performance in close games for the Chargers relative to the rest of the league. First, it’s worth exploring if Chargers’ games are more likely to be close than games played by other teams. As shown, the Chargers do indeed hold the top close game rate in the NFL. I’m shocked.

transform %>% 
  select(team, close_game_pct) %>% 
  arrange(desc(close_game_pct)) %>%
  mutate(close_game_pct = round(100*close_game_pct,0)) %>% 
  slice(1:10) %>% 
  kable(format = 'html') %>% 
  kable_styling()
team close_game_pct
lac 66
sea 62
chi 61
nyg 60
clt 59
pit 59
htx 58
det 57
tam 57
atl 56

Next, Let’s explore which teams have the worst win percentage in close games. Stated another way, which teams have the worst win percentage in games decided by 8 points or less? The Chargers come in ranked tied for 3rd worst win percentage in close games. Considering that my team plays in close games nearly two-thirds of the time and wins a little over a third of those contests, it’s no wonder this is a frustrated fan base.

transform %>% 
  select(team, close_game_win_pct) %>% 
  arrange(close_game_win_pct) %>%
  mutate(close_game_win_pct = round(100*close_game_win_pct,0)) %>% 
  slice(1:10) %>% 
  kable(format = 'html') %>% 
  kable_styling()
team close_game_win_pct
jax 29
cin 30
lac 37
cle 37
sfo 41
nyg 41
nyj 43
tam 44
chi 44
rav 44
team_data <- teams_colors_logos %>% 
  mutate(team = tolower(team_abbr),
         team = case_when(team == 'gb' ~ 'gnb',
                          team == 'sd' ~ 'lac',
                          team == 'ari' ~ 'crd',
                          team == 'bal' ~ 'rav',
                          team == 'hou' ~ 'htx',
                          team == 'ind' ~ 'clt',
                          team == 'kc' ~ 'kan',
                          team == 'no' ~ 'nor',
                          team == 'ne' ~ 'nwe',
                          team == 'oak' ~ 'rai',
                          team == 'sf' ~ 'sfo',
                          team == 'tb' ~ 'tam',
                          team == 'lar' ~ 'ram',
                          TRUE ~ as.character(team)))

plot <- transform %>% 
  left_join(team_data)
  
ggplot(plot, aes(x = close_game_win_pct, y = close_game_pct, label = team)) + 
  geom_image(aes(image = team_logo_wikipedia), size = 0.05, by = "width", asp = 1.618) + 
  labs(x = 'Win Percentage in One Score Games',
       y = 'Percentage of One Score Games') + 
  theme_bw()

 Share!