A Good Way to Lose
As a Chargers fan for 20+ years, the last 5ish years have been interesting to say the least. In some ways it has been comical watching coach after coach lead Philip Rivers to an endless line of close defeats. It’s funny to blame the quarterback, but the failure modes transcend beyond quarterback play. The on-field buffoonery has been written about plenty. Click here for a comprehensive review. I am curious to see if other teams experience close contests and if they similarly suck. Below we will examine close game frequency and win percentage across the NFL to see how the Chargers stack up against the rest of the league.
Get Ready
While R is my native tongue, this post contains some python. If you would like to follow along, you’ll need to be prepared to use Python and should consider a consult with the reticulate
documentation.
library(tidyverse)
library(reticulate)
library(rvest)
library(janitor)
library(glue)
library(nflfastR)
library(ggimage)
# i use this to force python chunks to use anaconda distro
#use_condaenv("anaconda3", required = TRUE)
Finding a Game Log
In order to compare close game rates between NFL teams, we’ll need to find a source for historical scores. Pro Football Reference looks to contain the information we need. Inspecting the HTML in a browser, you can see the table class id is “gamelog2020”. With the rvest
package, we can parse the html and search xpaths for “//*[@id="gamelog2020"]“. We’ll rely on the html_table()
function to return a list of child tables (which should result in 1 table).
url <- 'https://www.pro-football-reference.com/teams/sdg/2020/gamelog.html'
url %>%
read_html() %>%
html_nodes(xpath = '//*[@id="gamelog2020"]') %>%
html_table() %>%
.[[1]] %>% # when result is list of 1 dataframe, we can extract this way
clean_names() %>%
glimpse()
## Rows: 17
## Columns: 36
## $ x <chr> "Week", "1", "2", "3", "4", "5", "7", "8", "9", "10", "11",~
## $ x_2 <chr> "Day", "Sun", "Sun", "Sun", "Sun", "Mon", "Sun", "Sun", "Su~
## $ x_3 <chr> "Date", "September 13", "September 20", "September 27", "Oc~
## $ x_4 <chr> "", "boxscore", "boxscore", "boxscore", "boxscore", "boxsco~
## $ x_5 <chr> "", "W", "L", "L", "L", "L", "W", "L", "L", "L", "W", "L", ~
## $ x_6 <chr> "OT", "", "OT", "", "", "OT", "", "", "", "", "", "", "", "~
## $ x_7 <chr> "", "@", "", "", "@", "@", "", "@", "", "@", "", "@", "", "~
## $ x_8 <chr> "Opp", "Cincinnati Bengals", "Kansas City Chiefs", "Carolin~
## $ score <chr> "Tm", "16", "20", "16", "31", "27", "39", "30", "26", "21",~
## $ score_2 <chr> "Opp", "13", "23", "21", "38", "30", "29", "31", "31", "29"~
## $ passing <chr> "Cmp", "16", "22", "35", "20", "20", "28", "29", "28", "20"~
## $ passing_2 <chr> "Att", "30", "33", "49", "25", "34", "44", "44", "42", "32"~
## $ passing_3 <chr> "Yds", "207", "296", "319", "278", "239", "349", "275", "31~
## $ passing_4 <chr> "TD", "0", "1", "1", "3", "4", "3", "3", "2", "2", "3", "1"~
## $ passing_5 <chr> "Int", "0", "1", "1", "1", "0", "0", "2", "0", "1", "0", "1~
## $ passing_6 <chr> "Sk", "2", "2", "2", "2", "3", "1", "2", "2", "2", "3", "3"~
## $ passing_7 <chr> "Yds", "1", "15", "11", "12", "25", "2", "3", "14", "13", "~
## $ passing_8 <chr> "Y/A", "6.9", "9.4", "6.7", "11.6", "7.8", "8.0", "6.3", "7~
## $ passing_9 <chr> "NY/A", "6.5", "8.5", "6.3", "10.3", "6.5", "7.8", "6.0", "~
## $ passing_10 <chr> "Cmp%", "53.3", "66.7", "71.4", "80", "58.8", "63.6", "65.9~
## $ passing_11 <chr> "Rate", "75.3", "92.5", "87.0", "135.9", "119.6", "110.9", ~
## $ rushing <chr> "Att", "39", "44", "23", "23", "32", "32", "38", "31", "29"~
## $ rushing_2 <chr> "Yds", "155", "183", "117", "46", "111", "135", "210", "128~
## $ rushing_3 <chr> "Y/A", "4.0", "4.2", "5.1", "2.0", "3.5", "4.2", "5.5", "4.~
## $ rushing_4 <chr> "TD", "1", "1", "1", "0", "0", "2", "0", "1", "1", "0", "1"~
## $ scoring <chr> "FGM", "3", "2", "1", "1", "0", "2", "3", "2", "0", "2", "1~
## $ scoring_2 <chr> "FGA", "4", "2", "1", "2", "1", "3", "3", "3", "0", "2", "1~
## $ scoring_3 <chr> "XPM", "1", "2", "1", "4", "3", "3", "3", "2", "3", "4", "0~
## $ scoring_4 <chr> "XPA", "1", "2", "1", "4", "4", "4", "3", "2", "3", "4", "1~
## $ punting <chr> "Pnt", "5", "3", "3", "3", "7", "4", "3", "3", "5", "3", "6~
## $ punting_2 <chr> "Yds", "244", "131", "119", "160", "339", "138", "117", "14~
## $ downs <chr> "3DConv", "6", "6", "10", "4", "8", "6", "7", "7", "4", "9"~
## $ downs_2 <chr> "3DAtt", "16", "13", "15", "10", "17", "14", "16", "15", "1~
## $ downs_3 <chr> "4DConv", "0", "1", "0", "1", "0", "0", "0", "2", "2", "0",~
## $ downs_4 <chr> "4DAtt", "2", "2", "0", "1", "1", "0", "1", "2", "3", "1", ~
## $ x_9 <chr> "ToP", "30:10", "39:27", "31:11", "25:05", "30:42", "34:31"~
Data Wrangle
This is a good start, but we need to clean the data. The first row contains the column headers and there are quite a few columns that are not needed to compare final scores. We’ll remove the first row with slice()
, add a column to distinguish the season and clean up the column names with dplyr
.
url %>%
read_html() %>%
html_nodes(xpath = '//*[@id="gamelog2020"]') %>%
html_table() %>%
.[[1]] %>% # when result is list of 1 dataframe, we can extract this way
clean_names() %>%
slice(-1) %>% # remove first row
mutate(season = '2020') %>%
select(season, x, x_8, score, score_2) %>%
rename(week = x,
opp = x_8,
team_score = score,
opp_score = score_2) %>%
kable(format = 'html') %>%
kable_styling()
season | week | opp | team_score | opp_score |
---|---|---|---|---|
2020 | 1 | Cincinnati Bengals | 16 | 13 |
2020 | 2 | Kansas City Chiefs | 20 | 23 |
2020 | 3 | Carolina Panthers | 16 | 21 |
2020 | 4 | Tampa Bay Buccaneers | 31 | 38 |
2020 | 5 | New Orleans Saints | 27 | 30 |
2020 | 7 | Jacksonville Jaguars | 39 | 29 |
2020 | 8 | Denver Broncos | 30 | 31 |
2020 | 9 | Las Vegas Raiders | 26 | 31 |
2020 | 10 | Miami Dolphins | 21 | 29 |
2020 | 11 | New York Jets | 34 | 28 |
2020 | 12 | Buffalo Bills | 17 | 27 |
2020 | 13 | New England Patriots | 0 | 45 |
2020 | 14 | Atlanta Falcons | 20 | 17 |
2020 | 15 | Las Vegas Raiders | 30 | 27 |
2020 | 16 | Denver Broncos | 19 | 16 |
2020 | 17 | Kansas City Chiefs | 38 | 21 |
Retrieving Multiple Seasons
It’s nice having one season’s worth of scores, but we are trying to acquire scores since 2015. Below we turn the code above into a function. Using the glue
package, we’ll simply set up our code to accept a ‘year’ parameter. The additional data wrangling steps are for using the ‘year’ input as a column to distinguish the season as well as extracting the team slug in the url for generating a ‘team’ column. Using purrr
, we can map our function to the input list of seasons. The result is a nice, clean data frame.
scrape_pfr_team_game_logs <- function(year) {
url <- glue::glue('https://www.pro-football-reference.com/teams/sdg/{year}/gamelog.html')
url %>%
read_html() %>%
html_nodes(xpath = glue::glue('//*[@id="gamelog{year}"]')) %>%
html_table() %>%
.[[1]] %>%
clean_names() %>%
slice(-1) %>%
mutate(season = as.character(year), # use year parameter to annotate season
team = substr(url, 46, 48)) %>% # use portion of url to label the team
select(season, x, team, x_8, score, score_2) %>%
rename(week = x,
opp = x_8,
team_score = score,
opp_score = score_2)
}
years <- 2015:2020
scores <- map_df(years, scrape_pfr_team_game_logs)
scores %>%
sample_n(10) %>% # picking 10 random rows to highlight
kable(format = 'html') %>%
kable_styling()
season | week | team | opp | team_score | opp_score |
---|---|---|---|---|---|
2017 | 15 | sdg | Kansas City Chiefs | 13 | 30 |
2017 | 16 | sdg | New York Jets | 14 | 7 |
2017 | 11 | sdg | Buffalo Bills | 54 | 24 |
2017 | 10 | sdg | Jacksonville Jaguars | 17 | 20 |
2018 | 12 | sdg | Arizona Cardinals | 45 | 10 |
2017 | 5 | sdg | New York Giants | 27 | 22 |
2015 | 14 | sdg | Kansas City Chiefs | 3 | 10 |
2017 | 3 | sdg | Kansas City Chiefs | 10 | 24 |
2017 | 7 | sdg | Denver Broncos | 21 | 0 |
2016 | 8 | sdg | Denver Broncos | 19 | 27 |
Retrieving Multiple Teams’ Seasons
In order to compare the Chargers against the rest of the league, we’ll need game logs from all of the other teams. Usually I can use rvest
and obtain whatever I need from the web, but here, I decided BeautifulSoup
was a better fit. With this python library, it’s a bit more intuitive to find explicit tags. The ability to string together super flexible find
commands was extremely helpful for this particular website.
What we’re trying to do here is find our base URLs for the other teams. Once you locate the class that houses all of the links, you can parse them with regex
. We’ll loop through the results and generate a list of base URLs as shown below.
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("https://www.pro-football-reference.com/teams/")
bsObj = BeautifulSoup(html)
teamSlugs = bsObj.find("div", {"id":"all_teams_active"}).findAll("a", href = re.compile("^(/teams/)((?!:).)*$"))
teamSlugList = []
for ts in teamSlugs[0:]:
result = ts.attrs["href"]
teamSlugList.append(result)
print(teamSlugList)
## ['/teams/crd/', '/teams/atl/', '/teams/rav/', '/teams/buf/', '/teams/car/', '/teams/chi/', '/teams/cin/', '/teams/cle/', '/teams/dal/', '/teams/den/', '/teams/det/', '/teams/gnb/', '/teams/htx/', '/teams/clt/', '/teams/jax/', '/teams/kan/', '/teams/rai/', '/teams/sdg/', '/teams/ram/', '/teams/mia/', '/teams/min/', '/teams/nwe/', '/teams/nor/', '/teams/nyg/', '/teams/nyj/', '/teams/phi/', '/teams/pit/', '/teams/sfo/', '/teams/sea/', '/teams/tam/', '/teams/oti/', '/teams/was/']
Another Function
First on the agenda is transferring our python object over to R. Then we need to parse the slug.
team_slugs <- py$teamSlugList %>% # pass to R
map(~ substr(., 8, 10)) %>% # extract team slugs
unlist()
print(team_slugs[1:5])
## [1] "crd" "atl" "rav" "buf" "car"
Using crossing()
, we’ll generate all combinations of teams and years. Updating our function too allow for ‘team’ and ‘year’, this will give us two arguments for our scrape function. With purrr
, pmap()
accepts multiple arguments which supports our effort here. This will take a few minutes to run.
teams_and_years <- crossing(year = years, team = team_slugs) # all possible combinations
# use glue to concat input w/ strings
scrape_pfr_team_game_logs <- function(team, year) {
url <- glue::glue('https://www.pro-football-reference.com/teams/{team}/{year}/gamelog.html')
url %>%
read_html() %>%
html_nodes(xpath = glue::glue('//*[@id="gamelog{year}"]')) %>%
html_table() %>%
.[[1]] %>%
clean_names() %>%
slice(-1) %>%
mutate(season = as.character(year),
team = substr(url, 46, 48)) %>%
select(season, x, team, x_8, score, score_2) %>%
rename(week = x,
opp = x_8,
team_score = score,
opp_score = score_2) %>%
filter(!is.na(team_score)) # unplayed games can be filtered out
}
all_scores <- pmap_df(teams_and_years, scrape_pfr_team_game_logs)
all_scores %>%
sample_n(15) %>%
kable(format = 'html') %>%
kable_styling()
season | week | team | opp | team_score | opp_score |
---|---|---|---|---|---|
2015 | 3 | rai | Cleveland Browns | 27 | 20 |
2018 | 4 | cin | Atlanta Falcons | 37 | 36 |
2015 | 17 | sfo | St. Louis Rams | 19 | 16 |
2019 | 5 | sdg | Denver Broncos | 13 | 20 |
2020 | 8 | det | Indianapolis Colts | 21 | 41 |
2020 | 17 | oti | Houston Texans | 41 | 38 |
2019 | 5 | oti | Buffalo Bills | 7 | 14 |
2018 | 1 | tam | New Orleans Saints | 48 | 40 |
2015 | 14 | buf | Philadelphia Eagles | 20 | 23 |
2020 | 17 | mia | Buffalo Bills | 26 | 56 |
2016 | 17 | min | Chicago Bears | 38 | 10 |
2017 | 15 | mia | Buffalo Bills | 16 | 24 |
2019 | 16 | nwe | Buffalo Bills | 24 | 17 |
2017 | 4 | sdg | Philadelphia Eagles | 24 | 26 |
2017 | 16 | tam | Carolina Panthers | 19 | 22 |
Transform the Data
Equipped with several years of final scores from the NFL, we can start to perform transformations and do some digging. We’ll start by calculating three additional fields. ‘score_difference’ will hold the margin of victory or defeat. ‘within_a_score’ will determine if a game’s final score is decided by 8 points or less. ‘is_win’ will store a 1 for wins and 0 for losses. Within our the summarize
call, we’ll assemble metrics as follows: ‘win_pct’ is the percentage of games resulting in a win. ‘close_game_pct’ is the percentage of games decided by 8 points or less. ‘close_game_win_pct’ represents win percentage in games decided by 8 points or less. ‘pct_win_close’ calculates the percentage of a team’s wins which were decided by 8 points or less.
transform <- all_scores %>%
mutate(team = case_when(team == 'sdg' ~ 'lac',
team == 'oti' ~ 'ten',
TRUE ~ as.character(team))) %>%
mutate_at(vars(team_score, opp_score),
as.numeric) %>%
mutate(score_difference = abs(team_score - opp_score),
within_a_score = case_when(score_difference <= 8 ~ 1,
TRUE ~ 0),
is_win = case_when(team_score > opp_score ~ 1,
TRUE ~ 0)) %>%
group_by(team) %>%
summarize(win_pct = sum(is_win) / n(),
close_game_pct = sum(within_a_score) / n(),
close_game_win_pct = sum(is_win[within_a_score == 1]) / length(team[within_a_score ==1]),
pct_win_close = sum(within_a_score[is_win == 1]) / sum(is_win))
Visualize
With the data oriented in this manner, we can begin to evaluate the prevalence and performance in close games for the Chargers relative to the rest of the league. First, it’s worth exploring if Chargers’ games are more likely to be close than games played by other teams. As shown, the Chargers do indeed hold the top close game rate in the NFL. I’m shocked.
transform %>%
select(team, close_game_pct) %>%
arrange(desc(close_game_pct)) %>%
mutate(close_game_pct = round(100*close_game_pct,0)) %>%
slice(1:10) %>%
kable(format = 'html') %>%
kable_styling()
team | close_game_pct |
---|---|
lac | 66 |
sea | 62 |
chi | 61 |
nyg | 60 |
clt | 59 |
pit | 59 |
htx | 58 |
det | 57 |
tam | 57 |
atl | 56 |
Next, Let’s explore which teams have the worst win percentage in close games. Stated another way, which teams have the worst win percentage in games decided by 8 points or less? The Chargers come in ranked tied for 3rd worst win percentage in close games. Considering that my team plays in close games nearly two-thirds of the time and wins a little over a third of those contests, it’s no wonder this is a frustrated fan base.
transform %>%
select(team, close_game_win_pct) %>%
arrange(close_game_win_pct) %>%
mutate(close_game_win_pct = round(100*close_game_win_pct,0)) %>%
slice(1:10) %>%
kable(format = 'html') %>%
kable_styling()
team | close_game_win_pct |
---|---|
jax | 29 |
cin | 30 |
lac | 37 |
cle | 37 |
sfo | 41 |
nyg | 41 |
nyj | 43 |
tam | 44 |
chi | 44 |
rav | 44 |
team_data <- teams_colors_logos %>%
mutate(team = tolower(team_abbr),
team = case_when(team == 'gb' ~ 'gnb',
team == 'sd' ~ 'lac',
team == 'ari' ~ 'crd',
team == 'bal' ~ 'rav',
team == 'hou' ~ 'htx',
team == 'ind' ~ 'clt',
team == 'kc' ~ 'kan',
team == 'no' ~ 'nor',
team == 'ne' ~ 'nwe',
team == 'oak' ~ 'rai',
team == 'sf' ~ 'sfo',
team == 'tb' ~ 'tam',
team == 'lar' ~ 'ram',
TRUE ~ as.character(team)))
plot <- transform %>%
left_join(team_data)
ggplot(plot, aes(x = close_game_win_pct, y = close_game_pct, label = team)) +
geom_image(aes(image = team_logo_wikipedia), size = 0.05, by = "width", asp = 1.618) +
labs(x = 'Win Percentage in One Score Games',
y = 'Percentage of One Score Games') +
theme_bw()