Predicting NFL Injuries with Stan Part II

Previously I use data from armchair analysis to build a simple model to predict whether an NFL player would have an injury based only on their position. I restricted the analysis to QBs, RBs, TEs, and WRs. In this post I’d like to expand the model to include 3 years of injury data (which is all I have), as well as include some additional covariates such as player height, weight, and age. The inclusion of multiple years of data makes building the model more tricky since some players will be included in each years’ data.

Additionally, I thought it would be more interesting to predict the number of games in a season a player is unable to play due to injury. A common model for counts is the poisson model. This model has a single parameter for its mean and variance, and so is not very flexible. It’s easy to imagine two simple processes by which a player can remain injury free all year. The first is they are simply lucky and don’t get hurt. The other is that they don’t play very much. Since there are (at least) two processes that can produce zeros for the number of games a player misses, I’m confident that there will be greater than expected number of zeros. Therefore it makes sense to include an additional parameter to account for this in the model.

To simplify the coding I’m going to use the excellent brms package in R. This package allows you to fit models in Stan using standard R formula syntax. It’ll also be helpful since it allows the use of splines to model potentially nonlinear predictors like height, weight, and age. Additionally, brms provides a ton of other functionality I’ll use. This post will be more involved than the previous one and so I’ve tried to split it into four sections: data pre-processing, exploratory analysis, statistical modeling, model comparison, and finally a simple simulation using the model.

Data Pre-Processing

The data for this analysis are different than the previous post primarily in two ways. The first is that it includes multiple years. This means that I will have more than one record (row) for players that played multiple years. The second is that I’m looking at the number of games missed due to injury. I’m defining an injury here as having a game time status of: out, physically unable to perform, or Injured Reserve. So essentially what I would like is a record for each year and each player with a number between 0 and 16 for games injured. The first 10 rows of the tables that need to be joined are printed below:

player

## # A tibble: 10,139 x 27
##    player  fname  lname  pname pos1  pos2  height weight dob   forty bench
##    <chr>   <chr>  <chr>  <chr> <chr> <chr>  <int>  <int> <chr> <dbl> <int>
##  1 AA-0025 Ameer  Abdul~ A.Ab~ RB    <NA>      69    205 6/13~  4.50    24
##  2 AA-0050 Aaron  Adams  A.Ad~ OL    <NA>      77    305 5/16~  5.26    22
##  3 AA-0075 Andrew Adams  A.Ad~ DB    <NA>      71    203 10/2~  4.54    24
##  4 AA-0100 Antho~ Adams  A.Ad~ DL    <NA>      72    300 6/18~  5.13     0
##  5 AA-0200 Al     Afala~ A.Af~ DB    <NA>      71    212 1/20~  4.48    25
##  6 AA-0300 Antho~ Alabi  A.Al~ OL    <NA>      77    315 2/16~  5.12    18
##  7 AA-0400 Alex   Albri~ A.Al~ LB    <NA>      77    260 1/29~  4.86    22
##  8 AA-0500 Allen  Aldri~ A.Al~ LB    <NA>      73    254 5/30~  0        0
##  9 AA-0525 Alves~ Alexa~ A.Al~ RB    <NA>      70    206 10/1~  4.45    18
## 10 AA-0543 Adrian Amos   A.Am~ DB    <NA>      72    218 4/29~  4.39    21
## # ... with 1.013e+04 more rows, and 16 more variables: vertical <dbl>,
## #   broad <int>, shuttle <dbl>, cone <dbl>, arm <dbl>, hand <dbl>,
## #   dpos <int>, col <chr>, dv <chr>, start <int>, cteam <chr>, posd <chr>,
## #   jnum <int>, dcp <int>, nflid <int>, pimg <chr>

game

## # A tibble: 4,790 x 17
##      gid  seas    wk day   v     h     stad   temp  humd  wspd wdir  cond 
##    <int> <int> <int> <chr> <chr> <chr> <chr> <int> <int> <int> <chr> <chr>
##  1     1  2000     1 SUN   SF    ATL   Geor~    79    NA    NA <NA>  Dome 
##  2     2  2000     1 SUN   JAC   CLE   Clev~    78    63     9 NE    Sunny
##  3     3  2000     1 SUN   PHI   DAL   Texa~   109    19     5 S     Sunny
##  4     4  2000     1 SUN   NYJ   GB    Lamb~    77    66     5 E     Most~
##  5     5  2000     1 SUN   IND   KC    Arro~    90    50     8 E     Most~
##  6     6  2000     1 SUN   SEA   MIA   Pro ~    89    59    13 E     Sunny
##  7     7  2000     1 SUN   CHI   MIN   Metr~    65    NA    NA <NA>  Dome 
##  8     8  2000     1 SUN   TB    NE    Foxb~    71    93     5 VAR   Clou~
##  9     9  2000     1 SUN   DET   NO    Loui~    89    NA    NA <NA>  Dome 
## 10    10  2000     1 SUN   ARI   NYG   Gian~    80    79     3 VAR   Part~
## # ... with 4,780 more rows, and 5 more variables: surf <chr>, ou <dbl>,
## #   sprv <dbl>, ptsv <int>, ptsh <int>

injury

## # A tibble: 15,862 x 6
##      gid player  team  details    pstat                             gstat 
##    <int> <chr>   <chr> <chr>      <chr>                             <chr> 
##  1  3990 BS-4350 NE    Concussion Did Not Participate In Practice   Out   
##  2  3990 TF-0750 NE    Knee       Full Participation in Practice    Quest~
##  3  3990 TW-3250 NE    Quadricep  Limited Participation in Practice Quest~
##  4  3990 LJ-1250 PIT   Concussion Did Not Participate In Practice   Out   
##  5  3991 JB-4550 CHI   Ankle      Limited Participation in Practice Quest~
##  6  3991 JB-8200 CHI   Back       Did Not Participate In Practice   Proba~
##  7  3991 AJ-0430 CHI   Calf       Limited Participation in Practice Quest~
##  8  3991 EG-0350 CHI   Concussion Full Participation in Practice    Proba~
##  9  3991 JC-3100 CHI   Concussion Full Participation in Practice    Proba~
## 10  3991 MW-3250 CHI   Hamstring  Limited Participation in Practice Quest~
## # ... with 1.585e+04 more rows

offense

## # A tibble: 99,866 x 33
##      uid   gid player     pa    pc    py  ints   tdp    ra   sra    ry
##    <int> <int> <chr>   <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1     1 KW-1500     0     0     0     0     0     0     0     0
##  2     2     1 CG-0400     0     0     0     0     0    15     7    62
##  3     3     1 JG-0600    36    23   252     1     3     2     1    22
##  4     4     1 JR-2000     0     0     0     0     0     1     0   - 2
##  5     5     1 TO-0200     0     0     0     0     0     0     0     0
##  6     6     1 FB-0200     0     0     0     0     0     5     2    10
##  7     7     1 JA-1600     0     0     0     0     0    24    10    77
##  8     8     1 CC-1400    31    16   264     0     2     4     1    12
##  9     9     1 BC-1100     0     0     0     0     0     3     0     5
## 10    10     1 TM-0900     0     0     0     0     0     0     0     0
## # ... with 9.986e+04 more rows, and 22 more variables: tdr <int>,
## #   trg <int>, rec <int>, recy <int>, tdrec <int>, ret <int>, rety <int>,
## #   tdret <int>, fuml <int>, peny <int>, conv <int>, fp <dbl>, fp2 <dbl>,
## #   fp3 <dbl>, game <int>, seas <int>, year <int>, team <chr>, posd <chr>,
## #   jnum <int>, dcp <int>, nflid <int>

I join the tables using dplyr functions from the tidyverse package. In addition to the year, player, and number of games injured, I also get player name, weight, height, date of birth, as well as the year they entered the league (start).

#All games from 2015-2017
player.data <-
  data_frame(year = rep(2015:2017, each = 17),
                        wk = rep(1L:17L, times = 3)) %>%
  #Add game ids
  left_join(game %>% select(gid, seas, wk),
                        by = c("year" = "seas",
                               "wk" = "wk")) %>%
  #Add players with offense stats recorded in those years
  left_join(offense %>% select(gid,player), by = c("gid" = "gid")) %>%
  left_join(player %>% select(player,pname,pos1,height,weight,dob,start),
            by = c("player" = "player")) 
#add injuries
injury.data <-
  data_frame(year = rep(2015:2017, each = 17),
             wk = rep(1L:17L, times = 3)) %>%
  #Add game ids
  left_join(game %>% select(gid, seas, wk),
            by = c("year" = "seas",
                   "wk" = "wk")) %>%
  left_join(injury %>% select(gid, player, gstat),
            by = c("gid" = "gid")) %>%
  left_join(player %>% select(player,pname,pos1,height,weight,dob,start),
            by = c("player" = "player")) %>%
  select(year,wk,gid,player,pos1,pname,height,weight,dob,start,gstat)
#Full data
full.data <-
  player.data %>% bind_rows(injury.data) %>%
  mutate(injured = ifelse(gstat %in% c("Out","IR","Pup","PUP", "Out\r"),1L,0L)) %>%
  select(year,player,pname,pos1,height,weight,dob,start,injured) %>%
  filter(pos1 %in% c("QB","RB","TE","WR")) %>%
  group_by(year,player,pname,pos1,height,weight,dob,start) %>%
  summarize(games.inj = sum(injured)) %>% unique()

#Age variable
full.data$age <- full.data$year - lubridate::year(as.Date(full.data$dob, "%m/%d/%Y"))

The table I’ll use for analysis is printed here.

full.data

## # A tibble: 1,908 x 10
## # Groups:   year, player, pname, pos1, height, weight, dob [1,908]
##     year player  pname    pos1  height weight dob    start games.inj   age
##    <int> <chr>   <chr>    <chr>  <int>  <int> <chr>  <int>     <int> <dbl>
##  1  2015 AA-0025 A.Abdul~ RB        69    205 6/13/~  2015         0  22.0
##  2  2015 AA-1350 A.Andre~ RB        70    225 10/15~  2014         1  23.0
##  3  2015 AB-1975 A.Blue   RB        74    223 4/27/~  2014         0  24.0
##  4  2015 AB-2000 A.Boldin WR        73    220 10/3/~  2003         0  35.0
##  5  2015 AB-2600 A.Brads~ RB        69    214 3/19/~  2007         0  29.0
##  6  2015 AB-3500 A.Brown  WR        70    186 7/10/~  2010         0  27.0
##  7  2015 AC-0100 A.Caldw~ WR        72    200 4/15/~  2008         0  30.0
##  8  2015 AC-1650 A.Cleve~ TE        77    258 3/21/~  2014         0  23.0
##  9  2015 AC-2350 A.Cooper WR        73    211 6/17/~  2015         0  21.0
## 10  2015 AD-0100 A.Dalton QB        74    215 10/29~  2011         3  28.0
## # ... with 1,898 more rows

Before modeling the data, it’s always good to do some quality checks. Exploratory analysis can be very helpful in this regard since it can help identify irregularities and other issues with the data. Prior knowledge can be helpful here as well. For instance I know how many games some of my favorite players have missed in the last three seasons and I can use that as a standard to check the data against.

Le’veon Bell is one of my favorite players (I’m of course a Steelers fan) and I know that he hurt his knee is 2015 and missed most of the season. Let’s check the data set I created to make sure it contains this information. Bell’s player unique player id in this data set is “LB-0250”. We can select only the rows with his id by using the filter function from dplyr.

filter(full.data, player == "LB-0250")

## # A tibble: 3 x 10
## # Groups:   year, player, pname, pos1, height, weight, dob [3]
##    year player  pname  pos1  height weight dob       start games.inj   age
##   <int> <chr>   <chr>  <chr>  <int>  <int> <chr>     <int>     <int> <dbl>
## 1  2015 LB-0250 L.Bell RB        73    230 2/18/1992  2013         0  23.0
## 2  2016 LB-0250 L.Bell RB        73    230 2/18/1992  2013         0  24.0
## 3  2017 LB-0250 L.Bell RB        73    230 2/18/1992  2013         0  25.0

Well that’s not good. The data set says Bell didn’t miss any games in 2015, even though we know he did. Before rechecking the code I used to join the tables, let’s look at the entries for Bell in the full injury data set. If it’s missing there then the data I have is simply incomplete.

filter(injury, player == "LB-0250")

## # A tibble: 3 x 6
##     gid player  team  details    pstat gstat  
##   <int> <chr>   <chr> <chr>      <chr> <chr>  
## 1  4271 LB-0250 PIT   Suspension <NA>  Suspend
## 2  4275 LB-0250 PIT   Suspension <NA>  Suspend
## 3  4301 LB-0250 PIT   Suspension <NA>  Suspend

It looks like the raw data simply doesn’t have the information recorded! There must be something going on then with how the injury data was recorded. There are other omissions in the raw injury data set that I discovered as well. It would be too time consuming to fix all of them, so I’m simply going to proceed with the analysis, however this means anything that follows is obviously suspect.

Exploratory Analysis

Simple summaries are an effective way of exploring your data and can help diagnose potential issues. A simple summary table of the data is printed below.

summary(full.data)

##       year          player        pname           pos1         height     
##  Min.   :2015   AA-0025:   3   Length:1908        QB:247   Min.   :66.00  
##  1st Qu.:2015   AB-1975:   3   Class :character   RB:538   1st Qu.:71.00  
##  Median :2016   AB-3500:   3   Mode  :character   TE:400   Median :73.00  
##  Mean   :2016   AC-1650:   3                      WR:723   Mean   :73.22  
##  3rd Qu.:2017   AC-2350:   3                               3rd Qu.:76.00  
##  Max.   :2017   AD-0100:   3                               Max.   :80.00  
##                 (Other):1890                                              
##      weight        dob                start        games.inj     
##  Min.   :149   Length:1908        Min.   :1994   Min.   : 0.000  
##  1st Qu.:200   Class :character   1st Qu.:2011   1st Qu.: 0.000  
##  Median :217   Mode  :character   Median :2013   Median : 0.000  
##  Mean   :219                      Mean   :2013   Mean   : 1.327  
##  3rd Qu.:236                      3rd Qu.:2015   3rd Qu.: 1.000  
##  Max.   :283                      Max.   :2017   Max.   :16.000  
##                                                                  
##       age       
##  Min.   :17.00  
##  1st Qu.:24.00  
##  Median :25.00  
##  Mean   :26.19  
##  3rd Qu.:28.00  
##  Max.   :58.00  
##  NA's   :3

From examining the table, a few things immediately pop out. The max age is 58 years old! And someone has 1994 listed as the year they entered the league. I’ll spare you the boring details, but essentially several new players share names with previous players and some how their information has gotten mixed up. For instance this profile on NFL.com for WR George Farmer (the culprit in our data set) has his age listed as 59 years old. The player in our data set is also listed as a WR. I updated the data set to reflect what I think are the correct values, 24 year old running back. The ages are missing for 3 players as well. They all recently entered the league, so I simply assume they are 23 years old.

There is also a Charles Johnson, WR, who was born 1972. I’m pretty sure this is actually a player who was born in 1989 and now plays for the jets and not the Charles Johnson that won a superbowl with the Patriots in the early 2000s. Another WR name Trevor Davis also had an incorrect age. I manually corrected these records.

Simple plots should help to identify if there are any other data issues. I produce several plots below.

####Exploratory Analysis####
full.data %>% group_by(games.inj) %>%
  summarize(FREQ = n()) %>%
  ggplot(aes(x = games.inj, y = FREQ)) + geom_point()

full.data %>% group_by(games.inj, pos1) %>%
  summarize(FREQ = n()) %>%
  ggplot(aes(x = games.inj, y = FREQ, col = pos1)) + geom_point()

full.data %>% group_by(pos1) %>% mutate(total_pos = n()) %>%
  ungroup() %>% group_by(games.inj, pos1) %>%
  mutate(FREQ = n(),
         proportion = FREQ/total_pos) %>%
  ggplot(aes(x = games.inj, y = proportion, col = pos1)) + geom_line()

full.data %>% group_by(year,games.inj,pos1) %>%
  ggplot(aes(pos1, games.inj)) + geom_boxplot() + facet_wrap( ~ year)

#Injuries by year
#Vast majority of injuries occur in 2017
full.data %>% group_by(year) %>% summarize(total = sum(games.inj)) %>%
  ggplot(aes(y = total, x = year)) + geom_point()

The last graph really sticks out to me. I see no reason for there to be a substantial differences in the number of injuries from 2015 to 2017. My guess is that the veracity of the data declines from 2017 backward. This coheres with the missing injury checks above. The years I found missing injuries were from either 2015 or 2016. I think this implies that I should only model the 2017 data. I’m going to ignore this sound advice simply because I want to get some practice fitting complex models with brms, but all the results should be taken with a grain of salt (or maybe an entire shaker).

Statistical Modeling

Zero Inflated Poisson

The first model I’m going to fit is a zero inflated poisson. This model is a mixture of two models. A binomial model that estimates the probability of missing zero games and a separate poisson model to estimate the number of games missed. Zeros can then be produced by both processes. The binomial model only uses year and position to estimate the probability of zero. The poisson model uses zero, position, player height, weight, and age. Additionally, the poisson model has a random intercept for each player to account for the correlation across years. Simple mathematical representations of the models are presented below. I stick an exponential prior with rate parameter = 2 to impose some regularization on the varying player intercepts. This factor has over 900 levels and at most 3 observations per level. \[Y \sim ZIPoisson(p_i, \lambda_i) \] \[logit(p_i) \sim \alpha_p + \beta_p*X \] \[log(\lambda_i) \sim \alpha_\lambda + \alpha_{player_p} + \beta_\lambda*X \] \[\alpha_{\lambda} \sim Normal(0,10)\] \[\alpha_z \sim Normal(0,1)\] \[\beta_\lambda, \beta_z \sim Normal(0,1) \] \[\alpha_{player_\lambda} \sim Normal(0,sd_\lambda)\] \[sd_\lambda \sim exponential(2)\]

fit0 <- brm(bf(games.inj ~ year + pos1 + height + weight + age + (1|player),
               zi ~ 1),
            data = full.data, family = zero_inflated_poisson(link = "log"),
            prior = c(set_prior("normal(0, 5)", class = "Intercept"),
                      set_prior("normal(0,.5)", class = "Intercept", dpar = "zi"),
                      set_prior("normal(0, 1)", class = "b"),
                      set_prior("exponential(2)", class = "sd")),
            cores = 4)
summary(fit0)

##  Family: zero_inflated_poisson 
##   Links: mu = log; zi = logit 
## Formula: games.inj ~ year + pos1 + height + weight + age + (1 | player) 
##          zi ~ 1
##    Data: full.data (Number of observations: 1908) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
##          total post-warmup samples = 4000
##     ICs: LOO = NA; WAIC = NA; R2 = NA
##  
## Group-Level Effects: 
## ~player (Number of levels: 959) 
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)     1.18      0.07     1.04     1.32        648 1.00
## 
## Population-Level Effects: 
##              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -1440.29    105.04 -1650.41 -1232.46       1905 1.00
## zi_Intercept     0.05      0.09    -0.12     0.23       1186 1.00
## year             0.72      0.05     0.61     0.82       1898 1.00
## pos1RB           0.15      0.25    -0.34     0.64        696 1.00
## pos1TE           0.36      0.24    -0.13     0.82        740 1.00
## pos1WR           0.05      0.22    -0.37     0.48        753 1.00
## height          -0.00      0.04    -0.09     0.08        526 1.01
## weight          -0.01      0.01    -0.02     0.00        523 1.01
## age              0.01      0.02    -0.03     0.04        673 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior Predictive Checks

pp_check(fit0, nsamples = 20) + xlim(c(0,16))

## Warning: Removed 373 rows containing non-finite values (stat_density).

pp_check(fit0, nsamples = 20, type = "ecdf_overlay")

The zero inflated poisson model seems to fit the data fairly well except for those players who missed 16 games. It’s conceivable that there is a third process at work which determines whether a player will miss 16 games, namely that they suffer a season ending injury in the off-season or pre-season. Stan would allow us to create a mixture distribution like the zero inflated poisson.

For simplicity (i.e. I’m lazy) I’m instead going to divide the number of games missed by 16, which transforms the outcome into the proportion of games missed out of 16. This allows us to fit a zero-one inflated beta regression. This model is a mixture distribution like the zero inflated poisson, except there are now two binomial processes that account for the values 0 and 1. The beta distribution is then used as the model for all values in between. One big issue with this model is that it’s continuously valued between the (0,1) interval, where as our data were originally integers.

Zero-One Inflated Beta

#zero-one inflated beta
full.data$prop.inj <- full.data$games.inj/16

fit1 <- brm(bf(prop.inj ~ year + pos1 + height + weight + age + (1|player),
               coi ~ 1,
               zoi ~ 1),
            data = full.data, family = zero_one_inflated_beta(),
            prior = c(set_prior("normal(0, 5)", class = "Intercept"),
                      set_prior("normal(0,.5)", class = "Intercept", dpar = "zoi"),
                      set_prior("normal(0,.5)", class = "Intercept", dpar = "coi"),
                      set_prior("normal(0,1)", class = "b"),
                      set_prior("exponential(2)", class = "sd")),
            cores = 4)
summary(fit1)

##  Family: zero_one_inflated_beta 
##   Links: mu = logit; phi = identity; zoi = logit; coi = logit 
## Formula: prop.inj ~ year + pos1 + height + weight + age + (1 | player) 
##          coi ~ 1
##          zoi ~ 1
##    Data: full.data (Number of observations: 1908) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
##          total post-warmup samples = 4000
##     ICs: LOO = NA; WAIC = NA; R2 = NA
##  
## Group-Level Effects: 
## ~player (Number of levels: 959) 
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)     0.06      0.05     0.00     0.18       1983 1.00
## 
## Population-Level Effects: 
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept      -495.65     99.30  -696.45  -296.50       4000 1.00
## zoi_Intercept     0.90      0.05     0.80     1.00       4000 1.00
## coi_Intercept    -2.96      0.12    -3.20    -2.73       4000 1.00
## year              0.24      0.05     0.15     0.34       4000 1.00
## pos1RB            0.16      0.17    -0.16     0.49       4000 1.00
## pos1TE            0.07      0.16    -0.25     0.37       4000 1.00
## pos1WR            0.06      0.14    -0.20     0.34       4000 1.00
## height            0.00      0.03    -0.05     0.05       4000 1.00
## weight           -0.00      0.00    -0.01     0.00       4000 1.00
## age               0.02      0.01     0.00     0.05       4000 1.00
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## phi     5.01      0.30     4.45     5.63       4000 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

I’m more familiar with the zero inflated poisson so I’m going to stick with the zero inflated poisson, even though it doesn’t account for the increased prevalence of missing 16 games.

Simple Interpretation

It’s really easy to get marginal effect plots in brms. The estimates show an incredibly strong yearly trend due to under reporting for 2015, 2016 in the data.

marginal_effects(fit0, ask = FALSE)

Simple Simulation

One interesting application of the model would be to simulate the expected number of games missed for a particular player. I try to do that for Antonio Brown below. I wanted to see how ridiculous the yearly extrapolation was so I predicted the expected number of missed games for Brown in 2016, 2017, and finally 2018. It’s clear that the model is primarily driven by the unrealistic yearly trend in the data.

sim.data <- data.frame(year = c(rep(2016, 1000),rep(2017,1000),rep(2018, 1000)),
                       pos1 = rep("WR", 3000),
                       height = rep(70, 3000),
                       weight = rep(186, 3000),
                       age = c(rep(28, 1000),rep(29,1000),rep(30,1000)),
                       player = rep("AB-3500", 3000))

abrown.pred <- predict(fit0, newdata = sim.data)

ggplot(data.frame(abrown.pred,year = c(rep(2016, 1000),rep(2017,1000),rep(2018, 1000))),
       aes(exp(Estimate))) + geom_density() + facet_wrap( ~ year)