9 min read

Fantasy Football Player Rankings

Accurate prediction of player performance is of immense value to those of use who play fantasy football. With this in mind, I was curious about how well simple prediction models could perform in this context. Conveniently, Sean J. Taylor provided some nice code to make 2017 fantasy football projections which we can use to make 2018 predictions. Looking at his predictions for 2017, I’m not overly optimistic about how the models will perform. At any rate, it ought to be fun to see what theys produce.

My code is mostly identical to Sean’s except I’ve updated some of the year values so that it predicts for the 2018 season and makes use of armchair analysis data from 2017. Since I only changed a few values in the original code to create the analysis datasets, I’ve decided not to display it. However, while running the block of Sean’s code to generate the actual predictions, I encountered some errors having to do with his use of the return() function. Because of this, I’m going to display the code I used to generate the 2018 predictions.

Predictions

The code to generate the predictions is a little tricky, since it uses dplyr’s non standard evaluation (it’s generates them within the do() function). You’ll notice within the if else block of code I’ve removed the return function. The code builds separate models for rushing attempts, passing attempts, receiving yards, etc. for a total of 12 models. The first three metrics, rushing attempts, receiving targets, and passing attempts are the “opportunity” metrics and are modeled using a k-nearest neighbors algorithm. The justification for this that Sean provides is that penalized linear models used for other 9 rate metics would “shrink” the predictions too much (push them toward the average). The ultimate projections for player performance are heavily influenced by the opportunity projections, so it’s very important to get them right.

mySummary <- function(data, lev = NULL, model = NULL) {
    out <- c(mean(abs(data$obs - data$pred)), 
                     Evaluation.NDCG(order(data$pred), data$obs))
    names(out) <- c('MAE', 'NDCG')
    out
}
ctrl <- trainControl(method = 'cv',
                                         number = 10,
                                         summaryFunction = mySummary)

config <- data_frame(metric = c('ra', 'trg', 'pa', 'rec_trg', 'ry_ra', 'tdr_ra', 'tdrec_trg', 'recy_trg', 'py_pa', 'tdp_pa', 'ints_pa', 'fuml_ratrg'),
                                         method = c('kknn', 'kknn', 'kknn', rep('glmnet', 9)),
                                         outcome.metric = c('NDCG', 'NDCG', 'NDCG', rep('RMSE', 9)),
                                         maximize = c(TRUE, TRUE, TRUE, rep(FALSE, 9)),
                                         key = 1)


positions <- data_frame(pos1 = c('WR', 'TE', 'RB', 'QB'), key = 1)
cutoffs <- data_frame(cutoff = 2016:2017, key = 1)


preds <- config %>%
    inner_join(positions) %>%
    inner_join(cutoffs) %>%
    group_by(metric, pos1, cutoff) %>% do({
        my.metric <- first(.$metric)
        my.pos <- first(.$pos1)
        my.cutoff <- first(.$cutoff)
        for.reg <- season.metrics %>%
            filter(metric == my.metric) %>%
            dplyr::select(player, seas, value) %>%
            inner_join(features, by = c('player', 'seas')) %>%
            filter(pos1 == my.pos) %>%
            ungroup()
        if (str_detect(my.metric, 'pa') & my.pos != 'QB') {
            for.reg$yhat <- 0
            for.reg %>% select(player, seas, yhat = yhat)
        } else if (str_detect(my.metric, 'trg') & my.pos == 'QB') {
            for.reg$yhat <- 0
            for.reg %>% select(player, seas, yhat = yhat)
        } else if (str_detect(my.metric, 'ra') & my.pos %in% c('TE', 'WR')) {
            for.reg$yhat <- 0
            for.reg %>% select(player, seas, yhat = yhat)
        } else {
        X <- model.matrix( ~ 0 + ., for.reg %>% dplyr::select(-player, -value, -pos1))
        nzv <- nearZeroVar(X)
        X <- X[, -nzv]
        y <- with(for.reg, value)
        trainable <- !is.na(y) & (for.reg$seas <= my.cutoff)
        fit <- train(X[trainable,],y[trainable],
                                 metric = first(.$outcome.metric),
                                 method = first(.$method),
                                 maximize = first(.$maximize),
                                 trControl = ctrl,
                                 preProcess = c("center", "scale"))
        for.reg$yhat <- predict(fit, X)
        for.reg %>% dplyr::select(player, seas, yhat = yhat)
        }
    })

Rankings based on Predictions

rnks <- preds %>%
    spread(metric, yhat) %>%
    mutate(py = pa * py_pa,
                 tdp = pa * tdp_pa,
                 ints = pa * ints_pa,
                 rec = trg * rec_trg,
                 recy = trg * recy_trg,
                 ry = ra * ry_ra,
                 td = trg * tdrec_trg+ ra * tdr_ra,
                 fuml = fuml_ratrg * (ra + pa + trg),
                 fpts = py / 25 
                 + recy / 15 
                 + ry / 10 
                 + rec * 0.5 
                 + (td + tdp) * 6 
                 + ints * -2 
                 + fuml * -2) %>%
    left_join(player %>% select(player, pos1, fname, lname)) %>%
    dplyr::select(seas, player, fname, lname, pos1, fpts) %>%
    group_by(cutoff, seas, pos1) %>%
    arrange(-fpts) %>%
    mutate(rnk = row_number())
## Joining, by = c("pos1", "player")
## Adding missing grouping variables: `cutoff`

Running Back Rankings

rnks %>% filter(pos1 == 'RB', seas == 2018, cutoff == 2017) %>% 
  dplyr::select(player,fname,lname,pos1,fpts,rnk) %>%
  .[1:20,] %>% knitr::kable()
## Warning: package 'bindrcpp' was built under R version 3.4.4
## Adding missing grouping variables: `cutoff`, `seas`
cutoff seas player fname lname pos1 fpts rnk
2017 2018 LB-0250 Le’Veon Bell RB 215.4427 1
2017 2018 LF-0650 Leonard Fournette RB 205.4761 2
2017 2018 DM-4300 DeMarco Murray RB 203.3259 3
2017 2018 LM-1000 LeSean McCoy RB 199.9459 4
2017 2018 FG-0200 Frank Gore RB 180.6542 5
2017 2018 TG-1950 Todd Gurley RB 177.3315 6
2017 2018 MI-0100 Mark Ingram RB 171.8154 7
2017 2018 JH-5575 Jordan Howard RB 158.1608 8
2017 2018 KH-2850 Kareem Hunt RB 157.1811 9
2017 2018 DF-1875 Devonta Freeman RB 150.4969 10
2017 2018 CH-5000 Carlos Hyde RB 150.1606 11
2017 2018 JA-0450 Jay Ajayi RB 141.6486 12
2017 2018 LM-1150 Lamar Miller RB 141.0932 13
2017 2018 MG-1150 Melvin Gordon RB 139.9217 14
2017 2018 IC-0300 Isaiah Crowell RB 138.7915 15
2017 2018 Ronald Jones II NA NA RB 133.4248 16
2017 2018 CM-1225 Christian McCaffrey RB 133.0426 17
2017 2018 LM-1850 Latavius Murray RB 132.4538 18
2017 2018 MF-1300 Matt Forte RB 131.4645 19
2017 2018 AK-0050 Alvin Kamara RB 130.3700 20

Wide Receiver Rankings

rnks %>% filter(pos1 == 'WR', seas == 2018, cutoff == 2017) %>% 
  dplyr::select(player,fname,lname,pos1,fpts,rnk) %>%
  .[1:20,] %>% knitr::kable()
## Adding missing grouping variables: `cutoff`, `seas`
cutoff seas player fname lname pos1 fpts rnk
2017 2018 JJ-4700 Julio Jones WR 178.7398 1
2017 2018 AB-3500 Antonio Brown WR 176.9903 2
2017 2018 LF-0200 Larry Fitzgerald WR 164.6259 3
2017 2018 ME-0600 Mike Evans WR 149.0616 4
2017 2018 TH-1850 Ty Hilton WR 145.0079 5
2017 2018 JL-0215 Jarvis Landry WR 141.7893 6
2017 2018 RM-1500 Rishard Matthews WR 138.5234 7
2017 2018 DB-5300 Dez Bryant WR 132.4038 8
2017 2018 AG-1500 A.J. Green WR 131.2524 9
2017 2018 GT-0100 Golden Tate WR 130.7704 10
2017 2018 BC-2325 Brandin Cooks WR 128.0500 11
2017 2018 DT-0900 Demaryius Thomas WR 122.2746 12
2017 2018 DH-3950 DeAndre Hopkins WR 122.2129 13
2017 2018 AT-0350 Adam Thielen WR 119.4316 14
2017 2018 PG-0100 Pierre Garcon WR 119.1293 15
2017 2018 MC-2900 Michael Crabtree WR 118.0380 16
2017 2018 DB-0500 Doug Baldwin WR 117.1213 17
2017 2018 JS-4750 Juju Smith-Schuster WR 115.4172 18
2017 2018 CK-1300 Cooper Kupp WR 115.0892 19
2017 2018 MB-4550 Martavis Bryant WR 112.0642 20

Quarterback Rankings

rnks %>% filter(pos1 == 'QB', seas == 2018, cutoff == 2017) %>%
  dplyr::select(player,fname,lname,pos1,fpts,rnk) %>%
  .[1:20,] %>% knitr::kable()
## Adding missing grouping variables: `cutoff`, `seas`
cutoff seas player fname lname pos1 fpts rnk
2017 2018 CN-0500 Cam Newton QB 328.2320 1
2017 2018 DB-3800 Drew Brees QB 326.2442 2
2017 2018 TB-2300 Tom Brady QB 324.2403 3
2017 2018 MS-4100 Matthew Stafford QB 323.5988 4
2017 2018 MR-2500 Matt Ryan QB 323.4057 5
2017 2018 BB-2425 Blake Bortles QB 319.7643 6
2017 2018 PR-0300 Philip Rivers QB 318.7249 7
2017 2018 RW-3850 Russell Wilson QB 311.1636 8
2017 2018 KC-2350 Kirk Cousins QB 308.0833 9
2017 2018 JG-1850 Jared Goff QB 293.0441 10
2017 2018 DC-0725 Derek Carr QB 289.4341 11
2017 2018 DP-2037 Dak Prescott QB 286.5810 12
2017 2018 EM-0200 Eli Manning QB 283.5398 13
2017 2018 BR-1100 Ben Roethlisberger QB 281.3730 14
2017 2018 AD-0100 Andy Dalton QB 279.9740 15
2017 2018 JF-1900 Joe Flacco QB 274.1218 16
2017 2018 JC-6200 Jay Cutler QB 272.2124 17
2017 2018 TT-0500 Tyrod Taylor QB 267.1779 18
2017 2018 AR-1300 Aaron Rodgers QB 264.5544 19
2017 2018 CP-0500 Carson Palmer QB 253.5234 20

Tight End Rankings

rnks %>% filter(pos1 == 'TE', seas == 2018, cutoff == 2017) %>% 
  dplyr::select(player,fname,lname,pos1,fpts,rnk) %>%
  .[1:20,] %>% knitr::kable()
## Adding missing grouping variables: `cutoff`, `seas`
cutoff seas player fname lname pos1 fpts rnk
2017 2018 TK-0150 Travis Kelce TE 148.11218 1
2017 2018 JG-2900 Jimmy Graham TE 137.32173 2
2017 2018 ZE-0100 Zach Ertz TE 135.75456 3
2017 2018 EE-0400 Evan Engram TE 134.18324 4
2017 2018 JW-6000 Jason Witten TE 110.39928 5
2017 2018 KR-1200 Kyle Rudolph TE 106.56947 6
2017 2018 RG-2200 Rob Gronkowski TE 101.79246 7
2017 2018 AG-0500 Antonio Gates TE 101.16792 8
2017 2018 EE-0050 Eric Ebron TE 90.60055 9
2017 2018 CC-2100 Charles Clay TE 87.62130 10
2017 2018 HH-0225 Hunter Henry TE 85.74416 11
2017 2018 JC-4300 Jared Cook TE 83.79069 12
2017 2018 VD-0100 Vernon Davis TE 83.44308 13
2017 2018 JD-2550 Jack Doyle TE 83.36575 14
2017 2018 NO-0150 Nick O’Leary TE 77.43070 15
2017 2018 JT-2000 Julius Thomas TE 73.97472 16
2017 2018 TE-0300 Tyler Eifert TE 72.50556 17
2017 2018 JR-1150 Jordan Reed TE 72.44428 18
2017 2018 LK-0200 Lance Kendricks TE 72.07954 19
2017 2018 OH-0250 O.J. Howard TE 67.21814 20

Model Performance Plots

truth <- season.metrics %>%
    filter(seas == 2017) %>%
    dplyr::select(-career.year, -lst, -avg) %>%
    spread(metric, value) %>%
    mutate(fpts = py / 25 
                 + recy / 15 
                 + ry / 10 
                 + rec * 0.5 
                 + (tdr + tdrec + tdp) * 6 
                 + ints * -2 
                 + fuml * -2) %>%
    arrange(-fpts) %>%
    mutate(rnk = row_number())

rnks %>%
    filter(cutoff == 2016, seas == 2017) %>%
    inner_join(truth %>% dplyr::select(player, seas, fpts.true = fpts)) %>%
    ggplot(aes(x = fpts, y = fpts.true)) + 
    geom_point() +
    facet_wrap('pos1', scales = 'free') +
    theme_bw() +
    geom_abline() +
    xlab('Projected Points') +
    ylab('Actual Points')
## Joining, by = c("seas", "player")

Comments

These predictions do not seem very good at all to me. Looking at the last plot you can see the immense variability about the 45 degree line, some of the estimates are off by very large amounts. This is maybe not surprising when you consider the very limited basis on which the “opportunity” (rushing attempts, etc) projections are made. There is a lot of information that is not captured by previous data. Allowing for smooth functions of the predictors in the rate models would probably improve the performance as well.

Ultimately, in my opinion a better approach to fantasy football player projection is something like what is done at Fantasy Football Analytics. The approach they take is to combine the projections of lots of analysts via some weighting scheme. This would allow the incorporation of a lot more information than the models above, as well as averaging over particular analyst biases.