Can Crusher Ratings - vol.1

It seems pretty cool but surely the calculation on Lawler is off? He loses one fight to Hendricks in his entire UFC run, beats him, beats MacDonald, Ellenberger, Brown and Koscheck and he's somehow below Askren, Woodley and Lombard who have nowhere near as good wins? Actually neither does MacDonald.

Certainly has potential though. I like the way you're calculating pound for pound rankings especially.

Yeah, that's an interesting situation. Lawler fell a lot, and his comeback includes wins over Hendricks and MacDonald but they were both split decisions. He's probably risen twice as much as the other guys since 2013, but it's just not quite enough to overtake them. I'm focusing on the fights now so I'll write more about it later, maybe tomorrow. (Stay tuned!)
 
MMath to the extreme

I will be feeding this data into my mma math machine to see if the algorithms match that of the average poster. I suspect this might vibe at a higher frequency. MMA math it may be, but it sure as hell beats most of the ignorant chatter that can fill this place.
 
The best "system" is probably Vegas, which predicts the winners 68% of the time according to BestFightOdds. On the same set of fights, CCR picks the correct winners in 64% of UFC fights and 65% of non-UFC fights (over 2000 fights in each group). And it's well calibrated: fighters that it claims have a 30% chance of winning end up winning about 30% of the time.
Impressive. How does CCR compare to the (overround adjusted) Vegas odds in terms of a proper scoring rule?

What language did you code your model in? I like the look of those graphs.
 
7obj6jg.png

Here's a recap of Saturday's event. Winners are on the left. Cormier dropped to #9 on the pound-for-pound list. Gaudinot moved up slightly (0.3 points) even though he lost, and actually Horiguchi moved up also (0.1 points, to a rating of exactly 75 because he's the #100 active fighter). That's the match-making assumption having a (very small) effect.
 
Here's a graph of various pound-for-pound ranks, with points calculated monthly. By coincidence the ratings roughly translate to letter grades. You can think of #100 = 75 points being a "C" average (although being #100 in the world is pretty good!). Then #10 is around 90 points or an "A", #1000 is around 60 points or a "D", and so on. It's a bit tricky looking earlier than 1995 because there aren't really 100 active fighters. There's been just six fighters with ratings above 100: Bas Rutten, Fedor Emelianenko, Georges St. Pierre, Anderson Silva, Jon Jones and Chris Weidman.


Here's a graph of nine of the ten fighters that have held the #1 Pound-for-Pound ranking since 1995. I'm omitting Igor Vovchanchyn who became #1 by default for two months at the start of 2001 after Bas dropped from inactivity and Hendo lost to Wanderlei and before Big Nog won the Rings King of Kings Tournament. This is the big dip at that point in the pound-for-pound graph.


You can click the graphs to get a larger view.

Some sick stats right there
 
Impressive. How does CCR compare to the (overround adjusted) Vegas odds in terms of a proper scoring rule?

Here are the numbers. I ignored no contests, disqualifications and draws, but counted split decisions as full wins. The columns are corr = avg(prob>0.5), rmse = sqrt(avg((1-prob)^2)), logl = avg(-ln(prob)), where prob = the winner's probability of winning. Better predictions mean higher corr and lower rmse & logl. A coin flip has corr = 0.5, rmse = 0.5, logl = 0.693. CCR is whr, Vegas is odds, and for comparison I included vet = pick the guy who's had more fights and rec = pick the guy with the better record. For Vegas I used prob = p/(p+q) where p and q were the average implied probabilities for the winner and loser from BestFightOdds. I restricted to fights where both fighters had at least three prior fights.

Code:
+---------+------+----------+----------+----------+-----------+-----------+-----------+----------+----------+
| ufc     | num  | corr_whr | rmse_whr | logl_whr | corr_odds | rmse_odds | logl_odds | corr_vet | corr_rec |
+---------+------+----------+----------+----------+-----------+-----------+-----------+----------+----------+
| non-ufc | 2763 |   0.6518 |   0.4624 |   0.6158 |    0.6808 |    0.4484 |    0.5867 |   0.4888 |   0.5917 |
| ufc     | 2241 |   0.6417 |   0.4719 |   0.6360 |    0.6796 |    0.4547 |    0.6010 |   0.4913 |   0.5433 |
+---------+------+----------+----------+----------+-----------+-----------+-----------+----------+----------+

Of course when finding the best choice of the parameter w I looked more at the errors than the percentage correct. I could've chosen a different value where the percentages would've rounded up to 66% and 65%, but at a certain point it seems like you're optimizing on noise. Half a percent is only ten fights out of two thousand. Here are the numbers broken into years, you can see they vary a lot.

Code:
+------+---------+-----+----------+----------+----------+-----------+-----------+-----------+----------+----------+
| year | ufc     | num | corr_whr | rmse_whr | logl_whr | corr_odds | rmse_odds | logl_odds | corr_vet | corr_rec |
+------+---------+-----+----------+----------+----------+-----------+-----------+-----------+----------+----------+
| 2007 | non-ufc |  89 |   0.6742 |   0.4762 |   0.6471 |    0.6517 |    0.4557 |    0.6018 |   0.4888 |   0.5281 |
| 2008 | non-ufc | 268 |   0.6940 |   0.4414 |   0.5758 |    0.7127 |    0.4295 |    0.5498 |   0.4851 |   0.6455 |
| 2009 | non-ufc | 359 |   0.6490 |   0.4683 |   0.6280 |    0.6741 |    0.4555 |    0.6000 |   0.4889 |   0.5947 |
| 2010 | non-ufc | 381 |   0.5827 |   0.4840 |   0.6605 |    0.6378 |    0.4607 |    0.6123 |   0.4790 |   0.5394 |
| 2011 | non-ufc | 398 |   0.6457 |   0.4608 |   0.6136 |    0.6508 |    0.4618 |    0.6130 |   0.5050 |   0.5892 |
| 2012 | non-ufc | 392 |   0.6607 |   0.4529 |   0.5936 |    0.6735 |    0.4458 |    0.5791 |   0.4974 |   0.5931 |
| 2013 | non-ufc | 487 |   0.6899 |   0.4539 |   0.5977 |    0.7207 |    0.4368 |    0.5659 |   0.4805 |   0.6181 |
| 2014 | non-ufc | 389 |   0.6375 |   0.4679 |   0.6285 |    0.7018 |    0.4441 |    0.5780 |   0.4859 |   0.5861 |
| 2007 | ufc     |  95 |   0.7895 |   0.4302 |   0.5578 |    0.7684 |    0.4187 |    0.5338 |   0.5579 |   0.5263 |
| 2008 | ufc     | 195 |   0.6410 |   0.4764 |   0.6513 |    0.6872 |    0.4545 |    0.5999 |   0.5000 |   0.5026 |
| 2009 | ufc     | 211 |   0.6161 |   0.4822 |   0.6584 |    0.6682 |    0.4615 |    0.6154 |   0.4810 |   0.5498 |
| 2010 | ufc     | 248 |   0.6331 |   0.4744 |   0.6394 |    0.6331 |    0.4709 |    0.6347 |   0.5020 |   0.5625 |
| 2011 | ufc     | 295 |   0.6203 |   0.4852 |   0.6629 |    0.6847 |    0.4502 |    0.5935 |   0.4525 |   0.5203 |
| 2012 | ufc     | 331 |   0.6435 |   0.4758 |   0.6430 |    0.6677 |    0.4592 |    0.6098 |   0.5211 |   0.5227 |
| 2013 | ufc     | 375 |   0.6267 |   0.4705 |   0.6330 |    0.6880 |    0.4489 |    0.5883 |   0.4880 |   0.5693 |
| 2014 | ufc     | 491 |   0.6517 |   0.4623 |   0.6151 |    0.6864 |    0.4541 |    0.5993 |   0.4796 |   0.5580 |
+------+---------+-----+----------+----------+----------+-----------+-----------+-----------+----------+----------+

Here are the numbers for all fights, divided into early and late at 2007-06-16 (the point when BestFightOdds starts keeping track).

Code:
+-------+---------+-------+----------+----------+----------+----------+----------+
| years | ufc     | num   | corr_whr | rmse_whr | logl_whr | corr_vet | corr_rec |
+-------+---------+-------+----------+----------+----------+----------+----------+
| early | non-ufc | 14236 |   0.7056 |   0.4376 |   0.5641 |   0.5390 |   0.6558 |
| early | ufc     |   572 |   0.7010 |   0.4610 |   0.6152 |   0.5122 |   0.5848 |
| late  | non-ufc | 47999 |   0.6962 |   0.4433 |   0.5759 |   0.5101 |   0.6516 |
| late  | ufc     |  2242 |   0.6418 |   0.4718 |   0.6358 |   0.4915 |   0.5430 |
+-------+---------+-------+----------+----------+----------+----------+----------+

In the chart below, 1 & 0 mean correct and incorrect predictions. CCR disagreed with Vegas 21-26% (1/5 to 1/4) of the time, and of those Vegas got it right 57% (4/7) of the time. When they agreed they picked the winner a little over 70% of the time.

Code:
+---------+------+-----+------+------+----------+-------+
| ufc     | num  | whr | odds | pct  | disagree | agree |
+---------+------+-----+------+------+----------+-------+
| non-ufc |  637 |   0 |    0 |  23% |          |   29% |
| non-ufc |  325 |   0 |    1 |  12% |      57% |       |
| non-ufc |  245 |   1 |    0 |   9% |      43% |       |
| non-ufc | 1556 |   1 |    1 |  56% |          |   71% |
| total   | 2763 |     |      | 100% |      21% |   79% |
+---------+------+-----+------+------+----------+-------+
| ufc     |  467 |   0 |    0 |  21% |          |   28% |
| ufc     |  336 |   0 |    1 |  15% |      57% |       |
| ufc     |  251 |   1 |    0 |  11% |      43% |       |
| ufc     | 1187 |   1 |    1 |  53% |          |   72% |
| total   | 2241 |     |      | 100% |      26% |   74% |
+---------+------+-----+------+------+----------+-------+

What language did you code your model in? I like the look of those graphs.

A lot of different languages, but the graphs are gnuplot (with Brewer palettes).
 
It seems pretty cool but surely the calculation on Lawler is off? He loses one fight to Hendricks in his entire UFC run, beats him, beats MacDonald, Ellenberger, Brown and Koscheck and he's somehow below Askren, Woodley and Lombard who have nowhere near as good wins? Actually neither does MacDonald.

Here's a graph of Lawler and some of his opponents. He had a rough run in Strikeforce before being reborn in the UFC.

vXU36jl.png

Currently Brown is #15, Koscheck is #22 and Ellenberger is #26. MacDonald beat #6 Woodley and #9 Maia. Woodley beat #12 Condit and #14 Kim (and Koscheck). Lombard beat #8 Shields. And Askren beat a lot of guys...

The split decisions hurt him too. Below are the current Welterweight ratings under three different scenarios. The first is reality, the second is after changing the split decision over Hendricks to unanimous, and the third is after changing both his Hendricks and MacDonald split decisions to unanimous.

plsjjNn.png
 
very interesting read TS, cheers.

I've always been a big fan of the elo model, but it all methods of measure interest me
 
Here's a couple graphs for fighters in tonight's event. On the site I list their fights too, so you can see their opponents' ratings at the time of their fights. I think it adds some perspective. First McGregor:


And now Henderson and Cerrone:


This coming week I'll take a look at the light heavyweight division with the big Gus/Rumble and Davis/Bader fights.

Here's the preview for tonight. The biggest difference to the bookmakers is in the main event, CCR's not sold on McGregor yet. (Of course it can't hear all the hype.)

 
Back
Top