Bryan Wilson Empirical Ratings
About my ratings
The Brief Summary
This is an "advanced" ranking system in the sense that it uses strength of schedule which is
compounded until the ratings stabilize. It is meant to be purely predictive, which is to say it
focuses on making future predictions rather than explaining the past. It uses final scores, dates,
and locations of games, and currently nothing more. It uses data from previous seasons until there
is enough data in the current season.
Why I started making sports rankings
My interest in sports rankings started with college football in 2005. I was already
a programmer in Java (just as a hobby, not by profession), and have made a number of other
programs, including a chess AI. Most of them I made for the challenge of it, just to see
if I could do it. The sports ranking systems were no different. I wanted to see how good
of a system I could make.
My first system was not very good, but it was a decent starting point and provided the basis
for the systems to come. I entered all games by hand and gave somewhat subjective values to
the winners and losers of games based on my own thoughts about how impressive a
performance was. I used it to enter (and sometimes win) small bowl pickem pools.
In summer of 2014, I got the idea to do the same in basketball. I had been filling out brackets
myself since 2005 and made a simple bracket making program in 2008. Up until recently
the win probabilities were either estimated or just pulled from historic seed upset probabilities.
The idea of using some actual objective power ratings on which probabilities can be better calculated was
appealing. This idea has now been realized here .
My system's philosophy
There exists a seemingly endless number of ways to evaluate a team, the season it has had,
and the way the team is expected to play in the future. On the one hand you have over a hundred
computer systems represented in Massey's Computer Composite, linked above.
It is interesting to click through all the pages and see how so many different people have come
to different (and reasonable-sounding) conclusions about how to rank the teams. There are
linear regression models, win probability ratio models, ELO type sequential systems, systems
which try to break the game down to play-by-play efficiency and simulate games using the data,
and countless more.
There also exists human polls, tendencies, and wisdoms. The following jargon has been thrown
around constantly on all classes of sports networks:
- "Team X has a problem with finishing games."
- "It was ugly, but all that matters is they got the W today."
- "Team X will stay #1 until they are beaten."
- "Defense wins championships."
- "Team X's tough nonconference schedule will have them ready for conference play."
- "Team X is in for a shock the first time they play an opponent with a pulse."
- "I like Team X in this matchup, coming off a bye week."
- "This is a trap game for Team X, going into their big showdown next week."
- "Team X might struggle today after the big physical and emotional game last week."
There is probably some measure of truth in all of the above statements. The thing is, how much
truth is there, and how can it be quantified? One truth of which we can all be certain is
that no system can possibly be 100% accurate. Sports are too random, and their outcomes are
dependent on too many variables which simply cannot be predicted. This leads to results which
simply don't make sense and seemingly cannot be explained. The goal with my system is to take
as many factors into account which I can think of which might have an impact on a game, and
combine and weight them in such a way that it at least does the best possible job of predicting
games in past seasons. If enough past seasons are included in the analysis, then the
randomness starts to fall out and real patterns can start to emerge. It is then an empirical
approach that I have: Out of all these rating methods and wisdoms, which ones seem to hold
true over history and which ones are myths?
How my system works
The rating system incorporates 3 main components at the moment - a straight up transitive scoring margin
component, similar to linear regression models; a more achievement-based component which values
wins more highly, especially those against stronger opponents, and gives a diminishing return on
blowout wins; and a transitive "tempo-free" component which worries only about the ratio of
points scored between the two teams.
All three components are run through an iterative process which adjusts them according to strength
of schedule until the ratings stabilize. Then a linear combination of the three components is
taken as the final rating. The weights of the components are taken such that the historic
point spread predictions using that combination are as close as possible to the predicted ones. One might
say that instead of being purely based on mathematics, it is more a brute-force historical
comparison.
My system currently does not use anything in its calculations besides final scores, venues, and
the date of the game. The date is important because more recent results are valued higher than
distant results. The system uses some data from previous seasons in its rating early in the
season also. The basketball system only uses D1 results, and the football system only uses
FBS results (other games are still listed in win/loss column, but have no effect on the ratings).
I'd like to add that I think the idea of providing a single list of rating numbers for each team
and ranking them that way (and using the numbers to provide predictions) is inherently flawed. While
I do provide such numbers on my site and allow their use in making point spread estimates, I need
only refer you to the above list of "human network" questions to debunk this. Factors such as emotional
wins, trap games, and just plain rock/paper/scissors situations involving team play styles make such
a transitive, single list impossible. In addition to all of this, it is just not true that if Team A beats
B by 10 points and B beats C by 10 points, that A should beat C by 20 points. The result is
typically inflated to become a 40 point blowout instead. There is also a limit on the upper end, as
transitive arguments like this might suggest the #1 team would beat the worst team by 150 points, which
obviously just does not happen because of subbing in 2nd string, etc. The solution is to post a set of
ratings which does its best job, but to have a separate predictor system which uses different logic.
Thoughts on the future
As of this update (2023), I have mostly stopped making adjustments to my system. I do intend at
some point to do a more rigorous analysis of my system's performance. Every once in a while
I have time to complete a substantial project such as this season when I analyzed the correct
team rating to give to teams transitioning to Division 1. I also was forced to do some modification
for the pandemic year. Besides that though, I have found myself too busy and too invested in some
other projects to constantly analyze performance. The one system procedure I do analyze regularly
is the bracket generator, as I really want it to be the best it can possibly be.
Acknowledgements
Thanks should be given to the following, some of which are linked above:
Ken Pomeroy - For being an early inspiration to me for computer ranking
systems, and for fighting the good fight for public acceptance of advanced metrics. Down with RPI!
He also provides me with my basketball data HERE -
without which I would not be able to compile my ratings at all.
Kenneth Massey - For operating a great site and
providing a nice introduction to computer ranking systems in general. His Master's thesis, available
at that location, was very thought provoking and a good read for anyone interested in this stuff. He
also provides easily readable football scores which I use for my rankings. My ratings are featured
on his college sports composites which compare hundreds of different ranking systems.
Bracket Matrix - For hosting my automated
basketball bracket predictions (and also just for being awesome and existing).
My Dad, Tom Wilson, for encouraging me in the three passions which collide in making these rankings
possible - sports, math, and programming. Thanks to him also for hosting this web domain.
Wayward Trends is a general repository for a number of
projects Tom and myself have worked on.
James Howell , for providing historical
football scores (which are badly needed to do the kind of
analysis I need to do!)
DISCLAIMER
Although I believe the predictive power of my ratings to be quite good, I also acknowledge that
sports are inherently random and unpredictable. Just viewing the upset tables can
convince you of that - a team that is a 20 point underdog still has a chance if the cards fall
right (and the chances might be higher than most people think!) Even a theoretically perfect rating
system will be wrong some percentage of the time (For example, a typical retrodictive college basketball
rating system will be wrong about 16 percent of PAST games, even though it knows all the outcomes!).
Keep this in mind if you plan to use these ratings for any personal purposes.
It is also worth mentioning that this site is the work of myself (Bryan Wilson) alone, so if you
find any errors just know that I am trying my best and it is hard work to keep everything
maintained.
CONTACT
If you have any questions or comments about my ratings, methodology, bracketology, or anything else,
you can contact me at :
madbuttonmasher@gmail.com