Bryan Wilson Empirical Ratings
About my ratings

The Brief Summary

This is an "advanced" ranking system in the sense that it uses strength of schedule which is compounded until the ratings stabilize. It is meant to be purely predictive, which is to say it focuses on making future predictions rather than explaining the past. It uses final scores, dates, and locations of games, and currently nothing more. It uses data from previous seasons until there is enough data in the current season.

Why I started making sports rankings

My interest in sports rankings started with college football in 2005. I was already a programmer in Java (just as a hobby, not by profession), and have made a number of other programs, including a chess AI. Most of them I made for the challenge of it, just to see if I could do it. The sports ranking systems were no different. I wanted to see how good of a system I could make.

My first system was not very good, but it was a decent starting point and provided the basis for the systems to come. I entered all games by hand and gave somewhat subjective values to the winners and losers of games based on my own thoughts about how impressive a performance was. I used it to enter (and sometimes win) small bowl pickem pools. In summer of 2014, I got the idea to do the same in basketball. I had been filling out brackets myself since 2005 and made a simple bracket making program in 2008. Up until recently the win probabilities were either estimated or just pulled from historic seed upset probabilities. The idea of using some actual objective power ratings on which probabilities can be better calculated was appealing. This idea has now been realized here .

My system's philosophy

There exists a seemingly endless number of ways to evaluate a team, the season it has had, and the way the team is expected to play in the future. On the one hand you have over a hundred computer systems represented in Massey's Computer Composite, linked above. It is interesting to click through all the pages and see how so many different people have come to different (and reasonable-sounding) conclusions about how to rank the teams. There are linear regression models, win probability ratio models, ELO type sequential systems, systems which try to break the game down to play-by-play efficiency and simulate games using the data, and countless more.

There also exists human polls, tendencies, and wisdoms. The following jargon has been thrown around constantly on all classes of sports networks:
- "Team X has a problem with finishing games."
- "It was ugly, but all that matters is they got the W today."
- "Team X will stay #1 until they are beaten."
- "Defense wins championships."
- "Team X's tough nonconference schedule will have them ready for conference play."
- "Team X is in for a shock the first time they play an opponent with a pulse."
- "I like Team X in this matchup, coming off a bye week."
- "This is a trap game for Team X, going into their big showdown next week."
- "Team X might struggle today after the big physical and emotional game last week."

There is probably some measure of truth in all of the above statements. The thing is, how much truth is there, and how can it be quantified? One truth of which we can all be certain is that no system can possibly be 100% accurate. Sports are too random, and their outcomes are dependent on too many variables which simply cannot be predicted. This leads to results which simply don't make sense and seemingly cannot be explained. The goal with my system is to take as many factors into account which I can think of which might have an impact on a game, and combine and weight them in such a way that it at least does the best possible job of predicting games in past seasons. If enough past seasons are included in the analysis, then the randomness starts to fall out and real patterns can start to emerge. It is then an empirical approach that I have: Out of all these rating methods and wisdoms, which ones seem to hold true over history and which ones are myths?

How my system works

The rating system incorporates 3 main components at the moment - a straight up transitive scoring margin component, similar to linear regression models; a more achievement-based component which values wins more highly, especially those against stronger opponents, and gives a diminishing return on blowout wins; and a transitive "tempo-free" component which worries only about the ratio of points scored between the two teams.

All three components are run through an iterative process which adjusts them according to strength of schedule until the ratings stabilize. Then a linear combination of the three components is taken as the final rating. The weights of the components are taken such that the historic point spread predictions using that combination are as close as possible to the predicted ones. One might say that instead of being purely based on mathematics, it is more a brute-force historical comparison.

My system currently does not use anything in its calculations besides final scores, venues, and the date of the game. The date is important because more recent results are valued higher than distant results. The system uses some data from previous seasons in its rating early in the season also. The basketball system only uses D1 results, and the football system only uses FBS results (other games are still listed in win/loss column, but have no effect on the ratings).

I'd like to add that I think the idea of providing a single list of rating numbers for each team and ranking them that way (and using the numbers to provide predictions) is inherently flawed. While I do provide such numbers on my site and allow their use in making point spread estimates, I need only refer you to the above list of "human network" questions to debunk this. Factors such as emotional wins, trap games, and just plain rock/paper/scissors situations involving team play styles make such a transitive, single list impossible. In addition to all of this, it is just not true that if Team A beats B by 10 points and B beats C by 10 points, that A should beat C by 20 points. The result is typically inflated to become a 40 point blowout instead. There is also a limit on the upper end, as transitive arguments like this might suggest the #1 team would beat the worst team by 150 points, which obviously just does not happen because of subbing in 2nd string, etc. The solution is to post a set of ratings which does its best job, but to have a separate predictor system which uses different logic.

Thoughts on the future

As of this update (2023), I have mostly stopped making adjustments to my system. I do intend at some point to do a more rigorous analysis of my system's performance. Every once in a while I have time to complete a substantial project such as this season when I analyzed the correct team rating to give to teams transitioning to Division 1. I also was forced to do some modification for the pandemic year. Besides that though, I have found myself too busy and too invested in some other projects to constantly analyze performance. The one system procedure I do analyze regularly is the bracket generator, as I really want it to be the best it can possibly be.

Acknowledgements

Thanks should be given to the following, some of which are linked above:

Ken Pomeroy - For being an early inspiration to me for computer ranking systems, and for fighting the good fight for public acceptance of advanced metrics. Down with RPI! He also provides me with my basketball data HERE - without which I would not be able to compile my ratings at all.

Kenneth Massey - For operating a great site and providing a nice introduction to computer ranking systems in general. His Master's thesis, available at that location, was very thought provoking and a good read for anyone interested in this stuff. He also provides easily readable football scores which I use for my rankings. My ratings are featured on his college sports composites which compare hundreds of different ranking systems.

Bracket Matrix - For hosting my automated basketball bracket predictions (and also just for being awesome and existing).

My Dad, Tom Wilson, for encouraging me in the three passions which collide in making these rankings possible - sports, math, and programming. Thanks to him also for hosting this web domain. Wayward Trends is a general repository for a number of projects Tom and myself have worked on.

James Howell , for providing historical football scores (which are badly needed to do the kind of analysis I need to do!)

DISCLAIMER

Although I believe the predictive power of my ratings to be quite good, I also acknowledge that sports are inherently random and unpredictable. Just viewing the upset tables can convince you of that - a team that is a 20 point underdog still has a chance if the cards fall right (and the chances might be higher than most people think!) Even a theoretically perfect rating system will be wrong some percentage of the time (For example, a typical retrodictive college basketball rating system will be wrong about 16 percent of PAST games, even though it knows all the outcomes!). Keep this in mind if you plan to use these ratings for any personal purposes.

It is also worth mentioning that this site is the work of myself (Bryan Wilson) alone, so if you find any errors just know that I am trying my best and it is hard work to keep everything maintained.

CONTACT

If you have any questions or comments about my ratings, methodology, bracketology, or anything else, you can contact me at :

madbuttonmasher@gmail.com