The 2010 CrossFit Games
The ultimate proving grounds of the world’s fittest athletes.
July 16-18, 2010 • Carson, CA
The Home Depot Center Sports Complex
Select a 2010 CrossFit Games event
United States Qualifiers
Scoring CrossFit Competitions
How numbers and systems help determine who is the fittest athlete across broad time and modal domains
The question about how best to score a CrossFit competition has been a hot topic for a few years now. We have a new sport, and we have experimented with a variety of scoring formats. In this article, I will explore several of the key issues and the implications of each scoring approach.
First and foremost, the issues of the arbitrariness of rules and questions of fairness have been addressed in previous CrossFit Journal articles (Capacity, Standards, and Sport and All Other Things Being Equal: The CrossFit Fairness Doctrine). In sport, rules are agreed upon before the competition starts, and these rules are not necessarily fair. In boxing, you can't kick, but in kickboxing you can. Neither prescription is more inherently fair; the fairness comes in that each participant is governed by the same set of rules.
This, of course, brings up a different topic: the quality of judging. From Olympic boxing to Major League Baseball to high-school volleyball, almost every sporting event has fans complaining about the judging, reffing or umping. While this is an important topic, it’s quite outside the scope of this article.
So, before we dig into the question of which (if any) scoring system is ideal for a CrossFit competition, it’s essential to recognize that any reasonable system is legitimate because there is nothing wrong with arbitrary rules. Those rules might favor certain traits (intentionally or unintentionally), but that's true with every sport. Instead, we're going to explore the implications of different scoring systems and compare those to the intent of the competitions.
What Are We Testing?
Most sports have a clearly defined goal. A single ball, puck, discus, javelin, frisbee, or flag has to get somewhere. Or a distance must be covered. Or an opponent vanquished. Other sports have aesthetic and execution criteria that are judged, such as gymnastics, ice skating, and diving. For CrossFit, the general test is one of fitness.
Fitness is defined as increased work capacity across broad time and modal domains, which is a tricky thing to get your head around. No single test could possibly determine the fittest because each test is of a single time domain. Each reasonable test also has limited modal domains. So, any test of fitness must be a multi-event process.
The next obvious question is how many events do you need? The absolute answer is you need a number of events that will prevent different results and produce a consistent winner. Assuming there is such a thing as overall fitness, at some point adding tests of fitness will not change the determination. In 2009, we crowned Mikko Salo the World's Fittest Man after eight events. Would that have changed if we had made the competition 10 events? Twenty events? I don't think so.
Of course, not all events are equal tests of fitness. In 2009, we had a long run as the first event, followed by a heavy deadlift as the second event. Both are fair and legitimate events in a CrossFit competition (barring the limitation of the heaviest barbells for the men, which was a question of logistics not concept). So also would have been a single event combining a long run interrupted by heavy deadlifts. So also would have been adding handstand push-ups, pull-ups, or even swimming.
How exercises are combined into events, and how events are combined into competitions, is one of the most interesting components of our sport. It allows for constant variation and the evaluation of the broadest tests of GPP (general physical preparedness). Not knowing what the events are ahead of time improves an imperfect test of fitness (one that has a bias, whether intended or not). For example, being ready for anything and doing well at heavy deadlifts is very different from specializing in heavy deadlifts and not being prepared to run long distances.
The Ability to Predict
A good test of fitness should be able to reasonably predict the results of other good tests of fitness. In other words, if the ’09 Games events were replaced with the 2010 Northwest Regional events, would Mikko Salo still have won? If both were good tests of fitness, then the answer should be yes. If not, then perhaps not.
Let's dig in a bit. Does someone's Fran time predict his or her ability to win other workouts? How about Helen? Does the finishing order of Fran predict the finishing order of Helen and vice versa? The answer is sometimes but not always. What about the combined finishing order of Fran and Helen as a predictor for the finishing order of another workout? Depends on the workout, right? Maybe the data would apply more to Diane than max deadlifts or a long run—unless handstand push-ups are a particular weakness.
I think we would all agree that the overall winner of Fran, Diane, Helen, and the Filthy Fifty would be more likely to win Cindy or Kelly than someone who finished lower, and, more importantly, than someone who just won any one of those workouts.
Let's go to extremes. What about a powerlifting meet or a marathon? How well do they predict the finishing order of other events? Not nearly as well.
We can play this game all day long. Every individual workout is limited. The broader the test, the more likely it is to predict the finishing order of another broad test. When it comes to a competition, though, there are finite options for time and modes. How much is enough?
Two Primary Approaches to Scoring
There are two basic approaches to scoring, each with multiple permutations. The first is points based on placement, and the second is based on performance relative to other competitors. The 2007 and 2009 Games are versions of the former; the 2008 Games are a version of the latter.
Each system has benefits and drawbacks. The primary drawback of the first is that it doesn't recognize margins. We can easily create scenarios where not recognizing margins is flawed. Athlete A beats Athlete B in Fran by 1 second. Athlete B beats Athlete A in Helen by 2 minutes. With just points and two events, they tie, but obviously Athlete B has demonstrated superior fitness.
Fortunately, this problem is at its worst with two events and decreases in severity with each additional event and participant. With enough diverse events and a healthy field of competitors, the cumulative placings are an excellent method of ranking the athletes' relative capacity across broad time and modal domains. This becomes less true if there are cuts, particularly if those cuts are dramatic. What happens with cuts is that chinks in the armor are more heavily penalized early in the competition than later. This isn't necessarily a bad thing if you're trying to determine the most well-rounded athlete (I'll leave the merits and drawbacks of penalizing chinks to another article).
In contrast, several types of proportional scoring recognize margins. Every second counts is one method but requires that everything be measured in time (no 1RM events and no AMRAPs). There are several mathematical approaches to creating scoring systems that allow for all types of events. Some are relatively simple, and some exceptionally complex. The intent, though, is the same: to recognize that beating someone by 1 second is different than beating someone by 2 minutes.
The Problem of Ranking Margins
There is, however, a fatal flaw in using proportional scoring for most CrossFit competitions. It assumes that the margins (seconds, pounds, reps, or even percentages) are equally valid representations of fitness. This is not necessarily true. It might be true, but it's pretty easy to see examples where it isn't. The margins in single-element workouts are generally much more dramatic than multi-element workouts. Yet, I would argue that multi-element workouts are much more indicative of fitness, whereas the others reward specialists. Therefore, a proportional system, which inevitably weights specialized events heavier, would not rank athletes ideally for fitness levels.
Let's take a dramatic example. Chris Spealler shoulder-presses about half what Rob Orlando presses. If we were to have a press event and score it with weighted margins, Speal would have no way to make up the difference in all the other workouts combined, even if he won each subsequent event. Does that mean Rob is surely fitter? No way.
Now, while this is an extreme example, once you accept that the margins in various workouts are not equal, you instantly get the monumental task of trying to assign relative values to margins within each event. And this, I'm sure you can understand, is practically impossible given the limits of today's knowledge. How do you compare the differences in a long chipper, an 8-minute AMRAP, a max deadlift and a shuttle sprint? We are quite far from being able to accurately and precisely determine the quantitative differences.
How Much does the Scoring System Matter?
The best athletes tend to win. Change the events and the scoring system, and if the competitions are reasonable tests of fitness, the finishing order will be mostly the same. Credit pure fitness, drive to win, or some combination thereof—this is the primary factor in finishing order.
The quality of programming is probably the second-most-important component, at least when it comes to CrossFit competitions. It is well understood that most exercises, and thus some workouts, favor one type of athlete over another (all else being equal, body-weight exercises favor lighter athletes and heavy barbells favor bigger athletes). Putting together workouts and combinations of workouts that test fitness over physical traits is a challenge. Broad time and modal domains are hard to capture in four to five events.
Finally, the scoring system should work in conjunction with the programming to determine the fittest athletes. The right scoring can't make up for bad programming, nor is the wrong scoring system likely to ruin great programming. That being said, I think simple ranking systems are better than any other for most CrossFit competitions, particularly if you aren't highly confident that each event is an equal test of fitness.