2010 CrossFit Games Finals

Scoring CrossFit Competitions

How numbers and systems help determine who is the fittest athlete across broad time and modal domains

The question about how best to score a CrossFit competition has been a hot topic for a few years now. We have a new sport, and we have experimented with a variety of scoring formats. In this article, I will explore several of the key issues and the implications of each scoring approach.

First and foremost, the issues of the arbitrariness of rules and questions of fairness have been addressed in previous CrossFit Journal articles (Capacity, Standards, and Sport and All Other Things Being Equal: The CrossFit Fairness Doctrine). In sport, rules are agreed upon before the competition starts, and these rules are not necessarily fair. In boxing, you can't kick, but in kickboxing you can. Neither prescription is more inherently fair; the fairness comes in that each participant is governed by the same set of rules.

This, of course, brings up a different topic: the quality of judging. From Olympic boxing to Major League Baseball to high-school volleyball, almost every sporting event has fans complaining about the judging, reffing or umping. While this is an important topic, it’s quite outside the scope of this article.

So, before we dig into the question of which (if any) scoring system is ideal for a CrossFit competition, it’s essential to recognize that any reasonable system is legitimate because there is nothing wrong with arbitrary rules. Those rules might favor certain traits (intentionally or unintentionally), but that's true with every sport. Instead, we're going to explore the implications of different scoring systems and compare those to the intent of the competitions.

What Are We Testing?

Most sports have a clearly defined goal. A single ball, puck, discus, javelin, frisbee, or flag has to get somewhere. Or a distance must be covered. Or an opponent vanquished. Other sports have aesthetic and execution criteria that are judged, such as gymnastics, ice skating, and diving. For CrossFit, the general test is one of fitness.

Fitness is defined as increased work capacity across broad time and modal domains, which is a tricky thing to get your head around. No single test could possibly determine the fittest because each test is of a single time domain. Each reasonable test also has limited modal domains. So, any test of fitness must be a multi-event process.

The next obvious question is how many events do you need? The absolute answer is you need a number of events that will prevent different results and produce a consistent winner. Assuming there is such a thing as overall fitness, at some point adding tests of fitness will not change the determination. In 2009, we crowned Mikko Salo the World's Fittest Man after eight events. Would that have changed if we had made the competition 10 events? Twenty events? I don't think so.

Of course, not all events are equal tests of fitness. In 2009, we had a long run as the first event, followed by a heavy deadlift as the second event. Both are fair and legitimate events in a CrossFit competition (barring the limitation of the heaviest barbells for the men, which was a question of logistics not concept). So also would have been a single event combining a long run interrupted by heavy deadlifts. So also would have been adding handstand push-ups, pull-ups, or even swimming.

How exercises are combined into events, and how events are combined into competitions, is one of the most interesting components of our sport. It allows for constant variation and the evaluation of the broadest tests of GPP (general physical preparedness). Not knowing what the events are ahead of time improves an imperfect test of fitness (one that has a bias, whether intended or not). For example, being ready for anything and doing well at heavy deadlifts is very different from specializing in heavy deadlifts and not being prepared to run long distances.

The Ability to Predict

A good test of fitness should be able to reasonably predict the results of other good tests of fitness. In other words, if the ’09 Games events were replaced with the 2010 Northwest Regional events, would Mikko Salo still have won? If both were good tests of fitness, then the answer should be yes. If not, then perhaps not.

Let's dig in a bit. Does someone's Fran time predict his or her ability to win other workouts? How about Helen? Does the finishing order of Fran predict the finishing order of Helen and vice versa? The answer is sometimes but not always. What about the combined finishing order of Fran and Helen as a predictor for the finishing order of another workout? Depends on the workout, right? Maybe the data would apply more to Diane than max deadlifts or a long run—unless handstand push-ups are a particular weakness.

I think we would all agree that the overall winner of Fran, Diane, Helen, and the Filthy Fifty would be more likely to win Cindy or Kelly than someone who finished lower, and, more importantly, than someone who just won any one of those workouts.

Let's go to extremes. What about a powerlifting meet or a marathon? How well do they predict the finishing order of other events? Not nearly as well.

We can play this game all day long. Every individual workout is limited. The broader the test, the more likely it is to predict the finishing order of another broad test. When it comes to a competition, though, there are finite options for time and modes. How much is enough?

Two Primary Approaches to Scoring

There are two basic approaches to scoring, each with multiple permutations. The first is points based on placement, and the second is based on performance relative to other competitors. The 2007 and 2009 Games are versions of the former; the 2008 Games are a version of the latter.

Each system has benefits and drawbacks. The primary drawback of the first is that it doesn't recognize margins. We can easily create scenarios where not recognizing margins is flawed. Athlete A beats Athlete B in Fran by 1 second. Athlete B beats Athlete A in Helen by 2 minutes. With just points and two events, they tie, but obviously Athlete B has demonstrated superior fitness.

Fortunately, this problem is at its worst with two events and decreases in severity with each additional event and participant. With enough diverse events and a healthy field of competitors, the cumulative placings are an excellent method of ranking the athletes' relative capacity across broad time and modal domains. This becomes less true if there are cuts, particularly if those cuts are dramatic. What happens with cuts is that chinks in the armor are more heavily penalized early in the competition than later. This isn't necessarily a bad thing if you're trying to determine the most well-rounded athlete (I'll leave the merits and drawbacks of penalizing chinks to another article).

In contrast, several types of proportional scoring recognize margins. Every second counts is one method but requires that everything be measured in time (no 1RM events and no AMRAPs). There are several mathematical approaches to creating scoring systems that allow for all types of events. Some are relatively simple, and some exceptionally complex. The intent, though, is the same: to recognize that beating someone by 1 second is different than beating someone by 2 minutes.

The Problem of Ranking Margins

There is, however, a fatal flaw in using proportional scoring for most CrossFit competitions. It assumes that the margins (seconds, pounds, reps, or even percentages) are equally valid representations of fitness. This is not necessarily true. It might be true, but it's pretty easy to see examples where it isn't. The margins in single-element workouts are generally much more dramatic than multi-element workouts. Yet, I would argue that multi-element workouts are much more indicative of fitness, whereas the others reward specialists. Therefore, a proportional system, which inevitably weights specialized events heavier, would not rank athletes ideally for fitness levels.

Let's take a dramatic example. Chris Spealler shoulder-presses about half what Rob Orlando presses. If we were to have a press event and score it with weighted margins, Speal would have no way to make up the difference in all the other workouts combined, even if he won each subsequent event. Does that mean Rob is surely fitter? No way.

Now, while this is an extreme example, once you accept that the margins in various workouts are not equal, you instantly get the monumental task of trying to assign relative values to margins within each event. And this, I'm sure you can understand, is practically impossible given the limits of today's knowledge. How do you compare the differences in a long chipper, an 8-minute AMRAP, a max deadlift and a shuttle sprint? We are quite far from being able to accurately and precisely determine the quantitative differences.

How Much does the Scoring System Matter?

The best athletes tend to win. Change the events and the scoring system, and if the competitions are reasonable tests of fitness, the finishing order will be mostly the same. Credit pure fitness, drive to win, or some combination thereof—this is the primary factor in finishing order.

The quality of programming is probably the second-most-important component, at least when it comes to CrossFit competitions. It is well understood that most exercises, and thus some workouts, favor one type of athlete over another (all else being equal, body-weight exercises favor lighter athletes and heavy barbells favor bigger athletes). Putting together workouts and combinations of workouts that test fitness over physical traits is a challenge. Broad time and modal domains are hard to capture in four to five events.

Finally, the scoring system should work in conjunction with the programming to determine the fittest athletes. The right scoring can't make up for bad programming, nor is the wrong scoring system likely to ruin great programming. That being said, I think simple ranking systems are better than any other for most CrossFit competitions, particularly if you aren't highly confident that each event is an equal test of fitness.
 

Rogue Fitness Again Faster Equipment CrossFit Journal

Proudly supporting the 2010 CrossFit Games

Follow crossfitgames on Twitter

Follow the CrossFit Games on Twitter

The CrossFit/USA Weightlifting Open

The CrossFit/USA Weightlifting Open

24 comments on this entry

1. Christopher Cavanaugh wrote...

Very well put. This was my first rodeo, my judge was fair. I gave 100% and missed the bus by coming in 16th out of 15. No regrets
I AM CFJAX
Ret Master Chief
I will return

2. Sean Falconer wrote...

This is really interesting and highly related to some data analysis I've recently been doing with the regional data sets available.

For anyone interesting, I've built a tool for visually exploring sectional and regional data: http://keg.cs.uvic.ca/seanf/crossfit.php

It actually allows you to dig into the results data a bit to see how different programming and ranking can potentially influence the outcome of a particular competition.

I've also written a couple of posts about my own analysis of the data. In the first one I applied clustering techniques to explore what characteristics were consistent across athletes that had similar performances in the Canadian Regional competition. In a second post I use my visualization tool to further explore the Canada Regional events and athletes.

1st post: http://tinyurl.com/23bzpav
2nd post: http://tinyurl.com/23xmuoj

I think it would be interesting to see how a given sectional, regional or games event result changes based on the scoring system used. However, like the article discusses, this is a really complex issue and there isn't a simple perfect system.

3. Web Smith wrote...

Sean,

Your analysis is BEAUTIFUL. Wow, thank you for taking the time to put forth the graphic analysis.

Web Smith
WeAreCrossFit.com

4. Rob Corson wrote...

Steve,
Very cool!

5. jnel wrote...

I believe the scoring based on placement is the best way to go with a crossfit competition. Though the problem described with athlete A and B is right, i believe that scenario would be impossible because the crossfit competition is not based off 2 athletes and 2 events, one more event (given it reveal another domain of fitness) would certainly show which athlete is superior to the other.But we are talking 50+ athletes across 8 events. And as the pool of athletes become more competitive, in this constantly varied sport, that type of situation would be impossible. The best man at the end of the weekend will be quite evident.

6. Sean Falconer wrote...

@Web,

Thanks for the response. You may be interested in the proficiency chart calculator and visualizations I have also built: http://www.myfrantime.com/calculators/proficiencychart

It's not perfect, but I think it's quite useful. One of the problems with it is it always assumes a linear relationship between levels of fitness, which is obviously not correct. As someone comes closer to an elite time, like a 4 minute mile, every second closer means a lot more than if you're at say an 8 minute mile. The slope should not be the same, it's more likely an exponential curve.

I heard that you have been working on a system for predicting the outcome of competition events. I'd be interested in learning more about this. This is something I also considered when I did the cluster analysis. The characteristics of each athlete cluster can be used as predictors based on the characteristics of particular events. With more data, like athlete heights, weight, lifts, metcon times, etc. I could incorporate more features to further delineate the pool.

7. WhatIf wrote...

What if you used a competitors body weight in the equation when looking at a 1 RM shoulder press. If Chris can only lift half as much as Rob and Chris' body weight is only half of Rob's then everything would be equal. What are some negatives from using that approach?

8. chad mckay wrote...

"It is well understood that most exercises, and thus some workouts, favor one type of athlete over another..."

specifically i think the muscle up and double under favor crossfit athletes over say, Jimbo's strength and conditioning athletes. In order to keep the games a contest for the "worlds fittest human" I propose the bar muscle up replaces the ring muscle up as the games standard (requires less proficiency) and the double under is just downright eliminated.

9. Sean wrote...

Sean,

I am. I've made a lot of progress but its still not ready yet. I love the sport so I frequently analyze the individuals that I compete against (many of whom I look up to). The problem with developing a ranking system is, subjective (non-evidentiary) data seeps into the formulation of objective numbers.

For instance, with the numbers that I gathered before the regional, I was pleased with my post-qualifier outcome. I knew that Vic Zachary had an "off-the-charts" level of capacity. Knowing him personally, I also knew that he didn't train 3 of the last 5 weeks prior to the qualifier. Quantifying the meaning (in numbers) of such a training "lull" is one of the problems with my numbers. It is hard to determine if his taking time off would help him or hurt him.

I took a close look at several other athletes whose performances matched their pre-qualifier "ranking" and in the end, I was only moderately happy with it. I know that it will really work when I can look at a regional competition in Australia (where I am not closely familiar with any of the athletes) and correctly determine their positions.

I just love this stuff. Hopefully, we can chat at the games. I'll be competing for CrossFit Central's AFF team.

10. Sean Falconer wrote...

@WhatIf,

I think there's several potential problems with this approach. The first thing to consider is that you would need to apply this type of proportion across the board as one could argue it's not fair to a heavier person to run 5KMs and be compared to a lighter person. Potentially, this could be extended to metcons as well. There's lots of running in Nancy and lots of lifting in Isabel, so we need to take into account bodyweight there as well.

So, assuming we agree on this, the next issue is how do you actually measure performance relative to bodyweight? With pure weight, we can measure the lift relative to bodyweight, however, even in this very simple scenario there's issues. If we take the proportion then we assume that the relationship between strength and weight is linear. This may not be the case.

There's also potentially other factors that could play a part in comparing athletic performance. For example, height could be another compounding factor. Wallballs favor a taller person while air squats favor a shorter person. If athlete A does 20 air squats in 20 seconds and athlete B does 20 as well, but B is a foot shorter, then should A be rewarded for covering a larger range of motion in less time?

This could get pretty complicated. We could end up with a figure skating type scoring system that's perhaps super fair but no one understands :-).

11. jeremiah Ingersoll wrote...

Great article with lots of good points.
I do have a question off the topic though. Is there a place for teams who made it to the games to register or confirm they are going? I am with Kirkland Crossfit and we haven't heard anything yet. Any information would be helpful in our planning. Thanks.

12. Sean Falconer wrote...

@Chad,

I think what you are proposing gets a little sticky. If we eliminate double-unders because they are not just an exercise but one that requires training and skill, then we could also carry this argument to other fundamental movements of CrossFit. Should we eliminate olympic lifts in favor of lifts requiring less proficiency of movement?

I think we have to accept that skill with all kinds of forms of movement is part of fitness.

13. Sean Falconer wrote...

@Web,

Sounds good about the Games. I'm hoping to be there as a spectator.

14. chad mckay wrote...

@sean

In my mind, crossfit already eliminated the pure olympic lifts. for example at last years games we used a "looser" standard to define the snatch than a typical oly lifting competition.

I do agree that this argument gets sticky and i may not have thought it through to its conclusions, but i think it is important to consider.

In the future i imagine the games attracting top tier competetitors from other athletic backgrounds, not just crossfit. My hope is that in the future those athletes won't be penalized like Annie Thorisdottir was at last years games.

15. AA wrote...

What about a system like the olympics use in the decathalon? It rewards performance The better you do, the more points you get. It balances the different measurement types and orders of magnatude. Each event has it's own second/meter/centimeter to point ratio. The unit of measure to point formula is also progressive to reward the generalist and penalize the specialist int his/her non specialties.

16. Sean Falconer wrote...

@AA,

I've thought about that before as well. I think the difficulty here would be figuring this out for an arbitrary workout. With the decathlon, it's always the same 10 events. However, across the regional competitions alone there's probably several hundred workouts.

17. Ken Gall wrote...

Sean - excellent article. I agree completely on the scoring system tradeoffs. And your other thoughts are really spot on.

The one thing that I think is sometimes overlooked is the number of events. There is no way 3-4 events can assess overall fitness defined as broadly as crossfit defines it. I think the best example of this would be Tommy Hackenbruck. He is clearly a tremendous athlete and finished second when a large number of events covering a broad range of fitness were incorporated. But his regional in 2009 was heavily biased towards strength with only a few events and this eliminated him initially.

Cumulative scoring that carries from sectional to regionals to the games may be a way to add move events without having regionals with 8 events. But that has some challenges also. In either case, I do think the number events in a regional should be on par with the games, particularly with the decreasing number of spots. And it seems this number needs to be at least in the 8-10 range if the events are truly distinct.

18. Ross blake wrote...

What about having skill sets and elite athete requirements within competitions to help decide the impact on points and placings?

For example the OPT big dawg challenge earlier this year referenced a level 4 crossfit standard on an OHS wod. The wod was 3 sets 15reps OHS and the top 10 after day 1 of various events had to have achieved bw on the squats in order to progress to the next day of competition.. I think DJ wickham moved the most load however, I think it was Brett marshal who won the event due to him going over bw.

Could be handy, could just open a can of worms but perhaps this is an answer to the speal- Orlando comparison?

19. Jeff Chester wrote...

The NBA does not make any concessions for height. The game favours people who are tall, but that doesn't mean that people who aren't 6'6" don't make it to the pros.

If CrossFit really wanted to equalize things you could get rid of all the confusing bodyweight calculation talk and just have a few weight classes. You would have a winner from each weight class and an overall winner. All classes could do the workouts at the same time with the same weights. This could possibly increase competitive interest in people who think they would have no chance of winning anything as it is. Personally I am a stereotypical CrossFit size at 190 lbs and 5'10" so it doesn't matter to me but just thought I would throw it out there.

20. Jeff Chester wrote...

As for getting rid of ring muscleups and double unders that is just crazy sillyness! I wouldn't sign up to compete in something that I haven't done some research on and training for. MU's and DU's aren't exactly secrets if you look at mainsite WODS. You wouldn't show up to a swim meet and not know how to do the front crawl would you? CrossFit Games competition is way beyond just trying to get people out to "be active" and "give it a go", that should be saved for affiliate level competitions.

21. chad mckay wrote...

I might concede the point on double unders. Mikko Salo didn't know how to do them effectively before last years final chipper and he did them one at a time...so there is not that great of a skill requirement.

I still think the ring muscle up is too technical. I think it has a great place in crossfit training but the bigger the games get the more likely great "outside" athletes will want to compete i.e. the icelanders at last years games not having ring muscle ups. Anyone with the requisite strength can do a bar muscle up even if it is ugly as sin.

22. Sean Falconer wrote...

@Jeff,

I think weight classes is an interesting thing to consider. I actually just commented about this on my own blog.

"I know this goes against some of the CrossFit philosophy, but I think like old school UFC, CrossFit athletes are evolving. When UFC started, there was not weight classes. Large fighters were usually slower and less skilled than smaller ones, so things could be balanced. However, as the sport evolved, the larger fighters got faster and more skilled. Now, a light heavy weight would most likely destroy any light weight.

I think we're seeing a similar evolution with CrossFit athletes. Big guys are getting faster and they are super strong. Consider Paul "Kong" Smith: http://www.sicfit.com/blog/post/show/id/117-Paul-Kong-Smith-Signs-that-the-Game-has-Changed

This is only the beginning of the evolution."

Weight classes would be another way to include more athletes but keep the competition level high.

23. Kody King wrote...

The following is an excerp from an article submitted to the CFJournal last week. It covers problems with our current scoring systems and in it I offer the following solutions. I am going to have to post it in two installments because of length:

"1. We go back to our roots. We use creativity and appropriate planning in order to still use an “every second counts” format. I personally love the creativity that can be put into this one.

“What about AMRAPS?”, you quickly ask. Let’s consider Cindy. For WOD 1, we could have the athletes perform a “For Time” WOD in order to establish an initial time. WOD 2 could be 10 minutes of Cindy. Each round of Cindy could subtract 15 seconds from the athlete’s overall time. There are 30 reps in each round of Cindy so each rep could be worth 0.5 seconds. So for each rep of Cindy we would subtract half of a second from an athletes total time.

“What about max loads and/or max reps?”, could be you next question. Let’s consider the 2009 Deadlift event. Each of the athletes received a time for the 7K run. The competitors could then subtract 30 seconds from their total time for each bar they lift. Reaching the final bar could remove a total of ten minutes from an athlete’s total time.

The South Central Sectional used a version of this format in an event where they had competitiors row 1K then do a max rep of deadlifts at a certain weight. The men were able to subtract 7 seconds from their row for every deadlift rep they completed and the women got to subtract 10 seconds per rep.

This type of format takes careful planning in order to weight each of the events appropriately. If the events were carefully selected, and enough modalities are tested, the results would be flawless. The winner would be undeniable."

24. Kody King wrote...

"2. We use a combination of finishing place and standard deviation, while having each event weighted exactly the same.

This could be accomplished by making the maximum score for each event 100.

Fifty percent of that score would be achieved by normalizing the standard deviations so that the first place finisher receives 50 points and the last place finisher receives 0 points. Everyone in between would receive a normalized version of where they placed relative to these two competitors and relative to the average score for the event. This sounds complicated, but the SC Regional just used standard deviation as their only method of scoring. Using standard deviation allows us to factor in an athletes margin of victory over his/her competitors. Normalizing it to values between 0 and 50 would weight each event equally.

The other fifty percent of the score would be a normalized version of the competitors rank. If a competitor finishes first out of a field of twenty five, he would then recieve a raw score of 25. That number would then be plugged in for "y" in the formula ((y/25) x 50). So, half of his score would be ((25/25) x 50) which equals 50. There are 25 total competitors, so each of the competitors places would also be divided by twenty five so that the tenth place competitor would receive for half of his score ((16/25) x 50). Half of the last place competitors score would be ((1/25) x 50).
This process would award competitors in two ways. First, by using a normalized standard deviation, it awards “WODKILLAs” for how far ahead of the field of athletes they are on a WOD. Second, it also awards competitors using place order, so that athletes who “edge out” their competitors in the heat of battle are rewarded for that effort.

This method also assures that each of the events is weighted equally over the course of a competition."

The problem with the second solution is that people may see it as too complicated, I see it as a way to reward great athletes for performing well.

-->

The 2010 CrossFit Games Official Sponsors

For more information on how to sponsor the CrossFit Games, see the sponsorship page.

  • PROGENEX
  • Under Armour
  • Rogue Fitness
  • Again Faster Equipment
  • MiR Pro Weighted Vest
  • Concept 2
  • WeightVest.com
  • Vibram Five Fingers
  • OMG*Omega3
  • Stud Bar
  • Regupol America
  • Mindbody Online
  • Forged Clothing
  • GarageGym
  • Gymboss Interval Timer
  • Inov-8
  • At Large Nutrition
  • LifeAsRx
  • LifeAsRx
  • Undefeated Sports Nutrition
  • SEAL FIT: Powered by NAVYSEALS.COM
  • Rage Fitness Supply: Leader in Functional Fitness Equipment
  • Stronger Faster Healthier
  • Watermans Applied Science
  • Rocktape
  • DeFeet
  • AquaHydrate