Ultra Running Races- shallow competition? Part 2

A question addressed to Part 1 of this post indicated that the development of  a definition of competitiveness was lacking. I agree and provide the following in an attempt to elucidate the basis in a more complete way. The “statement of competitiveness” in Part 1 of this post is the following:

“The most defensible metric for such comparisons is the distribution of finishers as a function of percent back from the winning time.”

There is much background to this statement but I will summarize by pointing out that it is often misunderstood what ‘competitive’ is. An individual sport is competitive when the tail of the performance distribution is populated, i.e. the best talent has been attracted to the sport and this talent regularly performs at or near the individual’s attainable level. This axiom is based on the reality that individual athletic performance is the result of the interaction of many variables (talent (however one wishes to define it, but likely to be based on physiology), training, stress (physical and emotional), health, experience, etc.) and yields an approximately normal (or Gaussian) distribution of performance. Such a distribution has a ‘high performance tail’ that informs one as to the extent of the competitiveness of the sport. This tail is populated by athletes who perform at 3+ standard deviations from the mean. Here we are speaking of athletes who’s performance is beyond the top 0.27%. If this tail is populated, and provided there are a sufficient quantity of participants in the sport to ensure validity, then it is a direct measure of how competitive the sport (or event) is. A sometimes unappreciated derivative of this approach is the reality that such ‘3 sigma’ athletes are rare, very rare, and that the most competitive races will therefore reliably (although not always) exhibit this high performance tail. The 2011 London Marathon data presented in Part 1 of this post is indicative of a highly competitive event where the high performance tail of the distribution is nicely defined. Just ask someone like Max King or Sage Canaday (or any 2:15 marathoner) exactly how superior a 2:05 marathoner is- if they do not use the term ‘exponentially’, then they should as the differences are parametrized by an exponential function.

To further substantiate this approach, presented below is a table of the percent back analysis described in Part 1 for a few additional open* races- the 2012 Chicago Marathon, the 2012 Pikes Peak Marathon, and additional time series data for the Western States Endurance Run (2004-2011). Slide11Note the very low proportions of competitors finishing in the top 10% through the top 30% in the road marathons compared to the trail marathon and ultramarathon events. This supports a view that the competitive nature of the these events is likely at an immature state relative to road marathons. These data are not surprising given that the participation in trail and ultramarathon events is just now growing at a (seemingly) fast rate, whereas participation levels in road marathons has essentially plateaued (or are in slight decline). It is expected, as the trail and ultramarathon sports continue to grow, that the ‘high performance tail’ will have increasing probability of populating and statistically ‘superior’ performances will be extant, just as they are in road marathons.

The existence of a group of competitors who compete at a similar level and compose a ‘winning’ population in a sport does not mean that the sport is competitive. Examples of this have been seen in triathlon and cross country mountain biking. The early days of competition in these sports yielded winning finishing times (or speeds, in the case of mountain biking) that today, only 20-30 years later, are mediocre. Technology certainly played a role in the decreased times and speeds but, more importantly, these sports went through substantial growth coincident with significant improvements in performance. This is the result of the attraction of competitors to these sports whose ability and focus allowed for the continued challenge of what would, in times past, be considered ‘superior’ performance. These athletes pushed the boundaries of what was considered ‘possible’ i.e. the tail of the performance distribution was populated and rare, ‘3 sigma’, athletes became an integral part of the sports. There is every reason to expect that the same will obtain in trail and ultramarathon disciplines as participation goes mainstream. Given the current distributional data on ‘percent back’ from winning times (examples of which are provided above), it is apparent that the sport of trail ultra running is, from a performance perspective, in infancy. As stated in Part 1:

“If the 2011 London marathon data are indicative, these ‘sharp end’ ultramarathon competitors would be few and significantly better than the rest.”

At this point, statistically, the ‘sharp end’ competitors in ultramarathons are many and not that much better than the rest.

These data and the associated analysis are provided to add a data-based entry into the on-going discourse on competitiveness in ultramarathons. These comments are specifically intended offer an objective view, independent of individual athlete references.

*note: ‘Open’ races are those that do not involve any world-class time qualification for participation; ‘closed’ races such as world championships, Olympics, and numerous other races typically involve time or team (and sometimes both) qualifications. However, there is data to suggest that even the ‘selected’ populations in ‘closed’ races are self-similar to the at-large populations and therefore exhibit a normal (Gaussian) distribution and the arguments above would apply. This is a subject for another post.


6 thoughts on “Ultra Running Races- shallow competition? Part 2

  1. Interesting, but some apparent flaws:
    1. The cutoff for WSER and other ultras is essentially ~1.9 or so of the winning time, vs. 3x or so for marathons. This makes the direct comparison of unscaled %-back suspect.
    2. Similarly, you address “open” vs. “invite” races, but even if a Gaussian distribution applies, why would we expect raw percent back to be the applicable metric, then?
    On these two, I think we would benefit from std dev instead.

    3. By this metric, looking at the time series, it suggests that WSER is getting *less competitive* over time, which is opposed to your observation, “these sports went through substantial growth coincident with significant improvements in performance.” The improvements in performance in ultramarathons over time, I think, can be more easily justified (course records, popularity) — but goes against the distribution trend that you mention.
    4. “The existence of a group of competitors who compete at a similar level and compose a ‘winning’ population in a sport does not mean that the sport is competitive” — given the choice, I would argue the opposite, and against the notion that a few outlying “elites” necessarily means *more competitiveness.*
    Aren’t the close-fought battles (“Duel in the Sun”) more competitive than runaway victories?
    Wouldn’t most people consider “parity” in team sports (a close final score, victory percentages closer to 0.5) to be more “competitive” than runaway victories?

    In fact, I would suggest that the use of the word “competitive” in most disciplines (economics, biology, etc.) would tend to favour definitions in which the players are more evenly matched. The opposite would be “uncompetitive,” which would have players farther away from victory.

    OK, it is a good summary of the trends and differences, but I do not buy the metric as a valid comparative measure of competiveness. This could be strengthened or changed, perhaps, if you wanted to argue “professionalism” and “specialization,” in which we would observe a growing trend of a larger group of people making continued progress over contemporary and past peers.
    Thanks for the diversion!

    • Hi Mike,

      Thanks for the thoughtful comments. The post is an analysis in process and input is much appreciated! Here are some answers- and some questions:

      1. Not sure what you are asking, but the truncation of the marathon data sets was conducted at 110%, or about 2X the winning time.

      2. I am using the percent back as an measure of the relative performance of the competitors in the race. ‘Percent back’ is commonly used in many sports to assess existing and developing talent. As it is a direct derivative of finishing time, I see no loss of information by using it. Percent back also gives a calibration to absolute performance on the day and a more usable metric (percentage) for training development. The std dev is, of course, useful, but I was interested in the shape of the distribution, given the ‘bunching’ at the high performance end of the distribution in the ultramarathon data. I have since dragged out some stats software from my previous life and have found that the marathon data better fit a log normal distribution, indicating that the characteristics controlling performance are multiplicative rather than additive. Work in progress. The ultramarathon data have the ‘bunching up’at the high performance end which does not fit well. However, I have done a bit more research and found this interesting article that describes just such a ‘bunching’ of high performance in aggregated marathon data:


      They suggest a sociological origin…. still thinking about this. Such ‘bunching’, although of a much greater magnitude, is quite evident in virtually all ultramarthon data that I have analyzed to date.

      3. If you run an analysis of the WSER time series data there is no significant change in the distributions of percent back within the sample. There are outliers, but with any reasonable confidence interval (I will suggest a 90% CI), there is no statistical difference. The significant difference is between the ultramarathon data and the marathon data. I see these ultramarathon data as indicating that the shape of the percent back distribution is ‘blunt’ at the high performance end- something that I explained might be due to, or indicative of, a competitive environment that is not fully developed. Perhaps the sport has yet to see the results of the growth on competitiveness- there could be a lag time or a transition period… or sociological elements may play a role… or other things we have yet to think of.

      4. Yes, parity in team sports is essential to the ‘economics’ of such sports, otherwise who would care. Much of the literature on ‘competitiveness’ is on just this subject primarily because of the multi-100 billion $ industry which is extent in team sports. Unfortunately there is not a lot of research on individual sport competitiveness, hence my ‘shot’ at it. I think competitiveness in individual sport is a different calculus than for team sport, and particularly so for a such a ‘pure’ sport as running. One view, as I tried to present it here, is that excellence in individual sport is defined by the deepness a given performance is into the tail of the performance distribution- that these deep-into-the-tail performances set the competitive context for all other athletes. As argued, using a simple approach that realizes that individual athletic performance is the result of an interaction of a large number of variables, then normal (or perhaps, more accurately, log normal) statistics will control the shape of the performance distribution. Such statistics define an exponential tail- something that does not seem to be present in the ultramarthon data that I have, so far, analyzed. Once again, work in progress…

      I plan to continue the development of a ‘competitiveness’ metric for ultramarathons, if for no other purpose, than to satisfy my own curiosity. If any progress is made I will post it here. Given the paucity of data-based discourse on the subject, I hope that this and future posts are taken as attempts to add positively to understanding ‘competitiveness’ in ultramarathons (and marathons as well) and not as any sort of derogatory commentary on the sport.

  2. Couple comments – first, the 2011 WSER data you used in your original analysis is clearly an outlier; the percent finishing in the first 10% back is far higher than any of the other races. Your data above makes a pretty compelling counter-argument to the entire original post (at least as far as the matter of degree). Second, given the nature of WS100, with qualifying standards, a small field, and a (comparatively) larger proportion of that field consisting of ‘elites’ who are there to shoot for a top finish, it’s not clear that a comparison with a major road marathon like London or Chicago is appropriate. A more relevant analysis might compare western states runners with finishers at Boston – including *only* runners who gained entry by qualifying. Another important step to take would be to include the DNFs somehow. The rank and file runners who struggle at a marathon can easily walk the second half and finish, if that’s their goal. When was the last time London saw a 70% finish rate? Chopping off the slow end of the London distribution helps, but it also implicitly assumes the DNFs at States would have finished after 30 hours anyway. If you plunk them back in and stick them between 25-30 hours, it changes the shape of the distribution and the implications. All that said, I appreciate what you’re trying to do, but a simple glance at the numbers you’ve included above shows how variable data for even a single race can be. And wouldn’t an equally valid explanation for the numbers be that only athletes of a certain quality particate in 100 mile runs? You can’t simply rely on a definition of ‘competitive’ to give your argument weight. I’m not necessarily disagreeing with you, just making some observations.

    • Hi Ethan,

      Thanks for the encouragement… as I said to Mike above, I am just trying to put some positive data-based information and analysis in place.

      I do not think that WSER data for 2011 is an outlier. I will just point to the WSER 2006 data as an example and see my response to Mike above for additional information.

      Not sure I understand the counter-arugument you suggest and the logical process involved.

      w/r/t WSER, yes, it is a relatively small field but since any errors are going to be proportional to n^1/2, the 300 or so participants represent a reasonable population for testing. In the case of the selection process, I think maybe 30 of the participants are “invited’ (previous top 10, if they choose to participate, and the Montrail Cup qualifiers, if they choose to participate), the rest are from the lottery, so I do not see the race as dominated by elites. In fact many ‘elite’ and national-class runners complain that they cannot get in. If you do look at a sample of Boston data, you will find that the distribution has the expected exponential tail- and interestingly, sometimes has the ‘bunching’ at the high performance end mentioned above in my response to Mike. I am currently looking into this as this may bear some fruit for understanding the ultramarathon data.

      Still working on developing good metrics for ‘competitiveness’ in ultramarathons, so any suggestions will be appreciated. Thanks for taking the time to read and comment on what I posted!

  3. Thanks, this does clear things up, but one thing that I think is important to keep in mind is the Western States entry process. Pretty much every runner who wants to run who has a chance at being in the 10%-20% back groups has the chance to run (either through the previous top 10 exception, the ultra cup qualifiers, or via special consideration). In this sense pretty much all of the few thousand runners who want to run each year, but don’t get in via the lottery would be down in the slower parts of the race. If WS was open to everyone who wanted to run it the numbers would look a lot more like the numbers for UTMB, which are more or less similar to the marathon numbers you point out. Yes, the WS race size seems on the surface that it would be enough participants to be a decent statistical comparison, but when you consider that the entry process is set up the way it is, i think the fact that the race is limited to 350 runners (but all top runners who want in can get in) is the only reason that these numbers you present look so skewed. The ironic thing is that WS states uses the entry process they use to try to ensure as competitive of a race as possible, but this actually leads to making the race look much less competitive (using your system) than it would if they just took 350 runners at random without preference to faster runners. This is where I think your system of measuring competitiveness is extremely flawed. Basically the more middle and back of the pack runners that a race has (in comparison to the number of runners it has within the top 30% back or so), makes it look “more competitive.” The only other way for it to look significantly more competitive using this system of rating is for someone to run 20 or 30% faster than the current winning times, something which is never going to happen. Even is someone ran WS 10% faster than the current fastest times (something which is also very unlikely to ever happen, but at least within the realm of the possible), the numbers for WS would still make it look much less competitive than a large marathon. This wouldn’t be because a 13:15 (10% below current race record) at WS isn’t as unusual or as impressive as a 2:05 marathon (in comparison to the typical runner), it would be because WS entry system doesn’t allow the vast majority of typical runners that want to run into the race, and because it has a time cut off that is little more than 100% behind the front runners. Yes, the fastest marathon runners in the world are insanely impressive athletes, but a huge part of what makes these marathons you highlight look so much more competitive on paper than WS has little to do with true competitiveness and more to do with an entry system that allows thousands and thousands of folks who would not have a chance in a million of finishing WS within the time cutoff to go out and plod along for 5+ hours. I’m not saying that high profile road marathons aren’t more competitive than WS, but I don’t think it’s anywhere near to the extent that you highlight here.

    • Hi Geoff,

      Thanks for the thoughts- they help in moving toward a better understanding of the ultramarathon data.

      I will encourage you to read the paper I linked in the response to Mike above, repeated here:


      You will see a possible effect that the WS selection process might have on the finishing time data. This is perhaps revealed in the Boston marathon data in that article. Compared to the ‘open’ marathons the finishing times are shifted to the left and there appears a ‘bunching’ at the high performance tail. This ‘bunching’, or ‘crowding’ as the authors put it, is of a much lower magnitude than that which we see in the ultramarthon data. The authors note a leveling out of the data in the expected exponential tail of the ‘open’ marathons and, in addition, a small peak in the Boston data. This leveling and/or peaking in the tail is not expected from a uniform random population of marathon runners. Note however, that the leveling and/or peaking is very slight in the marathon data but very prominent in the ultramarathon data.

      There may be some fundamental reason that the ultramarathon data are so different than the marathon data. Sample size is not a likely reason as a population of 300 or so is sufficient to yield results with sufficient confidence. Also the ‘percent in bin’ data is essentially corrected for population size so comparisons are valid. I suspect that there are other factors that may account for the bluntness of the ultramarathon distribution data, competitiveness being at least one worth looking at.

      Your point about ‘plodders’ finishing a marathon is well placed, however, the analysis protocol truncates the marathon populations at 110% or about 4 hours- a finishing time that is perhaps the equivalent of a 30 hour finish for a WSER competitor. Remember there are a few 70 year olds finishing WSER in less than 30 hours (I think the record is about 28 hours)- the 70-75 age record for the marathon is about 3 hours. w/r/t ‘packing’ of the population with middle and back-of-the pack runners in the marathons, I see no basis for assuming that the population competing at WSER would necessarily have a significantly lower a proportion of middle and back-of-the-packers than a marathon; over 90% of the participants are chosen by lottery. The proportional number of middle-of-the-pack competitors in a marathon should be similar to that in an ultramarathon.

      As I indicated in the other responses to this post, I truly appreciate the input and will continue to go forward with attempting to understand the unusual shape of the distributions of finishing times in ultramarathons. Perhaps a greater mind than mine will provide clarity, in the meantime I continue to ‘plod’.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s