Competitiveness in Running Races – Part IV – Ultramarathons

Logical validity is not a guarantee of truth.

David Foster Wallace

 

At the outset of this series of posts, I indicated that many discussions involving assessments of the competitiveness of running races and, particularly, the competitiveness of ultramarathons, lack any sort of analytical context upon which one might rely to assert whether or not a race was “competitive”. In the prior three posts on this subject a methodology has been developed (Part I) and successfully applied to known competitive races in the road marathon (Part II) and road 10 km distances (Part III). The methodology utilizes the finishing time and finishing rank order as input data to define the normalized variables of percentage back from the winning time (from the finishing time data) and the cumulative probability/percentile rank (from the finishing rank order). These variables are plotted against one another for each race and this plot results in a graphical representation of the performance distribution for the race. Because the data are normalized, robust comparisons can be made with other races.

Analysis of the functionality of this performance distribution typically leads to a simple exponential function of the form:

y = a • exp (b • x)
where:
x = percentage back from winning time for the cohort
y = cumulative probability of the result in the cohort
a = a pre-exponential factor inversely proportional to the excellence of the winning time relative to the cohort
b = the exponential factor directly proportional to the competitiveness of the cohort
It has been found that, with the exception of the Falmouth Road Race, an exponential performance distribution is extant in all marathon and 10 km road races analyzed. An exponential functionality is expected as the analysis is parameterizing the high performance tail of a normal distribution of competitors. The “steepness” of the exponential function (controlled by the magnitude of “b” in the equation above) is directly proportional to the competitiveness. This allows for the definition of a competitive index (CI) as equal to “b” in the functional form outlined above. Calculation of “b” and comparisons with other races and other race types allows for an analytical basis for assessing the competitiveness of a given race. An extensive comparison of marathon races is provided in Part II of this series and comparison of two well known 10 km races is provided in Part III.
The Falmouth Road Race typically exhibits a linear performance distribution and it is suggested that this is due to the presence of a “stacked field” of competitors assembled by the race directors. As a result, the performance distribution is not exponential since the field of competitors in the high performance tail is not representative of the tail of a normal distribution. Rather, this “non-normal” group of high performance competitors out-perform the expected exponential distribution. This is because the race has artificially assembled the high performance end of the population of competitors and “stacked” the field. Such races may represent the practical ultimate in competitiveness of a given race.
In this post an assessment of the competitiveness of ultramarathons is presented.

Ultramarathons

A selection of ultramarathon events have been chosen that represent the wide variety of such races. Presented here are analyses of:

  1. Western States Endurance Run– a mountainous, primarily trail 100 mile race
  2. Wasatch 100– a very mountainous 100 mile trail race
  3. JFK 50– a trail, towpath, and road 50 mile race
  4. Leadville 100– a high altitude, mountainous trail and dirt road 100 mile race
  5. Ultra Trail du Mont Blanc (UTMB)– a very mountainous, primarily trail 100 mile race
  6. Comrades Marathon– a road ultramarathon
  7. Pikes Peak Marathon– a marathon-length, mountainous trail “ultramarathon”
  8. The North Face Endurance Challenge 50 Mile Championship– a late season (December) trail 50 mile race that typically draws a large proportion of sponsored full-time athletes

The Comrades Marathon and Pikes Peak Marathon will be addressed separately as Comrades is a road ultramarathon (56 miles/ 90 km) and Pikes Peak marathon is a marathon-length “ultramarathon” (for the purposes of the analysis presented here, the Pikes Peak Marathon race is considered an ultramarathon because of the 7000 ft (2100 m) of both climbing and descending).

A great majority of the ultramarathon data fit well to the exponential performance distribution as is observed in an overwhelming majority of the other races analyzed in this study. However, in contrast to all of the road Marathon races analyzed, there are numerous ultramarathon races that exhibit a linear functionality in some years. Presented below is a summary of the data for the eight trail, hybrid, and road ultramarathons analyzed.

Ultramarathon Paramentrics All with Leadville via preview 2

Parametrics Com Pikes NFEC

Among these data are quite a few events that are best described by a linear relationship. As shown in Part III, such linear functionality can be the result of a “stacked field” of competitors where the expected normal distribution is perturbed at the high performance tail. The data for each ultramarathon will be discussed separately  below.

Western States

The Western States Endurance Run is a long-running (since 1973), very popular, and a generally accepted de-facto 100 mile trail ultrarunning championship race (although no “championship” award is given). The race is also one of the four races comprising the “grand slam of ultrarunning”. Due to the popularity of this race a lottery system for entry has been in place for quite some time. The field is limited to about 350 due to USFS Wilderness permit restrictions for a tiny portion of the course and, as of 2014, in excess of 2000 qualified applicants are in the lottery each year. Therefore entry into this race is highly unlikely from a probability perspective. As a result, the starting field of competitors is not necessarily representative of the ultramarathon population and may be skewed to some degree. One part of the ultramarathon population that can be compromised in such a system is the most competitive portion- the part of the population that is the subject of this series of posts. However, starting in 2007 Western States entered into an agreement with presenting sponsor Montrail that allows top finishers from a series of races known as the Montrail Ultra Cup to gain direct entry to Western States. This has greatly increased the probability for top competitors to get into the race.

Turning to the data on Western States presented above and singularly below, it is noted that there is a dramatic change in the functionality of the performance distribution that is directly aligned with the Montrail Ultra Cup entry process. Starting in 2007, the performance distribution of the 125% cohort reflects a linear relationship whereas prior to 2007 the expected exponential functionality is extant.

Western States Parametrics

As is seen in the Falmouth Road Race fields, this linear relationship is indicative of a “stacked field”. The temporal correlation of this change in functionality with the direct entry process from the Montrail Ultra Cup for top competitors, although not necessarily causal, has arguably produced a consistent crop of competitors that effectively “stack” the field. As an example, presented below is a plot of the cumulative probability versus the percentage back from the winning time for the 2014 Western States race. Both linear and exponential fits are shown; clearly the data are best fit to a linear relationship. Note also that, just as has been seen in the Falmouth Road race results, the competitors in the 2014 Western States race are out-performing the equivalent exponential distribution, meaning that this field of competitors is of a higher caliber than what would occur with a more random selection process.

Slide1

Presented below is a plot of the same data as above as well as the data from the 2004 Western States race (pre-Montrail Ultra Cup direct entry process). This comparison is exemplary of the entire dataset. Here it is seen that the performance distribution comparison also shows that the 2014 field is substantially out-performing the 2004 field, meaning that the 2014 race is more competitive than the 2004 race.

Slide2

Comparison of results from the Western States 100 2004 (pre-Montrail Ultra Cup entry process) and 2014 (post Montrail Ultra Cup entry process). The different functionality is indicative of a “stacked field” in the 2014 event.

Finally, presented below is a comparison of the 2014 Western States data and the 2009 Falmouth Road Race data showing a very similar performance distribution with essentially the same slope. The slope of the linear fit to the data provides the same competitiveness metric as the exponential factor “b” described above, i.e. the slope is the competitiveness index for these fields where a steeper slope indicates a more competitive field. What this means is that the 2014 Western States race was, within a reasonable error estimation, as competitive as the 2009 Falmouth Road Race, a race which is one of the most competitive 10 km road events in the world.

Slide3

The calculated slopes for the 2007-2014 Western States races are 0.039, 0.036, 0.037, 0.040, 0.041, 0.033, and 0.039, respectively. With the exception of the very hot 2013 race, there has been a general increasing trend in the competitiveness of the Western States race, something which has been discussed anecdotally within the ultramarathon community for the past few years. This is confirmed by the observed continued decrease of about 8% in the average finishing time of the 125% cohort studied here- once again with the exception of the hot 2013 race.

Western Staes finishing times 2007-2014

Finishing times for the Western States Endurance Run, 2007-2014 (race was cancelled in 2008) showing reduction of about 8% in the average finishing time of the 125% cohort over the period.

It is noted that although a linear finishing time distribution is indicative of a “stacked field” of high performance competitors, such a linear relationship could obtain in the less probable instance of a disproportionate number of comparatively lower-performing competitors being in the 125% cohort. This would lead to a significantly lower slope and should therefore be identifiable. There is no evidence that such a low-performance linear distribution is extant in any of the data analyzed in this 4-part series.

Prior to 2007 and the inclusion of competitors from the Montrail Ultra Cup designated slots, the Western States race exhibits a uniform adherence to the expected exponential functionality as seen in competitor populations that are not “manipulated”. Also prior to 2007, the Western States competitor slots (with the exception of those competitors who were in the top ten the prior year and choose to compete) are filled via a lottery. The results of these lotteries seem to represent a random cross section of the competitor population, otherwise a non-exponential functionality would likely be in evidence. The pre-2007 races analyzed here have CIs in the 0.111-0.125 range with an average of about 0.118. When compared to the “big 5” marathons and the 10 km road races analyzed in parts II and III, we see that the pre-2007 Western States races were, on average, significantly less competitive (about 20%-30% less competitive).

The introduction of competitor entry via the Montrail Ultra Cup events has significantly increased the competitiveness of Western States to a level that is on par with one of the most competitive 10 km-type road races (Falmouth Road race). In addition, during this time period the course record has been broken and reset two times, so not only has the race been very competitive in the post-2007 period, it is also a very fast race, once again similar to what is found in the fast and highly competitive Falmouth Road Race. Western States sets a high standard for competitiveness in ultramarathons.

Wasatch 100

The Wasatch 100 mile trail race was chosen for this study because it represents one of the more difficult mountain trail races (with over 25,000 feet (7600 m) of climbing and a similar amount of descending). In addition, Wasatch is one of the very few 100 mile mountain trail races that has been run on the same course for an extended period. This allows for transparent and robust aggregation of data from numerous years. Such aggregated data will be analyzed and presented in Part V (Syntopicon). Wasatch 100 is also one of the 4 “grand slam of ultrarunning” races.

Presented below are the competitiveness parametrics for the Wasatch 100 race for the study period.

Wasatch 100 parametrics

* the data for 2013 fit a linear and exponential functionality equally well with the exponential functionality giving a slightly better fit

The Wasatch 100 race has a course record time of 18:30:55 set by Geoff Roes in 2009. This compares to the record time for the Western States Endurance Run record of 14:46:44 set by Timothy Olson in 2012. An almost  4 hour difference in the record time for the nearly equivalent course distance reveals exactly how difficult the Wasatch race is in comparison to Western States. Wasatch is known to be much less of a “runners” race as there are a couple of miles more of vertical than at Western States. There are also substantial sections of steep power-hiking and equally difficult descents at Wasatch, both of which will slow the time of even the fastest competitors. Such a race will also have a different pool of competitors as the climbing, the attitude (+5500 feet (1700 m)), the heat (at times in excess of 100F (38C)), and the technical nature of portions of the course all filter out a good proportion of competitive runners who choose to participate in races more aligned with “runnable” courses. Wasatch still attracts many highly competitive ultrarunners given the underlying ethos of “challenge above all else” that is the fabric of ultrarunning as a sport.

The competitiveness of Wasatch is on par with pre-2007 Western States and includes a couple of more competitive “linear” years (2001 and 2008). The 2001 race winning time of 21:44:38 is the slowest time relative to other winning times in the study period. This time is about 5% slower than the next slowest time in the study period and over 17% slower than the record time of 18:30:55 from 2009. So although the 2001 race was competitive, it was not a particularly fast race. The 2008 winning time of 20:01:07 is a relatively fast time and, combined with the linear functionality of the percent back distribution, this race represents a fast and competitive example for the Wasatch 100. However, the Wasatch race is primarily a regional event and does not routinely draw a large group of known high-level competitors, so a highly competitive, linear finishing time distribution race is not necessarily a fast race as has been noted above. Of course, the weather can (and does) have significant negative effects on the finishing time, so that should be considered as well.

For the 12 “exponential” years in the study period, Wasatch has an average CI of about 0.110 compared to an average of about 0.118 in the pre-2007 (“exponential”) period for Western States. This makes Western States about 6% more competitive than Wasatch in these years. Neither the Wasatch nor the pre-2007 Western States races are as competitive as the “Big 5” road marathons or other sub-elite road marathons analyzed where the “Big 5” and the 2 sub-elite road marathons show a minimum average CI of 0.132 (New York) and 0.131 (Columbus). These road marathons are, on average, about 11% more competitive than Western States and about 16% more competitive than Wasatch. The two linear years 2001 and 2008 exhibit slopes of 0.035 and 0.039, respectively. These values are similar to, but generally lower than, those found with Western States although given the much smaller number of competitors in the 125% cohort in these years, the associated error is significantly greater, so it is best to keep this in mind when making comparisons.

Western States has an average of 19 finishers in the 125% cohort whereas Wasatch has an average of about 14. Western States  has a field size of about 375. The Wasatch 100 field size has grown during the study period from about 200 to about 325. Although the starting field size will play some role, the smaller number of finishers in the 125% cohort of Wasatch is at least partly due to the difficulty of the Wasatch course where the slower pace naturally leads to larger multiplicative time differentials.

JFK 50

The JFK 50 is a long-running 50 mile ultramarathon race first run in 1963 with 4 finishers and now with typically over 1000 finishers. The course is a hybrid of trail, gravel road (towpath), and road and is very “runnable”. This race was also chosen because of the longevity of the race on a stable course route thereby enabling aggregation of data.

Presented below are the competitiveness parametrics for the JFK 50 race for the study period.

Ultramarathon Parametrics JFK 50-2

As can be gleaned from the table, the JFK 50 has a similar level of competitiveness as Western States and Wasatch in the “exponential” years. The slopes in the linear years are  0.040, 0.040, 0.038, 0.038, and 0.041 for 2001, 2002, 2008, 2009, and 2011, respectively. All of these values are similar to those in the linear years of Western States. It is noted that although the JFK 50 exhibits similar competitiveness to Western States, the number of competitors in the 125% cohort is similar as well (particularly in the last few years) even though the total field size is about 3 times larger at the JFK 50. This indicates that Western States has a proportionally deeper field in the 125% cohort, likely as a result of the Montrail Cup entry process. However, the competitive depth in both races, in an absolute sense, is essentially the same.

UTMB

Ultra Trail du Mont Blanc (UTMB) is a 100 mile, very mountainous (31,000+ feet (9600 m) of climbing), primarily trail race and is thought to be one of the most difficult 100 mile races. The race is also viewed as being very competitive as it attracts a large group of sponsored professional athletes from around the globe. UTMB, like Wasatch, is different from Western States and the JFK 50 as it is not considered to be a “runnable” race, meaning that there are significant portions of speed hiking, technical climbs, and slow, technical descents.

The competitiveness data for UTMB is presented below where the years of shortened races are excluded.

Ultramarathon Parametrics UTMB

UTMB exhibits a wide range of competitiveness from very low (0.077) to values on par with some of the most competitive “Big 5” marathons (0.137). No linear years are observed. Why there is such a range in competitiveness and there are no highly competitive linear years, even with a high quality starting contingent, may have to do with the ruggedness of the course (lack of “runability”) and the highly variable weather playing havoc with even the fastest, most prepared athletes. The Alps is known for rapid weather changes that can test the mettle of anyone. Therefore the dynamics of the number of variables that play into a competitive time at UTMB is large enough to be seen in the results, independent of the quality of the field. The same might very well be the case for Wasatch and any other “difficult”, non-“runable” 100 mile race.

Leadville 100

The Leadville 100 is a high altitude (10,000+ feet (3200 m)), mountainous (about 11,000 feet (3300 m) of climbing and descending), trail and dirt road race. This race has become very popular and the race promoters have recently established a lottery for entrance into the race (starting with the 2015 event).

The parametric data for Leadville for each year the study period are presented below.

Leadville Parametrics

The Leadville 100 race is unique among the events studied here in that prior to about 2008 the number of finishers in the 125% cohort is very small. This means that the analysis for these years will have a high quantitative uncertainty, however, as will be explained below, some very solid conclusions can be made about the competitiveness of this race.

First we examine the study period of 2001-2007 as, with the exception of 2003, these races have similarly low CIs and very shallow fields in the 125% cohort. It should be pointed out that this period includes both Mat Carpenter’s record time of 15:42:59 (2005) and Anton Krupicka’s two attempts in 2006 and 2007 (17:01:56 and 16:14:35, respectively) to take down this record. At the time, these finishing times were much faster than those preceding and lead to very few competitors in the 125% cohort. In fact, in 2005 the second place finisher was over 20% back. This shows how fast Carpenter’s time was. Similarly for 2006 and 2007, the second place finishers were 10% and 20% back from Krupicka’s time. No other events studied here show winning times that are so much faster than the remainder of the field. This is indicative of the presence of a singular talent that super dominates the field which can be the result of the winner having a “perfect” day or that the winner is just that much better than everyone else. It is probably a mixture of these things in this case, however, Carpenter was clearly a “super” talent much like Jornet is today. Presented below are the percentage back versus cumulative probability plots for the 2005 and 2007 races and the 2004 race as a representative race from the 2001-2008 period. This plot shows just how superior these winning performances by Carpenter and Krupicka were.

Slide1

Finishing time distributions for the Leadville 100 2005 (blue), 2007 (red), and 2004 (green) races showing how extraordinary the winning performances were in 2005 and 2007. Note, the cumulative probability values for the 0% percentage back performances (winners) for 2005 and 2007 are coincident.

Since the 2001-2007 period there have now been seven sub-17 hour finishes starting with Ryan Sandes 16:46:54 in 2011. In 2014 there were three finishers in sub-17 hours. The race is drawing a deeper field of high caliber competitors and is therefore becoming significantly more competitive. This is substantiated by the appearance of more competitive “linear” years. However, the CIs for the linear years are relatively low- 0.036, 0.037, and 0.029 for 2010, 2011, and 2014, respectively. Should the trends over the past few years continue it is likely that Leadville will ciontinue to become more competitive.

North Face Endurance Challenge Championship

The North Face Endurance Challenge Championship (NFECC) race is a late season 50 mile primarily trail race with significant prize money that started in 2007. This race also experiences little to no competition for runners from other similarly scheduled races and therefore this race regularly draws a high caliber, international field of professional runners and many upcoming elite runners.

The parametric data for the finishing time distributions for the 2008-2014 NFECC is presented below (I was unable to find the data for 2007).

Ultramarathon Parmetrics North Face EC

All of the years studied here exhibit linear finishing time distributions and the 125% cohort is as large as for any other trail ultramarathon in this study. In addition, the finishing times for this race are very fast for a trail race- for instance, Sage Canaday averaged a 7:12 mile pace for the 2014 race in muddy conditions.

The linear finishing time distributions are indicative that this race is very competitive. This combined with the consistently fast times and the deep fields means that the NFECC is arguably the most competitive ultramarathon studied here. A summary of the CIs for this event is presented in the following table along with the coefficient of and the size of the 125% cohort. Note that in this case the CI is the slope of the linear fit to the data as explained above.

Ultramarathon parmetrics NFECC slopes

The most competitive year for this race is 2012 but in that year the race course was changed and shortened due to torrential rainfall and a number of the top competitors got lost, inadvertently “shorted” the course, and were therefore DQ’d. So this race has an “asterisk”. However, a clear trend in increasing competitiveness is seen in the data and the magnitude of this competitiveness is on par with the highest values observed for Western States. The field is, however, much deeper than that of Western States. This is partly due to the fact that the race is 50 miles and late-race attrition is at a lower magnitude, but this is also due to the fact that entry into NFECC is essentially barrier-free for established elite runners.

“Other” Ultramarathons

Two “other” ultramarathons have been analyzed to provide additional context for comparisons:

  1. Pikes Peak Marathon- a trail marathon with 7000 feet (2100 m) of climbing and descending
  2. Comrades- a road ultramarathon

Pikes Peak Marathon

This race has a long history and has been run on the same course for many years. The 7000 foot (2100 m) climb followed by a return route down the same trail makes this marathon an “ultramarathon”. The race regularly draws top talent from the mountain and ultramarathon running world as well as a few washed-up elite marathoners looking for a new challenge. In 2014 the “ascent” race (run the day before the marathon) was the designated World Mountain Running Championship.

The parametric data for the Pikes Peak marathon during the study period is presented below.

Ultramarathon Parametrics Pikes Peak

The competitiveness of this race exhibits a very wide range and includes some more competitive “linear” years as well. This presents a very mixed bag with the depth of the field showing a decreasing trend. Comparisons of this race with others is best done on a year-by-year basis. The “linerar” years have CIs of 0.037, 0.036, 0.038, and 0.040 for 2005, 2010, 2012, and 2013, respectively. These values are all very similar to those found in the “linear” years of Western States and JFK 50, indicating that in these years Pikes Peak marathon is similarly competitive.  I plan to do a more extensive post on the Pikes Peak Marathon in the near future as there are numerous interesting results when the races from the 1990’s (i.e the “Matt Carpenter era”) are included in the analysis.

Comrades

Comrades is a long running, very well known, highly popular 56 mile (93 km) road ultramarathon that draws an international field of high caliber athletes and is a good choice to make comparisons with trail and mountainous trail ultramarathons. The race is run in opposing directions in alternate years- one year “up” and the next year “down”.

The parametric data for the Comrades race during the study period is presented below.

Ultramarathon Parmetrics Comrades

As expected the Comrades race has very similar results to that of the “Big 5” marathons, both with respect to the competitiveness and to the depth of the field in the 125% cohort. The competitiveness is on par with all of the “Big 5” marathons and encompasses a range of CI values that are essentially the same.

There is no apparent difference in competitiveness on the alternate, opposing direction, years.

Comrades represents an ultramarathon that is as competitive as any standard marathon.

Discussion

A substantial quantity of data and analysis has been presented here. Although the results are clear, additional insight can be gained with a few graphical comparative examples.

First we compare a very competitive road marathon (Berlin) to a very competitive road ultramarathon (Comrades). Presented below is a competitiveness plot with results from the 2008 Berlin Marathon and the 2010 Comrades Ultramarahon. The CIs are 0.158 and 0.155, respectively and therefore represent very similar races from a competitiveness perspective. They also have similarly deep fields of 106 and 179, respectively, in the 125% cohort and the finishing times for both races are very fast. In fact even the pre-exponentials are essentially the same which indicates that the finishing times for each of these races is similarly “fast” in relation to the 125% cohort.

Slide1

Both races show an exponential relationship indicating that the fields are not “stacked” and represent a normal distribution of competitors. This comparison illustrates that the most competitive road ultramarathons are just as competitive as the most competitive road marathons. Part II provides more data on road marathons, all of which support the observations made here.

A question that arises is why there are no “linear” years in either the road marathons or the road ultramarathon studied here, yet the 10 km road, hybrid ultramarathon, and trail ultramarathons all show some “linear” years. One reason may be that, particularly in the “Big 5” road marathons, the top competitors typically choose one or two marathons to run each year with the expectation that a win will have a big payoff from a remuneration perspective. So the top athletes are picking and choosing which marathons to enter, perhaps to best increase their odds of winning. As a result there are no truly “stacked” fields because the top competitors are being distributed among the numerous prestigious marathons rather than all showing up to just one event. In the case of trail and hybrid marathons, similar economics do not prevail so the competitors may not be primarily picking races in a way that would increase their odds of winning rather they are just engaging with the best competition that they can find and end up “stacking” the fields of certain races (like NFECC). In the case of the Falmouth Road Race, 10 km races do not require the same kind of recovery that marathons and ultramarathons do and therefore a competitor can race many more 10 km events and not be as concerned with recovery and injury. Certainly there are other possibilities that may explain the lack of “linear” years for the road marathons and ultramarathon.

Moving on to comparisons of trail ultramarathons with road marathons we will utilize the 2008 Berlin Marathon as an example of a highly competitive, deep, and fast exponential finishing time distribution as a basis. Presented below is a comparison plot of the 2014 Western States, the 2002 Western States, and the 2008 Berlin Marathon. Here it is clear how much more competitive a linear distribution is when one compares the 2014 Western States (linear) to the 2002 Western States (exponential). Even though the depth of the 125% cohort is the same (21 for both 2014 and 2002), only 30% of the of the cohort is less than 15% back in 2002 whereas this value rises to 60% for the 2014 race. That is a big difference in competitiveness.

Slide1

It is also clear from this comparative graph that the depth in the Berlin Marathon is much higher and the competitiveness is very high as well (exp = 0.158) when compared to the Western States 2002 race (exp = 0.118) a difference of over 25%. The 2008 Berlin Marathon is significantly more competitive than the 2002 Western States. Note: when comparing these finishing time distributions one must take into account the pre-exponential values as in this case they are very different (0.0195 for 2008 Berlin Marathon and 0.0523 for the 2002 Western States race) and this leads to a displacement of the Western Sates 2002 graph to a position “above” 2008 Berlin Marathon in the plot. This is a result of the winning time for the 2002 Western States being comparatively slow in relation to the winning time for the 2008 Berlin Marathon, for the respective 125% cohorts. However, the 2002 Western States graph is clearly a shallower function, and therefore less competitive, compared to the 2008 Berlin Marathon. Prior to 2007 and the introduction of entry into Western States via the Montrail Ultracup, and with just four exceptions in the 67 “Big 5” marathon races analyzed, none of the Western States races in this period were as competitive as the “Big 5” road marathons. Certainly, on average the pre-2007 Western States races were much less competitive than the “Big 5” road marathons in the same period.

It is difficult to analytically compare the 2014 Western States race to the 2008 Berlin Marathon as the finishing time distributions are functionally different and therefore yield different parametrics for assessment of competitiveness (i.e. slope of the linear fit for linear functions and the exponential factor for exponential functions). So, in the absence of any sort of normailzation approach, it is best to make comparisons of exponential finishing time distributions as a group and likewise for linear finishing time distributions. In the case of the 2008 Berlin Marathon, the CI is one of the highest measured in this study (only the 2013 and 2014 Boston Marathon exhibits a higher value of CI) and this value is higher than any of the ultramarthons with exponential finishing time distributions studied here.

A representative example for linearly distributed races presented below is a comparison of 2014 Western States with the 2009 Falmouth Road Race and the 2014 NFECC.

Slide1

In this case the slope of the linear fit is the CI and we find about a 6% variation in CI for this comparison which is not statistically significant, meaning that, within the error of the fit, all of these races have about the same competitiveness. This is a significant finding as the comparison is between two “championship” ultramarathons and a known highly competitive, international road race. Based on this comparison it is clear that the most competitive ultramarathons are as competitive as the most competitive road races.

A summary of the data in tabular form is presented below in ranked format. For each event the associated competitive index (CI) (either the exponential factor for exponential functional years or the slope for linear functional years) is tabulated along with the standard deviation. Also provided is the size of the 125% cohort analyzed, the standard deviation of this value, and the number of event years that have been analyzed.

Slide1

Among the races that exhibit exponential functionality the Boulder Boulder 10 km road race is the most competitive on average. However, the 2014 Boston Marathon was the most competitive race analyzed in the exponential functionality group. We see a distinct and significant drop in competitiveness for the trail ultramarathons where Western States 2001-2006 has the highest competitiveness and Leadville is the least competitive race among the ultramarathons studied. Also clear is a significant reduction in the size of the 125% cohort with the trail ultramarathons when compared to the road races. This is certainly due to the much smaller fields in the ultramarathons but may also be a reflection of a generally less deep competitive field in the 125% cohort.

In the linear functionality group, the North Face Endurance Challenge Championship is the most competitive with JFK 50, Western States 2007-2014, Falmouth, and Pikes Peak Marathon in close succession. Given the limited data, both the Wasatch and Leadville races that exhibit linear functionality should be considered individually and not as means, although the means are provided for completeness. Of the linear group it is best to limit any conclusive comments to the NFECC, Falmouth, and Western States where from a statistical perspective each have about the same level of competitiveness. Once again, as expected, the road races are much deeper in the 125% cohort.

Conclusions

At the outset of this series of articles, it was observed that much of any discussion of competitiveness lacked a fundamental analytical basis for comparisons and that such discussions were of limited value as a result. Provided here is a proposed methodology for development of competitiveness metrics including assessment of field depth and relative speed for the winning time. Such metrics can serve to quantify and “calibrate” competitiveness in a way that facilitates comparisons and can lead to constructive discussion of competitiveness in distance, marathon, and ultramarathon running races.

The primary conclusions from this work are:

  1. Generally speaking, finishing time distributions for distance, marathon, and ultramarathon races exhibit exponential functionality. Such functionality is expected from a normal distribution of competitors in the event.
  2. In the case of “manipulated” or “stacked” fields, a linear finishing time distribution can obtain. Such events are typically (although not always) very competitive due to the non-normal quantity of highly competitive runners in the 125% cohort. These linear functionality races are arguably more competitive than the exponential functionality races for the 125% cohort.
  3. The competitive road ultramarathon analyzed here (Comrades) is just as competitive as the most competitive road marathons and distance road races.
  4. The most competitive trail ultramarathons (Western States 2007-2014 and NFECC) are just as competitive as the most competitive distance road race analyzed here (Falmouth).
  5. 3 and 4 indicate that ultramarathons (either trail or road) can be just as competitive as any of the most competitive road races.

Addendum – 7 January 2015

There has been some discussion of the relative ranking of the most competitive 2014 ultramarathons over here. Although other races might also be considered, Lake Sonoma 50, Western States Endurance Run, and North Face Endurance Challenge Championship 50 are clearly among the most competitive trail ultras in the US for 2014. Presented below is a competitiveness plot comparing these three 2014 races all of which have a linear performance distribution for the 125% cohort.

Slide1

Inset on the graph is a chart of the competitiveness index (CI), the coefficient of determination (R^2), and n (the number of finishers in the 125% cohort). From both a CI and depth perspective, NFECC is the most competitive race of this group with the highest CI and the deepest field. Western States 2014 is next, and Lake Sonoma is the least competitive of the group. One might argue that the NFECC race should be given some sort of proportional weighting to account for the highly competitive nature of this year’s race and so on with other races being considered when choosing an award like the UROY.

I will encourage others to conduct such analyses to found any position on the competitiveness of a given race. But, as it concerns UROY, in the end an award is an award and such will always have a degree of subjectivity that will, hopefully, nucleate some interesting and fruitful discussions.

Advertisements

Competitiveness in Running Races – Part III – 10 km Road

One must learn by doing the thing; for though you think that you know it, you have no certainty until you try.

Sophocles

Parts I and II of this series introduced a methodology and analysis protocol to assess, in an objective manner, the competitiveness of running races using the Road Marathon event-type as a demonstration example (see part IV for extension to ultramarathons). It is found that normalization of the finishing time and rank order data to percentage back from the winning time and cumulative probability, respectively, results in simple exponential functionality. This functionality is expected for athletic performance. In addition, because the data are normalized the approach allows for reliable event-to-event comparisons. In Part III of the series the analysis is extended to a shorter race length- the Road 10 km.

For reasons of continuity, stability, and the relevance of long-term aggregated data, I searched for a 10 km road race that has been run on the same course over an extended period of time. One such race is the Boulder Boulder 10 km road race that takes place each year on Memorial Day in Boulder, CO. This race was co-founded by Olympian Frank Shorter in 1979 has been in existence since. The period used to analyze this race is 2001-2013. The 2014 results as reported on the Boulder Boulder race website are still preliminary so they are not included here. Also, in 2011 the race starting point was moved to a new location so when aggregated results are used one must take that into account. In this analysis any aggregated results will only include the 2001-2010 datasets.

A second, long-running 10 km-type event is also analyzed- the Falmouth Road Race. The Falmouth Road Race was started in 1973 and quickly gained status as one of the most competitive 10 km-type (the race is actually 7 miles) races in the US and the world. Early on the likes of Bill Rogers, Marty Liquori, and Frank Shorter were going head to head in Cape Cod and this level of racing continues today and this race fields one of the most elite-encrusted road events of the year.

Boulder Boulder 10 km

Following the same analysis approach as described in Part I and applied to Marathons in Part II, presented below is an example of the data for the Boulder Boulder 10 km race, in this case the 2003 race.

Slide1

As expected the data fit nicely to a simple exponential function with a competitiveness index (CI) of about 0.17. Comparison of this value of CI to those found in the “Big 5” Marathons reveals that the Boulder Boulder race is as competitive as the marathons.  In fact the 2003 Boulder Boulder is as competitive as any of the “Big 5” marathons over the period 2001-2014. The high CI for the 2003 Boulder Boulder is seen throughout the entire period analyzed (2001-2013). All of the analysis data (competitiveness index (CI), coefficient of determination (R^2), and cohort size (n)) for the Boulder Boulder race are presented in tabular form below along with the same data for the Boston Marathon for the same period. Note: of the “Big 5” Marathons, Boston was, on average, the most competitive of the group.

 

Boulder Boulder parametrics

 

The Boulder Boulder 10 km race exhibits an average CI for the study period of 0.168 compared to an average CI of 0.147 for the Boston Marathon- a 14% difference. This means that the Boulder Boulder is, on average and in all but one year of the study period, a more competitive event. Also note that there are, on average, many more competitors who make up the 125% cohort in the Boulder Boulder than in the Boston Marathon. As will be shown in Part IV of this series, it is generally apparent that the cohort size is inversely proportional to the event length, meaning that the longer the event the smaller is the 125% cohort size even when correcting for the total number of entrants in the particular race. This likely due to multiplicative effects on time differentials which are larger in longer races.

Falmouth Road Race

This race presents data that represent a good example of the “negative” information discussed briefly in Part I. This is data that in some way does not fit the developed model or thought process and can provide important insight into the details of mechanisms and governing laws.

Contrary to all datasets analyzed to date, all but one of the datasets for the Falmouth Road Race do not follow an exponential functionality. Rather the data are well described by a linear functionality. I present here the data for Falmouth and provide a discussion as to why one might observe such linear functionalities. We will also see evidence for linear functionality in some of the ultramarathon data which will be presented in Part IV.

As representative of the linear functionality datasets from the study period, presented below is the same analysis as being used throughout this study for the 2009 Falmouth Road Race. Shown are the data fitted to exponential (red) and linear (blue) functions. Clearly the data are best described by a linear function (which exhibits an R^2 value of 0.987) with a slope of 0.039.

Slide1

The exponential function significantly under estimates the proportion of the cohort in the 5-20% back region. The competitors in this performance range are performing at a much higher level (i.e. faster finishing times) than would be expected from an exponential distribution. As an example, this dataset shows that the athlete who is at the 10% back performance level is at a cumulative probability of about 0.35 or the 35th percentile (where 65% of the finishing times are slower) whereas, an exponential distribution would predict this athlete to perform at something around a cumulative probability of 0.20 (or the 20th percentile (where 80% of the finishing times are slower))*. Thus a greater proportion of the population is faster than would be predicted by the expected exponential distribution. What this means is that the population represented by this cohort is out-performing the expected performance based on an exponential distribution. A direct derivative of this result is that the cohort is likely non-normal. This will be discussed below.

To demonstrate how different the 2009 Falmouth race is when compared to the other races analyzed, presented below is a plot of both the Falmouth 2009 race and a representative dataset from the Boulder Boulder 10 km race (the 2003 race). Note the dramatic difference in the functionality of the competitor distributions for this 125% cohort.

Slide2

Why are these two competitor performance distributions so different? Well, in two words, the reason appears to be: East Africans. The Falmouth Race offers travel stipends to top runners, there is a nice prize purse, and the race director obviously works hard at assembling the best field of world class runners as possible. As a result this race is consistently and highly populated by world-class athletes from East Africa. For instance in the 2012 race eight of the top ten runners were among the best runners from Kenya and the other two were top runners from Uganda and Ethiopia. As expected such a field will also draw the best American runners who want every chance to test their mettle against such a world class field. In the end you have a “stacked field” that is not representative of the tail of a distribution of all competitive runners. The 2013 race is an interesting counterpoint as the finishing time distribution fits an simple exponential. This race, contrary to the others in the study period, is not highly populated by East African runners. In fact there are only six East Africans or runners of East African origin in the race. The other races in the study period have at least twice as many and each of these races exhibits a linear functionality. Presented below are the analysis data in tabular form for the Falmouth Road Race 2001-2013.

 

Falmouth Parametrics

 

In contrast to these results, the Boulder Boulder results summarized above uniformly follow a simple exponential functionality and do not have the participation of as large a population of world-class athletes. This is surprising as the Boulder Boulder spends in excess of $200,000 for athlete travel, accommodation, and prizes. This level of support for attracting top talent is similar to what Falmouth does. For some reason Falmouth attracts a much more world-class field; I am certain that there is an identifiable reason for this but not being a road runner I have not been exposed to any background.

Non-exponential Performance Distributions

Why is the performance distribution tending toward linear for many of the Falmouth Road Race races? It is apparent that when the fast end of competing population of runners is skewed toward a world-class level (in this case by successful recruitment of such talent by the race director), the 125% cohort tail of the distribution tends toward a linear functionality. This can be partially reasoned by the fact that this portion of the population is out-performing what would have been a normal, exponential distribution and is better characterized by a linear relationship. There is an important take away from this observation- even though athletic performance is defined by an exponential relationship, when one analyzes only the highest performing population (the data presented here suggests that) a linear functionality will obtain. This means that if an athlete is able to perform at this level they are not necessarily facing that “exponential wall” of improvement that was discussed in Part I of this series and in the 10,000 hour rule post. Rather, some such athletes can possibly evolve more rapidly through this, for them, linear space and achieve world-class status. This is an important observation also for coaches as it is extremely difficult with high performance athletes to determine whether they have plateaued or are still improving. Analysis of an athletes’ progression against world-class competition using percentage back from the winning time as the operative metric could reveal whether the athlete is on a linear trajectory or is, in fact, hitting the “exponential wall”. This could allow coaches to identify the “true” world-class-capable talent and focus training and racing appropriately. I have personally seen way too many cases of very, very good athletes coming to this “exponential wall” of improvement and spending a lot of time, effort, money, angst, and coach resources only to lead to eventual retirement without achieving world-class results. This is contrasted to the few athletes who somehow make it through the “exponential wall” and become top, world-class competitors. Identifying these athletes is a challenge and the methodology presented here may be one way to help in such identification. I am currently analyzing data that tests this hypothesis retrospectively with results of known world-class competitors- stay tuned.

The bottom line here is that when conducting this type of competitiveness analysis and an approximately linear relationship is found, one can expect to see a stacked field of competitors that skews an otherwise normal distribution of finishing time data. There is some evidence that this is occurring in some ultramarathon events as will be seen in Part IV.

Conclusions

We have extended a methodology for assessing competitiveness in running races from Marathons to a shorter race, the Road 10 km. It is found that the methodology is extensible as evidenced by the uniform exponential functionality exhibited by the Boulder Boulder 10 km race. However, it is also found that such exponential functionality can be replaced by a linear functionality if the competitor field includes a recruited population of world-class athletes. This “stacked field” alters the functionality for the 125% cohort in way leads to a linear performance distribution. Such a linear distribution of performance consistently out-performs the performance predicted by an exponential distribution, as expected given the superior talent represented in this tail of the general population of competing runners.

Next, in Part IV, we will extend this methodology and analysis to longer races- ultramarathons.

 

*Note: This may confuse some. In the case of finishing times a faster time (lower time value) is ranked higher. Many percentiles are reported on test scores where a higher score is ranked higher. The analysis here is inverted from this more common type.

Competitiveness in Running Races – Part II – Road Marathons

True delight is in the finding out rather than in the knowing.

Isaac Asimov

In Part I of this series, a methodology has been developed to analytically assess the competitiveness of running races (see parts III and IV for extensions to 10 km road races and ultra marathons, respectively). The approach involves normalization of the finishing time data and the rank finish order data to provide a transformed dataset for any running that can be analyzed and compared to any other timed running event. This normalization/transformation process utilizes the percentage back from the winning (or best ever) time for normailzation of the finishing finishing time data and cumulative probability (percentile rank) for normalization the finishing rank order data. Once the dataset is transformed the functionality of the cumulative probability versus the percentage back from the winning time is determined and tabulated using functional parametrics. In the case of running events it has been found that virtually all races that the author has analyzed exhibit a simple exponential functionality of the form:

y = a • exp(b • x)
where:
x = percentage back from winning time for the cohort
y = cumulative probability of the result in the cohort
a = a pre-exponential factor inversely proportional to the excellence of the winning time relative to the cohort
b = the exponential factor directly proportional to the competitiveness of the cohort
Assessment of the value of b, which will hereinafter be called the competitiveness index (CI), allows for an analytical basis for comparison of the competitiveness of events and event types. Specifically, the magnitude of the CI determines how competitive a selected event is.
In addition, calibration of the expected magnitude of the CI using known competitive events (e.g. London marathon) is developed as well as computation of an expected upper limit to the value of the CI by analysis of the cohort consisting of the fastest 499 marathon times ever recorded (note: the recent world record at the Berlin Marathon is not included in the analysis). These analyses revealed that, for a highly competitive major road marathon, the value of CI is about 0.14 (although it can vary by as much as 40% as will be shown below) and an upper limit of CI of about 1.25.
Note: all event analyses in this series of posts are of the men’s field using a cohort consisting of the finishers within a finishing time of 125% of the winning time. Although a finishing time of 125% of the winning time is not considered “competitive”, the value of 125% is used to standardize the analysis to allow for comparisons with ultramarathon events where it is observed that the finishing times become more spread out partly due to the much longer event period (2-3 hours for a marathon versus 6-20 hours for ultramarthons (50 miles to 100 miles)).

Analysis of the “Big 5” Road Marathons

The “Big 5” marathons- London, Chicago, Berlin, New York, and Boston serve as a group of events that unarguably represent prototypical competitive running races. Analysis of these events over a significant period of time allows for a development of a calibration of the CI for competitive events and therefore a standard for CI that can be used to compare to other events and event types.

I show here the analysis for the London marathon for the period 2001-2013 as an example and then provide a tabular figure showing the results for all of the “Big 5” marathons from the period 2001-2014. Presented below are the cumulative probability versus the percent back from the winning time for each year of the London Marathon, the fitted exponential curve, the exponential equation for the fit, and the R^2 value (coefficient of determination) for the fit. Also shown is the aggregated data analysis for all of these years taken together and re-analyzed using the same 125% cohort.

Slide1

Slide2Slide3

Slide4

Plots of the cumulative probability versus the percentage back from the winning time for the London Marathon for the events between 2001 and 2013 and a plot for the aggregated data for all of these competitions re-analyzed using the 125% cohort from the aggregated population. Simple exponential functions are fitted to each dataset and the associated parametrics and coefficients of determination are shown.

It is difficult to see the parametrics in these figures so the London data along with the population size for each analysis is presented below in tabular form:

London Marathon Paramentrics Cropped

Note that all of the R^2 values are about 0.92 or greater indicating very good fits to the data for the exponential function. It is seen that the CI varies from a low of 0.119 for the 2012 event to a high of 0.153 for the 2010 event. This represents a difference of about 28% meaning that the 2012 event was, by this measure, 28% less competitive than the 2010 event. The rest of the years show CI values around 0.130-0.140.

Presented below is the tabular data for all “Big 5” marathons over the period 2001-2014 (or 2001-2013 for those marathon events that have yet to occur in 2014).

Marathon Parametrics All cropped

There is much to be gleaned from these data and I note here some of the important observations:

  1. The CI for these highly competitive events has general bounds of about 0.120 to about 0.170 or a range of about 40%.
  2. All of the fits to the data are very good- R^2 values are in excess of 0.918.
  3. The population sizes are sufficient to expect a very low error magnitude.
  4. Of the group, the New York marathon is, on average, the least competitive and the Boston Marathon is the most competitive.
  5. Interestingly, the last two Boston Marathons have been the most competitive events of the group by a good margin.

Point 1 (taken together with points 2 and 3) allows us to now have a calibration for expected magnitude for the competitiveness index for highly competitive events. We can now make meaningful comparisons with other marathon events and other running races.

The analyzed population size varies in this dataset and ranges from a low of 63 to a high of 368, a variation that is almost a factor of 6. It is important to test the robustness of this analysis approach by determining the extent to which there is a relationship between the computed CI and the analyzed population size. Presented below is a graph of the CI versus the population size. As is clear form the graph, there is no correlated relationship and this gives additional support to the efficacy of the analysis approach across events with very different populations in the “125% cohort”.

Slide1

“Other” Road Marathons

It seems that there exist almost as many Marathon events in the US as there are cities, towns, and villages- meaning that there are many thousands of Marathon events held each year. I will make no attempt to survey a representative selection of such events as the task is enormous. I will, however, present analysis of a few Marathon event results to begin to establish a “feel” for what one might find in a comprehensive study.

In a quite random way I selected the following “other” Marathons for analysis and comparison to the “Big 5”:

  • Kansas City (MO) Marathon 2012
  • Fox Cities (WI)  Marathon 2014
  • Columbus (OH) Marathon 2013
  • Rochester (NY) Marathon 2014
  • Wenatchee (WA) Marathon 2013
  • Vermont City Marathon 2013

This selection of events includes a range of size and speed. Although none of these marathon events have elite level winning times (< 2:10:00) two have winning times in the sub-elite level (2:10:00 – 2:20:00). Presented below are the cumulative probability versus percentage back from winning time plots for each event showing the fitted exponential functions and the associated parametrics.

Slide1

Slide2

And here is the data in tabular form:

Other marathon parametrics

There are a few interesting observations that merit remarks:

  1. Three of these events (Kansas City, Rochester, and Wenatachee) exhibit very low competitiveness compared to the “Big 5” events.
  2. As noted earlier, an event can be competitive but still have a relatively slow winning time (Fox Cities). This is an important understanding because, although competitiveness and the “fastness” of a given race are not entirely independent, competitiveness can be high even in “slow” races since the computational basis is the cohort in the race.
  3. The two “fastest” races of the group (Columbus and Vermont) show competitiveness on par with the “Big 5” events.
  4. All of the fits to the data are very good- R^2 values are all in excess of 0.92.
  5. The analysis appears to be robust down to very small populations, although the calculated error will be substantially higher for the small populations.

I continue to be encouraged by the robustness of this analysis approach across this very disparate selection of marathon events ranging from the largest and most “elite encrusted” events right down to the “neighborhood”-type events.

Conclusions

Shown here is the application of an analytical competitiveness methodology across a large range of marathon events. The results show consistent adherence to the expected exponential function resulting from normally distributed performance data. This work establishes a new basis for assessment of the competitiveness of a given running event using a very simple and straightforward analysis protocol and should provide an analytical context to evaluations of “competitiveness” in such events.

In Part III of this series we will look at a shorter distance race (10 km) and Part IV will extend the analysis to ultramarathons. There are some very interesting results!

 

 

Competitiveness in Running Races – Part I – Methodology

Try again. Fail again. Fail better.

Samuel Beckett

A recent example of a seemingly never-ending discussion on whether a certain running race was competitive or not has spurred me into writing this post (see parts II, III, and IV for an analysis of competitiveness across distances (10 km through ultramarathons) and type (trail vs. road)).

The discussion that is presented in the comments of the above-mentioned article is commonplace whenever the subject comes up, particularly as it relates to such discussions of competitiveness in ultramarathon races. Presumably much of the assertion behind claims that ultramarathons are not competitive arise out of the typically small fields when compared to other endurance running events (e.g. road marathons) and “naive” references to “slow” times by those who do not have a grasp of the reality of racing for such long distances over extended periods of time.

In my experience all of these discussions lack any sort of frame of reference with respect to what is a definition of competitiveness and therefore these discussions lack any sort of logical, arguable, and defensible position from which to derive constructive conclusions. Although there may be other quantifiable metrics for competitiveness, I will offer a data-based approach here and expand upon application to a variety of running races in succeeding posts. This approach is highly defensible as it uses only finishing time and rank order placement results for computation of “competitiveness.” Event “shallowness” can also be quantified with the same data.

Shallow Competition? These are two different things.

We often hear reference to “shallow competition” as a descriptor of a particular race or event type. As will be developed here, the degree to which a race is competitive is significantly (although not entirely) independent of how “deep” the field is. Therefore the term “shallow competition” really has no foundation in communicating anything of substance. It is possible to have a shallow but competitive field as well as a deep but not competitive field. The following will provide a definition of and metric for competitiveness in running races, describe a method for assessing what is a “deep” field, and offer tools for anyone to determine the competitiveness and “deepness” of the field a given running race.

Definition of Competitiveness

A search of the literature has turned up very little work on defining and evaluating “competitiveness” from an analytical perspective in individual timed sporting events. Given the mountains of data of recorded finishing times for such timed events all across the world, it seems odd that no one has taken up the task of defining competitiveness. I may have missed some publications but certainly there is nothing of substance on the subject via a comprehensive search using numerous channels.

It is clear that some events are more competitive than others, that some sports have deep and competitive fields and others do not, and that “new” sports (e.g. cross country mountain biking) become established and, in a relatively short time, demonstrate a transition to much greater competitiveness. However, there exists no basic fundamental analysis that describes and measures competitiveness.

Understanding and potentially measuring competitiveness is useful for numerous reasons. First, a measure for competitiveness can provide the competing athlete with a clear understanding of how competitive their sport is and, additionally, how competitive a particular race is. This understanding will allow the competitor to assess their performance in an objective way. Second, a measure for competitiveness can serve as a basis for “point” accumulation in ranking of competitors for “championship” awards and honors. These “point” accumulations can be adjusted as a function of the competitiveness of individual events to ensure that the greatest point accumulations are by those who compete well in the most competitive events. Third, it is commonly asserted by many among the “running” community that ultramarathons are not “competitive” and an analytic measure of competitiveness can determine whether or not this assertion is, in fact, supportable.

When one considers the concept of a definition of competitiveness for an event it becomes abundantly clear that there are numerous sub-categorical levels of competitiveness. We have the competitiveness of the particular event itself (intra-event competitiveness (IEC)), the competitiveness of a particular event in aggregate over all or a selected portion of years that the event has been in existence (aggregate intra-event competitiveness (A-IEC)), and we have the competitiveness of a given event type (e.g. road marathon) as it is compared from event to event over the history of the event type or over some selected time period (inter-event competitiveness (IrEC and the aggregate (A-IrEC)).

In running races we are fortunate to have well defined results that can be rigorously analyzed without, to first order, any subjectivity. These data are the finishing time and the rank order of finish. This is great but, as we know, running race courses are all at least slightly different even if they are conducted on a track. In addition each event can have very large differences in the number of competitive participants; this is particularly true when comparing ultramarathons to other event types. Therefore it is imperative that we employ some method to normalize the finishing time and rank order data to be able to compare one race to another be it multiple intra-event results, inter-event results, inter event-type results, or aggregate, all-time inter-event results.

Separate from issues having to do with making comparisons, in framing a concept of competitiveness it is important to recognize that the competitiveness of an event is not only defined by how fast the winner runs but also by how other competitors in the race compare to the eventual winner. This means that any robust competitiveness evaluation must also normalize both the finishing time data and the finishing rank order data. The following will summarize the approach taken here.

Normalization of Finishing Time Data

Accepting the reality that every running course is different, that weather and/or atmospheric conditions can play an important role, and that even the fact that the same running course “runs” differently on different days due to surface conditions, it is crucial that one develop a method for normalizing finishing time data in a fashion that accommodates such differences to facilitate a robust analysis of competitiveness for a given event or event type.

For running race finishing time data the most direct way to accomplish this is via utilization of the calculated percentage time back from the winning time. The percentage back value represents a universal performance metric derived from the finishing time that is substantially independent of the race course, the length of the race, the weather, or other impacting variables that may arise since all competitors face the same conditions on race day. In addition, it is much more informative to assess one’s performance utilizing percentage back rather than raw finishing time (or placement) since improvements are better characterized with percent increases/decreases from the winning time than with raw time. Also the finishing time on various courses will be different on each course even if the race is of the same length. The FIS uses percentage back in assessments for cross country skiing. National endurance sports organizations like the US Ski Team and many other national ski programs (e.g. Norway, Sweden, France, etc.) also use percentage back metrics to assess current and up-coming talent. In fact the US Ski Team has often used percentage back metrics in decisions on which team athletes are to attend World Cup and World Championship races. Also, many coaches of endurance athletes will use percentage back to evaluate performance. In the following analysis we will use “percentage back from the winning time” as the fundamental normalization method for finishing time in the development of competitiveness metrics in running races.

Normalization of Finishing Rank Order Data

When assessing performance of a given cohort (or an aggregated collection of comparable cohorts) the concept of “percentile rank” is commonly used. The percentile rank will likely be familiar to you from the extensive use of this metric by the College Board in assessing SAT scores for a given year as well as in comparisons of test performances from year to year. The percentile rank values range from 0-100 and one’s percentile rank for a test provides data as to what percentile you have scored in relative to your cohort. For instance a test score that yields an 85 percentile rank means that 85% of the participants scored lower and 15% scored higher. These percentile rankings serve to normalize the test scores within the cohort and allow for comparisons with other years (cohorts) where the population size may change significantly. Similarly for running races, the percentile rank is a useful metric for comparison not only within a cohort (the performance of a competitive field at a given race) but also between cohorts (the performance of competitors from numerous years of the same event) and effectively allows for comparisons of races with very different competitive field sizes. The arithmetically related “cumulative probability” will be used here instead of percentile rank for normalization. Cumulative probability values range from 0-1 and represent the probability of a given result within the cohort. For instance a cumulative probability value of 0.1 for a running race result means that this competitor has finished just within the top 10% of the field and has posted a time that is faster than 90% of the competitors in the cohort.

The Functionality of Running Race Results

As for any fundamental concept, derivation of a functional description is paramount to allowing for utility. In the case presented here for evaluations of running races, it is the functionality of the cumulative probability (percentile rank) versus the percentage back from the winning time that describes the competitiveness of the event. In other words the shape of the curve defined by the cumulative probability versus the percentage back defines the competitiveness and comparisons of the shape of such curves (and appropriate descriptive parametrics) will allow for evaluation of the competitiveness of a given event or event type.

For demonstrative purposes, presented below are two cumulative probability versus percentage back from the winning time plots for the Men’s results of the London Marathon in years 2005 (blue) and 2002 (green) using a cohort of the top 100 finishers. The top 100 finishers were used as this population typically shows the population of runners who have finished within about 25% of the winning time. Although I would define “competitive” runners as those who finish within about 5% of the winning time, this population was chosen to allow for comparison to longer, ultramarathon races with much smaller populations and including results up to about 25% back yields sufficient population sizes for analysis and comparisons of all races.

It is inarguable that the London Marathon represents a very competitive event, particularly among the top 100 finishers, so the following analysis is representative of a very competitive running race.

The top 100 finisher cohort of the 2005 London Marathon Men’s race exhibits a steeper ascending functionality than the shallower functionality of the 2002 data.

Slide1

Cumulative probability versus percentage time back from the winning time for the 2005 London Marathon (blue) and the 2002 London marathon (green) Men’s results. The cohorts are composed of the top 100 finishers or those finishers within about 25% of the winning time.

Graphical inspection of the curves reveals that the 2005 Men’s race was more competitive than the 2002 race. The two figures presented below show that at the same percentile rank/cumulative probability or at the same percentage back from the winning time there is a considerable difference in the percentage back value and the proportion of the population, respectively. Specifically, at an arbitrarily selected value of cumulative probability of 0.20, we see that in the 2005 race this value represents competitors who’s finishing times are about 8% back from the winning time whereas in the 2002 race this probability value represents competitors who’s finishing time is about 12.5% back from the winning time, or about a 35% difference between the races. Similarly, at an arbitrarily selected value of percentage back from the winning time of 10%, the results from the 2005 race show that about 28% of the cohort was at or below this finishing time percentage whereas in the 2002 race only 13% of the cohort was at or below this finishing time percentage, or about a 55% difference between the races. It is clear that, in comparison of the selected cohorts, the 2005 race was more competitive- meaning there is a significantly greater proportion of the cohort of competitors closer to the winning competitor.

Slide3

Slide2

In a more fully analytical approach, one can fit the curves to a function and use the function metrics to characterize the level of competitiveness. In this case (and in all cases of running races studied by the author) the cumulative probability versus the percentage back from the winning time data generally fit very well to a simple exponential function. This is expected from a population that follows a normal distribution as athletic performance does. Presented below is a figure showing the fit of exponential functions to the race data. The fits are quite good although, in this example, they underestimate the differences. However the trend is captured. We see that the 2005 race is more competitive and therefore this event exhibits an exponential factor of 0.1481 which is larger than that for the less competitive 2002 race where the exponential factor is 0.1349. These exponential factors characterize the steepness of the curves and therefore the level of competitiveness of the race or event. The following section provides a method for utilization of these exponential function parametrics to capture an analytical measure of competitiveness (a competitiveness index).

Slide4

Exponential functions fitted to the 2005 (blue) and 2002 (green) London Marathon Men’s results for a cohort of the top 100 finishers. The fits exhibit high R^2 values but underestimate the differences in this example. The magnitude of the exponential factor of the fitted function is directly proportional to the competitiveness of the race or event.

One can also derive a metric for the level of the “deepness” of the field (cohort) from these data by assessing the density of competitors (data points) along the curve. A “deep” field would exhibit a high density of competitive times throughout the high performance end whereas a “shallow” field would show a paucity of competitors (data points) in this same region with large gaps between competitors. I will offer no analytical parametric for this evaluation as it is relatively straightforward to determine a sense of the “deepness” of the field from graphical observations.

Derivation of Competitiveness from the Exponential Data

It is unarguable that the road marathon (and specifically here the London Marathon) is a highly competitive running event where literally thousands (and perhaps even ten thousands) of elite and sub-elite participants have recorded impressive finishing times in the 100 year recorded history of the event. That these data fit an exponential function is entirely consistent with performance excellence and highly competitive sport. The exponential function describes a finishing time distribution that includes a sparsely populated tail of ethereal performance followed by an increasingly populated distribution of less impressive finishing times. The degree of performance excellence is defined by the high performance tail and the competitiveness of the event is defined by the “steepness” of the curve (which is proportional to the magnitude of the exponential term of the function). For example, an “other-worldly” performance at the far left of the curve (near or at zero percent back) with very few (or no) other recorded performances near it in the distribution is the definition of performance excellence. Similarly, the steepness of the curve just beyond the high performance tail defines how close other competitors are to the “netherland” of performance excellence. In other words, the steepness of the performance excellence curve determines how many competitors are “knocking at the door” of entry into the performance excellence club. The greater the number of such individuals, the higher is the probability that one of these (very talented and hard-working) competitors will put everything together and score a finishing time in the high performance tail. In the case of a more shallow exponential curve (lower magnitude exponential term), performances are more widely distributed and there are therefore many less individual competitors who have demonstrated performances that are close to the high performance tail. In this case the probability that a competitor will score a finishing time in the high performance tail is much smaller than in the population represented in the steeper distribution. This probability of performance excellence clearly scales with the steepness of the distribution (magnitude of the exponential term) and is a way to define the competitiveness of the event.  Presented below are plots of simple exponential functions where only the exponential term is varying, showing the change in steepness of the curve as a function of the exponential. The range of exponential terms in the plot spans the range of such terms found in running finishing time data as will become apparent in subsequent sections.

Slide1

From a functional perspective, two performances from an exponential population distribution that are close in linear time (the x axis in this plot) are actually exponentially different in “net performance” (the y axis in this plot- e.g. percentile rank). This means that although one competitor may be linearly “close” in time to another competitor in an event, they are actually exponentially further back from a performance perspective and the magnitude of the difference is directly proportional to the exponential term that characterizes the fitted data. The steeper the performance excellence curve the more difficult it is to progress. Many of us have experienced this reality in our own athletic endeavors as we approach our individual limit of ability- exponential improvement is not easy. A shallow(er) curve defines a population where even relatively large changes in finishing time (percentage back) do not lead to substantial changes in percentile rank. Such a population is the result of a sparse competitive field (in some cases due to a sport or event that is new or in a high-growth mode) and/or that the current level of performance is not challenging elite-level human limitations- meaning that the most of the current competitors have not fully developed their potential for performance (either physiological or technical abilities or both).

Now let’s take a look at this exponential functionality as the pre-exponential term varies. Plotted below is an exponential function with an exponential term similar to that exhibited by the road marathon data (an exponent of 1.2x) but with increasing magnitude pre-exponential terms (1, 5, 10, 20). Note that as the pre-exponential term is increased the  the rapidly increasing portion of the exponential function begins at lower values (lower percentage back, faster finishing time). Since the x values are generated with a basis of the fastest time ever (at 0% back), the lower the pre-exponential the greater the degree of excellence (the more ethereal the performance) is represented by the fastest time ever.

Slide2

Taking these two arguments together we now can construct a conceptual equation defining performance excellence: competitiveness and the degree (magnitude) of comparative excellence associated with the fastest time in the cohort. In a general, conceptual, equation form we have:

R ~ 1/E • C     (equation 1)

where:

R = cumulative probability (percentile rank)

E = magnitude of comparative excellence of fastest (or fastest ever) time

C = exp(bx), where b=competitiveness index (CI) and x=finishing time or percentage back from the fastest time

Conceptually we have a functionality for competitiveness and excellence that states that, for a measured cohort, the higher the magnitude of the exponential factor, the greater the competitiveness and the higher the magnitude of the pre-exponential factor, the smaller is the difference between the best time and the “rest of the best”. What remains is calibration of the parameters as they map onto running event data. This will be addressed in following posts but an estimate of the upper limit to the competitiveness index is provided below.

Establishing an Upper limit to the Competitive Index

To calibrate the approach outlined here it is important to establish an upper limiting value for the competitive index (CI). As shown above, this index is defined as the exponential factor in the fitted function to the finishing time data. It is inarguable that the road marathon is one of the most competitive of running events. Application of the the analysis protocol developed here to the dataset consisting of the fastest 499 marathon finishing times ever is a good estimate of the expected upper limit to how competitive the event can be. This is because the cohort represented in the data is from the all time best finishing times and represents a cohort of superstars all competing together in one “fictional” race- a “dream” race of sorts. Since these data are the best efforts of all who have ever run the marathon event, they represent the ultimate level of competition as we know it today. Plotted below are the data shown previously for the 2005 and 2002 London Marathon along with the data for the fastest 499 marathon times ever. We see that the data for the fastest times fits very well to a simple exponential function (as expected) and that the competitive index is nearly an order of magnitude larger than that for the individual, single race data for the London Marathon (CI= 1.2585 for the all time data and 0.1481 for the 2005 London Marathon). Based on this analysis it is expected that no single event or aggregated event data will be more competitive than the cohort represented by the all time data and therefore the CI of the all time data represents an upper limit to the value of the CI. Establishing this value will allow for meaningful comparisons in the analysis of numerous other events and event types in follow-on posts.Slide3

Take-aways from the Analysis

  • The first important take-away here is that running event data fit very nicely to an exponential distribution of finishing time (or percentage back from the fastest time). This exponential behavior is fundamental to the nature of excellence in the sport of running.
  • A second take-away is that via a simple analysis of the distribution of the finishing time data for a running event we can extract functional parameters that define the competitiveness of the event as well as establish a reasonable approximation of the degree of excellence of the fastest time. Should other event data of this type fit an exponential function then the exponential term can be used as a fundamental metric for defining the competitiveness of a given event and allow for comparisons between events. A simple process of calculating the cumulative probability, plotting this against the percentage back data, and then fitting the resulting curve will provide robust metrics for defining the  competitiveness (steepness of the “excellence curve”) of the event data and therefore yield an analytical basis for comparison.
  • A third take-away is that the factors that have lead to such exponential differences in “net performance” are similarly exponential and arguments (such as that espoused by the “10,000 hour rule” cult) that more practice (training) alone can close performance “gaps” are not founded. One must introduce some sort of positive non-linearity to the process of improvement since training time cannot be non-linearly increased by any meaningful magnitude for any meaningful time period. To put this in marathon running terms, a marathon competitor who has progressed to say a 2:15 performance standard over some considerable period is going to have an exponentially increasing difficult prospect at closing the gap to a 2:10 performance standard.
  • A final important take-away is that the analysis provides perspective of exactly how exceptional performances in the tail of the finishing time (percentage back) distribution are- this is not a linear space as many seem to assume.

Subsequent posts in this series will analyze finishing time data from numerous distance road running events of varying lengths (10 km-marathon) and from trail ultramarahons. There are some interesting findings.

 

A Note on Scientific Process

Having written on numerous occasions on this subject and with continued development and refinement of robust analysis approaches to evaluate the “competitiveness” of running races in general and ultramarathons specifically, it is important to point out some parts of the scientific process that are critical to advancement. The first is to establish a null hypothesis and to test against it. In all of this work the working null hypothesis has been that ultrarunning races are just as competitive as other endurance running events. As is the imperative of science, I went about proving this hypothesis wrong; this approach of going about proving a hypothesis wrong is something that is typically not well understood by those who do not engage in scientific inquiry. The following video, which was inspired by a favorite book of mine – “The Black Swan” by Nassim Taleb – does a good job of demonstrating how difficult and seemingly elusive inquiry can be even in the most simple of examples.

The key to progress is to obtain “negative” information, i.e. information that does not fit the null hypothesis and therefore provides positive insight as to what is the underlying law, rule, or function that is the subject of the hypothesis. This is what I have been engaging in with this project.

Second, science and scientific inquiry is not about agreement and it certainly is not about stasis. Our understanding evolves and can be upturned due to new findings and refined due to new insight. All too many consider articles about advancements or discoveries in the popular press to be “definitive”. These same readers lament (sometimes publicly) when they then are exposed to another study that may undermine or refute the prior study conclusions. This is scientific inquiry, a constant, jittery series of disagreements, additional study, and resolution that, when successful, describes general progress toward understanding. There are very few fundamental discoveries that lead to sizable jumps in understanding of any complex inquiry. When the popular press presents any study as such, be wary. A contemporary example that the ultrarunning community has been exposed to is the back and forth amongst cardio researchers as to whether running long distances damages the heart- one study concludes that it does, the next that it does not- this is science, no giant leaps forward, just a bunch of back and forth all the while developing an accumulation of data and interpretation that defines progress but may not fully answer the question at hand. If one is uncomfortable with uncertainty, conflicting data, or alternate interpretations of data, then science is not for you.

 

Excellence and the “10,000 hour rule” – excellence is exponential, “the rule” is not

There has been substantial debate over the past number of years pertaining to the validity of the so-called “10,000 hour rule”  (hereafter referred to as “the rule”) as it applies to development of expertise and excellence in performance. As first asserted by Ericsson, the rule provides that the development of an “expert” or “master” level of accomplishment requires a minimum of about 10,000 hours of “deliberate practice” and that this improvement follows a linear growth rate. “Deliberate practice” is focused (perhaps structured) training where one consciously addresses weaknesses whilst maintaining (and possibly improving) strengths. The 10,000 hours works out to about 10 years of focused training before one can attain an “expert” or “master” level in the endeavor. The underlying supposition is that “nurture” super-dominates “nature”, i.e. as some would say “talent is over-rated”. The egalitarian basis of “the rule” has resonated with a society that values a hard-work ethos that leads to success, something that is perhaps fundamental to any civil society. But reality is, in this case, something very different.

Background

As applied to sport, many have noted that there are numerous examples of athletes who have invested much less than 10,000 hours of focused training yet exhibit “excellent” performance at the international and Olympic level. Similarly, many have also noted numerous examples of athletes who after investing substantially more than 10,000 hours of focused training have still not reached (or even come near to) excellence in their respective sports. All of this is, of course, contrary to “the rule” and there have been a number of excellent analyses that disprove the efficacy of “the rule” as a controlling, single factor in the development of expertise and excellence in performance. The best of these analyses that I have been exposed to are well represented by those of Ross Tucker here, here, and here. Tucker concisely and thoroughly shows that, in addition to the glaring lack of attention to the statistical variance in the data as first presented by Ericsson (and subsequently by others), a myriad of arguments and data can be brought forth that detail the many other factors that clearly play significant roles in performance excellence. Not the least of these factors is individual gene expression, the subject of the recent book “The Sports Gene” by David Epstein. Epstein’s thesis is that unique combinations of  “hardware” (genes) and “software” (training and opportunity) are what lead to performance excellence- not just training time as Ericsson and his acolytes assert.

In this post I am providing yet another aspect of the debate that has generally been overlooked and not well recognized- that of the statistical rarity of excellence and approaches to defining such excellence.

How to define an “expert” or “performance excellence”?

One of the deficient parts of the debate has been in defining exactly what “expert” or “excellence” is. For chess Ericsson uses the “master” level achievement as the definition of “expert” and such an earned title is based on the performance of chess players in tournaments with other “ranked” players. To first order this is a reasonable approach for something like chess. For sport, other systems can be used but in the case of running, particularly track and road events, the finishing time is an almost absolute reckoning of the level of excellence of a particular performance. Analytical comparisons of an athlete’s best time with the world record provides a sound basis for establishing a scale upon which “levels” of achievement can be placed.

One approach to deriving an analytical basis for the determination of excellence (or “expert” (elite) level) in standard distance, timed events is via statistics. The collection of a large number of finishing times for a particular event (e.g. marathon, mile, 800 m, etc.) can be analyzed for distribution type (normal, log-normal, etc.) and then metrics can be applied defining “levels” of accomplishment. In the case of a normally distributed population of marathon times, for instance, one could use standard deviation from the mean as an analytical metric defining expertise/excellence, i.e., for example, “good”= 1-2 standard deviations from the mean (84.2-97.8 percentile), “very good” = 2-3 standard deviations (97.9-99.9 percentile) from the mean, and “expert” (elite) = >3 standard deviations (>99.9 percentile) from the mean. Similar distribution metrics can be utilized for other types of distributions, should such non-normal distributions be extant.

A problem with this approach is deciding exactly what population of finishing times to analyze. Using all available times from a particular event will likely skew the data to longer finishing times as many who participate in a given event are not “athletes”- this is particularly true of middle and long distance events (5 km-ultramarathons). Truncation of the population at a certain cutoff finishing time will clearly help (e.g. using only times less than 4 hours for analysis of men’s marathon finishing times) but such a protocol involves a somewhat arbitrary determination and without conducting a sensitivity analysis the results could still be skewed.

The “percentage back” approach

Another, more robust, approach involves a simple process of rank ordering of the best ever finishing times for a particular event and then calculating the percentage time back from the best ever finishing time. The best ever finishing time provides an absolute reference against which any other time can be compared. A plot of cumulative probability (percentile rank) versus percentage back from the best ever finishing time will yield at least two useful things:

  1. “levels” of expertise/excellence can be applied to the data (e.g. “expert” (elite) could be defined as a best result that is less than 5% back from the best ever finishing time, “very good” (sub-elite) could be defined by times less than 10% back, etc.)
  2. the analytic functionality of the “excellence curve” of that particular event can be determined and allow for scaling of a given effort

Such “percentage back” analysis approaches are utilized regularly in cross country skiing to calculate World Cup points and thereby rank all competitors. One reason it is used is because finishing times in cross country skiing is highly variable for the same distance as a result of snow and weather conditions playing a dominating role in skiing speed (skiing speed for a given race distance (say, 30 km) shows about a 30-40% variability across events depending on course conditions and weather). So for an individual event on a given day under whatever conditions are prevailing, the percentage back from the winning time is the most relevant metric for evaluation of a performance. Corrections are made for the “quality” of the field at each event to ensure that races where a strong field is present are more heavily weighted than those with a much lower level of competitiveness.

In the case of running, finishing times are much less affected by weather and prevailing surface conditions, particularly those finishing times that are among the fastest ever recorded. So the “percentage back” approach can be used to make comparisons between events and therefore one can include all finishing times for an event, independent of when and where it took place. Use of data sets that include something in excess of about the 500 fastest finishing times ever will accurately establish the “expert” or “elite” tail of the distribution of all recorded times- it is this tail of the distribution that is the important part for the purposes in this post.

I will suggest here that “excellence” (elites) could be reasonably defined by those finishing times that are less than 5% back from the fastest ever time. Similarly, “very good” (sub-elite) could be finishing times less than 10% back, etc. This is just a proposal, not a proclamation; other defendable choices are likely, but the 5%,10% are commonly used in evaluations of talent in cross country skiing.

The “excellence curve” – Competition in running is an “exponential world”

As an example, presented below is a plot of percentile rank (cumulative probability) of the 499 fastest men’s marathon times ever recorded against percentage back from the fastest ever finishing time (2:03:02, G. Mutai, 4/18/11 (Boston)). Note that this type of analytic normalization of rank order is utilized in calculation of percentile rank for the SAT test for each cohort taking the test. A truncated population is shown here for the fastest men’s marathon finishing times (i.e. the equivalent of test scores) because we are interested in the “excellence” end of the population, so the expected “S” curve is not extant.

Slide05

Clearly the functionality is non-linear, in fact the functionality is exponential. This empirical curve is the current “excellence curve” for the men’s marathon in that it defines the functionality and magnitude of time improvement required to progress in the marathon event.

Presented below is the same data as in the first graphic with a fitted exponential function. The equation for the curve is shown on the graph showing an e-base exponent of about 1.26.

 

Slide04Clearly

K. Ito’s time of 2:07:57 (Beijing, 1/19/86) is exponentially slower than G. Mutai’s 2:03:02 (Boston, 4/18/11). In other words Mutai is exponentially faster than Ito by a magnitude defined by the percentage back, in this case 3.996% back and Ito would have to improve exponentially to claw his way down the marathon “excellence curve”. There is no linearity in performance excellence for the marathon*. One will find similar exponential results for other distances. This analysis also clearly shows exactly how rare and ethereal the top performers are.

Using the suggested protocol for defining “excellence” (elite) and “very good” (sub-elite) mentioned above, “elite” marathoners  would be those with results less than 2:09:09 (less than 5% back from the fastest ever time) and “sub-elite” marathoners would be those with results less than 2:15:18 but greater than 2:09:09 (less than 10% back but greater than 5% back from the fastest time ever).

The “10,000 hour rule” in an exponential world

A fundamental premise underlying the work of Ericsson (and others) who subscribe to the “10,000 hour rule”, is that increasing total volume of deliberate practice singularly leads to greater accomplishment until one reaches the “master” or “expert” (elite) level at total accumulated training times greater than about 10,000 hours. This is the reason that numerous books have been written describing various ways to go about becoming “expert” (elite) using deliberate practice. All of these books center around a basic tenet:  More deliberate practice (and only more) is better, necessary, and sufficient to achieve excellence. Examples of books that espouse the “10,000 hour rule” tenet are “The Talent Code –  Greatness Isn’t Born. It’s Grown. Here’s How”“Talent is Overrated: What Really Separates World-Class Performers from Everybody Else” , and Outliers: The Story of Success, where the authors gleefully proclaim that anyone can be “expert” or attain “performance excellence” just by grinding away at deliberate practice for long enough.

Applying this principle to the marathon, it follows that if one were to accumulate the 10,000 hours in total volume of deliberate practice then one would be “expert” or have results that are considered “performance excellence” (elite). We all know that this is not true as there are thousands of dedicated, smart-training, 10,000 hour+ marathon runners who will never see the likes of a 2:07 finishing time.  Just ask your local 2:15, 10,000 hour+ marathoner exactly what they think about ever finishing a race in 2:07 (note: this result would still only get one to within about 4% of the best time). Additionally, it is not possible to non-linearly increase training time for any meaningful period as there are only so many hours in the day and only so much training stress that one’s body can take without breaking down physiologically. Any meaningful non-linear increase in training time will rapidly run out of hours in the day and musculoskeletal tolerance**. What this and the data above show is that, for the men’s marathon (and I note that this holds for other distances as well), linear increases in deliberate practice (training) will make it impossible to improve along the exponential “excellence curve”. One must introduce some individual non-linearity into the improvement process in order to ever be able to compete at the highest levels. I will suggest that one origin of such non-linear improvement comes from what is colloquially called “talent”, i.e. an innate, likely genetic, predisposition to non-linear improvement with deliberate practice in a chosen sport. We have likely all experienced a training partner or fellow competitor who, with a very similar training program and volume, accelerates in performance excellence and leaves “the rest” behind in another category entirely. I’ve seen this not only in sport (tennis, road cycling, mountain biking, and cross country skiing) but also in academics (physics, chemistry, mathematics). To use the sub-title of one of the books noted above, what really separates World-class performers from everybody else is not deliberate practice alone but rather the combination of deliberate practice and innate abilities as well as other factors such as environment, access and, importantly, motivation (the subject of a future post). All of these elements combine to produce the exponential improvement that leads to population of  the high performance tail of the finishing time distribution.

The “10,000 hour rule” is a linear concept which has no singular place in the exponential world of athletic performance in endurance sport; the data are clear.

* A similar analysis including many more marathon finishing times (say, 100,000) may eventually show a linear dependence at some point far out on the finishing time/percent back scale, however this part of the excellence curve is not defining “excellence”. The “excellence” part of the curve is exponential as shown here.

**Daniel Coyle, the author of the book “The Talent Code”, argues that under certain situations (he uses the example of a music camp in upstate New York) one can experience non-linear increases in training effectiveness through something that he calls “deep practice”. However, Coyle also notes that this happens only over a limited period of time (7 weeks in the example of the music camp).