One must learn by doing the thing; for though you think that you know it, you have no certainty until you try.
Parts I and II of this series introduced a methodology and analysis protocol to assess, in an objective manner, the competitiveness of running races using the Road Marathon event-type as a demonstration example (see part IV for extension to ultramarathons). It is found that normalization of the finishing time and rank order data to percentage back from the winning time and cumulative probability, respectively, results in simple exponential functionality. This functionality is expected for athletic performance. In addition, because the data are normalized the approach allows for reliable event-to-event comparisons. In Part III of the series the analysis is extended to a shorter race length- the Road 10 km.
For reasons of continuity, stability, and the relevance of long-term aggregated data, I searched for a 10 km road race that has been run on the same course over an extended period of time. One such race is the Boulder Boulder 10 km road race that takes place each year on Memorial Day in Boulder, CO. This race was co-founded by Olympian Frank Shorter in 1979 has been in existence since. The period used to analyze this race is 2001-2013. The 2014 results as reported on the Boulder Boulder race website are still preliminary so they are not included here. Also, in 2011 the race starting point was moved to a new location so when aggregated results are used one must take that into account. In this analysis any aggregated results will only include the 2001-2010 datasets.
A second, long-running 10 km-type event is also analyzed- the Falmouth Road Race. The Falmouth Road Race was started in 1973 and quickly gained status as one of the most competitive 10 km-type (the race is actually 7 miles) races in the US and the world. Early on the likes of Bill Rogers, Marty Liquori, and Frank Shorter were going head to head in Cape Cod and this level of racing continues today and this race fields one of the most elite-encrusted road events of the year.
Boulder Boulder 10 km
Following the same analysis approach as described in Part I and applied to Marathons in Part II, presented below is an example of the data for the Boulder Boulder 10 km race, in this case the 2003 race.
As expected the data fit nicely to a simple exponential function with a competitiveness index (CI) of about 0.17. Comparison of this value of CI to those found in the “Big 5” Marathons reveals that the Boulder Boulder race is as competitive as the marathons. In fact the 2003 Boulder Boulder is as competitive as any of the “Big 5” marathons over the period 2001-2014. The high CI for the 2003 Boulder Boulder is seen throughout the entire period analyzed (2001-2013). All of the analysis data (competitiveness index (CI), coefficient of determination (R^2), and cohort size (n)) for the Boulder Boulder race are presented in tabular form below along with the same data for the Boston Marathon for the same period. Note: of the “Big 5” Marathons, Boston was, on average, the most competitive of the group.
The Boulder Boulder 10 km race exhibits an average CI for the study period of 0.168 compared to an average CI of 0.147 for the Boston Marathon- a 14% difference. This means that the Boulder Boulder is, on average and in all but one year of the study period, a more competitive event. Also note that there are, on average, many more competitors who make up the 125% cohort in the Boulder Boulder than in the Boston Marathon. As will be shown in Part IV of this series, it is generally apparent that the cohort size is inversely proportional to the event length, meaning that the longer the event the smaller is the 125% cohort size even when correcting for the total number of entrants in the particular race. This likely due to multiplicative effects on time differentials which are larger in longer races.
Falmouth Road Race
This race presents data that represent a good example of the “negative” information discussed briefly in Part I. This is data that in some way does not fit the developed model or thought process and can provide important insight into the details of mechanisms and governing laws.
Contrary to all datasets analyzed to date, all but one of the datasets for the Falmouth Road Race do not follow an exponential functionality. Rather the data are well described by a linear functionality. I present here the data for Falmouth and provide a discussion as to why one might observe such linear functionalities. We will also see evidence for linear functionality in some of the ultramarathon data which will be presented in Part IV.
As representative of the linear functionality datasets from the study period, presented below is the same analysis as being used throughout this study for the 2009 Falmouth Road Race. Shown are the data fitted to exponential (red) and linear (blue) functions. Clearly the data are best described by a linear function (which exhibits an R^2 value of 0.987) with a slope of 0.039.
The exponential function significantly under estimates the proportion of the cohort in the 5-20% back region. The competitors in this performance range are performing at a much higher level (i.e. faster finishing times) than would be expected from an exponential distribution. As an example, this dataset shows that the athlete who is at the 10% back performance level is at a cumulative probability of about 0.35 or the 35th percentile (where 65% of the finishing times are slower) whereas, an exponential distribution would predict this athlete to perform at something around a cumulative probability of 0.20 (or the 20th percentile (where 80% of the finishing times are slower))*. Thus a greater proportion of the population is faster than would be predicted by the expected exponential distribution. What this means is that the population represented by this cohort is out-performing the expected performance based on an exponential distribution. A direct derivative of this result is that the cohort is likely non-normal. This will be discussed below.
To demonstrate how different the 2009 Falmouth race is when compared to the other races analyzed, presented below is a plot of both the Falmouth 2009 race and a representative dataset from the Boulder Boulder 10 km race (the 2003 race). Note the dramatic difference in the functionality of the competitor distributions for this 125% cohort.
Why are these two competitor performance distributions so different? Well, in two words, the reason appears to be: East Africans. The Falmouth Race offers travel stipends to top runners, there is a nice prize purse, and the race director obviously works hard at assembling the best field of world class runners as possible. As a result this race is consistently and highly populated by world-class athletes from East Africa. For instance in the 2012 race eight of the top ten runners were among the best runners from Kenya and the other two were top runners from Uganda and Ethiopia. As expected such a field will also draw the best American runners who want every chance to test their mettle against such a world class field. In the end you have a “stacked field” that is not representative of the tail of a distribution of all competitive runners. The 2013 race is an interesting counterpoint as the finishing time distribution fits an simple exponential. This race, contrary to the others in the study period, is not highly populated by East African runners. In fact there are only six East Africans or runners of East African origin in the race. The other races in the study period have at least twice as many and each of these races exhibits a linear functionality. Presented below are the analysis data in tabular form for the Falmouth Road Race 2001-2013.
In contrast to these results, the Boulder Boulder results summarized above uniformly follow a simple exponential functionality and do not have the participation of as large a population of world-class athletes. This is surprising as the Boulder Boulder spends in excess of $200,000 for athlete travel, accommodation, and prizes. This level of support for attracting top talent is similar to what Falmouth does. For some reason Falmouth attracts a much more world-class field; I am certain that there is an identifiable reason for this but not being a road runner I have not been exposed to any background.
Non-exponential Performance Distributions
Why is the performance distribution tending toward linear for many of the Falmouth Road Race races? It is apparent that when the fast end of competing population of runners is skewed toward a world-class level (in this case by successful recruitment of such talent by the race director), the 125% cohort tail of the distribution tends toward a linear functionality. This can be partially reasoned by the fact that this portion of the population is out-performing what would have been a normal, exponential distribution and is better characterized by a linear relationship. There is an important take away from this observation- even though athletic performance is defined by an exponential relationship, when one analyzes only the highest performing population (the data presented here suggests that) a linear functionality will obtain. This means that if an athlete is able to perform at this level they are not necessarily facing that “exponential wall” of improvement that was discussed in Part I of this series and in the 10,000 hour rule post. Rather, some such athletes can possibly evolve more rapidly through this, for them, linear space and achieve world-class status. This is an important observation also for coaches as it is extremely difficult with high performance athletes to determine whether they have plateaued or are still improving. Analysis of an athletes’ progression against world-class competition using percentage back from the winning time as the operative metric could reveal whether the athlete is on a linear trajectory or is, in fact, hitting the “exponential wall”. This could allow coaches to identify the “true” world-class-capable talent and focus training and racing appropriately. I have personally seen way too many cases of very, very good athletes coming to this “exponential wall” of improvement and spending a lot of time, effort, money, angst, and coach resources only to lead to eventual retirement without achieving world-class results. This is contrasted to the few athletes who somehow make it through the “exponential wall” and become top, world-class competitors. Identifying these athletes is a challenge and the methodology presented here may be one way to help in such identification. I am currently analyzing data that tests this hypothesis retrospectively with results of known world-class competitors- stay tuned.
The bottom line here is that when conducting this type of competitiveness analysis and an approximately linear relationship is found, one can expect to see a stacked field of competitors that skews an otherwise normal distribution of finishing time data. There is some evidence that this is occurring in some ultramarathon events as will be seen in Part IV.
We have extended a methodology for assessing competitiveness in running races from Marathons to a shorter race, the Road 10 km. It is found that the methodology is extensible as evidenced by the uniform exponential functionality exhibited by the Boulder Boulder 10 km race. However, it is also found that such exponential functionality can be replaced by a linear functionality if the competitor field includes a recruited population of world-class athletes. This “stacked field” alters the functionality for the 125% cohort in way leads to a linear performance distribution. Such a linear distribution of performance consistently out-performs the performance predicted by an exponential distribution, as expected given the superior talent represented in this tail of the general population of competing runners.
Next, in Part IV, we will extend this methodology and analysis to longer races- ultramarathons.
*Note: This may confuse some. In the case of finishing times a faster time (lower time value) is ranked higher. Many percentiles are reported on test scores where a higher score is ranked higher. The analysis here is inverted from this more common type.