Try again. Fail again. Fail better.
A recent example of a seemingly never-ending discussion on whether a certain running race was competitive or not has spurred me into writing this post.
The discussion that is presented in the comments of the above-mentioned article is commonplace whenever the subject comes up, particularly as it relates to such discussions of competitiveness in ultramarathon races. Presumably much of the assertion behind claims that ultramarathons are not competitive arise out of the typically small fields when compared to other endurance running events (e.g. road marathons) and “naive” references to “slow” times by those who do not have a grasp of the reality of racing for such long distances over extended periods of time.
In my experience all of these discussions lack any sort of frame of reference with respect to what is a definition of competitiveness and therefore these discussions lack any sort of logical, arguable, and defensible position from which to derive constructive conclusions. Although there may be other quantifiable metrics for competitiveness, I will offer a data-based approach here and expand upon application to a variety of running races in succeeding posts. This approach is highly defensible as it uses only finishing time and rank order placement results for computation of “competitiveness.” Event “shallowness” can also be quantified with the same data.
Shallow Competition? These are two different things.
We often hear reference to “shallow competition” as a descriptor of a particular race or event type. As will be developed here, the degree to which a race is competitive is significantly (although not entirely) independent of how “deep” the field is. Therefore the term “shallow competition” really has no foundation in communicating anything of substance. It is possible to have a shallow but competitive field as well as a deep but not competitive field. The following will provide a definition of and metric for competitiveness in running races, describe a method for assessing what is a “deep” field, and offer tools for anyone to determine the competitiveness and “deepness” of the field a given running race.
Definition of Competitiveness
A search of the literature has turned up very little work on defining and evaluating “competitiveness” from an analytical perspective in individual timed sporting events. Given the mountains of data of recorded finishing times for such timed events all across the world, it seems odd that no one has taken up the task of defining competitiveness. I may have missed some publications but certainly there is nothing of substance on the subject via a comprehensive search using numerous channels.
It is clear that some events are more competitive than others, that some sports have deep and competitive fields and others do not, and that “new” sports (e.g. cross country mountain biking) become established and, in a relatively short time, demonstrate a transition to much greater competitiveness. However, there exists no basic fundamental analysis that describes and measures competitiveness.
Understanding and potentially measuring competitiveness is useful for numerous reasons. First, a measure for competitiveness can provide the competing athlete with a clear understanding of how competitive their sport is and, additionally, how competitive a particular race is. This understanding will allow the competitor to assess their performance in an objective way. Second, a measure for competitiveness can serve as a basis for “point” accumulation in ranking of competitors for “championship” awards and honors. These “point” accumulations can be adjusted as a function of the competitiveness of individual events to ensure that the greatest point accumulations are by those who compete well in the most competitive events. Third, it is commonly asserted by many among the “running” community that ultramarathons are not “competitive” and an analytic measure of competitiveness can determine whether or not this assertion is, in fact, supportable.
When one considers the concept of a definition of competitiveness for an event it becomes abundantly clear that there are numerous sub-categorical levels of competitiveness. We have the competitiveness of the particular event itself (intra-event competitiveness (IEC)), the competitiveness of a particular event in aggregate over all or a selected portion of years that the event has been in existence (aggregate intra-event competitiveness (A-IEC)), and we have the competitiveness of a given event type (e.g. road marathon) as it is compared from event to event over the history of the event type or over some selected time period (inter-event competitiveness (IrEC and the aggregate (A-IrEC)).
In running races we are fortunate to have well defined results that can be rigorously analyzed without, to first order, any subjectivity. These data are the finishing time and the rank order of finish. This is great but, as we know, running race courses are all at least slightly different even if they are conducted on a track. In addition each event can have very large differences in the number of competitive participants; this is particularly true when comparing ultramarathons to other event types. Therefore it is imperative that we employ some method to normalize the finishing time and rank order data to be able to compare one race to another be it multiple intra-event results, inter-event results, inter event-type results, or aggregate, all-time inter-event results.
Separate from issues having to do with making comparisons, in framing a concept of competitiveness it is important to recognize that the competitiveness of an event is not only defined by how fast the winner runs but also by how other competitors in the race compare to the eventual winner. This means that any robust competitiveness evaluation must also normalize both the finishing time data and the finishing rank order data. The following will summarize the approach taken here.
Normalization of Finishing Time Data
Accepting the reality that every running course is different, that weather and/or atmospheric conditions can play an important role, and that even the fact that the same running course “runs” differently on different days due to surface conditions, it is crucial that one develop a method for normalizing finishing time data in a fashion that accommodates such differences to facilitate a robust analysis of competitiveness for a given event or event type.
For running race finishing time data the most direct way to accomplish this is via utilization of the calculated percentage time back from the winning time. The percentage back value represents a universal performance metric derived from the finishing time that is substantially independent of the race course, the length of the race, the weather, or other impacting variables that may arise since all competitors face the same conditions on race day. In addition, it is much more informative to assess one’s performance utilizing percentage back rather than raw finishing time (or placement) since improvements are better characterized with percent increases/decreases from the winning time than with raw time. Also the finishing time on various courses will be different on each course even if the race is of the same length. The FIS uses percentage back in assessments for cross country skiing. National endurance sports organizations like the US Ski Team and many other national ski programs (e.g. Norway, Sweden, France, etc.) also use percentage back metrics to assess current and up-coming talent. In fact the US Ski Team has often used percentage back metrics in decisions on which team athletes are to attend World Cup and World Championship races. Also, many coaches of endurance athletes will use percentage back to evaluate performance. In the following analysis we will use “percentage back from the winning time” as the fundamental normalization method for finishing time in the development of competitiveness metrics in running races.
Normalization of Finishing Rank Order Data
When assessing performance of a given cohort (or an aggregated collection of comparable cohorts) the concept of “percentile rank” is commonly used. The percentile rank will likely be familiar to you from the extensive use of this metric by the College Board in assessing SAT scores for a given year as well as in comparisons of test performances from year to year. The percentile rank values range from 0-100 and one’s percentile rank for a test provides data as to what percentile you have scored in relative to your cohort. For instance a test score that yields an 85 percentile rank means that 85% of the participants scored lower and 15% scored higher. These percentile rankings serve to normalize the test scores within the cohort and allow for comparisons with other years (cohorts) where the population size may change significantly. Similarly for running races, the percentile rank is a useful metric for comparison not only within a cohort (the performance of a competitive field at a given race) but also between cohorts (the performance of competitors from numerous years of the same event) and effectively allows for comparisons of races with very different competitive field sizes. The arithmetically related “cumulative probability” will be used here instead of percentile rank for normalization. Cumulative probability values range from 0-1 and represent the probability of a given result within the cohort. For instance a cumulative probability value of 0.1 for a running race result means that this competitor has finished just within the top 10% of the field and has posted a time that is faster than 90% of the competitors in the cohort.
The Functionality of Running Race Results
As for any fundamental concept, derivation of a functional description is paramount to allowing for utility. In the case presented here for evaluations of running races, it is the functionality of the cumulative probability (percentile rank) versus the percentage back from the winning time that describes the competitiveness of the event. In other words the shape of the curve defined by the cumulative probability versus the percentage back defines the competitiveness and comparisons of the shape of such curves (and appropriate descriptive parametrics) will allow for evaluation of the competitiveness of a given event or event type.
For demonstrative purposes, presented below are two cumulative probability versus percentage back from the winning time plots for the Men’s results of the London Marathon in years 2005 (blue) and 2002 (green) using a cohort of the top 100 finishers. The top 100 finishers were used as this population typically shows the population of runners who have finished within about 25% of the winning time. Although I would define “competitive” runners as those who finish within about 5% of the winning time, this population was chosen to allow for comparison to longer, ultramarathon races with much smaller populations and including results up to about 25% back yields sufficient population sizes for analysis and comparisons of all races.
It is inarguable that the London Marathon represents a very competitive event, particularly among the top 100 finishers, so the following analysis is representative of a very competitive running race.
The top 100 finisher cohort of the 2005 London Marathon Men’s race exhibits a steeper ascending functionality than the shallower functionality of the 2002 data.
Graphical inspection of the curves reveals that the 2005 Men’s race was more competitive than the 2002 race. The two figures presented below show that at the same percentile rank/cumulative probability or at the same percentage back from the winning time there is a considerable difference in the percentage back value and the proportion of the population, respectively. Specifically, at an arbitrarily selected value of cumulative probability of 0.20, we see that in the 2005 race this value represents competitors who’s finishing times are about 8% back from the winning time whereas in the 2002 race this probability value represents competitors who’s finishing time is about 12.5% back from the winning time, or about a 35% difference between the races. Similarly, at an arbitrarily selected value of percentage back from the winning time of 10%, the results from the 2005 race show that about 28% of the cohort was at or below this finishing time percentage whereas in the 2002 race only 13% of the cohort was at or below this finishing time percentage, or about a 55% difference between the races. It is clear that, in comparison of the selected cohorts, the 2005 race was more competitive- meaning there is a significantly greater proportion of the cohort of competitors closer to the winning competitor.
In a more fully analytical approach, one can fit the curves to a function and use the function metrics to characterize the level of competitiveness. In this case (and in all cases of running races studied by the author) the cumulative probability versus the percentage back from the winning time data generally fit very well to a simple exponential function. This is expected from a population that follows a normal distribution as athletic performance does. Presented below is a figure showing the fit of exponential functions to the race data. The fits are quite good although, in this example, they underestimate the differences. However the trend is captured. We see that the 2005 race is more competitive and therefore this event exhibits an exponential factor of 0.1481 which is larger than that for the less competitive 2002 race where the exponential factor is 0.1349. These exponential factors characterize the steepness of the curves and therefore the level of competitiveness of the race or event. The following section provides a method for utilization of these exponential function parametrics to capture an analytical measure of competitiveness (a competitiveness index).
One can also derive a metric for the level of the “deepness” of the field (cohort) from these data by assessing the density of competitors (data points) along the curve. A “deep” field would exhibit a high density of competitive times throughout the high performance end whereas a “shallow” field would show a paucity of competitors (data points) in this same region with large gaps between competitors. I will offer no analytical parametric for this evaluation as it is relatively straightforward to determine a sense of the “deepness” of the field from graphical observations.
Derivation of Competitiveness from the Exponential Data
It is unarguable that the road marathon (and specifically here the London Marathon) is a highly competitive running event where literally thousands (and perhaps even ten thousands) of elite and sub-elite participants have recorded impressive finishing times in the 100 year recorded history of the event. That these data fit an exponential function is entirely consistent with performance excellence and highly competitive sport. The exponential function describes a finishing time distribution that includes a sparsely populated tail of ethereal performance followed by an increasingly populated distribution of less impressive finishing times. The degree of performance excellence is defined by the high performance tail and the competitiveness of the event is defined by the “steepness” of the curve (which is proportional to the magnitude of the exponential term of the function). For example, an “other-worldly” performance at the far left of the curve (near or at zero percent back) with very few (or no) other recorded performances near it in the distribution is the definition of performance excellence. Similarly, the steepness of the curve just beyond the high performance tail defines how close other competitors are to the “netherland” of performance excellence. In other words, the steepness of the performance excellence curve determines how many competitors are “knocking at the door” of entry into the performance excellence club. The greater the number of such individuals, the higher is the probability that one of these (very talented and hard-working) competitors will put everything together and score a finishing time in the high performance tail. In the case of a more shallow exponential curve (lower magnitude exponential term), performances are more widely distributed and there are therefore many less individual competitors who have demonstrated performances that are close to the high performance tail. In this case the probability that a competitor will score a finishing time in the high performance tail is much smaller than in the population represented in the steeper distribution. This probability of performance excellence clearly scales with the steepness of the distribution (magnitude of the exponential term) and is a way to define the competitiveness of the event. Presented below are plots of simple exponential functions where only the exponential term is varying, showing the change in steepness of the curve as a function of the exponential. The range of exponential terms in the plot spans the range of such terms found in running finishing time data as will become apparent in subsequent sections.
From a functional perspective, two performances from an exponential population distribution that are close in linear time (the x axis in this plot) are actually exponentially different in “net performance” (the y axis in this plot- e.g. percentile rank). This means that although one competitor may be linearly “close” in time to another competitor in an event, they are actually exponentially further back from a performance perspective and the magnitude of the difference is directly proportional to the exponential term that characterizes the fitted data. The steeper the performance excellence curve the more difficult it is to progress. Many of us have experienced this reality in our own athletic endeavors as we approach our individual limit of ability- exponential improvement is not easy. A shallow(er) curve defines a population where even relatively large changes in finishing time (percentage back) do not lead to substantial changes in percentile rank. Such a population is the result of a sparse competitive field (in some cases due to a sport or event that is new or in a high-growth mode) and/or that the current level of performance is not challenging elite-level human limitations- meaning that the most of the current competitors have not fully developed their potential for performance (either physiological or technical abilities or both).
Now let’s take a look at this exponential functionality as the pre-exponential term varies. Plotted below is an exponential function with an exponential term similar to that exhibited by the road marathon data (an exponent of 1.2x) but with increasing magnitude pre-exponential terms (1, 5, 10, 20). Note that as the pre-exponential term is increased the the rapidly increasing portion of the exponential function begins at lower values (lower percentage back, faster finishing time). Since the x values are generated with a basis of the fastest time ever (at 0% back), the lower the pre-exponential the greater the degree of excellence (the more ethereal the performance) is represented by the fastest time ever.
Taking these two arguments together we now can construct a conceptual equation defining performance excellence: competitiveness and the degree (magnitude) of comparative excellence associated with the fastest time in the cohort. In a general, conceptual, equation form we have:
R ~ 1/E • C (equation 1)
R = cumulative probability (percentile rank)
E = magnitude of comparative excellence of fastest (or fastest ever) time
C = exp(bx), where b=competitiveness index (CI) and x=finishing time or percentage back from the fastest time
Conceptually we have a functionality for competitiveness and excellence that states that, for a measured cohort, the higher the magnitude of the exponential factor, the greater the competitiveness and the higher the magnitude of the pre-exponential factor, the smaller is the difference between the best time and the “rest of the best”. What remains is calibration of the parameters as they map onto running event data. This will be addressed in following posts but an estimate of the upper limit to the competitiveness index is provided below.
Establishing an Upper limit to the Competitive Index
To calibrate the approach outlined here it is important to establish an upper limiting value for the competitive index (CI). As shown above, this index is defined as the exponential factor in the fitted function to the finishing time data. It is inarguable that the road marathon is one of the most competitive of running events. Application of the the analysis protocol developed here to the dataset consisting of the fastest 499 marathon finishing times ever is a good estimate of the expected upper limit to how competitive the event can be. This is because the cohort represented in the data is from the all time best finishing times and represents a cohort of superstars all competing together in one “fictional” race- a “dream” race of sorts. Since these data are the best efforts of all who have ever run the marathon event, they represent the ultimate level of competition as we know it today. Plotted below are the data shown previously for the 2005 and 2002 London Marathon along with the data for the fastest 499 marathon times ever. We see that the data for the fastest times fits very well to a simple exponential function (as expected) and that the competitive index is nearly an order of magnitude larger than that for the individual, single race data for the London Marathon (CI= 1.2585 for the all time data and 0.1481 for the 2005 London Marathon). Based on this analysis it is expected that no single event or aggregated event data will be more competitive than the cohort represented by the all time data and therefore the CI of the all time data represents an upper limit to the value of the CI. Establishing this value will allow for meaningful comparisons in the analysis of numerous other events and event types in follow-on posts.
Take-aways from the Analysis
- The first important take-away here is that running event data fit very nicely to an exponential distribution of finishing time (or percentage back from the fastest time). This exponential behavior is fundamental to the nature of excellence in the sport of running.
- A second take-away is that via a simple analysis of the distribution of the finishing time data for a running event we can extract functional parameters that define the competitiveness of the event as well as establish a reasonable approximation of the degree of excellence of the fastest time. Should other event data of this type fit an exponential function then the exponential term can be used as a fundamental metric for defining the competitiveness of a given event and allow for comparisons between events. A simple process of calculating the cumulative probability, plotting this against the percentage back data, and then fitting the resulting curve will provide robust metrics for defining the competitiveness (steepness of the “excellence curve”) of the event data and therefore yield an analytical basis for comparison.
- A third take-away is that the factors that have lead to such exponential differences in “net performance” are similarly exponential and arguments (such as that espoused by the “10,000 hour rule” cult) that more practice (training) alone can close performance “gaps” are not founded. One must introduce some sort of positive non-linearity to the process of improvement since training time cannot be non-linearly increased by any meaningful magnitude for any meaningful time period. To put this in marathon running terms, a marathon competitor who has progressed to say a 2:15 performance standard over some considerable period is going to have an exponentially increasing difficult prospect at closing the gap to a 2:10 performance standard.
- A final important take-away is that the analysis provides perspective of exactly how exceptional performances in the tail of the finishing time (percentage back) distribution are- this is not a linear space as many seem to assume.
Subsequent posts in this series will analyze finishing time data from numerous distance road running events of varying lengths (10 km-marathon) and from trail ultramarahons. There are some interesting findings.
A Note on Scientific Process
Having written on numerous occasions on this subject and with continued development and refinement of robust analysis approaches to evaluate the “competitiveness” of running races in general and ultramarathons specifically, it is important to point out some parts of the scientific process that are critical to advancement. The first is to establish a null hypothesis and to test against it. In all of this work the working null hypothesis has been that ultrarunning races are just as competitive as other endurance running events. As is the imperative of science, I went about proving this hypothesis wrong; this approach of going about proving a hypothesis wrong is something that is typically not well understood by those who do not engage in scientific inquiry. The following video, which was inspired by a favorite book of mine – “The Black Swan” by Nassim Taleb – does a good job of demonstrating how difficult and seemingly elusive inquiry can be even in the most simple of examples.
The key to progress is to obtain “negative” information, i.e. information that does not fit the null hypothesis and therefore provides positive insight as to what is the underlying law, rule, or function that is the subject of the hypothesis. This is what I have been engaging in with this project.
Second, science and scientific inquiry is not about agreement and it certainly is not about stasis. Our understanding evolves and can be upturned due to new findings and refined due to new insight. All too many consider articles about advancements or discoveries in the popular press to be “definitive”. These same readers lament (sometimes publicly) when they then are exposed to another study that may undermine or refute the prior study conclusions. This is scientific inquiry, a constant, jittery series of disagreements, additional study, and resolution that, when successful, describes general progress toward understanding. There are very few fundamental discoveries that lead to sizable jumps in understanding of any complex inquiry. When the popular press presents any study as such, be wary. A contemporary example that the ultrarunning community has been exposed to is the back and forth amongst cardio researchers as to whether running long distances damages the heart- one study concludes that it does, the next that it does not- this is science, no giant leaps forward, just a bunch of back and forth all the while developing an accumulation of data and interpretation that defines progress but may not fully answer the question at hand. If one is uncomfortable with uncertainty, conflicting data, or alternate interpretations of data, then science is not for you.