True delight is in the finding out rather than in the knowing.
In Part I of this series, a methodology has been developed to analytically assess the competitiveness of running races (see parts III and IV for extensions to 10 km road races and ultra marathons, respectively). The approach involves normalization of the finishing time data and the rank finish order data to provide a transformed dataset for any running that can be analyzed and compared to any other timed running event. This normalization/transformation process utilizes the percentage back from the winning (or best ever) time for normailzation of the finishing finishing time data and cumulative probability (percentile rank) for normalization the finishing rank order data. Once the dataset is transformed the functionality of the cumulative probability versus the percentage back from the winning time is determined and tabulated using functional parametrics. In the case of running events it has been found that virtually all races that the author has analyzed exhibit a simple exponential functionality of the form:
Analysis of the “Big 5” Road Marathons
The “Big 5” marathons- London, Chicago, Berlin, New York, and Boston serve as a group of events that unarguably represent prototypical competitive running races. Analysis of these events over a significant period of time allows for a development of a calibration of the CI for competitive events and therefore a standard for CI that can be used to compare to other events and event types.
I show here the analysis for the London marathon for the period 2001-2013 as an example and then provide a tabular figure showing the results for all of the “Big 5” marathons from the period 2001-2014. Presented below are the cumulative probability versus the percent back from the winning time for each year of the London Marathon, the fitted exponential curve, the exponential equation for the fit, and the R^2 value (coefficient of determination) for the fit. Also shown is the aggregated data analysis for all of these years taken together and re-analyzed using the same 125% cohort.
It is difficult to see the parametrics in these figures so the London data along with the population size for each analysis is presented below in tabular form:
Note that all of the R^2 values are about 0.92 or greater indicating very good fits to the data for the exponential function. It is seen that the CI varies from a low of 0.119 for the 2012 event to a high of 0.153 for the 2010 event. This represents a difference of about 28% meaning that the 2012 event was, by this measure, 28% less competitive than the 2010 event. The rest of the years show CI values around 0.130-0.140.
Presented below is the tabular data for all “Big 5” marathons over the period 2001-2014 (or 2001-2013 for those marathon events that have yet to occur in 2014).
There is much to be gleaned from these data and I note here some of the important observations:
- The CI for these highly competitive events has general bounds of about 0.120 to about 0.170 or a range of about 40%.
- All of the fits to the data are very good- R^2 values are in excess of 0.918.
- The population sizes are sufficient to expect a very low error magnitude.
- Of the group, the New York marathon is, on average, the least competitive and the Boston Marathon is the most competitive.
- Interestingly, the last two Boston Marathons have been the most competitive events of the group by a good margin.
Point 1 (taken together with points 2 and 3) allows us to now have a calibration for expected magnitude for the competitiveness index for highly competitive events. We can now make meaningful comparisons with other marathon events and other running races.
The analyzed population size varies in this dataset and ranges from a low of 63 to a high of 368, a variation that is almost a factor of 6. It is important to test the robustness of this analysis approach by determining the extent to which there is a relationship between the computed CI and the analyzed population size. Presented below is a graph of the CI versus the population size. As is clear form the graph, there is no correlated relationship and this gives additional support to the efficacy of the analysis approach across events with very different populations in the “125% cohort”.
“Other” Road Marathons
It seems that there exist almost as many Marathon events in the US as there are cities, towns, and villages- meaning that there are many thousands of Marathon events held each year. I will make no attempt to survey a representative selection of such events as the task is enormous. I will, however, present analysis of a few Marathon event results to begin to establish a “feel” for what one might find in a comprehensive study.
In a quite random way I selected the following “other” Marathons for analysis and comparison to the “Big 5”:
- Kansas City (MO) Marathon 2012
- Fox Cities (WI) Marathon 2014
- Columbus (OH) Marathon 2013
- Rochester (NY) Marathon 2014
- Wenatchee (WA) Marathon 2013
- Vermont City Marathon 2013
This selection of events includes a range of size and speed. Although none of these marathon events have elite level winning times (< 2:10:00) two have winning times in the sub-elite level (2:10:00 – 2:20:00). Presented below are the cumulative probability versus percentage back from winning time plots for each event showing the fitted exponential functions and the associated parametrics.
And here is the data in tabular form:
There are a few interesting observations that merit remarks:
- Three of these events (Kansas City, Rochester, and Wenatachee) exhibit very low competitiveness compared to the “Big 5” events.
- As noted earlier, an event can be competitive but still have a relatively slow winning time (Fox Cities). This is an important understanding because, although competitiveness and the “fastness” of a given race are not entirely independent, competitiveness can be high even in “slow” races since the computational basis is the cohort in the race.
- The two “fastest” races of the group (Columbus and Vermont) show competitiveness on par with the “Big 5” events.
- All of the fits to the data are very good- R^2 values are all in excess of 0.92.
- The analysis appears to be robust down to very small populations, although the calculated error will be substantially higher for the small populations.
I continue to be encouraged by the robustness of this analysis approach across this very disparate selection of marathon events ranging from the largest and most “elite encrusted” events right down to the “neighborhood”-type events.
Shown here is the application of an analytical competitiveness methodology across a large range of marathon events. The results show consistent adherence to the expected exponential function resulting from normally distributed performance data. This work establishes a new basis for assessment of the competitiveness of a given running event using a very simple and straightforward analysis protocol and should provide an analytical context to evaluations of “competitiveness” in such events.
In Part III of this series we will look at a shorter distance race (10 km) and Part IV will extend the analysis to ultramarathons. There are some very interesting results!