Competitiveness in Running Races – Part II – Road Marathons

True delight is in the finding out rather than in the knowing.

Isaac Asimov

In Part I of this series, a methodology has been developed to analytically assess the competitiveness of running races. The approach involves normalization of the finishing time data and the rank finish order data to provide a transformed dataset for any running that can be analyzed and compared to any other timed running event. This normalization/transformation process utilizes the percentage back from the winning (or best ever) time for normailzation of the finishing finishing time data and cumulative probability (percentile rank) for normalization the finishing rank order data. Once the dataset is transformed the functionality of the cumulative probability versus the percentage back from the winning time is determined and tabulated using functional parametrics. In the case of running events it has been found that virtually all races that the author has analyzed exhibit a simple exponential functionality of the form:

y = a • exp(b • x)
x = percentage back from winning time for the cohort
y = cumulative probability of the result in the cohort
a = a pre-exponential factor inversely proportional to the excellence of the winning time relative to the cohort
b = the exponential factor directly proportional to the competitiveness of the cohort
Assessment of the value of b, which will hereinafter be called the competitiveness index (CI), allows for an analytical basis for comparison of the competitiveness of events and event types. Specifically, the magnitude of the CI determines how competitive a selected event is.
In addition, calibration of the expected magnitude of the CI using known competitive events (e.g. London marathon) is developed as well as computation of an expected upper limit to the value of the CI by analysis of the cohort consisting of the fastest 499 marathon times ever recorded (note: the recent world record at the Berlin Marathon is not included in the analysis). These analyses revealed that, for a highly competitive major road marathon, the value of CI is about 0.14 (although it can vary by as much as 40% as will be shown below) and an upper limit of CI of about 1.25.
Note: all event analyses in this series of posts are of the men’s field using a cohort consisting of the finishers within a finishing time of 125% of the winning time. Although a finishing time of 125% of the winning time is not considered “competitive”, the value of 125% is used to standardize the analysis to allow for comparisons with ultramarathon events where it is observed that the finishing times become more spread out partly due to the much longer event period (2-3 hours for a marathon versus 6-20 hours for ultramarthons (50 miles to 100 miles)).

Analysis of the “Big 5” Road Marathons

The “Big 5” marathons- London, Chicago, Berlin, New York, and Boston serve as a group of events that unarguably represent prototypical competitive running races. Analysis of these events over a significant period of time allows for a development of a calibration of the CI for competitive events and therefore a standard for CI that can be used to compare to other events and event types.

I show here the analysis for the London marathon for the period 2001-2013 as an example and then provide a tabular figure showing the results for all of the “Big 5” marathons from the period 2001-2014. Presented below are the cumulative probability versus the percent back from the winning time for each year of the London Marathon, the fitted exponential curve, the exponential equation for the fit, and the R^2 value (coefficient of determination) for the fit. Also shown is the aggregated data analysis for all of these years taken together and re-analyzed using the same 125% cohort.




Plots of the cumulative probability versus the percentage back from the winning time for the London Marathon for the events between 2001 and 2013 and a plot for the aggregated data for all of these competitions re-analyzed using the 125% cohort from the aggregated population. Simple exponential functions are fitted to each dataset and the associated parametrics and coefficients of determination are shown.

It is difficult to see the parametrics in these figures so the London data along with the population size for each analysis is presented below in tabular form:

London Marathon Paramentrics Cropped

Note that all of the R^2 values are about 0.92 or greater indicating very good fits to the data for the exponential function. It is seen that the CI varies from a low of 0.119 for the 2012 event to a high of 0.153 for the 2010 event. This represents a difference of about 28% meaning that the 2012 event was, by this measure, 28% less competitive than the 2010 event. The rest of the years show CI values around 0.130-0.140.

Presented below is the tabular data for all “Big 5” marathons over the period 2001-2014 (or 2001-2013 for those marathon events that have yet to occur in 2014).

Marathon Parametrics All cropped

There is much to be gleaned from these data and I note here some of the important observations:

  1. The CI for these highly competitive events has general bounds of about 0.120 to about 0.170 or a range of about 40%.
  2. All of the fits to the data are very good- R^2 values are in excess of 0.918.
  3. The population sizes are sufficient to expect a very low error magnitude.
  4. Of the group, the New York marathon is, on average, the least competitive and the Boston Marathon is the most competitive.
  5. Interestingly, the last two Boston Marathons have been the most competitive events of the group by a good margin.

Point 1 (taken together with points 2 and 3) allows us to now have a calibration for expected magnitude for the competitiveness index for highly competitive events. We can now make meaningful comparisons with other marathon events and other running races.

The analyzed population size varies in this dataset and ranges from a low of 63 to a high of 368, a variation that is almost a factor of 6. It is important to test the robustness of this analysis approach by determining the extent to which there is a relationship between the computed CI and the analyzed population size. Presented below is a graph of the CI versus the population size. As is clear form the graph, there is no correlated relationship and this gives additional support to the efficacy of the analysis approach across events with very different populations in the “125% cohort”.


“Other” Road Marathons

It seems that there exist almost as many Marathon events in the US as there are cities, towns, and villages- meaning that there are many thousands of Marathon events held each year. I will make no attempt to survey a representative selection of such events as the task is enormous. I will, however, present analysis of a few Marathon event results to begin to establish a “feel” for what one might find in a comprehensive study.

In a quite random way I selected the following “other” Marathons for analysis and comparison to the “Big 5”:

  • Kansas City (MO) Marathon 2012
  • Fox Cities (WI)  Marathon 2014
  • Columbus (OH) Marathon 2013
  • Rochester (NY) Marathon 2014
  • Wenatchee (WA) Marathon 2013
  • Vermont City Marathon 2013

This selection of events includes a range of size and speed. Although none of these marathon events have elite level winning times (< 2:10:00) two have winning times in the sub-elite level (2:10:00 – 2:20:00). Presented below are the cumulative probability versus percentage back from winning time plots for each event showing the fitted exponential functions and the associated parametrics.



And here is the data in tabular form:

Other marathon parametrics

There are a few interesting observations that merit remarks:

  1. Three of these events (Kansas City, Rochester, and Wenatachee) exhibit very low competitiveness compared to the “Big 5” events.
  2. As noted earlier, an event can be competitive but still have a relatively slow winning time (Fox Cities). This is an important understanding because, although competitiveness and the “fastness” of a given race are not entirely independent, competitiveness can be high even in “slow” races since the computational basis is the cohort in the race.
  3. The two “fastest” races of the group (Columbus and Vermont) show competitiveness on par with the “Big 5” events.
  4. All of the fits to the data are very good- R^2 values are all in excess of 0.92.
  5. The analysis appears to be robust down to very small populations, although the calculated error will be substantially higher for the small populations.

I continue to be encouraged by the robustness of this analysis approach across this very disparate selection of marathon events ranging from the largest and most “elite encrusted” events right down to the “neighborhood”-type events.


Shown here is the application of an analytical competitiveness methodology across a large range of marathon events. The results show consistent adherence to the expected exponential function resulting from normally distributed performance data. This work establishes a new basis for assessment of the competitiveness of a given running event using a very simple and straightforward analysis protocol and should provide an analytical context to evaluations of “competitiveness” in such events.

In Part III of this series we will look at a shorter distance race (10 km) and Part IV will extend the analysis to ultramarathons. There are some very interesting results!




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s