Thursday, August 11, 2016

Estimating Ironman Triathlon Time from 70.3 Triathlon Time

Recently, I ran an analysis to estimate 70.3 triathlon finishing time using Olympic distance triathlon finishing time.

In this post, I run a similar analysis to estimate Ironman finishing times using 70.3 race results.

I live in the Chicago area, so I've chosen to estimate finishing times for Ironman Wisconsin using three different 70.3 events that all relatively close to Chicago.

I needed to choose events located in the same geographic area because I'm identifying athletes who competed in both a 70.3 and the Ironman to run the regressions. I needed the races to be near each other so I could get a meaningful number of data points.  

All race results used in the analysis are from 2015.

The Races to Compare


IRONMAN Race (Dependent Variable)

Ironman Wisconsin
September 13th, 2015

70.3 Races (Independent Variables)

Ironman 70.3 Muncie
July 11th , 2015

Ironman 70.3 Racine
July 19th , 2015

Ironman 70.3 Steelhead
August 9th , 2015

Scatter Plots and Regression Equations


The plots below were generated using a script which identifies athletes who competed in at least one of the 70.3 races as well as the Ironman Wisconsin race.  I then convert all finishing times to minutes and generate a scatterplot and regression line.

All times in the plots and equations are in minutes.

MUNCIE RESULTS:


106 data points, Residual Standard Error = 42.0

RACINE RESULTS:


450 data points, Residual Standard error = 49.5


STEELHEAD RESULTS:

153 data points, Residual Standard Error = 48.7
Note: One extreme outlier was omitted from the analysis

As a final step, I plotted all three regression lines on a single plot.  You can see the regression lines are very similar.  In fact, for Muncie and Steelhead the regression lines are nearly indistinguishable.

Note that this plot may look wrong because the Racine and Steelhead lines are offset slightly, despite the fact that the equations above are identical for these two races.  This is due to rounding.  I used the raw regression output to plot the lines, but the values in the equations shown above are rounded

Conclusion

I was surprised how similarly the three regressions came out. 

Of course, the standard errors are fairly large, so in practice there is a lot of variation around the estimates given by the equations, and the results would certainly shift somewhat for different races and different conditions. Nevertheless, I think these equations are an interesting reference point.

Interestingly, my estimates of Ironman time are significantly slower than some similar estimates I found on the web. Part of this might be because Ironman Wisconsin is a difficult course.  Also, I think the methodology is a bit different.  My approach looks only at athletes who competed in both races, where I think the other approach compares the overall distribution in both races.


No comments:

Post a Comment