Thursday, August 11, 2016

Estimating Ironman Triathlon Time from 70.3 Triathlon Time

Recently, I ran an analysis to estimate 70.3 triathlon finishing time using Olympic distance triathlon finishing time.

In this post, I run a similar analysis to estimate Ironman finishing times using 70.3 race results.

I live in the Chicago area, so I've chosen to estimate finishing times for Ironman Wisconsin using three different 70.3 events that all relatively close to Chicago.

I needed to choose events located in the same geographic area because I'm identifying athletes who competed in both a 70.3 and the Ironman to run the regressions. I needed the races to be near each other so I could get a meaningful number of data points.  

All race results used in the analysis are from 2015.

The Races to Compare


IRONMAN Race (Dependent Variable)

Ironman Wisconsin
September 13th, 2015

70.3 Races (Independent Variables)

Ironman 70.3 Muncie
July 11th , 2015

Ironman 70.3 Racine
July 19th , 2015

Ironman 70.3 Steelhead
August 9th , 2015

Scatter Plots and Regression Equations


The plots below were generated using a script which identifies athletes who competed in at least one of the 70.3 races as well as the Ironman Wisconsin race.  I then convert all finishing times to minutes and generate a scatterplot and regression line.

All times in the plots and equations are in minutes.

MUNCIE RESULTS:


106 data points, Residual Standard Error = 42.0

RACINE RESULTS:


450 data points, Residual Standard error = 49.5


STEELHEAD RESULTS:

153 data points, Residual Standard Error = 48.7
Note: One extreme outlier was omitted from the analysis

As a final step, I plotted all three regression lines on a single plot.  You can see the regression lines are very similar.  In fact, for Muncie and Steelhead the regression lines are nearly indistinguishable.

Note that this plot may look wrong because the Racine and Steelhead lines are offset slightly, despite the fact that the equations above are identical for these two races.  This is due to rounding.  I used the raw regression output to plot the lines, but the values in the equations shown above are rounded

Conclusion

I was surprised how similarly the three regressions came out. 

Of course, the standard errors are fairly large, so in practice there is a lot of variation around the estimates given by the equations, and the results would certainly shift somewhat for different races and different conditions. Nevertheless, I think these equations are an interesting reference point.

Interestingly, my estimates of Ironman time are significantly slower than some similar estimates I found on the web. Part of this might be because Ironman Wisconsin is a difficult course.  Also, I think the methodology is a bit different.  My approach looks only at athletes who competed in both races, where I think the other approach compares the overall distribution in both races.


Tuesday, August 9, 2016

Naperville Sprint Triathlon 2016 - Enhanced Results Report

This past Sunday was the Naperville Sprint Triathlon.  I didn't compete in this race, but it is a popular race with many participants from my community, so I downloaded the results data to experiment with some updates to my program which generates PDF race reports.

An example output (for a made up participant I inserted into the actual results) is available here.

I'm interested in getting feedback and enhancing these reports, so if you participated in this race and would like a report with your own results, or if you have any suggestions/comments, please email me at camontgom[at]gmail.com.

Thursday, July 14, 2016

Estimating 70.3 Triathlon Time from Olympic Triathlon Time

Setting goals can be difficult when moving to a new race distance. I recently read an article which had some example calculations for setting a half-ironman goals based on Olympic distance results, and I decided to run some regressions to see what equations I could come up with based on actual race data.

I found two races, an Olympic triathlon and a 70.3, that were reasonably close to Chicago and about one month apart, and I ran a computer script to identify athletes who competed in both.  I then ran some regressions to estimate the expected 70.3 times from the Olympic times.

The Races to Compare


ET Lake Zurich Olympic Triathlon
July 12, 2015
Swim: 1500m
Bike: 24.9 miles
Run: 6.2 Miles
Finishers: 433 (Age Group Category)

Ironman 70.3 Steelhead
August 9th , 2015
Swim: 1.2 miles
Bike: 56 miles
Run: 13.1 miles
Finishers: 2043

In comparing the results of these races, I was able to identify 57 athletes (one was omitted from final analysis) who competed the "Age Group" category in both races.

Regression Analysis


I ran some simple regressions using the results for the athletes who completed both races.  The Olympic splits were used as the independent variable, and the Half Ironman splits were used as the dependent variable.  Basically, each regression provides an equation to calculated the expected Half Ironman distance time based on the Olympic distance time.

All times in the equations and plots are in minutes!

Swim Regression



The "residual standard error" of this regression is about 2 minutes, so, roughly speaking, about 68% of swim results fall within +/- 2 minutes of the estimate provided by the equation and 95% of results fall within +/- 4 minutes (two times the standard error) of the estimate.

Bike Regression



The residual standard error for the bike time estimate is about 10 minutes.

Run Regression



The residual standard error for the run time estimate is about 12 minutes.

Overall Regression



The residual standard error for the overall time estimate is about 18 minutes.

Conclusion


These results ended up matching fairly well with the example calculations in the article noted earlier. Of course, in practice, there are many additional factors which could affect the results when applying these equations to other Olympic and Half-Ironman events, so, use with caution! I may try to find some other pairs of events to check how similar the estimates turn out to be.  I'd also like to run some similar regressions to compare half-ironman and full ironman times.

One other note, I did run some multiple regressions using the swim, bike, and run from the Olympic to estimate the Half Ironman overall time rather than just directly estimating one overall time from the other.  I expected this might give a better result since the individual components don't all scale by the same amount between the two events.  However, the result was only slightly better than the simple regression equation I posted above, so I opted to stick with the simpler model.



Tuesday, July 12, 2016

Lake Zurich Olympic Triathlon 2016 Race Report

This past weekend was the ET Lake Zurich Triathlon in Lake Zurich, Illinois. This was my first attempt at the Olympic distance, and I've created a report with some stats to evaluate my race. The PDF is available here. It includes all the tables and graphs below plus some additional maps, plots, and stats.

As with the Twin Lakes race, if anyone else who did the Lake Zurich race (Sprint or Olympic) wants a similar report with their own data, just email me at camontgom[at]gmail[dot]com, and I'll generate it and send it to you.

My Results



Performance by Race Segment




Swim


The 1500m swim course was a counter-clockwise triangle. I thought I paced the swim well considering it was my first time racing this distance.  I did veer off in the wrong direction a couple times, so I know my open water sighting and navigation still need some work.  I think I could improve the swim by a minute or more with some better open water skills.

Bike


The bike course was a two loop course which was mostly closed to other traffic. I thought it was relatively flat and fast.

As the bar charts above show, the bike is a big weakness for me. I think the bike leg of this race went ok given my current cycling fitness, but this is an area I need to work on for next year. If I can get my bike up to the level of my swim and run, it will significantly improve my overall finish.

Run


The 10k run was two loops around Lake Zurich, and the temperature was hot and humid!  I was hoping to maintain a sub 7 min mile pace, but couldn't quite do it. I ended up walking through the last couple water stops, so I could get more water down and keep myself hydrated. All things considered, I'm happy with my run performance.

Transitions


I got a bit disoriented coming into T2, and I wasn't initially able to find my stuff. I need to make better note of my exact location in the future, and maybe get a more colorful bag or mat.

Play by Play Graphs



These plots show how my standing evolved throughout the race.  I've also included the standing of the overall winner and the winner of my age group.  The first plot shows the time gap and the second plot shows the position gap.  

Obviously, I dropped lost a in terms of both time and position during the bike leg, but I was able to make up a bit of ground on the run.



Summary

The Lake Zurich Triathlon was a great race, and I think I learned a lot about what I need to focus on to come back faster next year.

Wednesday, June 29, 2016

Twin Lakes Triathlon 2016 - Time Distributions

This past Sunday was the Twin Lakes Triathlon in Palatine, Illinois. This was my first attempt at a triathlon, and I went a bit overboard in analyzing the race data. 

I wrote a small program which uses the race results to create detailed graphs and generate a personalized PDF report (see this example). For anyone else who participated in this year’s race, I’d be happy to generate a similar PDF format report with your results. Just email me at camontgom[at]gmail.com. 


I've posted a few basic graphs showing some time distributions below.  


Time Distributions by Discipline


The histograms in this section show the distribution of times for the swim, bike, and run segments. The average time, the median time, and the standard deviation for each segment is also noted on the plot.

Age Group Data

The graphs below break the discipline splits down by gender and age group.  The "box plot" for each age group shows the median time as the center line, and the colored center box shows the 25th to 75th percentile range.  The top and bottom bar of the "whisker" shows the max and min times in the age group.  However, if there data points that are well outside the "typical" range, then these are shown as a individual dot rather than the bar of the whisker. A more detailed explanation of the box plot can be found on Wikipedia.