December 6

R Studio – Marathon Stats

I got an idea during the “R Programming” course on Coursera. Apart from my interests in the programming, I am also avid marathon runner. So I got idea to analyze marathon results and generate various interesting graphs. I downloaded Ljubljana Volkswagen Half Marathon 2017 results in PDF format and converted it to following CSV file:


File contains following columns:

  • FullName
  • BirthYear
  • FinishTime
  • Gender

This is an example of data:







My goal was to generate following graphs:

  • finish time distribution
  • average finish time by age category
  • number of results better than 1:25h per age category

Two helper functions were created (hmsToSeconds and secToHm) and moved to separate R file which is included in main script. These functions convert date from h:mm:ss format to number of seconds and vice versa. This is content of time_helper.r script:

This example requires installing following libraries:

Complete script which plots all three graphs:

If you are familiar to running (half)marathons these graphs might surprise you. We are going to analyze one by one.

Finish time distribution

X axis represents finish time divided into three minutes chunks. Y axis represents number of participants which falls into those chunks. Graph resembles normal form which is expected.

Average finish time by age category

Unlike previous graph this graph contains three dimensions. X axis is age range. Y axis is number of participants within particular age range. Size of black circle marker represents average finish time. The bigger the circle, the better average finish time. 40-44 age group has largest group of participants. Average finish times are almost identical across age categories from 15-19 to 55-59.

Number of results better than 1:25h per age category

X axis represents age category while Y axis represents number of participants that achieved results better than 1:25 h.