January 6

MySQL – Get top N rows for each group

Following table contains results from athletic 10K race:

It contains following fields:

id – autoincrement

full_name – name of participant

category – can be “Junior”, “Senior” and “35+”

result – finish time in seconds

Goal is to extract top 3 participants for each category using SQL query. Microsoft SQL Server has OVER(PARTITION BY fieldname) clause which can help us achieve this goal. However, MySQL doesn’t have any built-in feature. In this case we will utilize variables in order to calculate position for each group.

Inner query does following:

– orders participants by category and result

– assigns position number by category for each participant

Outer query selects only first three participants from each group.

Final result set is:

This query looks little bit dirty but it can save you doing additional processing at application level.

December 6

R Studio – Marathon Stats

I got an idea during the “R Programming” course on Coursera. Apart from my interests in the programming, I am also avid marathon runner. So I got idea to analyze marathon results and generate various interesting graphs. I downloaded Ljubljana Volkswagen Half Marathon 2017 results in PDF format and converted it to following CSV file:


File contains following columns:

  • FullName
  • BirthYear
  • FinishTime
  • Gender

This is an example of data:







My goal was to generate following graphs:

  • finish time distribution
  • average finish time by age category
  • number of results better than 1:25h per age category

Two helper functions were created (hmsToSeconds and secToHm) and moved to separate R file which is included in main script. These functions convert date from h:mm:ss format to number of seconds and vice versa. This is content of time_helper.r script:

This example requires installing following libraries:

Complete script which plots all three graphs:

If you are familiar to running (half)marathons these graphs might surprise you. We are going to analyze one by one.

Finish time distribution

X axis represents finish time divided into three minutes chunks. Y axis represents number of participants which falls into those chunks. Graph resembles normal form which is expected.

Average finish time by age category

Unlike previous graph this graph contains three dimensions. X axis is age range. Y axis is number of participants within particular age range. Size of black circle marker represents average finish time. The bigger the circle, the better average finish time. 40-44 age group has largest group of participants. Average finish times are almost identical across age categories from 15-19 to 55-59.

Number of results better than 1:25h per age category

X axis represents age category while Y axis represents number of participants that achieved results better than 1:25 h.