23 Nov 2014

Statistics result 10-km run Corre por el niño 2014

#27 Statistics result 10-km run Corre por el niño 2014

On Sun. 9 Nov. I participated in the 4th ed. of the 10K run "Corre por el Niño 2014" (#CorrePorElNiño), together with 1827 other people. (In total there were 7000 runners, the rest participated in the other 4 km run. This run was organized by hospital Niño Jesus (in Madrid), to raise funds for investigation of severe diseases with childeren, like e.g. cancer. For more info  about this road-race, see:

http://correporelnino.wordpress.com

https://twitter.com/CorrePorElNino

https://www.facebook.com/pages/Carrera-Popular-Hospital-Ni%C3%B1o-Jes%C3%BAs/212488895455880

Every participant of the 10k run gets a chip, and in this race the bruto (gun) time and netto (chip) time are measured. The organization decided to use the bruto time as the oficial time (as prescribed by IAAF), although I think the netto time would be better because of the large number of participants, not everybody can start at the start/finish line at the same (gun-)time. For me for example it took 20 sec. after the gun to reach the start/finish line. And the final ranking raised also some doubts: the netto time of the fastest woman (33:47 min, and bruto time 37:03) was better than that of the fastest man (36:05 min. (netto and bruto time)), which probably is incorrect, as was noticed here:

http://www.forofosdelrunning.com/index.php?topic=6709.30  

And I think it´s a pity that in the ranking-table the columns 'gender' and 'category' (age-classes) are missing.
But of course the most important thing was the hospital raised about 70.000 euro (every participant pays 10 euro). For  some of my photos of the race (and that of 2013), see:

https://picasaweb.google.com/103278159654062440102/CarreraDelNino2013Madrid

(one of them has Pedro Delgado on it, the former Tour de France winner, who participated in the run to promoto it).

After the race I wondered how good (or bad) was my ranking (my bruto-time: 53:33 min., netto: 53:13, about half as slow as the world-record for the 10k road-race (26:44 min)...). The ranking was published here:

http://www.cronococa.com/ResultadosRunning.aspx

and I saw I ended at 828th place (my name is in blank because I registered last minute, on the day of the race). And on this site:

http://www.runedia.com/cursa/201418490/carrera-popular-corre-por-el-nino-10k/2014/

I saw that my ranking corresponds to 45%-percentile. And this site also had a nice histogram of the times of the participants, which gave me the idea to do the same in Excel, see fig.1 for the result.

    fig.1: Distribution of finish-times runners

How did I make this? First, I copied the data from the CronoCoca-website, see fig.2, blue columns. Then, I added the orange columns. First, the bruto time in seconds, and the (integer-)values of this column were input for the FREQUENCY-funcion (1st parameter, 'data-array') which calculates the freq. of the values in this data-array. Then I created another table with the 'bins', time-intervals of x minutes (I made 2 versions, with x=2 and x=5), for which I also calculated the equivalent in seconds, which is the 2nd  parameter, 'bins array') of the FREQUENCY-funcion.The results is the column 'Freq.' (see table in fig.1), for which I created a bar-chart (histogram) (see left graph in fig.1). The next column is the cumulative frequency and the last columns the relative cumulative frequenc, or 'percentile rank'. For a more detailed description of Excel and distribution-functions, see:

http://exceluser.com/formulas/frequency-distribution-five-ways.htm

fig.2: Table with all finish-times

For how you can calculate percentiles (quartiles) in Excel (see fig.2, 2nd orange column), using e.g. the  statistical functions PERCENTRANK.INC, PERCENTILE, QUARTILE, see:

http://best-excel-tutorial.com/55-advanced/219-calculate-percentile

http://www.excel-easy.com/examples/percentiles-quartiles.html

As I said, my percentil and ranking was 45% and 828 (see table of fig.1, the orange row). In the histograms with bin-widths of 5 minutes (fig.1, graph Cumul. Rel. Freq.) you can see that my time (53:33 min.) falls between the percentiles of bins 0:51-0:53 and 0:53-0:55 min., so between 41% and 51%. And if you look at the same graph in sheet-2, with the (bigger) bin-width of 5 min., it shows my time is between the 29% (bin 0:45-0:50) and 51% (bin 0:50-0:55). So the smaller the bin-width, the more exact is the percentile-estimation. For how to determine the optimal bin-width, see:

http://stats.stackexchange.com/questions/798/calculating-optimal-number-of-bins-in-a-histogram-for-n-where-n-ranges-from-30

The RunEdia-site also had some other statistics, like mean and standard deviation of the finish-times, things which can also be calculated in Excel with the statistical-functions (AVERAGE, STDEV), see sheet-2 and fig.3 and this site:

http://office.microsoft.com/en-us/excel-help/statistical-functions-HP005203066.aspx


 fig.3: Statistics finish-times

The distribution of the times of the runners looks like a normal distribution  (a bell-shaped curve) and to test this, I used the statistical function NORM.INV to calculate the 3 quartiles Q1, Q2, Q3 and compared them with the 'real' Q1,Q2, Q3, and as you can see, Q1 and Q2 of the normal distribution are quite close to that of the real distribution. Another test for 'normality' of the distribution which I did was the SKEW-function, with a result of 0,3, and according to this site:

http://help.gooddata.com/doc/public/wh/WHAll/Default.htm?#MAQLRefGuide/NormalityTesting-SkewnessAndKurtosis.htm

that indicates the distribution is approximately symmetric (like a normal distribution). The fact the skew is positive means that the distribution-function has a tail, which means in this case (10K run) that there are more runners that are slower than the mean time (54:50 min.) then that there are runners that are faster then the mean time.

For this race I used the Runtastic Android-app to measure my km-times, which you can see here:

http://www.runtastic.com/sport-sessions/347242237

I also included these results (km-times and speed) in my Excel, see fig.4

  fig.4: my km-times and speed

This graphic has besides my speed als the height(differences) during the race (to get the absolute height, you must add ca. 667m, the hight of Madrid (above sea-level). For a description how to combine 2 graphs in 1 graphic (in this case: area-chart for height) and line-chart for speed), see:

http://blogs.office.com/2012/06/21/combining-chart-types-adding-a-second-axis/

For this post I want to thank (Crono)Coca, the company responsable for the time-registration in this 10K run, for answering my questions about the time-ranking table, and José, my running/training-partner, for reviewing the statistics of my Excel.

And to conclude, some interesting websites I found while making this Excel:

- statistics for a 10K run (very profesional!):
http://cnr.lwlss.net/RealData/

-Q/A distribution times 5K run:
http://www.letsrun.com/forum/flat_read.php?thread=4612823

- graphics for distributions:
http://flowingdata.com/2012/05/15/how-to-visualize-and-compare-distributions/

-histograms and (normal) distribution functions in Excel:
http://peltiertech.com/Excel/Charts/Histograms.html
http://exceluser.com/formulas/statsnormal.htm

- race-timing and discussion bruto vs netto finish times and funny anecdote:
http://www.aimsworldrunning.org/race_timing.htm
http://www.iaaf.org/news/news/sometimes-rules-can-be-complicated-to-explain

Downloads:

#Mirror 1 (PDF file):

https://es.scribd.com/doc/247957028/Statistics-10km-Run-CorrePorElNino2014-in-Excel

 #Mirror 2 (Excel file):
NB: this site has MS Onedrive, which has 'Excel-Online', so you can view my Excel-files here if you don´t have MS Excel on your PC

https://onedrive.live.com/redir?resid=3F963E9F1A42D952%21251

#Mirror 3 (Excel and PDF file in 1 zip):

https://drive.google.com/file/d/0BywxxSJoaUYxRkx4RnQwLWNNbVU/view?usp=sharing 



No comments: