Note 20/12/2016: I updated this post with a Power BI solution (with R-Stats and Q&A), see part 2 of this post.
Part 1: Statistics with R
Today the 'business run' ("Carrera de las Empresas") was held here in Madrid. I participated with Raet, a Dutch ISV of a SAAS-solution for Human Resources, which since almost 2 years now has a software development centre in Madrid.
For an earlier post of this run (with more details), see:
http://worktimesheet2014.blogspot.com.es/2015/12/statistics-result-10-km-run-carrera-de.html
This time, there were over 14.000 runners, from 800 companies, for more details, see:
http://www.expansion.com/directivos/deporte-negocio/2016/12/19/58578b2ae2704ef1758b4588.html
The results of the 10K run (in which I participated) I got from here:
http://www.carreradelasempresas.com/sec_clasificaciones.asp
'
and for the statistics, see this page (a social/runners network): http://www.runedia.com/cursa/201628906/carrera-de-las-empresas-10k-actualidad-economica/2016/
The results (10K-Men, 5563 runners) are in PDF format, which I converted to Excel, with help of this nice (free) online tool:
http://www.pdftoexcel.com/
After some 'cleansing' (see added yellow-columns in Excel file) I had the data ready for analysis, for which I used R, the language for statistical computing.
NB: For an earlier post in which I used R, see:
http://worktimesheet2014.blogspot.com.es/2016/02/20minutos-blog-awards-2015-statistics.html
The R-script I made looked basically like this:
sv10km2 <- read.csv("10Km2.csv")
edit(csv10km2)
Tfinish=as.numeric(as.difftime(as.character(csv10km2$Tfinish),format="%H:%M:%S"))
summary(Tfinish)
plot(density(Tfinish,na.rm=TRUE),lwd=3,xlab="Finish Time (mins)", main="Madrid Business Run 2016 10k Results")
View(csv10km2)
boxplot(Tfinish, data=csv10km2)
install.packages("sqldf")
newdata <- sqldf('select * from csv10km2 where Team like %Raet%')
Input for the script has to be in the MyDocument-folder, in my case: D:\Persoonlijk\maartenree\Documents\10Km2.csv
See figures below for the result of my analysis.
And for those who don't have R, but want to do the same as I did for my company (in fig.3), so filter the results, I also added the Excel where the finish-results are in table-format (so easy to filter).
Fig.1: Plot with finish-times and Five-number summary
Fig.2: Boxplot with finish-times
Fig.3: Filter dataset with finish-times for (teams of) company Raet
Part 2: Power BI + R Stats/Charts
Fig.4: Excel with new column Tfinish2 (time in seconds)
And for the final result in Power BI (Desktop), see fig.5.
Note that slicer for team 'dimension' has a pre-filter (in this case with value 'Raet') to limit the number of items in the slicer (very helpful when there are over 1000 teams). And to filter for >1 team (like I did, with 3 Raet-teams 'checked'): press Control + click in checks for filter values. And note that the chart is reacting on the filter (so Power BI filter and R-chart are connected).
Fig.5 Power BI Desktop: report finish-times (in second) filtered for teams of Raet
https://app.powerbi.com/view?r=eyJrIjoiZDQ1YzFkZTAtODExNi00OWRmLWFlOTAtMzI0M2JjZWE4MzQ0IiwidCI6ImI3OWIyMzE3LTM0ZGQtNDNlNS05MWEyLWNkNjZkM2FlMWYwYiIsImMiOjh9
To be able to use R-Stats/Charts in Power BI online, you need to have a Power BI Pro account. About how to get a free trial-account, see:
https://powerbi.microsoft.com/en-us/documentation/powerbi-admin-purchasing-power-bi-pro/#power-bi-pro-60-day-trial
With Power BI Online it is also possible to do Q&A in natural language (English). See fig.6-7. For this example to work, I had to rename the Excel-columns Runner -> RunnerName and Team -> TeamName, see:
https://powerbi.microsoft.com/en-us/documentation/powerbi-service-q-and-a-tips/
Fig.6 Power BI and Q&A 1/2
Fig.7 Power BI and Q&A 2/2
http://cnr.lwlss.net/RealData/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4317364/
Downloads
R:
v1: https://goo.gl/gdjPRk
PowerBI:
v2:
https://goo.gl/S43i2O
v2b: (better chart than v2)
https://goo.gl/gn4o8k