13 Nov 2017

Statistics result 10-km run Corre por el niño 2017

#60 Statistics result 10-km run Corre por el niño 2017


Note 13/11/2017: updated post and report (see 'Downloads, v2)

On Sun. 5 Nov. I participated in the 7th edition of the 10K run "Corre por el Niño 2017", with this year about 10.000 participants (10 km, 4.5 km and 2 km for kids (whole family)).  This run is organized by hospital Niño Jesús (in Madrid), to raise funds for investigation of severe diseases with childeren, like e.g. cancer. For more info about this road-race, see:

https://www.correporelnino.org/

https://twitter.com/CorrePorElNino

And besides the 10K I also did the 2K 'run' (which started later) with the family. The company Bebeaway sponsored us with a running buggie, for which I'd like to thank them. If you are interested in hiring a running buggie from them, see:  http://www.bebeaway.com/

And now about the 10K race. As in previous editions in which I participated, I made a data analysis of the finish times. In my last post about this race in 2015, see:


I used Excel for the data analysis, but this year I decided to use Microsoft  Power BI, a 'Self-service' Business Intelligence (BI) solution, which could be a good alternative for Excel (for BI-stuff). For the end-result, see fig.1. In this figure you can see that in this dashboard you can filter by (runner-)name and starting-number of the runner, which then shows in the top-table the netto finish time (*1) of the runner, his/her category (gender + age-group) and the ranking (general and per gender and per category), the same as the site of the run with the results, see:


(*1) Note: the official time for this race is the bruto-time, but I chose to use the netto time in my data analysis as I think this is what counts for most runners (who press 'start' on their stopwatch when they cross the start/finish line, which can be some time after the 'start' gun shot, with a field of 1820 runners).

Besides that, I also included the percentile rank in the dashboard, for which I used the system of World Masters Association (WMA), which takes for the 100% percentile the best finish time and so a percentile rank of 74% in this case means: the runner was 26 % removed from the best time (his performance was 74% of the best performance). For more details, see: http://livehealthy.chron.com/fast-average-runner-run-5k-10293.html
BTW: on this site, which also has created statistics of the race:
the percentile rank was defined just the other way around: the fastest runner is in percentile 0% (and the last runner in 100%).


Fig.1: Power BI dashboard for 10K race finish times

The visuals in the part below the table with the result of the selected runner are competition-statistics (so not for an individual runner as the top-part, but for all runners). In fig.1 I selected in the chart '#Runners/Category' the category 'VeC M' (veteran C Male), so the same category of the selected runner, and the report shows there were in total 214 runners in this category, where the time of the fastest runner was 00:38:23, so the the performance of the selected runner (with time 00:49:22)  was 67% of that of this fastest runner.

When I made the report for the 2015-race, the 'category' was not registered, but luckily now it is, which makes the data analysis more interesting and is also good for 'bench-marking' (how well did I do comparing to my 'peer-runners' (same gender and age-group)).

For this race I also use the runner-app 'Strava', for the result, see:

https://www.strava.com/activities/1261820177

NB: I pressed 'start' some time before the race really started, as I wanted to be sure the app was working OK.

That's if for now, maybe later I'll give some more details of how I made this dashboard with Power BI.


V2 (update 13/11/2017):
After having read:
https://powerbi.microsoft.com/en-us/documentation/powerbi-service-visualization-best-practices/

I have improved the design of the dashboard, e.g.: the filters are now left to the table they are applied upon.

I also uploaded my dashboard to the Power BI cloud service:




and in fig.2 you can see how a 'published report' in the cloud looks like.
In this case I first looked at the bottom-left table who was the winner of the run (general ranking), which was Martin Fiz, in 00:33:10 (!), and then I filtered in the upper table on his start-number (click in 'slicer' on '...' > Search and then fill '0123' and click on the loupe) and I learned that he is from category Veteran-D (he is 54 years), and finally I filtered on this category in the bottom-right chart (click on bar 'VeD-M', which then filters all 3 visuals at bottom part of report) and the bottom-left table shows then that rank 2 in this category finished more then 7 min. later. I did not know Fiz, but it appears he is a Spanish former professional marathon runner, and is still in a very good shape it seems. So that could be an inspiration for middle-aged men like me :)


fig.2: Power BI report in the cloud

Another nice feature of Power BI is 'Q&A', which allows you to query the dataset of the report in 'natural language', see fig.3.



fig.3: Q&A


Downloads:

#Mirror 1: Google Drive (zip files with Excel and PDF files):

https://goo.gl/iu1FBV

NB:
- to open the PowerBI (pbix) file, you need to have Power BI desktop, for the (free) download, see:


19 Dec 2016

Carrera de la Empresas 2016 Madrid Statistics made with R

#59: 10K Carrera de las Empresas 2016 Madrid: Statistics made with R + included in Power BI

Note 20/12/2016: I updated this post with a Power BI solution (with R-Stats and Q&A), see part 2 of this post.


Part 1: Statistics with R 

Today the 'business run' ("Carrera de las Empresas") was held here in Madrid. I participated with Raet, a Dutch ISV of a SAAS-solution for Human Resources, which since almost 2 years now has a software development centre in Madrid.

For an earlier post of this run (with more details), see:
http://worktimesheet2014.blogspot.com.es/2015/12/statistics-result-10-km-run-carrera-de.html

This time, there were over 14.000 runners, from 800 companies, for more details, see:

http://www.expansion.com/directivos/deporte-negocio/2016/12/19/58578b2ae2704ef1758b4588.html
The results of the 10K run (in which I participated) I got from here:

http://www.carreradelasempresas.com/sec_clasificaciones.asp
'
and for the statistics, see this page (a social/runners network): http://www.runedia.com/cursa/201628906/carrera-de-las-empresas-10k-actualidad-economica/2016/

The results (10K-Men, 5563 runners)  are in PDF format, which I converted to Excel, with help of this nice (free) online tool:
http://www.pdftoexcel.com/

After some 'cleansing' (see added yellow-columns in Excel file) I had the data ready for analysis, for which I used R, the language for statistical computing.
NB: For an earlier post in which I used R, see:

http://worktimesheet2014.blogspot.com.es/2016/02/20minutos-blog-awards-2015-statistics.html


The R-script I made looked basically like this:


sv10km2 <- read.csv("10Km2.csv")
edit(csv10km2)
Tfinish=as.numeric(as.difftime(as.character(csv10km2$Tfinish),format="%H:%M:%S"))
summary(Tfinish)
plot(density(Tfinish,na.rm=TRUE),lwd=3,xlab="Finish Time (mins)", main="Madrid Business Run 2016 10k Results")
View(csv10km2)
boxplot(Tfinish, data=csv10km2)
install.packages("sqldf")
newdata <-  sqldf('select * from csv10km2 where Team like %Raet%')


Input for the script has to be in the MyDocument-folder, in my case: D:\Persoonlijk\maartenree\Documents\10Km2.csv

See figures below for the result of my analysis.

And for those who don't have R, but want to do the same as I did for my company (in fig.3), so filter the results, I also added the Excel where the finish-results are in table-format (so easy to filter).



Fig.1: Plot with finish-times and Five-number summary


Fig.2: Boxplot  with finish-times



Fig.3: Filter dataset with finish-times for (teams of) company Raet


Part 2: Power BI + R Stats/Charts

Power BI has an option to include R Stats/Charts, and so I wanted to add the above R-chart 'Plot finish-times' (fig.1) to Power BI, and the filtering-option in Power BI would make it easy to 'slice-and-dice' this chart, in this case: filtering for a team, and showing the runners in that team. To be able to do this, I had to convert the Excel-column with the finish-times in 'duration-format' ([h]:mm:ss) to seconds, see column Tfinish2 in attached file 10Km3.xlsx and fig.4.

Fig.4: Excel with new column Tfinish2 (time in seconds)


And for the final result in Power BI (Desktop), see fig.5.
Note that slicer for team 'dimension' has a pre-filter (in this case with value 'Raet') to limit the number of items in the slicer (very helpful when there are over 1000 teams). And to filter for >1 team (like I did, with 3 Raet-teams 'checked'): press Control + click in checks for filter values. And note that the chart is reacting on the filter (so Power BI filter and R-chart are connected).


Fig.5 Power BI Desktop: report finish-times (in second) filtered for teams of Raet



I also uploaded the Power BI Desktop report to Power BI online, see:

https://app.powerbi.com/view?r=eyJrIjoiZDQ1YzFkZTAtODExNi00OWRmLWFlOTAtMzI0M2JjZWE4MzQ0IiwidCI6ImI3OWIyMzE3LTM0ZGQtNDNlNS05MWEyLWNkNjZkM2FlMWYwYiIsImMiOjh9

To be able to use R-Stats/Charts in Power BI online, you need to have a Power BI Pro account. About how to get a free trial-account, see:

https://powerbi.microsoft.com/en-us/documentation/powerbi-admin-purchasing-power-bi-pro/#power-bi-pro-60-day-trial

With Power BI Online it is also possible to do Q&A in natural language (English). See fig.6-7. For this example to work, I had to rename the Excel-columns Runner -> RunnerName and Team -> TeamName, see:

https://powerbi.microsoft.com/en-us/documentation/powerbi-service-q-and-a-tips/

Fig.6 Power BI and Q&A 1/2

Fig.7 Power BI and Q&A 2/2



References

http://cnr.lwlss.net/RealData/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4317364/


Downloads

R:
v1: https://goo.gl/gdjPRk

PowerBI:

v2:
https://goo.gl/S43i2O

v2b: (better chart than v2)
https://goo.gl/gn4o8k

9 Dec 2016

PowerBI Dashboard for Messi´s worldrecord of 91 goals in one year


#58 PowerBI Dashboard for Messi´s worldrecord of 91 goals in one year

NB: 
- This post is not about an Excel I made (but about a Microsoft Power BI dashboard I made, which is (almost) a copy of an Excel workbook I blogged about earlier), so I tagged it with the 'More' category.

Once I made an Excel with a PowerView dashboard for Messi's worldrecord of #goals in one year (91), see:

http://worktimesheet2014.blogspot.com.es/2015/08/dashboard-for-messis-worldrecord-of-91.html

In this post, I made a new version of this dashboard, but now in Microsoft PowerBI, and also with some improvements, e.g. the treemap-view for all matches in which Messi scored, aggregated by the team and competition where he made his goals. NB: For more info about the Treemap-chart, see my previous post:

http://worktimesheet2014.blogspot.com/2016/07/treemap-and-sunburst-excel2016-charts.html

See fig.1 for the end-result, in PowerBI Desktop.



Fig.1 Messi-Dashboard in PowerBI Desktop

The nice thing of PowerBI-dashboards is that they are suited for any device, so apart from PC also smartphone or tablet, see fig.2, in PowerBI-online (cloud-service), so this is a site for sharing your dashboards, for mine, see:

https://app.powerbi.com/view?r=eyJrIjoiOWYxMTVkNTUtZjJjMy00NTcyLWJlMzctNmNlYzE5ZmQyYjM1IiwidCI6ImI3OWIyMzE3LTM0ZGQtNDNlNS05MWEyLWNkNjZkM2FlMWYwYiIsImMiOjh9

This URL Power BI gave me after choosing menu-option: "File > Publish to web", which should make the dashboard 'public', see:
https://michielquakernaat.com/2016/02/03/want-to-share-your-power-bi-report-to-the-world-yes-we-can/

Or you can use this URL: https://app.powerbi.com/groups/me/dashboards/729276ac-365c-4473-9d42-1969ddd16dac

There is also a Power BI Community page which has a 'Data Stories Gallery', and I posted my 'Messi-dashboard' here:
http://community.powerbi.com/t5/Data-Stories-Gallery/Messi-s-worldrecord-of-total-goals-in-1-year-91-goals-in-2012/m-p/100642#M485

And here I found another nice Messi-dashboard:
http://community.powerbi.com/t5/Data-Stories-Gallery/Incredible-Lionel-Messi/m-p/82225/highlight/false#M386

So this is similar to this page of the Tableau-community:
https://public.tableau.com/en-us/s/gallery



Fig.2: Messi-Dashboard in PowerBI (online) on PC and smartphone


2 Nice build-in features of Power BI are:
- Quick Insight in the dataset (source of dashboard), which generates several 'data-profile' charts, see fig.2b
- Natural language Q&A, user types his question (in English), using some 'keywords' like 'by' to specify the 'slice-and-dice' dimensions, and Power BI answers (with charts) , see fig.2c.

For more details about these features (and Power BI in general), see:
http://www.computerworld.com/article/3088958/data-analytics/free-data-visualization-with-microsoft-power-bi-your-step-by-step-guide-with-video.html



Fig.2b: Quick Insight



Fig.2c: Natural language Q&A

As said in my previous blog-post, my Messi-dashboard was inspired by one made with BI/Dashboard-tool Tableau  (for which PowerBI was Microsofts answer/alternative), see fig.3 for the this version.


Fig.3: original Messi-dashboard in (free trial version of ) Tableau

To see another really nice example of a PowerBI-report with the results of the Spanish soccer League made by a friend, see fig.4 and his site: http://Ligasantanderbi.azurewebsites.net



Fig.4: PowerBI (with Azure) example with results Spanish soccer League

So with Power BI you can make really nice dashboards, and with the frequent updates/releases of Power BI, much more frequent than that of MS Office (Excel), one could wonder if for 'self-service BI' Power BI is the way to go (and not Excel with its 'Power-plugins' like Power View, Power Map etc.) Some say it is, see e.g.:

http://www.cio.com/article/2979725/enterprise-software/why-power-bi-is-the-future-of-excel.html

while others say you can/should use the 'pack' 'Power BI + Exel', see:
https://www.youtube.com/watch?v=jmTedSuKers

And what do you think? Comments are welcome (but please not anonymously, because I get a lot of 'fake' comments (publicity) from anonymous users, so I don't read anonymous comments.

Note 12/12/2016:
Although I haven't got any comments on this post (yet), apparently the 'editor' of The #PowerBI Daily (on www.paper.li), liked this post, because he published it here today, see fig.5.
NB: Paper.li is a site to create your own (virtual) newspaper of tweets of persons you follow, for more info, see:
http://www.takeflyte.com/flyte/2010/12/what-is-paper-li-a-newspaper-of-tweets.html

Fig.5: Blog-post referenced on  The #PowerBI Daily (on www.paper.li) 



References

-1: Good explanation of what is "Business Intelligence (BI): Multidimensional Analysis"
https://youtu.be/IhFkNmVmwn4

-2: A 'BI-solution' actually consists of 2 components: a Data Warehouse (DW), the (invisible) back-room ('the database') and BI, the (visual) front-room (the dashboard), where DW is normally about 80% of the solution, and BI 20 %, so it is the top of the ice-berg, as described here:
https://www.datamart.de/competence/business-intelligence-data-warehouse/Seiten/default.aspx


Downloads
Pbix-file, so PowerBI-report:

https://goo.gl/Ex4XbB






17 Jul 2016

Treemap and Sunburst Excel2016 charts for Tour De France


#57 Treemap and Sunburst Excel2016 charts for Tour De France


In 2015 Tour de France contracted the company Dimension Data to make the event as high tech as possible. As DD said themselves: 


It has completed its big data analytics and digital delivery platform from which the company will deliver real-time information on individual riders for the first time in the history of professional cycling. This major innovation was made possible by Amaury Sport Organisation (ASO) in partnership with the 22 teams participating in the 2015 Tour de France. Highly accurate data will be collected through the use of live trackers under the saddle of each rider. Dimension Data will then process and analyse the data, and make it available to cycling fans, commentators, broadcasters and the media.

When the Tour de France begins on Saturday, the viewing public around the world will be able to follow all 198 riders in 22 teams real-time, and be able to track the speed at which each cyclist is riding, exactly where he’s positioned in the race in relation to other cyclists, and the distance between each rider - all via a beta live tracking website.

And I think they did an excellent job, and also in this year's edition.
On one of their Twitter-posts:
https://twitter.com/letourdata/status/624565612048744448 
 I once saw a graphic (data: total % of time a cyclist was leading in the breakaway group) which I didn't know, and which was something which could not be made with Excel (2013), but a bit later, which the launch of Excel-2016, I learnt that this graph was a Treemap, so now supported by Excel, and I tried to reproduce, see e.g. fig.1


fig.1 Treemap chart wit data from teams and cyclists


Besides the Treemap chart, Excel 2016 had another new graph, the Sunburst chart, see fig.2 for an example




fig.2 Sunburst chart wit data from countries and cyclists


Both the Treemap and the Sunburst-chart are for displaying hierarchical data, so e.g. in Tour de France, cyclists are part of a team, so you can compare the teams performances (total times) and breaking it down to the indivual cyclists performances, as done in fig.1. Another possible grouping is cyclists per country as in fig.2, although this is more artificial, so not something which is considered by the Tour de France, while grouping by teams is of course (there is a prize for the best team). 

And if fig.3 you can see the equivalent of fig.2, but now in Treemap format. Maybe this format is handier if you have less space, like e.g. Dimension Data had, where this chart was part of a dashboard with other charts.


 fig.3 Treemap chart wit data from countries and cyclists

And in fig.4 you can see the original Dimension Data Treemap, which in fact did not have any hierarchical data, just data of cyclists. In this chart it is of course the easiest to compare data of 2 riders, e.g. the 2 best riders in the breakaway-group were Roland and Rodriguez (both in left of chart, with same square-size).

fig.4: Treemap charta with data from cycists.


If you want to read more about Big Data brought by Dimension Data to Tour de France, see e.g.:

And for a video explaining more about Treemap charts in Excel, see: 

For me, the new high-tech dimension (also by using onboard cameras) makes the Tour de France much more interesting to follow, and of course the fact that the Dutch are doing very well in TDF2016 (Tom Dumoulin 2 stage victories and Bouke
Mollema #2 in general ranking).

Downloads

Excel

PDF
https://goo.gl/9rDyKAhttps://goo.gl/9rDyKA





5 Jun 2016

Month-calendar for June 2016 in Excel

#56 Month-calendar for June 2016 in Excel

In this post the new month-calendar for June, see fig.1. The Excel also has the year-calendar for 2016 and a 2016-year-planner. Event of this month is UEFA Euro 2016 France, hold on 10/6 - 10/7. Unfortunately the Netherlands did not qualify, so this time I was not so motivated to make a new version of Match for this Eurocup, or a new Excel for an office-pool, like I did for last FIFA Worldcup. But of course you can do it by yourself, by adapting these 2 files (Access and Excel):

http://worktimesheet2014.blogspot.com.es/2014/05/match-access-app-to-track-world-cup.html

http://worktimesheet2014.blogspot.com.es/2014/05/pool-for-betting-2014-world-cup-soccer.html

For a very nice Excel with a EURO 2016 tracker, which automatically calculates the points, who go to next round etc (by using formulas), see:

http://www.excely.com/football/euro-2016-schedule.shtml




fig.1: Month calendar 201606


Downloads:

https://goo.gl/BA0cJU







2 May 2016

Month-calendar for May 2016 in Excel

#55 Month-calendar for May 2016 in Excel

In this post the new month-calendar for May, see fig.1. The Excel also has the year-calendar for 2016 and a 2016-year-planner. Event of this month is Roland Garros, the French Open ATP tennis tournament, 22/5-5/6.




 fig.1: month-calendar May 2016




Downloads:

https://goo.gl/q3lwNW





2 Apr 2016

Month-calendar for April 2016 in Excel

#54 Month-calendar for April 2016 in Excel

In this post the new month-calendar for April, see fig.1. The Excel also has the year-calendar for 2016 and a 2016-year-planner. Event of this month is 'El Clasico', Barcelona vs Real Madrid, Sat. April 2, in the stadium of Barca, Nou Camp, in which there will be a 'remembrance-mosaic' with the Barca-shirt with number 14´, the number of Johan Cruyff, the Dutch legendary player and coach of this club, who died last March 24. Cruyff was besides for his football also famous for his oracle-like aphorisms, like "every disavantage has its advantage" and his logic like : I do not believe. In Spain  all 22 players make the sign of the cross before they enter the field, and if it would work, it would always be a draw." , see:


--

--
Fig.1: calendar April 2016