Showing posts with label More. Show all posts
Showing posts with label More. Show all posts

19 Dec 2016

Carrera de la Empresas 2016 Madrid Statistics made with R

#59: 10K Carrera de las Empresas 2016 Madrid: Statistics made with R + included in Power BI

Note 20/12/2016: I updated this post with a Power BI solution (with R-Stats and Q&A), see part 2 of this post.


Part 1: Statistics with R 

Today the 'business run' ("Carrera de las Empresas") was held here in Madrid. I participated with Raet, a Dutch ISV of a SAAS-solution for Human Resources, which since almost 2 years now has a software development centre in Madrid.

For an earlier post of this run (with more details), see:
http://worktimesheet2014.blogspot.com.es/2015/12/statistics-result-10-km-run-carrera-de.html

This time, there were over 14.000 runners, from 800 companies, for more details, see:

http://www.expansion.com/directivos/deporte-negocio/2016/12/19/58578b2ae2704ef1758b4588.html
The results of the 10K run (in which I participated) I got from here:

http://www.carreradelasempresas.com/sec_clasificaciones.asp
'
and for the statistics, see this page (a social/runners network): http://www.runedia.com/cursa/201628906/carrera-de-las-empresas-10k-actualidad-economica/2016/

The results (10K-Men, 5563 runners)  are in PDF format, which I converted to Excel, with help of this nice (free) online tool:
http://www.pdftoexcel.com/

After some 'cleansing' (see added yellow-columns in Excel file) I had the data ready for analysis, for which I used R, the language for statistical computing.
NB: For an earlier post in which I used R, see:

http://worktimesheet2014.blogspot.com.es/2016/02/20minutos-blog-awards-2015-statistics.html


The R-script I made looked basically like this:


sv10km2 <- read.csv("10Km2.csv")
edit(csv10km2)
Tfinish=as.numeric(as.difftime(as.character(csv10km2$Tfinish),format="%H:%M:%S"))
summary(Tfinish)
plot(density(Tfinish,na.rm=TRUE),lwd=3,xlab="Finish Time (mins)", main="Madrid Business Run 2016 10k Results")
View(csv10km2)
boxplot(Tfinish, data=csv10km2)
install.packages("sqldf")
newdata <-  sqldf('select * from csv10km2 where Team like %Raet%')


Input for the script has to be in the MyDocument-folder, in my case: D:\Persoonlijk\maartenree\Documents\10Km2.csv

See figures below for the result of my analysis.

And for those who don't have R, but want to do the same as I did for my company (in fig.3), so filter the results, I also added the Excel where the finish-results are in table-format (so easy to filter).



Fig.1: Plot with finish-times and Five-number summary


Fig.2: Boxplot  with finish-times



Fig.3: Filter dataset with finish-times for (teams of) company Raet


Part 2: Power BI + R Stats/Charts

Power BI has an option to include R Stats/Charts, and so I wanted to add the above R-chart 'Plot finish-times' (fig.1) to Power BI, and the filtering-option in Power BI would make it easy to 'slice-and-dice' this chart, in this case: filtering for a team, and showing the runners in that team. To be able to do this, I had to convert the Excel-column with the finish-times in 'duration-format' ([h]:mm:ss) to seconds, see column Tfinish2 in attached file 10Km3.xlsx and fig.4.

Fig.4: Excel with new column Tfinish2 (time in seconds)


And for the final result in Power BI (Desktop), see fig.5.
Note that slicer for team 'dimension' has a pre-filter (in this case with value 'Raet') to limit the number of items in the slicer (very helpful when there are over 1000 teams). And to filter for >1 team (like I did, with 3 Raet-teams 'checked'): press Control + click in checks for filter values. And note that the chart is reacting on the filter (so Power BI filter and R-chart are connected).


Fig.5 Power BI Desktop: report finish-times (in second) filtered for teams of Raet



I also uploaded the Power BI Desktop report to Power BI online, see:

https://app.powerbi.com/view?r=eyJrIjoiZDQ1YzFkZTAtODExNi00OWRmLWFlOTAtMzI0M2JjZWE4MzQ0IiwidCI6ImI3OWIyMzE3LTM0ZGQtNDNlNS05MWEyLWNkNjZkM2FlMWYwYiIsImMiOjh9

To be able to use R-Stats/Charts in Power BI online, you need to have a Power BI Pro account. About how to get a free trial-account, see:

https://powerbi.microsoft.com/en-us/documentation/powerbi-admin-purchasing-power-bi-pro/#power-bi-pro-60-day-trial

With Power BI Online it is also possible to do Q&A in natural language (English). See fig.6-7. For this example to work, I had to rename the Excel-columns Runner -> RunnerName and Team -> TeamName, see:

https://powerbi.microsoft.com/en-us/documentation/powerbi-service-q-and-a-tips/

Fig.6 Power BI and Q&A 1/2

Fig.7 Power BI and Q&A 2/2



References

http://cnr.lwlss.net/RealData/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4317364/


Downloads

R:
v1: https://goo.gl/gdjPRk

PowerBI:

v2:
https://goo.gl/S43i2O

v2b: (better chart than v2)
https://goo.gl/gn4o8k

9 Dec 2016

PowerBI Dashboard for Messi´s worldrecord of 91 goals in one year


#58 PowerBI Dashboard for Messi´s worldrecord of 91 goals in one year

4/1/2021:
Today I made a new live Messi Goal Tracker dashboard, see:


NB: 
- This post is not about an Excel I made (but about a Microsoft Power BI dashboard I made, which is (almost) a copy of an Excel workbook I blogged about earlier), so I tagged it with the 'More' category.

Once I made an Excel with a PowerView dashboard for Messi's worldrecord of #goals in one year (91), see:

http://worktimesheet2014.blogspot.com.es/2015/08/dashboard-for-messis-worldrecord-of-91.html

In this post, I made a new version of this dashboard, but now in Microsoft PowerBI, and also with some improvements, e.g. the treemap-view for all matches in which Messi scored, aggregated by the team and competition where he made his goals. NB: For more info about the Treemap-chart, see my previous post:

http://worktimesheet2014.blogspot.com/2016/07/treemap-and-sunburst-excel2016-charts.html

See fig.1 for the end-result, in PowerBI Desktop.



Fig.1 Messi-Dashboard in PowerBI Desktop

The nice thing of PowerBI-dashboards is that they are suited for any device, so apart from PC also smartphone or tablet, see fig.2, in PowerBI-online (cloud-service), so this is a site for sharing your dashboards, for mine, see:

https://app.powerbi.com/view?r=eyJrIjoiOWYxMTVkNTUtZjJjMy00NTcyLWJlMzctNmNlYzE5ZmQyYjM1IiwidCI6ImI3OWIyMzE3LTM0ZGQtNDNlNS05MWEyLWNkNjZkM2FlMWYwYiIsImMiOjh9

This URL Power BI gave me after choosing menu-option: "File > Publish to web", which should make the dashboard 'public', see:
https://michielquakernaat.com/2016/02/03/want-to-share-your-power-bi-report-to-the-world-yes-we-can/

Or you can use this URL: https://app.powerbi.com/groups/me/dashboards/729276ac-365c-4473-9d42-1969ddd16dac

There is also a Power BI Community page which has a 'Data Stories Gallery', and I posted my 'Messi-dashboard' here:
http://community.powerbi.com/t5/Data-Stories-Gallery/Messi-s-worldrecord-of-total-goals-in-1-year-91-goals-in-2012/m-p/100642#M485

And here I found another nice Messi-dashboard:
http://community.powerbi.com/t5/Data-Stories-Gallery/Incredible-Lionel-Messi/m-p/82225/highlight/false#M386

So this is similar to this page of the Tableau-community:
https://public.tableau.com/en-us/s/gallery



Fig.2: Messi-Dashboard in PowerBI (online) on PC and smartphone


2 Nice build-in features of Power BI are:
- Quick Insight in the dataset (source of dashboard), which generates several 'data-profile' charts, see fig.2b
- Natural language Q&A, user types his question (in English), using some 'keywords' like 'by' to specify the 'slice-and-dice' dimensions, and Power BI answers (with charts) , see fig.2c.

For more details about these features (and Power BI in general), see:
http://www.computerworld.com/article/3088958/data-analytics/free-data-visualization-with-microsoft-power-bi-your-step-by-step-guide-with-video.html



Fig.2b: Quick Insight



Fig.2c: Natural language Q&A

As said in my previous blog-post, my Messi-dashboard was inspired by one made with BI/Dashboard-tool Tableau  (for which PowerBI was Microsofts answer/alternative), see fig.3 for the this version.


Fig.3: original Messi-dashboard in (free trial version of ) Tableau

To see another really nice example of a PowerBI-report with the results of the Spanish soccer League made by a friend, see fig.4 and his site: http://Ligasantanderbi.azurewebsites.net



Fig.4: PowerBI (with Azure) example with results Spanish soccer League

So with Power BI you can make really nice dashboards, and with the frequent updates/releases of Power BI, much more frequent than that of MS Office (Excel), one could wonder if for 'self-service BI' Power BI is the way to go (and not Excel with its 'Power-plugins' like Power View, Power Map etc.) Some say it is, see e.g.:

http://www.cio.com/article/2979725/enterprise-software/why-power-bi-is-the-future-of-excel.html

while others say you can/should use the 'pack' 'Power BI + Exel', see:
https://www.youtube.com/watch?v=jmTedSuKers

And what do you think? Comments are welcome (but please not anonymously, because I get a lot of 'fake' comments (publicity) from anonymous users, so I don't read anonymous comments.

Note 12/12/2016:
Although I haven't got any comments on this post (yet), apparently the 'editor' of The #PowerBI Daily (on www.paper.li), liked this post, because he published it here today, see fig.5.
NB: Paper.li is a site to create your own (virtual) newspaper of tweets of persons you follow, for more info, see:
http://www.takeflyte.com/flyte/2010/12/what-is-paper-li-a-newspaper-of-tweets.html

Fig.5: Blog-post referenced on  The #PowerBI Daily (on www.paper.li) 



References

-1: Good explanation of what is "Business Intelligence (BI): Multidimensional Analysis"
https://youtu.be/IhFkNmVmwn4

-2: A 'BI-solution' actually consists of 2 components: a Data Warehouse (DW), the (invisible) back-room ('the database') and BI, the (visual) front-room (the dashboard), where DW is normally about 80% of the solution, and BI 20 %, so it is the top of the ice-berg, as described here:
https://www.datamart.de/competence/business-intelligence-data-warehouse/Seiten/default.aspx


Downloads
Pbix-file, so PowerBI-report:

https://goo.gl/Ex4XbB






11 Feb 2015

Excel Statistics for 20Minutos Blog Awards 2014


#32 Excel Statistics for 20Minutos Blog Awards 2014

NB: Si prefieres leer este blog en español, recuerda que con Google-Blogger (el herramiento usado para este blog) es posible traducir este blog de ingles a español. El 'Google Traductor' esta en el menu a la derecha.

The Spanish newspaper 20Minutos, the 2nd newspaper in Spain, with a daily (free) edition of approx. 1 million, organizes the 9th edition of its Blog Awards ("Premios 20Blogs"), the biggest blog-contest in the Spanish speaking world. The 2014 edition has 7223 participating blogs from (mostly) Spanish speaking bloggers, and they are not only from Spain but e.g. also from Latin America (where BTW 20Minutos is also published in 2 countries: USA and Mexico). Until 2nd of March everybody with a 20Minutos account can vote 20 times, one time for every blog-category, and the blog with the most votes wins the public prize, although the jury decides who will be the winner of the 1st prize: 5000 euro.
My blog exists over 1 year now, and although it´s not in Spanish, I also signed-up my blog, and 20Minutos accepted it. Maybe they accepted that my blog is in English because English is so popular in Spain nowadays, there are so many people doing courses, even MP Rajoy seems to have done the course 'learn the 1000 most used English words'. Or maybe my blog was accepted because I have several post which have something to do with Spain, or Madrid where I live, e.g.:

* 2015 year-calendar with holidays in Spain and Madrid ('calendario laboral'), with Spanish names of weekdays and months, see:
http://worktimesheet2014.blogspot.com.es/2015/02/month-calendar-for-january-2015-in-excel.html

* timesheet to track worked hours (´plantilla de control de horas trabajado', 'los partes'), with integrated Spanish calendar, and e.g Spanish summer-schedule (´horario intensivo'), see:
http://worktimesheet2014.blogspot.com.es/2015/01/timesheet-for-worked-hours-2015.html

* Excel with statistics results 10K run 'Corre por el niño' (organized by children hospital Niño Jesus), see:
http://worktimesheet2014.blogspot.com.es/2014/11/statistics-result-10-km-run-corre-por.html

* Infographic win/loss record of Rafa Nadal, see:
http://worktimesheet2014.blogspot.com.es/2014/10/infographic-winloss-record-rafa-nadal.html

* Match schedule 2014 FIBA Worldcup Basketball Spain (´Mundial baloncesto') , see:
http://worktimesheet2014.blogspot.com.es/2014/08/worldcup-basketball-2014-match-schedule.html

20Minutos allows you to promote your own blog to attract votes in the Blog Awards, so: if you like my blog, you can vote for it here:

http://lablogoteca.20minutos.es/excel-ejemplos-para-tu-trabajo-deportes-y-mas-50022/0/

To do so, you have to register here:

http://www.20minutos.es/usuarios/registro/

But enough self-promotion, let´s start with what this blog is about: MS Excel. And I thought it could be interesting for all the participants of the Blog Awards to see some statistics about it. Of course 20Minutos also made one, about the number of participating blogs per year, see:

http://blogs.20minutos.es/premios20blogs/vota-ya-por-tus-favoritos-en-los-premios-20blogs-tienes-7-221-candidaturas-entre-las-que-elegir-y-20-votos-para-repartir/

And this is 1(of 4) statistics in my Excel, see fig.1.
NB: in their graphic, the total of 2013 is not correct (6790), or better said, not corresponding to their data on this page (my source):           
http://lablogoteca.20minutos.es/premios-20blogs/ediciones-anteriores/2013/

This graph shows that the number of blogs participating in the Blog Awards increased almost every year, from a bit more than 2000 in 2006 to over 7000 in 2014.

                                               fig.1


                                      fig.2


The chart in fig.2 shows for the 2014 Blog Awards the number of blogs for all 20 blog-categories (except 'Best Blog' and 'Public Prize', because of course all blogs participate in these 2 'categories'). My blog is in category 'Blogosfera' (a 'Rest'-category, so for blogs which don´t fit in any other category) with 933 blogs, although I thought I signed up for category 'Internet etc.', with 502 blogs, so with 'less' competition then in category 'Blogosfera', but still, 501 competitors... Maybe next year I´ll participate with a blog about motors .. (this category has only 48 blogs).


fig.3


fig.4

The charts in fig.3 and 4 are similar, both have the 'dimensions' (axis) time and blog-category, but they differ on which axis (x or y) these dimensions are placed. Both have a filter: Rank blog-category <= 3, so it shows per year only the top 3 blog-categories (excluding the 'Rest-category'). In fig.3 you can see that there are 5 categories which every year have the most blogs, and that in the beginning the 'Personal-category' was the most popular but this decreased during the years. And in fig.4 you can e.g. see that before 2011 the 'Personal-category' was the one with ranking 1, but that since then, the 'Culture-category' took over.

How did I make this Excel? I used this page as my data-source:
http://lablogoteca.20minutos.es/premios-20blogs/ediciones-anteriores/2013/

Maybe I could have used the 'copy/paste' method to get the data from this (tabbed) HTML-table in Excel, but I thought it would be nicer to use the Power BI option of Excel, and use the function "Get external data from website", but this didn´t work. But from this site:

https://eagereyes.org/data/scrape-tables-using-google-docs

I learned that Google-Sheets (Google´s Excel..) is better in 'scraping' data from websites then Excel. So I created a spreadsheet with Google-Sheets on Google-Drive (so in the cloud) and this function did the job perfectly:

=IMPORTHTML("http://lablogoteca.20minutos.es/premios-20blogs/ediciones-anteriores/2013/", "table",1)

I shared this Google-spreadsheet here:

http://goo.gl/At8RkU

see also fig.5 

fig.5:  Google-spreadsheet with data imported from website 20Minutos

And then I exported (downloaded) this cloud-spreadsheet from Google-Sheets to my 'good-old' desktop Excel (version 2013). And this Excel is the database (´server´) for the Excel with the statistics (´client´), and to 'bind' them, I used an Excel-query, and to get the data of every year, with 1 worksheet (and 'named range') per year in 1 table, I used SQL´s union-operator:

--
SELECT R_2006.Categorias, R_2006.Ganador, R_2006.Concursantes, R_2006.Rank, R_2006.Año
FROM `D:\projects\20Minutos_BlogCompetition2014_v4_input.xlsx`.R_2006 R_2006
union
SELECT R_2007.Categorias, R_2007.Ganador, R_2007.Concursantes, R_2007.Rank, R_2007.Año
FROM `D:\projects\20Minutos_BlogCompetition2014_v4_input.xlsx`.R_2007 R_2007
union


ETC
--

For more info about using Excel as a database, see my post:
http://worktimesheet2014.blogspot.com.es/2014/04/excel-and-relational-databases.html

The 4 statistics in my Excel are made with pivot-tables, about which I wrote in several posts, e.g.:
http://worktimesheet2014.blogspot.com.es/2014/01/timesheet.html


And to conclude this post, I´d like to say good luck to all my fellow-bloggers in the "Premios 20Blogs" Blog Awards.


Downloads:

#Mirror 1: Scribd.com (PDF file):
https://es.scribd.com/doc/255471225/20Minutos-Blog-Awards-2014-some-Statistics

#Mirror 2: MS Onedrive (Excel file):
NB: this site has 'Excel-Online', so you can view my Excel-doc if you don´t have MS Excel on your PC

-server:
http://1drv.ms/1z8iMEn

-client (with statistics):
http://1drv.ms/1MdCR5o

#Mirror 3: Google Drive (1 zip file with Excel and PDF files):
http://goo.gl/30xyB1