20 Nov 2015

SQL Saturday Madrid 2015 statistics with PowerPivot and PowerQuery

#46: SQL Saturday Madrid 2015 statistics with PowerPivot and PowerQuery

On 21-11-2015, so tomorrow, Saturday, I go to this event: SQL Saturday Madrid 2015, #SQLSatMadrid, where several Microsoft SQL Server and Business Intelligence (BI) MVP´s give some talks, for more information see:


and for the time-schedule:


As you can see, there is a total of 27 courses, in 7 time-blocks of 1 hour, in 4 rooms, and for each course the schedule shows this data:
  • Course name
  • MVP name
  • Level (Beginner, Intermediate, Advanced)
  • Theme (Big Data, BI, Cloud, DBA), indicated by cell-color
  • Language (most are in Spanish but when the course-name is in English , I guess the course is in English)
To ´warm up´ for the sessions about PowerBI, I converted this schedule in a pivot-table and graph, using Power Pivot and Power Query, see fig.1 for the end-result.

 fig.1 Pivot table Courses SQLSat.

To make this Excel, first I copied the Schedule (PDF) to Excel (export to Excel in the tool Nitro PDF) and did some cleaning (delete rows with non-schedule data and 'undo' of cell-merges and delete empty columns), see file 'SQLSaturday459_Madrid2015_Schedule_20151120_v2.xlsx' in Downloads below.
And then I made a 2nd Excel with the statistics, and with PowerQuery I loaded the schedule data from Excel-1, see file 'SQLSaturday459_Madrid2015_Schedule_Statistics' in Downloads and fig.2-3 for some of the steps in the Power Query formula language 'M'. An example of an 'M-formula':

= Table.AddColumn(#"Replaced Value3", "Level_1", each Text.Contains([Value], "Beginner"))

which looks in Excel-1 if in a cell there is the word 'Beginner' (level of course) and if so, sets the value in the Level-1 column to True (and if not to False), for more details of this formula, see:


fig.2: Power Query import

 fig.3 Power Query formula to Unpivot all columns (Room 1-4') except 1st column ('Time')

And after the Power Query import I did some other things in Excel-2 like adding a Time-block column and Course-Level-column, hiding come columns etc.) to facilitate creating the final pivot-table.
BTW: The Course-Level column I made manually because this information in Excel-1 is not part of the cell (string) value, but color-coded. But probably this could also be automated (so the conversion from cell-color to a (string-)value, see e.g.:


I keep it short now, tomorrow I have to get up early, the event starts at 08:30, and that for a Saturday..



14 Nov 2015

Statistics result 10-km run Corre por el niño 2015

#45 Statistics result 10-km run Corre por el niño 2015

UPDATE 8-12-2015: I added another Excel with a boxplot with the finishing-times per category (men/women). In par. Downloads below, I marked this Excel with 'V2'.

UPDATE 16-11-2015: I added another Excel with the ranking per category (men/women).

On Sun. 8 Nov. I participated in the 5th edition of the 10K run "Corre por el Niño 2015" (#CorrePorElNiño), this year with over 8000 participants (10km, 4km and 1 km for kids).  This run is organized by hospital Niño Jesús (in Madrid), to raise funds for investigation of severe diseases with childeren, like e.g. cancer. For more info about this road-race, see:





For  some of my photos of the race (and that of 2013), see:


In this run the bruto (gun) time and netto (chip) time are measured, and the bruto-time is the oficial time.

I also participated in this run, and, like last year, I made an Excel with the statistics of the result, see fig. 1 for the result. For more info about this Excel with the 2015-results was made,which is almost the same as the Excel with the 2014 result, see my 2014 blog-post about this 10K run:


And here are 2 others sites with statistics and all finish-times of #CarreraPopular #CorrePorElNiño Madrid 2015:



(this last site used the netto-time in their statistics, although the oficial finish-time is the bruto-time).

The data from this Excel comes (like last year) from this site:


I copied the data from their site to my Excel and added some check-columns (which are hidden), which showed some errors (red cells):
- in the man/women category field ("Sexo") their was one row with value "Masculina" (in stead of "Masculino")
- for some rows, the netto-time was higher then the netto-time
For the result of the 'import' of the time-date of the website into Excel, see fig.2.

This year there were no chips for the people who registered on the day of the run, so my name is not on the list (sheet 'Data'). My time was something between 47 and 48 min. (I made a short stop during the race to say hello to my sister-in-law and niece who were supporting the runners and I didn´t stop my stop-watch during this break), so as you can see in sheet 'Data', this time corresponds to about ranking 240 (of the 1642 10K runners) and percentile 17. That is in the 'All-category' (Men and Women). And to know my percentile in the Men-category, I copied the Excel and deleted in the Data-sheet all rows with finish-time of category Women (and X - Unknow). The array-formula to calculate the Frequency-colums used 'named ranges' so these were automatically (re-)calculated. The array-formula looks like this:


- R_Time2 = range on Time-Bruto column on Data-sheet
- R_Bin1 = range on Time-column on Stats-sheet

And I also made an Excel for the Women-category, and a final one with all Freq.values (so for categories All, Man, Women) with a graph which shows the Percentile-ranks for these 3 datasets, see fig.3, in which you can see e.g that after 55 min. about 50% of the men had finished, and about 20% of the women. And in the Excel for the Men-category I saw my percentile was about 18%.

And in fig.4 you can see a boxplot with the finishing-times per category (Men/Women). This is a stacked barchart based on the differences of the '5 number summary' (min, Q1, Q2 (median), Q3, max.). To create the boxplot, in menu Desgin (of chart) I selected 'Switch rows and columns' and replaced the part of the stached chart for the Min. and Max for the 'whiskers' (implemented by error-bars in chart) of the boxplot. For a more detailed explanation, see:


In the boxplot you can e.g. easily see that 75% of the men (Q3) finished before 50% of the women (Q2, median).

It´s a pity in this run they don´t have more categories using also 'dimension' age-group, e.g. Men-Junior (until 30), Men-Senior A (between 30 and 50), Men-Senior B(older then 50) etc., so you could compare your finishing-time better (with more or less equals (same gender and age-group).

And to conclude, I want to thank hospital Niño Jesús and all the volunteers for making this run a really nice experience and the isotonic drinks and bananas they gave us after the race almost make you forget  the suffering in the last part of the run with some 'nice' slopes...

fig.1: statistics result 10K run "Corre por el Niño 2015" 

fig.2: Imported Time-data

fig.3: Percentile-Rank per Category (Man/Women)

fig.4: Boxplot with finishing-times per Category (Man/Women)


#Mirror 1: Scribd.com (PDF file):


#Mirror 2: Microsoft Onedrive (Excel files):
NB: this site has 'Excel-Online', so you can view my Excel-doc if you don´t have MS Excel on your PC



#Mirror 3: Google Drive (zip files with Excel and PDF files):




4 Nov 2015

Month-calendar for November 2015 in Excel

#44 Month-calendar for November 2015 in Excel

In this post the new month-calendar for November 2015, see fig.1. The Excel also has the year-calendar for 2015 and a 2015-year-planner. Theme of this month is the Science Week, 'la Semana de la Ciencia', in Madrid (2/11-15/11), where universities etc. open their doors to the public, see:


Last year one of the activites of the Science Week was a guided walk in a garden with Bonsai-trees, for some photos I made, see:


And if your are interested in some science about data-visualization and the important role of Exel in this, as I have tried to show in several posts in this blog (with label: 'Data visualization'), see:


--fig.1: Nov. 2015 Calendar