25 Mar 2020

Coronavirus COVID-19 in Spain, a Power BI report

#68 Coronavirus COVID-19 in Spain, a Power BI report

NB: the embedded report is at the bottom of this page. This report is not daily updated. In case you are interested in this, please let me know in a comment on this post.

This "Coronavirus (COVID-19) in Spain"- Power BI report I could make thanks to the good work done by Datadista which offers open data on GitHub-page:

https://github.com/datadista/datasets/tree/master/COVID%2019

And for the meta-data of the datasets on this page. see:

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/GPFFAQ

Datadista is a 'data-journalism' company in Spain, for a recent publication, about #COVID-19 crises in Spain, see:

https://datadista.com/coronavirus/camas-uci/

The source Datadista used to compile their dataset I used for this report (nacional_covid19_rango_edad.csv) is from the Spanish ministry of health (see table 2 in this doc):

https://www.mscbs.gob.es/profesionales/saludPublica/ccayes/alertasActual/nCov-China/documentos/Actualizacion_55_COVID-19.pdf

The numbers are for a group of aprox. 21k Corona-patients for which Age-group and Gender are known (of the total of aprox 47k infected persons in Spain, on 25/3/2020).

The Datadista open data is really usefull for us 'data-engineers', because also the Spanish government, as the Dutch, is not delivering their data in an 'easy-to-process' format (e.g. PDF), and also the quality of the data is not always as it should be. See also:

https://www.elconfidencial.com/espana/2020-03-19/coronavirus-comunicacion-datos-ministerio-sanidad_2505867/

The governments in Italy does it very well, offering open data:

https://github.com/pcm-dpc/COVID-19/blob/master/dati-regioni/dpc-covid19-ita-regioni.csv

And also Singapore is a good example: they have a dashboard which includes the details of all patients (anonimized), like: gender, age, nationality and the actual status of the patient (e.g. hospitalized, discharged, deceased), see:

https://experience.arcgis.com/experience/7e30edc490a5441a874f9efe67bd8b89

Of course, the more 'attributes' of a patient we store, the more insight we can get. E.g. if you would store if someone who gets infected by the Coronavirus also had some other severe disease, and this patient would die, then you could discuss if Corona was really the cause of death, or the other disease, maybe the patient would have survived if he didn't have that other disease but 'just' Corona. You could decide to exclude this case from the 'Corona-deaths' number, and only include people who died of Corona and had no other diseases.

Datadista's offers a dataset (nacional_covid19_rango_edad.csv) that includes dimensions as gender and age-band of the patients, something missing in the Dutch dataset (as this is not given by RIVM).
In fig.1 you can see  the report with these dimensions.
NB: when I made this report, I found a small error in the file, which I reported here:
https://github.com/datadista/datasets/issues/37
and it was fixed within minutes. So by using GitHub to offer open data, the GitHub-community (from which I am a member (user: mvanreek) can help to make the open source 'product' (i.c. Spanish CODVID-19 open data) better.




fig.1 Power BI report with stats about CODVID-19 cases in Spain, incl. details Gender and Age-group.


Downloads

Mirror #1

http://tiny.cc/6cazlz

No comments: