In this tutorial, you will learn how to explore and summarize data that are paired.
The captopril dataset
The captopril dataset stems from a small experiment with 15 patients with hypertension. For each patient systolic and diasystolic blood pressure measurements where taken before and after administering captopril.
Before we can start visualizing the data, we must load the required libraries.
Import the data
Have a first look at the data
Data visualization
Let’s say we now first want to visualize the data. One possibility to easily visualize the four types of blood pressure values is by adopting the gather
function from tidyverse. It will reshape the dataframe, such that we have have a single variable type
, which points at one of the four blood pressure types, and bp
, which points at the actual value for each type for each patient.
captopril %>%
gather(type,bp,-id)
Barplot
A barplot is a plot that you will commonly find in papers. The code for generating such a barplot is provided below:
captopril %>%
gather(type,bp,-id) %>%
group_by(type) %>%
summarize_at("bp",
list(mean=~mean(.,na.rm=TRUE),
sd=~sd(.,na.rm=TRUE),
n=function(x) x%>%is.na%>%`!`%>%sum)) %>%
mutate(se=sd/sqrt(n)) %>%
ggplot(aes(x=type,y=mean,fill=type)) +
scale_fill_brewer(palette="RdGy") +
theme_bw() +
geom_bar(stat="identity") +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se),width=.2) +
ggtitle("Barplot of different blood pressure measures") +
ylab("blood pressure (mmHg)")
A barplot, however, is not very informative. The height of the bars only provides us with information of the mean blood pressure. However, we don’t see the actual underlying values, so we for instance don’t have any information on the spread of the data. It is usually more informative to represent to underlying values as raw as possible. Note that it is possible to add the raw data on the barplot, but we still would not see any measures of the spread, such as the interquartile range.
Based on this critisism, can you think of a better visualization strategy for the captopril data?
Descriptive statistics
- Provide a code chunk to calculate useful summary statistics for the capropril data!
LS0tCnRpdGxlOiAiVHV0b3JpYWwgNC4zOiBFeHBsb3JpbmcgdGhlIGNhcHRvcHJpbCBkYXRhc2V0IiAgIApvdXRwdXQ6CiAgICBodG1sX2RvY3VtZW50OgogICAgICBjb2RlX2Rvd25sb2FkOiB0cnVlICAgIAogICAgICB0aGVtZTogY29zbW8KICAgICAgdG9jOiB0cnVlCiAgICAgIHRvY19mbG9hdDogdHJ1ZQogICAgICBoaWdobGlnaHQ6IHRhbmdvCiAgICAgIG51bWJlcl9zZWN0aW9uczogdHJ1ZQotLS0KCkluIHRoaXMgdHV0b3JpYWwsIHlvdSB3aWxsIGxlYXJuIGhvdyB0byBleHBsb3JlIGFuZCBzdW1tYXJpemUgZGF0YSB0aGF0IGFyZSBwYWlyZWQuCgojIFRoZSBjYXB0b3ByaWwgZGF0YXNldAoKVGhlIGNhcHRvcHJpbCBkYXRhc2V0IHN0ZW1zIGZyb20gYSBzbWFsbCBleHBlcmltZW50IHdpdGgKMTUgcGF0aWVudHMgd2l0aCBoeXBlcnRlbnNpb24uIEZvciBlYWNoIHBhdGllbnQgc3lzdG9saWMgYW5kCmRpYXN5c3RvbGljIGJsb29kIHByZXNzdXJlIG1lYXN1cmVtZW50cyB3aGVyZSB0YWtlbiBiZWZvcmUgYW5kCmFmdGVyIGFkbWluaXN0ZXJpbmcgY2FwdG9wcmlsLgoKQmVmb3JlIHdlIGNhbiBzdGFydCB2aXN1YWxpemluZyB0aGUgZGF0YSwgd2UgbXVzdCBsb2FkIHRoZQpyZXF1aXJlZCBsaWJyYXJpZXMuCgpgYGB7ciwgbWVzc2FnZT1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpCmBgYAoKIyBJbXBvcnQgdGhlIGRhdGEKCmBgYHtyfQoKYGBgCgpIYXZlIGEgZmlyc3QgbG9vayBhdCB0aGUgZGF0YQoKYGBge3J9CgpgYGAKCiMgRGF0YSB2aXN1YWxpemF0aW9uCgpMZXQncyBzYXkgd2Ugbm93IGZpcnN0IHdhbnQgdG8gdmlzdWFsaXplIHRoZSBkYXRhLiAKT25lIHBvc3NpYmlsaXR5IHRvIGVhc2lseSB2aXN1YWxpemUgdGhlIGZvdXIgdHlwZXMKb2YgYmxvb2QgcHJlc3N1cmUgdmFsdWVzIGlzIGJ5IGFkb3B0aW5nIHRoZSBgZ2F0aGVyYApmdW5jdGlvbiBmcm9tIHRpZHl2ZXJzZS4gSXQgd2lsbCByZXNoYXBlIHRoZSBkYXRhZnJhbWUsCnN1Y2ggdGhhdCB3ZSBoYXZlIGhhdmUgYSBzaW5nbGUgdmFyaWFibGUgYHR5cGVgLCB3aGljaCAKcG9pbnRzIGF0IG9uZSBvZiB0aGUgZm91ciBibG9vZCBwcmVzc3VyZSB0eXBlcywgYW5kIGBicGAsCndoaWNoIHBvaW50cyBhdCB0aGUgYWN0dWFsIHZhbHVlIGZvciBlYWNoIHR5cGUgCmZvciBlYWNoIHBhdGllbnQuCgpgYGB7ciwgZXZhbD1GQUxTRX0KY2FwdG9wcmlsICU+JSAKICBnYXRoZXIodHlwZSxicCwtaWQpCmBgYAoKIyMgQmFycGxvdAoKQSBiYXJwbG90IGlzIGEgcGxvdCB0aGF0IHlvdSB3aWxsIGNvbW1vbmx5IGZpbmQgaW4gcGFwZXJzLgpUaGUgY29kZSBmb3IgZ2VuZXJhdGluZyBzdWNoIGEgYmFycGxvdCBpcyBwcm92aWRlZCBiZWxvdzoKCmBgYHtyLCBldmFsPUZBTFNFfQpjYXB0b3ByaWwgJT4lIAogIGdhdGhlcih0eXBlLGJwLC1pZCkgJT4lIAogIGdyb3VwX2J5KHR5cGUpICU+JQogICAgIHN1bW1hcml6ZV9hdCgiYnAiLAogICAgICAgICAgICAgICBsaXN0KG1lYW49fm1lYW4oLixuYS5ybT1UUlVFKSwKICAgICAgICAgICAgICAgICAgICBzZD1+c2QoLixuYS5ybT1UUlVFKSwKICAgICAgICAgICAgICAgICAgICBuPWZ1bmN0aW9uKHgpIHglPiVpcy5uYSU+JWAhYCU+JXN1bSkpICU+JQogIG11dGF0ZShzZT1zZC9zcXJ0KG4pKSAlPiUKICBnZ3Bsb3QoYWVzKHg9dHlwZSx5PW1lYW4sZmlsbD10eXBlKSkgKyAKICBzY2FsZV9maWxsX2JyZXdlcihwYWxldHRlPSJSZEd5IikgKwogIHRoZW1lX2J3KCkgKwogIGdlb21fYmFyKHN0YXQ9ImlkZW50aXR5IikgKyAKICBnZW9tX2Vycm9yYmFyKGFlcyh5bWluPW1lYW4tc2UsIHltYXg9bWVhbitzZSksd2lkdGg9LjIpICsKICBnZ3RpdGxlKCJCYXJwbG90IG9mIGRpZmZlcmVudCBibG9vZCBwcmVzc3VyZSBtZWFzdXJlcyIpICsKICB5bGFiKCJibG9vZCBwcmVzc3VyZSAobW1IZykiKQpgYGAKCkEgYmFycGxvdCwgaG93ZXZlciwgaXMgbm90IHZlcnkgaW5mb3JtYXRpdmUuIFRoZSBoZWlnaHQgb2YgdGhlCmJhcnMgb25seSBwcm92aWRlcyB1cyB3aXRoIGluZm9ybWF0aW9uIG9mIHRoZSBtZWFuIGJsb29kIHByZXNzdXJlLgpIb3dldmVyLCB3ZSBkb24ndCBzZWUgdGhlIGFjdHVhbCB1bmRlcmx5aW5nIHZhbHVlcywgc28gd2UgZm9yCmluc3RhbmNlIGRvbid0IGhhdmUgYW55IGluZm9ybWF0aW9uIG9uIHRoZSBzcHJlYWQgb2YgdGhlIGRhdGEuCkl0IGlzIHVzdWFsbHkgbW9yZSBpbmZvcm1hdGl2ZSB0byByZXByZXNlbnQgdG8gdW5kZXJseWluZyAKdmFsdWVzIGFzIF9yYXdfIGFzIHBvc3NpYmxlLiBOb3RlIHRoYXQgaXQgaXMgcG9zc2libGUgdG8gYWRkIHRoZQpyYXcgZGF0YSBvbiB0aGUgYmFycGxvdCwgYnV0IHdlIHN0aWxsIHdvdWxkIG5vdCBzZWUgYW55IG1lYXN1cmVzCm9mIHRoZSBzcHJlYWQsIHN1Y2ggYXMgdGhlIGludGVycXVhcnRpbGUgcmFuZ2UuCgpCYXNlZCBvbiB0aGlzIGNyaXRpc2lzbSwgY2FuIHlvdSB0aGluayBvZiBhIGJldHRlcgp2aXN1YWxpemF0aW9uIHN0cmF0ZWd5IGZvciB0aGUgY2FwdG9wcmlsIGRhdGE/CgojIERlc2NyaXB0aXZlIHN0YXRpc3RpY3MKCi0gUHJvdmlkZSBhIGNvZGUgY2h1bmsgdG8gY2FsY3VsYXRlIHVzZWZ1bCBzdW1tYXJ5IHN0YXRpc3RpY3MgZm9yIHRoZSBjYXByb3ByaWwgZGF0YSEKCg==