In this tutorial, you will learn how to import, tidy, wrangle and visualize data yourself! You will work with one specific dataset;
The FEV dataset
The FEV, which is an acronym for forced expiratory volume, is a measure of how much air a person can exhale (in liters) during a forced breath. In this dataset, the FEV of 606 children, between the ages of 6 and 17, were measured. The dataset also provides additional information on these children: their age
, their height
, their gender
and, most importantly, whether the child is a smoker or a non-smoker.
The goal of this experiment was to find out whether or not smoking has an effect on the FEV of children.
Note: to analyse this dataset properly, we will need some relatively advanced modeling techniques. At the end of this week, you will have seen all three required steps to analyse such a dataset! For now, we will limit ourselves to exploring the data.
Load the required libraries
Import the data
Have a first look at the data
There are a few things in the formatting of the data that can be improved upon:
- Both the
gender
and smoking
can be transformed to factors.
- The
height
variable is written in inches. Assuming that this audience is mainly Portuguese/Belgian, inches are hard to interpret. Let’s add a new column, height_cm
, with the values converted to centimeters
That’s better!
Now, let’s make a first explorative plot, showing only the FEV for both smoking categories.
Which type of plot do you suggest? Generate a good-looking, informative representation of the data.
Did you expect these results?
Maybe there is something else going on in the data. By taking more of the information in the dataset into account, can you provided a more detailed/accurate visualizition of the variables that effect the FEV?
...
## Try to get a visualization that describes the data as good as possible!!
...
LS0tCnRpdGxlOiAiVHV0b3JpYWwgNC40OiBFeHBsb3JpbmcgdGhlIEZFViBkYXRhc2V0IiAgIApvdXRwdXQ6CiAgICBodG1sX2RvY3VtZW50OgogICAgICBjb2RlX2Rvd25sb2FkOiB0cnVlICAgIAogICAgICB0aGVtZTogY29zbW8KICAgICAgdG9jOiB0cnVlCiAgICAgIHRvY19mbG9hdDogdHJ1ZQogICAgICBoaWdobGlnaHQ6IHRhbmdvCiAgICAgIG51bWJlcl9zZWN0aW9uczogdHJ1ZQotLS0KCkluIHRoaXMgdHV0b3JpYWwsIHlvdSB3aWxsIGxlYXJuIGhvdyB0byBpbXBvcnQsIHRpZHksIHdyYW5nbGUgYW5kIAp2aXN1YWxpemUgZGF0YSB5b3Vyc2VsZiEgWW91IHdpbGwgd29yayB3aXRoIG9uZSBzcGVjaWZpYyBkYXRhc2V0OwoKIyBUaGUgRkVWIGRhdGFzZXQKClRoZSBGRVYsIHdoaWNoIGlzIGFuIGFjcm9ueW0gZm9yIGZvcmNlZCBleHBpcmF0b3J5IHZvbHVtZSwKaXMgYSBtZWFzdXJlIG9mIGhvdyBtdWNoIGFpciBhIHBlcnNvbiBjYW4gZXhoYWxlIChpbiBsaXRlcnMpIApkdXJpbmcgIGEgZm9yY2VkIGJyZWF0aC4gSW4gdGhpcyBkYXRhc2V0LCB0aGUgRkVWIG9mIDYwNiBjaGlsZHJlbiwKYmV0d2VlbiB0aGUgYWdlcyBvZiA2IGFuZCAxNywgd2VyZSBtZWFzdXJlZC4gVGhlIGRhdGFzZXQKYWxzbyBwcm92aWRlcyBhZGRpdGlvbmFsIGluZm9ybWF0aW9uIG9uIHRoZXNlIGNoaWxkcmVuOgp0aGVpciBgYWdlYCwgdGhlaXIgYGhlaWdodGAsIHRoZWlyIGBnZW5kZXJgIGFuZCwgbW9zdAppbXBvcnRhbnRseSwgd2hldGhlciB0aGUgY2hpbGQgaXMgYSBzbW9rZXIgb3IgYSBub24tc21va2VyLgoKVGhlIGdvYWwgb2YgdGhpcyBleHBlcmltZW50IHdhcyB0byBmaW5kIG91dCB3aGV0aGVyIG9yIG5vdApzbW9raW5nIGhhcyBhbiBlZmZlY3Qgb24gdGhlIEZFViBvZiBjaGlsZHJlbi4KCk5vdGU6IHRvIGFuYWx5c2UgdGhpcyBkYXRhc2V0IHByb3Blcmx5LCB3ZSB3aWxsIG5lZWQgc29tZQpyZWxhdGl2ZWx5IGFkdmFuY2VkIG1vZGVsaW5nIHRlY2huaXF1ZXMuIEF0IHRoZSBlbmQgb2YgdGhpcyAKd2VlaywgeW91IHdpbGwgaGF2ZSBzZWVuIGFsbCB0aHJlZSByZXF1aXJlZCBzdGVwcyB0byBhbmFseXNlCnN1Y2ggYSBkYXRhc2V0ISBGb3Igbm93LCB3ZSB3aWxsIGxpbWl0IG91cnNlbHZlcyB0byBleHBsb3JpbmcKdGhlIGRhdGEuCgpMb2FkIHRoZSByZXF1aXJlZCBsaWJyYXJpZXMKCmBgYHtyLCBtZXNzYWdlID0gRkFMU0V9CgpgYGAKCiMgSW1wb3J0IHRoZSBkYXRhCgpgYGB7ciwgZXZhbD1GQUxTRX0KLi4uCmBgYAoKSGF2ZSBhIGZpcnN0IGxvb2sgYXQgdGhlIGRhdGEKCmBgYHtyLCBldmFsPUZBTFNFfQouLi4KYGBgCgoKVGhlcmUgYXJlIGEgZmV3IHRoaW5ncyBpbiB0aGUgZm9ybWF0dGluZyBvZiB0aGUKZGF0YSB0aGF0IGNhbiBiZSBpbXByb3ZlZCB1cG9uOgoKMS4gQm90aCB0aGUgYGdlbmRlcmAgYW5kIGBzbW9raW5nYCBjYW4gYmUgdHJhbnNmb3JtZWQgdG8KZmFjdG9ycy4KMi4gVGhlIGBoZWlnaHRgIHZhcmlhYmxlIGlzIHdyaXR0ZW4gaW4gaW5jaGVzLiBBc3N1bWluZyB0aGF0CnRoaXMgYXVkaWVuY2UgaXMgbWFpbmx5IFBvcnR1Z3Vlc2UvQmVsZ2lhbiwgaW5jaGVzIGFyZSBoYXJkIHRvCmludGVycHJldC4gTGV0J3MgYWRkIGEgbmV3IGNvbHVtbiwgYGhlaWdodF9jbWAsIHdpdGggdGhlIHZhbHVlcwpjb252ZXJ0ZWQgdG8gY2VudGltZXRlcnMKCmBgYHtyLCBldmFsPUZBTFNFfQouLi4KYGBgCgpUaGF0J3MgYmV0dGVyIQoKTm93LCBsZXQncyBtYWtlIGEgZmlyc3QgZXhwbG9yYXRpdmUgcGxvdCwgc2hvd2luZwpvbmx5IHRoZSBGRVYgZm9yIGJvdGggc21va2luZyBjYXRlZ29yaWVzLgoKV2hpY2ggdHlwZSBvZiBwbG90IGRvIHlvdSBzdWdnZXN0PyBHZW5lcmF0ZSBhIGdvb2QtbG9va2luZywKaW5mb3JtYXRpdmUgcmVwcmVzZW50YXRpb24gb2YgdGhlIGRhdGEuCgpgYGB7ciwgZXZhbD1GQUxTRX0KLi4uCmBgYAoKRGlkIHlvdSBleHBlY3QgdGhlc2UgcmVzdWx0cz8KCk1heWJlIHRoZXJlIGlzIHNvbWV0aGluZyBlbHNlIGdvaW5nIG9uIGluIHRoZSBkYXRhLiAKQnkgdGFraW5nIG1vcmUgb2YgdGhlIGluZm9ybWF0aW9uIGluIHRoZSBkYXRhc2V0IGludG8gYWNjb3VudCwgY2FuCnlvdSBwcm92aWRlZCBhIG1vcmUgZGV0YWlsZWQvYWNjdXJhdGUgdmlzdWFsaXppdGlvbiBvZiB0aGUKdmFyaWFibGVzIHRoYXQgZWZmZWN0IHRoZSBGRVY/CgpgYGB7ciwgZXZhbD1GQUxTRX0KLi4uCgoKIyMgVHJ5IHRvIGdldCBhIHZpc3VhbGl6YXRpb24gdGhhdCBkZXNjcmliZXMgdGhlIGRhdGEgYXMgZ29vZCBhcyBwb3NzaWJsZSEhCgoKCi4uLgpgYGAKCgoK