As an exercise on linear regression, we will analyse the FEV dataset.

1 The FEV dataset

The FEV, which is an acronym for forced expiratory volume, is a measure of how much air a person can exhale (in litres) during a forced breath. In this dataset, the FEV of 606 children, between the ages of 6 and 17, were measured. The dataset also provides additional information on these children: their age, their height, their gender and, most importantly, whether the child is a smoker or a non-smoker.

The overarching goal of this experiment was to find out whether or not smoking has an effect on the FEV of children.

2 Load the required libraries

library(tidyverse)

3 Import the data

fev <- read_tsv("https://raw.githubusercontent.com/GTPB/PSLS20/master/data/fev.txt")
## Parsed with column specification:
## cols(
##   age = col_double(),
##   fev = col_double(),
##   height = col_double(),
##   gender = col_character(),
##   smoking = col_double()
## )
head(fev)
## # A tibble: 6 x 5
##     age   fev height gender smoking
##   <dbl> <dbl>  <dbl> <chr>    <dbl>
## 1     9  1.71   57   f            0
## 2     8  1.72   67.5 f            0
## 3     7  1.72   54.5 f            0
## 4     9  1.56   53   m            0
## 5     9  1.90   57   m            0
## 6     8  2.34   61   f            0

4 Tidy the data

There are a few things in the formatting of the data that can be improved upon:

  1. Both the gender and smoking can be transformed to factors.
  2. The height variable is written in inches. Assuming that this audience is mainly Portuguese/Belgian, inches are hard to interpret. Let’s add a new column, height_cm, with the values converted to centimeters using the mutate function.

5 Data Exploration

Explore the data. Visualise the FEV for smokers versus non-smokers:

Did you expect these results? Can you explain what we observe (and why)? Additionally, can you provide an even better visualisation of the data, taking into account more useful explanatory variables with respect to the FEV?

6 Analysis

As stated above, the overarching goal of this experiment was to assess the impact of smoking on the FEV of children. In principle, we have multiple variables that can affect the FEV. We have, however, not learned yet how to model the response based on multiple predictors. To answer this research question properly, we will need some more advanced modelling techniques. In the tutorial on multiple regression, we will learn those and come back to this dataset!

For now, we can already assess other (less complex) research questions:

  1. Is there a linear association between the FEV and the height of non-smoking females?

  2. Is there a linear association between the FEV and the age of non-smoking females?

  3. Is there a linear association between the FEV and the height of non-smoking males?

  4. Is there a linear association between the FEV and the age of non-smoking males?

For each of these short research questions, you should:

  1. Check the assumptions of the linear model, and analyse the data accordingly.

  2. Interpret the output, focusing on the interpretation of the intercept and the slope parameters of the model.

  3. Formulate a conclusion for each research hypothesis.

LS0tCnRpdGxlOiAiVHV0b3JpYWwgNi4yOiBMaW5lYXIgcmVncmVzc2lvbiBvbiB0aGUgRkVWIGRhdGFzZXQiICAgCm91dHB1dDoKICAgIGh0bWxfZG9jdW1lbnQ6CiAgICAgIGNvZGVfZG93bmxvYWQ6IHRydWUgICAgCiAgICAgIHRoZW1lOiBjb3NtbwogICAgICB0b2M6IHRydWUKICAgICAgdG9jX2Zsb2F0OiB0cnVlCiAgICAgIGhpZ2hsaWdodDogdGFuZ28KICAgICAgbnVtYmVyX3NlY3Rpb25zOiB0cnVlCi0tLQoKQXMgYW4gZXhlcmNpc2Ugb24gbGluZWFyIHJlZ3Jlc3Npb24sIHdlIHdpbGwgYW5hbHlzZSB0aGUgRkVWIGRhdGFzZXQuCgojIFRoZSBGRVYgZGF0YXNldAoKVGhlIEZFViwgd2hpY2ggaXMgYW4gYWNyb255bSBmb3IgZm9yY2VkIGV4cGlyYXRvcnkgdm9sdW1lLAppcyBhIG1lYXN1cmUgb2YgaG93IG11Y2ggYWlyIGEgcGVyc29uIGNhbiBleGhhbGUgKGluIGxpdHJlcykgCmR1cmluZyAgYSBmb3JjZWQgYnJlYXRoLiBJbiB0aGlzIGRhdGFzZXQsIHRoZSBGRVYgb2YgNjA2IGNoaWxkcmVuLApiZXR3ZWVuIHRoZSBhZ2VzIG9mIDYgYW5kIDE3LCB3ZXJlIG1lYXN1cmVkLiBUaGUgZGF0YXNldAphbHNvIHByb3ZpZGVzIGFkZGl0aW9uYWwgaW5mb3JtYXRpb24gb24gdGhlc2UgY2hpbGRyZW46CnRoZWlyIGBhZ2VgLCB0aGVpciBgaGVpZ2h0YCwgdGhlaXIgYGdlbmRlcmAgYW5kLCBtb3N0CmltcG9ydGFudGx5LCB3aGV0aGVyIHRoZSBjaGlsZCBpcyBhIHNtb2tlciBvciBhIG5vbi1zbW9rZXIuCgpUaGUgb3ZlcmFyY2hpbmcgZ29hbCBvZiB0aGlzIGV4cGVyaW1lbnQgd2FzIHRvIGZpbmQgb3V0IHdoZXRoZXIgb3Igbm90CnNtb2tpbmcgaGFzIGFuIGVmZmVjdCBvbiB0aGUgRkVWIG9mIGNoaWxkcmVuLgoKIyBMb2FkIHRoZSByZXF1aXJlZCBsaWJyYXJpZXMKCmBgYHtyLCBtZXNzYWdlID0gRkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQpgYGAKCiMgSW1wb3J0IHRoZSBkYXRhCgpgYGB7cn0KZmV2IDwtIHJlYWRfdHN2KCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vR1RQQi9QU0xTMjAvbWFzdGVyL2RhdGEvZmV2LnR4dCIpCmhlYWQoZmV2KQpgYGAKCiMgVGlkeSB0aGUgZGF0YQoKVGhlcmUgYXJlIGEgZmV3IHRoaW5ncyBpbiB0aGUgZm9ybWF0dGluZyBvZiB0aGUKZGF0YSB0aGF0IGNhbiBiZSBpbXByb3ZlZCB1cG9uOgoKMS4gQm90aCB0aGUgYGdlbmRlcmAgYW5kIGBzbW9raW5nYCBjYW4gYmUgdHJhbnNmb3JtZWQgdG8KZmFjdG9ycy4KMi4gVGhlIGBoZWlnaHRgIHZhcmlhYmxlIGlzIHdyaXR0ZW4gaW4gaW5jaGVzLiBBc3N1bWluZyB0aGF0CnRoaXMgYXVkaWVuY2UgaXMgbWFpbmx5IFBvcnR1Z3Vlc2UvQmVsZ2lhbiwgaW5jaGVzIGFyZSBoYXJkIHRvCmludGVycHJldC4gTGV0J3MgYWRkIGEgbmV3IGNvbHVtbiwgYGhlaWdodF9jbWAsIHdpdGggdGhlIHZhbHVlcwpjb252ZXJ0ZWQgdG8gY2VudGltZXRlcnMgdXNpbmcgdGhlIGBtdXRhdGVgIGZ1bmN0aW9uLiAKCmBgYHtyfQoKYGBgCgoKIyBEYXRhIEV4cGxvcmF0aW9uCgpFeHBsb3JlIHRoZSBkYXRhLiBWaXN1YWxpc2UgdGhlIEZFViBmb3Igc21va2VycyB2ZXJzdXMgbm9uLXNtb2tlcnM6CgpgYGB7cn0KCmBgYAoKRGlkIHlvdSBleHBlY3QgdGhlc2UgcmVzdWx0cz8gQ2FuIHlvdSBleHBsYWluIHdoYXQgd2Ugb2JzZXJ2ZSAoYW5kIHdoeSk/CkFkZGl0aW9uYWxseSwgY2FuIHlvdSBwcm92aWRlIGFuIGV2ZW4gYmV0dGVyIHZpc3VhbGlzYXRpb24gb2YgdGhlIGRhdGEsIHRha2luZwppbnRvIGFjY291bnQgbW9yZSB1c2VmdWwgZXhwbGFuYXRvcnkgdmFyaWFibGVzIHdpdGggcmVzcGVjdAp0byB0aGUgRkVWPwoKYGBge3J9CgpgYGAKCiMgQW5hbHlzaXMKCkFzIHN0YXRlZCBhYm92ZSwgdGhlIG92ZXJhcmNoaW5nIGdvYWwgb2YgdGhpcyBleHBlcmltZW50IHdhcyB0byBhc3Nlc3MgdGhlIGltcGFjdCBvZiBzbW9raW5nIG9uIHRoZSBGRVYgb2YgY2hpbGRyZW4uCkluIHByaW5jaXBsZSwgd2UgaGF2ZSBtdWx0aXBsZSB2YXJpYWJsZXMgdGhhdCBjYW4gYWZmZWN0IHRoZSBGRVYuIApXZSBoYXZlLCBob3dldmVyLCBub3QgbGVhcm5lZCB5ZXQgaG93IHRvIG1vZGVsIHRoZSByZXNwb25zZSBiYXNlZCBvbiBtdWx0aXBsZQpwcmVkaWN0b3JzLiBUbyBhbnN3ZXIgdGhpcyByZXNlYXJjaCBxdWVzdGlvbiBwcm9wZXJseSwgd2Ugd2lsbCBuZWVkIHNvbWUKbW9yZSBhZHZhbmNlZCBtb2RlbGxpbmcgdGVjaG5pcXVlcy4gSW4gdGhlIHR1dG9yaWFsIG9uIGBtdWx0aXBsZSByZWdyZXNzaW9uYCwgCndlIHdpbGwgbGVhcm4gdGhvc2UgYW5kIGNvbWUgYmFjayB0byB0aGlzIGRhdGFzZXQhCgpGb3Igbm93LCB3ZSBjYW4gYWxyZWFkeSBhc3Nlc3Mgb3RoZXIgKGxlc3MgY29tcGxleCkgcmVzZWFyY2ggcXVlc3Rpb25zOgoKMS4gSXMgdGhlcmUgYSBsaW5lYXIgYXNzb2NpYXRpb24gYmV0d2VlbiB0aGUgYEZFVmAgYW5kIHRoZSBgaGVpZ2h0YCBvZiBgbm9uLXNtb2tpbmdgCmBmZW1hbGVzYD8gCgoyLiBJcyB0aGVyZSBhIGxpbmVhciBhc3NvY2lhdGlvbiBiZXR3ZWVuIHRoZSBgRkVWYCBhbmQgdGhlIGBhZ2VgIG9mIGBub24tc21va2luZ2AKYGZlbWFsZXNgPwoKMy4gSXMgdGhlcmUgYSBsaW5lYXIgYXNzb2NpYXRpb24gYmV0d2VlbiB0aGUgYEZFVmAgYW5kIHRoZSBgaGVpZ2h0YCBvZiBgbm9uLXNtb2tpbmdgCmBtYWxlc2A/IAoKNC4gSXMgdGhlcmUgYSBsaW5lYXIgYXNzb2NpYXRpb24gYmV0d2VlbiB0aGUgYEZFVmAgYW5kIHRoZSBgYWdlYCBvZiBgbm9uLXNtb2tpbmdgCmBtYWxlc2A/CgpGb3IgZWFjaCBvZiB0aGVzZSBzaG9ydCByZXNlYXJjaCBxdWVzdGlvbnMsIHlvdSBzaG91bGQ6CgoxLiBDaGVjayB0aGUgYXNzdW1wdGlvbnMgb2YgdGhlIGxpbmVhciBtb2RlbCwgYW5kIGFuYWx5c2UgdGhlIGRhdGEgYWNjb3JkaW5nbHkuCgoyLiBJbnRlcnByZXQgdGhlIG91dHB1dCwgZm9jdXNpbmcgb24gdGhlIGludGVycHJldGF0aW9uIG9mIHRoZSBpbnRlcmNlcHQgYW5kIHRoZSBzbG9wZSBwYXJhbWV0ZXJzIG9mIHRoZSBtb2RlbC4KCjMuIEZvcm11bGF0ZSBhIGNvbmNsdXNpb24gZm9yIGVhY2ggcmVzZWFyY2ggaHlwb3RoZXNpcy4KCg==