As an exercise on linear regression, we will analyse the FEV dataset.
The FEV dataset
The FEV, which is an acronym for forced expiratory volume, is a measure of how much air a person can exhale (in litres) during a forced breath. In this dataset, the FEV of 606 children, between the ages of 6 and 17, were measured. The dataset also provides additional information on these children: their age
, their height
, their gender
and, most importantly, whether the child is a smoker or a non-smoker.
The overarching goal of this experiment was to find out whether or not smoking has an effect on the FEV of children.
Load the required libraries
Import the data
fev <- read_tsv("https://raw.githubusercontent.com/GTPB/PSLS20/master/data/fev.txt")
## Parsed with column specification:
## cols(
## age = col_double(),
## fev = col_double(),
## height = col_double(),
## gender = col_character(),
## smoking = col_double()
## )
## # A tibble: 6 x 5
## age fev height gender smoking
## <dbl> <dbl> <dbl> <chr> <dbl>
## 1 9 1.71 57 f 0
## 2 8 1.72 67.5 f 0
## 3 7 1.72 54.5 f 0
## 4 9 1.56 53 m 0
## 5 9 1.90 57 m 0
## 6 8 2.34 61 f 0
Tidy the data
There are a few things in the formatting of the data that can be improved upon:
- Both the
gender
and smoking
can be transformed to factors.
- The
height
variable is written in inches. Assuming that this audience is mainly Portuguese/Belgian, inches are hard to interpret. Let’s add a new column, height_cm
, with the values converted to centimeters using the mutate
function.
Data Exploration
Explore the data. Visualise the FEV for smokers versus non-smokers:
Did you expect these results? Can you explain what we observe (and why)? Additionally, can you provide an even better visualisation of the data, taking into account more useful explanatory variables with respect to the FEV?
Analysis
As stated above, the overarching goal of this experiment was to assess the impact of smoking on the FEV of children. In principle, we have multiple variables that can affect the FEV. We have, however, not learned yet how to model the response based on multiple predictors. To answer this research question properly, we will need some more advanced modelling techniques. In the tutorial on multiple regression
, we will learn those and come back to this dataset!
For now, we can already assess other (less complex) research questions:
Is there a linear association between the FEV
and the height
of non-smoking
females
?
Is there a linear association between the FEV
and the age
of non-smoking
females
?
Is there a linear association between the FEV
and the height
of non-smoking
males
?
Is there a linear association between the FEV
and the age
of non-smoking
males
?
For each of these short research questions, you should:
Check the assumptions of the linear model, and analyse the data accordingly.
Interpret the output, focusing on the interpretation of the intercept and the slope parameters of the model.
Formulate a conclusion for each research hypothesis.
LS0tCnRpdGxlOiAiVHV0b3JpYWwgNi4yOiBMaW5lYXIgcmVncmVzc2lvbiBvbiB0aGUgRkVWIGRhdGFzZXQiICAgCm91dHB1dDoKICAgIGh0bWxfZG9jdW1lbnQ6CiAgICAgIGNvZGVfZG93bmxvYWQ6IHRydWUgICAgCiAgICAgIHRoZW1lOiBjb3NtbwogICAgICB0b2M6IHRydWUKICAgICAgdG9jX2Zsb2F0OiB0cnVlCiAgICAgIGhpZ2hsaWdodDogdGFuZ28KICAgICAgbnVtYmVyX3NlY3Rpb25zOiB0cnVlCi0tLQoKQXMgYW4gZXhlcmNpc2Ugb24gbGluZWFyIHJlZ3Jlc3Npb24sIHdlIHdpbGwgYW5hbHlzZSB0aGUgRkVWIGRhdGFzZXQuCgojIFRoZSBGRVYgZGF0YXNldAoKVGhlIEZFViwgd2hpY2ggaXMgYW4gYWNyb255bSBmb3IgZm9yY2VkIGV4cGlyYXRvcnkgdm9sdW1lLAppcyBhIG1lYXN1cmUgb2YgaG93IG11Y2ggYWlyIGEgcGVyc29uIGNhbiBleGhhbGUgKGluIGxpdHJlcykgCmR1cmluZyAgYSBmb3JjZWQgYnJlYXRoLiBJbiB0aGlzIGRhdGFzZXQsIHRoZSBGRVYgb2YgNjA2IGNoaWxkcmVuLApiZXR3ZWVuIHRoZSBhZ2VzIG9mIDYgYW5kIDE3LCB3ZXJlIG1lYXN1cmVkLiBUaGUgZGF0YXNldAphbHNvIHByb3ZpZGVzIGFkZGl0aW9uYWwgaW5mb3JtYXRpb24gb24gdGhlc2UgY2hpbGRyZW46CnRoZWlyIGBhZ2VgLCB0aGVpciBgaGVpZ2h0YCwgdGhlaXIgYGdlbmRlcmAgYW5kLCBtb3N0CmltcG9ydGFudGx5LCB3aGV0aGVyIHRoZSBjaGlsZCBpcyBhIHNtb2tlciBvciBhIG5vbi1zbW9rZXIuCgpUaGUgb3ZlcmFyY2hpbmcgZ29hbCBvZiB0aGlzIGV4cGVyaW1lbnQgd2FzIHRvIGZpbmQgb3V0IHdoZXRoZXIgb3Igbm90CnNtb2tpbmcgaGFzIGFuIGVmZmVjdCBvbiB0aGUgRkVWIG9mIGNoaWxkcmVuLgoKIyBMb2FkIHRoZSByZXF1aXJlZCBsaWJyYXJpZXMKCmBgYHtyLCBtZXNzYWdlID0gRkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQpgYGAKCiMgSW1wb3J0IHRoZSBkYXRhCgpgYGB7cn0KZmV2IDwtIHJlYWRfdHN2KCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vR1RQQi9QU0xTMjAvbWFzdGVyL2RhdGEvZmV2LnR4dCIpCmhlYWQoZmV2KQpgYGAKCiMgVGlkeSB0aGUgZGF0YQoKVGhlcmUgYXJlIGEgZmV3IHRoaW5ncyBpbiB0aGUgZm9ybWF0dGluZyBvZiB0aGUKZGF0YSB0aGF0IGNhbiBiZSBpbXByb3ZlZCB1cG9uOgoKMS4gQm90aCB0aGUgYGdlbmRlcmAgYW5kIGBzbW9raW5nYCBjYW4gYmUgdHJhbnNmb3JtZWQgdG8KZmFjdG9ycy4KMi4gVGhlIGBoZWlnaHRgIHZhcmlhYmxlIGlzIHdyaXR0ZW4gaW4gaW5jaGVzLiBBc3N1bWluZyB0aGF0CnRoaXMgYXVkaWVuY2UgaXMgbWFpbmx5IFBvcnR1Z3Vlc2UvQmVsZ2lhbiwgaW5jaGVzIGFyZSBoYXJkIHRvCmludGVycHJldC4gTGV0J3MgYWRkIGEgbmV3IGNvbHVtbiwgYGhlaWdodF9jbWAsIHdpdGggdGhlIHZhbHVlcwpjb252ZXJ0ZWQgdG8gY2VudGltZXRlcnMgdXNpbmcgdGhlIGBtdXRhdGVgIGZ1bmN0aW9uLiAKCmBgYHtyfQoKYGBgCgoKIyBEYXRhIEV4cGxvcmF0aW9uCgpFeHBsb3JlIHRoZSBkYXRhLiBWaXN1YWxpc2UgdGhlIEZFViBmb3Igc21va2VycyB2ZXJzdXMgbm9uLXNtb2tlcnM6CgpgYGB7cn0KCmBgYAoKRGlkIHlvdSBleHBlY3QgdGhlc2UgcmVzdWx0cz8gQ2FuIHlvdSBleHBsYWluIHdoYXQgd2Ugb2JzZXJ2ZSAoYW5kIHdoeSk/CkFkZGl0aW9uYWxseSwgY2FuIHlvdSBwcm92aWRlIGFuIGV2ZW4gYmV0dGVyIHZpc3VhbGlzYXRpb24gb2YgdGhlIGRhdGEsIHRha2luZwppbnRvIGFjY291bnQgbW9yZSB1c2VmdWwgZXhwbGFuYXRvcnkgdmFyaWFibGVzIHdpdGggcmVzcGVjdAp0byB0aGUgRkVWPwoKYGBge3J9CgpgYGAKCiMgQW5hbHlzaXMKCkFzIHN0YXRlZCBhYm92ZSwgdGhlIG92ZXJhcmNoaW5nIGdvYWwgb2YgdGhpcyBleHBlcmltZW50IHdhcyB0byBhc3Nlc3MgdGhlIGltcGFjdCBvZiBzbW9raW5nIG9uIHRoZSBGRVYgb2YgY2hpbGRyZW4uCkluIHByaW5jaXBsZSwgd2UgaGF2ZSBtdWx0aXBsZSB2YXJpYWJsZXMgdGhhdCBjYW4gYWZmZWN0IHRoZSBGRVYuIApXZSBoYXZlLCBob3dldmVyLCBub3QgbGVhcm5lZCB5ZXQgaG93IHRvIG1vZGVsIHRoZSByZXNwb25zZSBiYXNlZCBvbiBtdWx0aXBsZQpwcmVkaWN0b3JzLiBUbyBhbnN3ZXIgdGhpcyByZXNlYXJjaCBxdWVzdGlvbiBwcm9wZXJseSwgd2Ugd2lsbCBuZWVkIHNvbWUKbW9yZSBhZHZhbmNlZCBtb2RlbGxpbmcgdGVjaG5pcXVlcy4gSW4gdGhlIHR1dG9yaWFsIG9uIGBtdWx0aXBsZSByZWdyZXNzaW9uYCwgCndlIHdpbGwgbGVhcm4gdGhvc2UgYW5kIGNvbWUgYmFjayB0byB0aGlzIGRhdGFzZXQhCgpGb3Igbm93LCB3ZSBjYW4gYWxyZWFkeSBhc3Nlc3Mgb3RoZXIgKGxlc3MgY29tcGxleCkgcmVzZWFyY2ggcXVlc3Rpb25zOgoKMS4gSXMgdGhlcmUgYSBsaW5lYXIgYXNzb2NpYXRpb24gYmV0d2VlbiB0aGUgYEZFVmAgYW5kIHRoZSBgaGVpZ2h0YCBvZiBgbm9uLXNtb2tpbmdgCmBmZW1hbGVzYD8gCgoyLiBJcyB0aGVyZSBhIGxpbmVhciBhc3NvY2lhdGlvbiBiZXR3ZWVuIHRoZSBgRkVWYCBhbmQgdGhlIGBhZ2VgIG9mIGBub24tc21va2luZ2AKYGZlbWFsZXNgPwoKMy4gSXMgdGhlcmUgYSBsaW5lYXIgYXNzb2NpYXRpb24gYmV0d2VlbiB0aGUgYEZFVmAgYW5kIHRoZSBgaGVpZ2h0YCBvZiBgbm9uLXNtb2tpbmdgCmBtYWxlc2A/IAoKNC4gSXMgdGhlcmUgYSBsaW5lYXIgYXNzb2NpYXRpb24gYmV0d2VlbiB0aGUgYEZFVmAgYW5kIHRoZSBgYWdlYCBvZiBgbm9uLXNtb2tpbmdgCmBtYWxlc2A/CgpGb3IgZWFjaCBvZiB0aGVzZSBzaG9ydCByZXNlYXJjaCBxdWVzdGlvbnMsIHlvdSBzaG91bGQ6CgoxLiBDaGVjayB0aGUgYXNzdW1wdGlvbnMgb2YgdGhlIGxpbmVhciBtb2RlbCwgYW5kIGFuYWx5c2UgdGhlIGRhdGEgYWNjb3JkaW5nbHkuCgoyLiBJbnRlcnByZXQgdGhlIG91dHB1dCwgZm9jdXNpbmcgb24gdGhlIGludGVycHJldGF0aW9uIG9mIHRoZSBpbnRlcmNlcHQgYW5kIHRoZSBzbG9wZSBwYXJhbWV0ZXJzIG9mIHRoZSBtb2RlbC4KCjMuIEZvcm11bGF0ZSBhIGNvbmNsdXNpb24gZm9yIGVhY2ggcmVzZWFyY2ggaHlwb3RoZXNpcy4KCg==