As an exercise on multiple regression, we will analyse the fish tank dataset.

1 Fish tank dataset

In this experiments 96 fish (dojofish, goldfish and zebrafish) were placed separately in a tank with two liters of water and a certain dose (in mg) of a certain poison EI-43,064. The resistance of the fish a against the poison was measured as the amount of minutes the fish survived upon adding the poison (Surv_time, in minutes). Additionally, the weightt of each fish was measured.

2 Goal

In this tutorial, we will study the association between dose and survival time, while correcting for weight and species, by using a multiple regression model.

Read the required libraries

library(tidyverse)

3 Import the data

poison <- read_csv("https://raw.githubusercontent.com/GTPB/PSLS20/master/data/poison.csv")

4 Data tidying

head(poison)
## # A tibble: 6 x 4
##   species Weight  Dose Surv_time
##     <dbl>  <dbl> <dbl>     <dbl>
## 1       0   1.88   1        3.46
## 2       2   1.73   1.1      2.11
## 3       1   2.83   1.2     11.4 
## 4       2   1.75   1.3      1.82
## 5       0   2.11   1.4      3.28
## 6       0   1.85   1.5      2.96

We can see a couple of things in the data that can be improved upon:

  1. Capitalize the fist column name
  2. Set the Species column as a factor
  3. Change the speciec factor levels from 0, 1 and 2 to Dojofish, Goldfish and Zebrafish. Hint: use the fct_recode function.
  4. Add the variable log.Surv_time: we already saw in previous tutorials that this transfromation is required to obtain normally distributed data.

5 Data exploration

In a previous tutorial, we already studied the effect of dose on (the logarithm of) suvival time. There, we did not account for the fact that fish of the same species and/or fish of similar weight will probably reacht similarly to the poison.

When we expect the data more closely, we see that these factors do indeed matter.

library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
#poison %>%  select(-survival) %>% ggpairs()

Interpret the correlations with respect to the survival time.

Some additional visualizations:

  • Plot the log survival time in function of weigth. Add species as a color.

Interpret the observed association.

  • Plot the fish weights in function of dose, color on species.

Interpret the observed association.

  • Plot the log of survival time in function of dose, color on species.

Interpret the observed association.

  • Plot the relationship between log survival time and species.

Interpret the observed association.

The researchers assume, based on the data exploration, that there are multiple variables other than the dose of the poison that affect the survival time of the fishes.

In addition, it seems that the main effects (i.e. dose, species and weight) influence each other. As such, we must construct a multiple regression model that contains all main effects as well as the interaction terms.

6 Multiple regression model

Fit the multiple regression model with all the main effects:

7 Assess the model assumptions

8 Interpret the model parameters

library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
#Anova(..., type = "III") ## type three in presence of important interactions
#summary(...)

9 Conclusion

Formulate a conclusion.

LS0tCnRpdGxlOiAiVHV0b3JpYWwgOC4xOiBNdWx0aXBsZSByZWdyZXNzaW9uIG9uIHRoZSBmaXNoIHRhbmsgZGF0YXNldCIgICAKb3V0cHV0OgogICAgaHRtbF9kb2N1bWVudDoKICAgICAgY29kZV9kb3dubG9hZDogdHJ1ZSAgICAKICAgICAgdGhlbWU6IGNvc21vCiAgICAgIHRvYzogdHJ1ZQogICAgICB0b2NfZmxvYXQ6IHRydWUKICAgICAgaGlnaGxpZ2h0OiB0YW5nbwogICAgICBudW1iZXJfc2VjdGlvbnM6IHRydWUKLS0tCgpBcyBhbiBleGVyY2lzZSBvbiBtdWx0aXBsZSByZWdyZXNzaW9uLCB3ZSB3aWxsIGFuYWx5c2UKdGhlIGZpc2ggdGFuayBkYXRhc2V0LgoKIyBGaXNoIHRhbmsgZGF0YXNldAoKSW4gdGhpcyBleHBlcmltZW50cyA5NiBmaXNoIChkb2pvZmlzaCwgZ29sZGZpc2ggYW5kIHplYnJhZmlzaCkKd2VyZSBwbGFjZWQgc2VwYXJhdGVseSBpbiBhIHRhbmsgd2l0aCB0d28gbGl0ZXJzIG9mIHdhdGVyIGFuZAphIGNlcnRhaW4gZG9zZSAoaW4gbWcpIG9mIGEgY2VydGFpbiBwb2lzb24gRUktNDMsMDY0LiBUaGUgcmVzaXN0YW5jZQpvZiB0aGUgZmlzaCBhIGFnYWluc3QgdGhlIHBvaXNvbiB3YXMgbWVhc3VyZWQgYXMgdGhlIGFtb3VudCBvZgptaW51dGVzIHRoZSBmaXNoIHN1cnZpdmVkIHVwb24gYWRkaW5nIHRoZSBwb2lzb24gKFN1cnZfdGltZSwgaW4KbWludXRlcykuIEFkZGl0aW9uYWxseSwgdGhlIHdlaWdodHQgb2YgZWFjaCBmaXNoIHdhcyBtZWFzdXJlZC4KCiMgR29hbAoKSW4gdGhpcyB0dXRvcmlhbCwgd2Ugd2lsbCBzdHVkeSB0aGUgYXNzb2NpYXRpb24gYmV0d2VlbiBkb3NlIGFuZApzdXJ2aXZhbCB0aW1lLCB3aGlsZSBjb3JyZWN0aW5nIGZvciB3ZWlnaHQgYW5kIHNwZWNpZXMsIGJ5IHVzaW5nIAphIG11bHRpcGxlIHJlZ3Jlc3Npb24gbW9kZWwuCgpSZWFkIHRoZSByZXF1aXJlZCBsaWJyYXJpZXMKCmBgYHtyLCBtZXNzYWdlID0gRkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQpgYGAKCiMgSW1wb3J0IHRoZSBkYXRhCgpgYGB7ciwgbWVzc2FnZT1GQUxTRX0KcG9pc29uIDwtIHJlYWRfY3N2KCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vR1RQQi9QU0xTMjAvbWFzdGVyL2RhdGEvcG9pc29uLmNzdiIpCmBgYAoKIyBEYXRhIHRpZHlpbmcKCmBgYHtyfQpoZWFkKHBvaXNvbikKYGBgCgpXZSBjYW4gc2VlIGEgY291cGxlIG9mIHRoaW5ncyBpbiB0aGUgZGF0YSB0aGF0IGNhbgpiZSBpbXByb3ZlZCB1cG9uOgoKMS4gQ2FwaXRhbGl6ZSB0aGUgZmlzdCBjb2x1bW4gbmFtZSAKMi4gU2V0IHRoZSBTcGVjaWVzIGNvbHVtbiBhcyBhIGZhY3RvcgozLiBDaGFuZ2UgdGhlIHNwZWNpZWMgZmFjdG9yIGxldmVscyBmcm9tIDAsIDEgYW5kIDIgdG8KRG9qb2Zpc2gsIEdvbGRmaXNoIGFuZCBaZWJyYWZpc2guIEhpbnQ6IHVzZSB0aGUgZmN0X3JlY29kZQpmdW5jdGlvbi4KNC4gQWRkIHRoZSB2YXJpYWJsZSBsb2cuU3Vydl90aW1lOiB3ZSBhbHJlYWR5IHNhdyBpbiBwcmV2aW91cwp0dXRvcmlhbHMgdGhhdCB0aGlzIHRyYW5zZnJvbWF0aW9uIGlzIHJlcXVpcmVkIHRvIG9idGFpbgpub3JtYWxseSBkaXN0cmlidXRlZCBkYXRhLgoKYGBge3J9CgpgYGAKCiMgRGF0YSBleHBsb3JhdGlvbgoKSW4gYSBwcmV2aW91cyB0dXRvcmlhbCwgd2UgYWxyZWFkeSBzdHVkaWVkIHRoZSBlZmZlY3Qgb2YKZG9zZSBvbiAodGhlIGxvZ2FyaXRobSBvZikgc3V2aXZhbCB0aW1lLiBUaGVyZSwgd2UgZGlkIG5vdAphY2NvdW50IGZvciB0aGUgZmFjdCB0aGF0IGZpc2ggb2YgdGhlIHNhbWUgc3BlY2llcyBhbmQvb3IKZmlzaCBvZiBzaW1pbGFyIHdlaWdodCB3aWxsIHByb2JhYmx5IHJlYWNodCBzaW1pbGFybHkgdG8KdGhlIHBvaXNvbi4KCldoZW4gd2UgZXhwZWN0IHRoZSBkYXRhIG1vcmUgY2xvc2VseSwgd2Ugc2VlIHRoYXQgdGhlc2UKZmFjdG9ycyBkbyBpbmRlZWQgbWF0dGVyLgoKYGBge3J9CmxpYnJhcnkoR0dhbGx5KQojcG9pc29uICU+JSAgc2VsZWN0KC1zdXJ2aXZhbCkgJT4lIGdncGFpcnMoKQpgYGAKCkludGVycHJldCB0aGUgY29ycmVsYXRpb25zIHdpdGggcmVzcGVjdCB0byB0aGUgc3Vydml2YWwgdGltZS4KClNvbWUgYWRkaXRpb25hbCB2aXN1YWxpemF0aW9uczoKCi0gUGxvdCB0aGUgbG9nIHN1cnZpdmFsIHRpbWUgaW4gZnVuY3Rpb24gb2Ygd2VpZ3RoLgpBZGQgc3BlY2llcyBhcyBhIGNvbG9yLgoKYGBge3J9CgpgYGAKCkludGVycHJldCB0aGUgb2JzZXJ2ZWQgYXNzb2NpYXRpb24uCgotIFBsb3QgdGhlIGZpc2ggd2VpZ2h0cyBpbiBmdW5jdGlvbiBvZiBkb3NlLCBjb2xvciBvbiBzcGVjaWVzLgoKYGBge3J9CgpgYGAKCkludGVycHJldCB0aGUgb2JzZXJ2ZWQgYXNzb2NpYXRpb24uCgotIFBsb3QgdGhlIGxvZyBvZiBzdXJ2aXZhbCB0aW1lIGluIGZ1bmN0aW9uIG9mIGRvc2UsIGNvbG9yIG9uIHNwZWNpZXMuCgpgYGB7cn0KCmBgYAoKSW50ZXJwcmV0IHRoZSBvYnNlcnZlZCBhc3NvY2lhdGlvbi4KCi0gUGxvdCB0aGUgcmVsYXRpb25zaGlwIGJldHdlZW4gbG9nIHN1cnZpdmFsIHRpbWUgYW5kIHNwZWNpZXMuCgpgYGB7cn0KCmBgYAoKSW50ZXJwcmV0IHRoZSBvYnNlcnZlZCBhc3NvY2lhdGlvbi4KClRoZSByZXNlYXJjaGVycyBhc3N1bWUsIGJhc2VkIG9uIHRoZSBkYXRhIGV4cGxvcmF0aW9uLCB0aGF0CnRoZXJlIGFyZSBtdWx0aXBsZSB2YXJpYWJsZXMgb3RoZXIgdGhhbiB0aGUgZG9zZSBvZiB0aGUgcG9pc29uCnRoYXQgYWZmZWN0IHRoZSBzdXJ2aXZhbCB0aW1lIG9mIHRoZSBmaXNoZXMuCgpJbiBhZGRpdGlvbiwgaXQgc2VlbXMgdGhhdCB0aGUgbWFpbiBlZmZlY3RzIChpLmUuIGRvc2UsIHNwZWNpZXMgYW5kCndlaWdodCkgaW5mbHVlbmNlIGVhY2ggb3RoZXIuIEFzIHN1Y2gsIHdlIG11c3QgY29uc3RydWN0IGEgbXVsdGlwbGUKcmVncmVzc2lvbiBtb2RlbCB0aGF0IGNvbnRhaW5zIGFsbCBtYWluIGVmZmVjdHMgYXMgd2VsbCBhcyB0aGUgaW50ZXJhY3Rpb24gdGVybXMuCgoKIyBNdWx0aXBsZSByZWdyZXNzaW9uIG1vZGVsIAoKRml0IHRoZSBtdWx0aXBsZSByZWdyZXNzaW9uIG1vZGVsIHdpdGggYWxsIHRoZSBtYWluIGVmZmVjdHM6CgpgYGB7cn0KCmBgYAoKIyBBc3Nlc3MgdGhlIG1vZGVsIGFzc3VtcHRpb25zCgpgYGB7cn0KCmBgYAoKIyBJbnRlcnByZXQgdGhlIG1vZGVsIHBhcmFtZXRlcnMKCmBgYHtyfQpsaWJyYXJ5KGNhcikKI0Fub3ZhKC4uLiwgdHlwZSA9ICJJSUkiKSAjIyB0eXBlIHRocmVlIGluIHByZXNlbmNlIG9mIGltcG9ydGFudCBpbnRlcmFjdGlvbnMKI3N1bW1hcnkoLi4uKQpgYGAKCgojIENvbmNsdXNpb24KCkZvcm11bGF0ZSBhIGNvbmNsdXNpb24uCgoKCgoKCgoKCg==