Assess genotoxicity of 1,2-dimethylhydrazine dihydrochloride (DMH) (EU directive)
high
Are there differences in DNA damage due to DMH dose?
dna <- read_delim("https://raw.githubusercontent.com/GTPB/PSLS20/master/data/dna.txt",delim=" ")
dna$dose <- as.factor(dna$dose)
dna
# A tibble: 24 x 3
id length dose
<chr> <dbl> <fct>
1 Rat1 19.3 0
2 Rat2 18.9 0
3 Rat3 18.6 0
4 Rat4 19.0 0
5 Rat5 17.1 0
6 Rat6 18.8 0
7 Rat7 55.4 1.25
8 Rat8 59.2 1.25
9 Rat9 59.1 1.25
10 Rat10 52.1 1.25
# ... with 14 more rows
dna %>%
ggplot(aes(x=dose,y=length,fill=dose))+
geom_boxplot() +
geom_point(position="jitter")
dna %>%
ggplot(aes(sample=length)) +
geom_qq() +
geom_qq_line() +
facet_wrap(~dose)
plot(lm(length~dose,data=dna))
The Kruskal-Wallis Rank Test (KW-test) is a non-parameteric alternative for ANOVA F-test.
with \(g\) the number of groups.
SSTot depends only on outcomes \(\mathbf{y}\) and will not vary in permutation test.
SST can be used as statistic \[\text{SST}=\sum_{j=1}^t n_j(\bar{Y}_j-\bar{Y})^2\]
The factor \(\frac{12}{n(n+1)}\) is used so that \(KW\) has a simple asymptotic null distribution. In particular under \(H_0\), given thart \(\min(n_1,\ldots, n_g)\rightarrow \infty\), \[ KW \rightarrow \chi^2_{t-1}. \]
The exact KW-test can be executed by calculating the permutation null distribution (that only depends on \(n_1, \ldots, n_g\)) to test \[H_0: f_1=\ldots=f_g \text{ vs } H_1: \text{ at least two means are different}.\]
If location-shift is invalid, we have to formulate \(H_1\) in terms of probabilistic indices: \[H_0: f_1=\ldots=f_g \text{ vs } H_1: \exists\ j,k \in \{1,\ldots,g\} : \text{P}\left[Y_j\geq Y_k\right]\neq 0.5\]
kruskal.test(length~dose,data=dna)
Kruskal-Wallis rank sum test
data: length by dose
Kruskal-Wallis chi-squared = 14, df = 3, p-value = 0.002905
On the \(5\%\) level of significance we can reject the null hypothesis.
R-functie kruskal.test
only returns the asymptotic approximation for \(p\)-values.
With only 6 observaties per groep, this is not a good approximation of the \(p\)-value
With the coin
R package we can calculate the exacte \(p\)-value
library(coin)
kwPerm <- kruskal_test(length~dose,data=dna,
distribution=approximate(B=100000))
kwPerm
Approximative Kruskal-Wallis Test
data: length by dose (0, 1.25, 2.5, 5)
chi-squared = 14, p-value = 0.00039
We conclude that the difference in the distribution of the DNA damages due to the DMH dose is extremely significantly different.
Posthoc analysis with WMW tests.
pairwise.wilcox.test(dna$length,dna$dose)
Pairwise comparisons using Wilcoxon rank sum test
data: dna$length and dna$dose
0 1.25 2.5
1.25 0.013 - -
2.5 0.013 0.818 -
5 0.013 0.721 0.788
P value adjustment method: holm
pairwise.wilcox.test
output. Point estimate on probability on higher DNA-damage?nGroup <- table(dna$dose)
probInd <- combn(levels(dna$dose),2,function(x)
{
test <- wilcox.test(length~dose,subset(dna,dose%in%x))
return(test$statistic/prod(nGroup[x]))
}
)
names(probInd) <- combn(levels(dna$dose),2,paste,collapse="vs")
probInd
0vs1.25 0vs2.5 0vs5 1.25vs2.5 1.25vs5 2.5vs5
0.0000000 0.0000000 0.0000000 0.4444444 0.2777778 0.3333333
Because there are doubts on the location-shift model we draw our conclusions in terms of the probabilistic index.
we assume that no ties are available↩