R语言常用统计假设检验(二)

格拉布斯检验 Grubbs’ Test

Grubbs’ Test is a statistical test that can be used to identify the presence of outliers in a dataset. To use this test, a dataset should be approximately normally distributed and have at least 7 observations.

library(Outliers)
data <- c(5, 14, 15, 15, 14, 13, 19, 17, 16, 20, 22, 8, 21, 28, 11, 9, 29, 40)
grubbs.test(data)
grubbs.test(data, opposite=TRUE)

二项分布检验 Binomial Test

binom.test(9, 24, 1/6) #  two-tailed 
binom.test(11, 30, 0.5, alternative="less") # left-tailed
binom.test(46, 50, 0.8, alternative="greater") # right-tailed 

中位数检验 Mood’s Median Test

Mood’s Median Test is used to compare the medians of two or more independent groups.

library(coin)
method = rep(c('method1', 'method2'), each=10)
score = c(75, 77, 78, 83, 83, 85, 89, 90, 91, 97, 77, 80, 84, 84, 85, 90, 92, 92, 94, 95)
examData = data.frame(method, score)
median_test(score~method, data = examData)

游程检验 Runs Test

Runs test is a statistical test that is used to determine whether or not a dataset comes from a random process.

library(randtests)
data <- c(12, 16, 16, 15, 14, 18, 19, 21, 13, 13)
runs.test(data)

正态性检验 Test for Normality

  • 法1:直方图
set.seed(0)
normal_data <- rnorm(200)

non_normal_data <- rexp(200, rate=3)

par(mfrow=c(1,2)) 
hist(normal_data, col='steelblue', main='Normal')
hist(non_normal_data, col='steelblue', main='Non-normal')
  • 法2:Q-Q图
set.seed(0)
normal_data <- rnorm(200)
non_normal_data <- rexp(200, rate=3)


par(mfrow=c(1,2)) 

qqnorm(normal_data, main='Normal')
qqline(normal_data)

qqnorm(non_normal_data, main='Non-normal')
qqline(non_normal_data)
  • 法3:SW检测 Shapiro-Wilk Test
set.seed(0)
normal_data <- rnorm(200)
shapiro.test(normal_data)
  • 法4:KS检测 Kolmogorov-Smirnov Test
set.seed(0)
normal_data <- rnorm(200)
ks.test(normal_data, 'pnorm')
  • 法5:CV检测 Cramer-Von Mises Test
library(goftest)
set.seed(0)
normal_data <- rnorm(200)
cvm.test(data, 'pnorm')

多元正态性检验 Multivariate Normality Tests

  • 法1:Mardia’s Test
library(QuantPsyc)
set.seed(0)
data <- data.frame(x1 = rnorm(50),
                   x2 = rnorm(50),
                   x3 = rnorm(50))
mult.norm(data)$mult.test
  • 法2:Energy Test
library(energy)
set.seed(0)
data <- data.frame(x1 = rnorm(50),
                   x2 = rnorm(50),
                   x3 = rnorm(50))
mvnorm.etest(data, R=100)

相关性检验 Correlation Test

x <- c(2, 3, 3, 5, 6, 9, 14, 15, 19, 21, 22, 23)
y <- c(23, 24, 24, 23, 17, 28, 38, 34, 35, 39, 41, 43)
cor.test(x, y)

卡方独立性检验 Chi-Square Test of Independence

A Chi-Square Test of Independence is used to determine whether or not there is a significant association between two categorical variables.

data <- matrix(c(120, 90, 40, 110, 95, 45), ncol=3, byrow=TRUE)
colnames(data) <- c("Rep","Dem","Ind")
rownames(data) <- c("Male","Female")
data <- as.table(data)

chisq.test(data)

卡方拟合优度检验 Chi-Square Goodness of Fit Test

A Chi-Square Goodness of Fit Test is used to determine whether or not a categorical variable follows a hypothesized distribution.

observed <- c(50, 60, 40, 47, 53) 
expected <- c(.2, .2, .2, .2, .2) 
chisq.test(x=observed, p=expected)

似然比检验 Likelihood Ratio Test

library(lmtest)
#fit full model
model_full <- lm(mpg ~ disp + carb + hp + cyl, data = mtcars)
#fit reduced model
model_reduced <- lm(mpg ~ disp + carb, data = mtcars)

lrtest(model_full, model_reduced)
# H0: The full model and the nested model fit the data equally well.

Cramer’s V系数 Cramer’s V

Cramer’s V is a measure of the strength of association between two nominal variables.

library(rcompanion)
data = matrix(c(7,9,12,8), nrow = 2)
cramerV(data, ci = TRUE)

# 多变量
data = matrix(c(6, 9, 8, 5, 12, 9), nrow = 2)
cramerV(data, ci = TRUE)

$\Phi$系数 Phi Coefficient

A Phi Coefficient (sometimes called a mean square contingency coefficient) is a measure of the association between two binary variables.

data = matrix(c(4, 8, 9, 4), nrow = 2)
phi(data)

基尼系数 Gini Coefficient

Gini coefficient is a way to measure the income distribution of a population.

library(DescTools)
x <- c(50, 50, 70, 70, 70, 90, 150, 150, 150, 150)
Gini(x, unbiased=FALSE)

# 指定频数
x <- c(10, 20, 25, 55, 70, 90, 110, 115, 130)
n <- c(6, 7, 7, 14, 22, 20, 8, 4, 1)
Gini(x, n, unbiased=FALSE)

邹检验 Chow Test

A Chow test is used to test whether the coefficients in two different regression models on different datasets are equal.

library(strucchange)
data <- data.frame(x = c(1, 1, 2, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8, 9, 10, 10,
                         11, 12, 12, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20, 20),
                   y = c(3, 5, 6, 10, 13, 15, 17, 14, 20, 23, 25, 27, 30, 30, 31,
                         33, 32, 32, 30, 32, 34, 34, 37, 35, 34, 36, 34, 37, 38, 36))

sctest(data$y ~ data$x, type = "Chow", point = 10)

格兰杰因果检验 Granger-Causality Test

The Granger Causality test is used to determine whether or not one time series is useful for forecasting another.

library(lmtest)
data(ChickEgg)
grangertest(chicken ~ egg, order = 3, data = ChickEgg)
grangertest(egg ~ chicken, order = 3, data = ChickEgg)

# H0: Time series x does not Granger-cause time series y

巴特莱特检验 Bartlett’s Test

Bartlett’s test is a statistical test that is used to determine whether or not the variances between several groups are equal.

df <-data.frame(group = rep(c('A','B', 'C'), each=10),
                score = c(85, 86, 88, 75, 78, 94, 98, 79, 71, 80,
                          91, 92, 93, 85, 87, 84, 82, 88, 95, 96,
                          79, 78, 88, 94, 92, 85, 83, 85, 82, 81))
bartlett.test(score ~ group, data = df)

对数秩检验 Log Rank Test

library(survival)
head(ovarian)
survdiff(Surv(futime, fustat) ~ rx, data=ovarian)
# H0: There is no difference in survival between the two groups.

留下评论