R语言常用统计假设检验(二)

格拉布斯检验 Grubbs’ Test

Grubbs’ Test is a statistical test that can be used to identify the presence of outliers in a dataset. To use this test, a dataset should be approximately normally distributed and have at least 7 observations.

library(Outliers)
data <- c(5, 14, 15, 15, 14, 13, 19, 17, 16, 20, 22, 8, 21, 28, 11, 9, 29, 40)
grubbs.test(data)
grubbs.test(data, opposite=TRUE)

二项分布检验 Binomial Test

binom.test(9, 24, 1/6) #  two-tailed 
binom.test(11, 30, 0.5, alternative="less") # left-tailed
binom.test(46, 50, 0.8, alternative="greater") # right-tailed

中位数检验 Mood’s Median Test

Mood’s Median Test is used to compare the medians of two or more independent groups.

library(coin)
method = rep(c('method1', 'method2'), each=10)
score = c(75, 77, 78, 83, 83, 85, 89, 90, 91, 97, 77, 80, 84, 84, 85, 90, 92, 92, 94, 95)
examData = data.frame(method, score)
median_test(score~method, data = examData)

游程检验 Runs Test

Runs test is a statistical test that is used to determine whether or not a dataset comes from a random process.

library(randtests)
data <- c(12, 16, 16, 15, 14, 18, 19, 21, 13, 13)
runs.test(data)

正态性检验 Test for Normality

法1：直方图

set.seed(0)
normal_data <- rnorm(200)

non_normal_data <- rexp(200, rate=3)

par(mfrow=c(1,2)) 
hist(normal_data, col='steelblue', main='Normal')
hist(non_normal_data, col='steelblue', main='Non-normal')

法2：Q-Q图

set.seed(0)
normal_data <- rnorm(200)
non_normal_data <- rexp(200, rate=3)


par(mfrow=c(1,2)) 

qqnorm(normal_data, main='Normal')
qqline(normal_data)

qqnorm(non_normal_data, main='Non-normal')
qqline(non_normal_data)

法3：SW检测 Shapiro-Wilk Test

set.seed(0)
normal_data <- rnorm(200)
shapiro.test(normal_data)

法4：KS检测 Kolmogorov-Smirnov Test

set.seed(0)
normal_data <- rnorm(200)
ks.test(normal_data, 'pnorm')

法5：CV检测 Cramer-Von Mises Test

library(goftest)
set.seed(0)
normal_data <- rnorm(200)
cvm.test(data, 'pnorm')

多元正态性检验 Multivariate Normality Tests

法1：Mardia’s Test

library(QuantPsyc)
set.seed(0)
data <- data.frame(x1 = rnorm(50),
                   x2 = rnorm(50),
                   x3 = rnorm(50))
mult.norm(data)$mult.test

法2：Energy Test

library(energy)
set.seed(0)
data <- data.frame(x1 = rnorm(50),
                   x2 = rnorm(50),
                   x3 = rnorm(50))
mvnorm.etest(data, R=100)

卡方独立性检验 Chi-Square Test of Independence

A Chi-Square Test of Independence is used to determine whether or not there is a significant association between two categorical variables.

data <- matrix(c(120, 90, 40, 110, 95, 45), ncol=3, byrow=TRUE)
colnames(data) <- c("Rep","Dem","Ind")
rownames(data) <- c("Male","Female")
data <- as.table(data)

chisq.test(data)

卡方拟合优度检验 Chi-Square Goodness of Fit Test

A Chi-Square Goodness of Fit Test is used to determine whether or not a categorical variable follows a hypothesized distribution.

observed <- c(50, 60, 40, 47, 53) 
expected <- c(.2, .2, .2, .2, .2) 
chisq.test(x=observed, p=expected)

似然比检验 Likelihood Ratio Test

library(lmtest)
#fit full model
model_full <- lm(mpg ~ disp + carb + hp + cyl, data = mtcars)
#fit reduced model
model_reduced <- lm(mpg ~ disp + carb, data = mtcars)

lrtest(model_full, model_reduced)
# H0: The full model and the nested model fit the data equally well.

Cramer’s V系数 Cramer’s V

Cramer’s V is a measure of the strength of association between two nominal variables.

library(rcompanion)
data = matrix(c(7,9,12,8), nrow = 2)
cramerV(data, ci = TRUE)

# 多变量
data = matrix(c(6, 9, 8, 5, 12, 9), nrow = 2)
cramerV(data, ci = TRUE)

$\Phi$系数 Phi Coefficient

A Phi Coefficient (sometimes called a mean square contingency coefficient) is a measure of the association between two binary variables.

data = matrix(c(4, 8, 9, 4), nrow = 2)
phi(data)

基尼系数 Gini Coefficient

Gini coefficient is a way to measure the income distribution of a population.

library(DescTools)
x <- c(50, 50, 70, 70, 70, 90, 150, 150, 150, 150)
Gini(x, unbiased=FALSE)

# 指定频数
x <- c(10, 20, 25, 55, 70, 90, 110, 115, 130)
n <- c(6, 7, 7, 14, 22, 20, 8, 4, 1)
Gini(x, n, unbiased=FALSE)

邹检验 Chow Test

A Chow test is used to test whether the coefficients in two different regression models on different datasets are equal.

library(strucchange)
data <- data.frame(x = c(1, 1, 2, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8, 9, 10, 10,
                         11, 12, 12, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20, 20),
                   y = c(3, 5, 6, 10, 13, 15, 17, 14, 20, 23, 25, 27, 30, 30, 31,
                         33, 32, 32, 30, 32, 34, 34, 37, 35, 34, 36, 34, 37, 38, 36))

sctest(data$y ~ data$x, type = "Chow", point = 10)

格兰杰因果检验 Granger-Causality Test

The Granger Causality test is used to determine whether or not one time series is useful for forecasting another.

library(lmtest)
data(ChickEgg)
grangertest(chicken ~ egg, order = 3, data = ChickEgg)
grangertest(egg ~ chicken, order = 3, data = ChickEgg)

# H0: Time series x does not Granger-cause time series y

巴特莱特检验 Bartlett’s Test

Bartlett’s test is a statistical test that is used to determine whether or not the variances between several groups are equal.

df <-data.frame(group = rep(c('A','B', 'C'), each=10),
                score = c(85, 86, 88, 75, 78, 94, 98, 79, 71, 80,
                          91, 92, 93, 85, 87, 84, 82, 88, 95, 96,
                          79, 78, 88, 94, 92, 85, 83, 85, 82, 81))
bartlett.test(score ~ group, data = df)

对数秩检验 Log Rank Test

library(survival)
head(ovarian)
survdiff(Surv(futime, fustat) ~ rx, data=ovarian)
# H0: There is no difference in survival between the two groups.

R语言常用统计假设检验(二)

格拉布斯检验 Grubbs’ Test

二项分布检验 Binomial Test

中位数检验 Mood’s Median Test

游程检验 Runs Test

正态性检验 Test for Normality

多元正态性检验 Multivariate Normality Tests

相关性检验 Correlation Test

卡方独立性检验 Chi-Square Test of Independence

卡方拟合优度检验 Chi-Square Goodness of Fit Test

似然比检验 Likelihood Ratio Test

Cramer’s V系数 Cramer’s V

$\Phi$系数 Phi Coefficient

基尼系数 Gini Coefficient

邹检验 Chow Test

格兰杰因果检验 Granger-Causality Test

巴特莱特检验 Bartlett’s Test

对数秩检验 Log Rank Test

留下评论取消回复

格拉布斯检验 Grubbs’ Test

二项分布检验 Binomial Test

中位数检验 Mood’s Median Test

游程检验 Runs Test

正态性检验 Test for Normality

多元正态性检验 Multivariate Normality Tests

相关性检验 Correlation Test

卡方独立性检验 Chi-Square Test of Independence

卡方拟合优度检验 Chi-Square Goodness of Fit Test

似然比检验 Likelihood Ratio Test

Cramer’s V系数 Cramer’s V

$\Phi$系数 Phi Coefficient

基尼系数 Gini Coefficient

邹检验 Chow Test

格兰杰因果检验 Granger-Causality Test

巴特莱特检验 Bartlett’s Test

对数秩检验 Log Rank Test

相关文章

隐马尔可夫模型(HMM)（三）

隐马尔可夫模型(HMM)（二）

隐马尔可夫模型(HMM)(一)

留下评论取消回复