📜  Tukey 的 R 编程中的五数总结——fivenum()函数(1)

📅  最后修改于: 2023-12-03 15:20:41.226000             🧑  作者: Mango

Tukey's Five-Number Summary in R: fivenum() Function

Introduction

Tukey's five-number summary, also known as the Tukey's fence, is a method of describing a dataset using five key numbers: minimum, first quartile, median, third quartile, and maximum. These numbers are used to understand the distribution, skewness, and outliers of a dataset. In R programming, we can calculate Tukey's five-number summary using the fivenum() function, which is a built-in function in R.

Syntax

The syntax for the fivenum() function is as follows:

fivenum(x, na.rm = FALSE)

where x is the dataset for which we want to calculate the Tukey's five-number summary, and na.rm is a logical value that indicates whether missing values should be removed or not. If na.rm = TRUE, missing values are removed; otherwise, they are not removed.

Example

Let's consider the following dataset:

data <- c(10, 18, 12, 14, 22, 13, 18, 20, 16, 19, 15, 17, 21, 10, 11)

To calculate Tukey's five-number summary for this dataset, we can use the fivenum() function as follows:

fivenum(data)

The output will be:

[1] 10.0 12.5 16.5 19.0 22.0

which means that the minimum value is 10.0, the first quartile is 12.5, the median is 16.5, the third quartile is 19.0, and the maximum value is 22.0.

Interpretation

The Tukey's five-number summary provides a quick and easy way to understand the distribution, centrality, and spread of a dataset. We can interpret it as follows:

  • The minimum value is the smallest observation in the dataset.
  • The first quartile is the point below which 25% of the observations lie.
  • The median is the point below which 50% of the observations lie.
  • The third quartile is the point below which 75% of the observations lie.
  • The maximum value is the largest observation in the dataset.

Using these five numbers, we can get a rough idea of the range and spread of the data. We can also identify outliers by looking for values that fall outside the range defined by the minimum and maximum values.

Conclusion

The fivenum() function in R allows us to quickly calculate Tukey's five-number summary for a dataset. This summary provides a quick and easy way to understand the distribution, centrality, and spread of the data, as well as identifying outliers.