📜  Julia 中的描述性统计

📅  最后修改于: 2021-11-25 04:45:56             🧑  作者: Mango

Julia 是一种适合执行数据分析的编程语言。它有各种内置的统计函数和包来支持描述性统计。描述性统计有助于理解给定数据的特征并快速总结。

在 Julia 中执行描述性统计所需的包:

  • Distributions.jl:它提供了大量的概率分布和相关函数,例如采样、矩、熵、概率密度、对数、最大似然估计、分布组合等。
  • StatsBase.jl:它提供对统计的基本支持。它由各种与统计相关的函数组成,如标量统计、高阶矩计算、计数、排序、协方差、采样和经验密度估计。
  • CSV.jl:用于读取和写入逗号分隔值 (CSV) 文件。
  • Dataframes.jl:用于创建不同的数据结构。
  • StatsPlots.jl:用于表示各种统计图。

在 Julia 中执行描述性统计的步骤:

步骤 1:安装所需的软件包

以下命令可用于安装所需的软件包:

Using Pkg
Pkg.add(“Distributions”)
Pkg.add(“StatsBase”)
Pkg.add(“CSV”)
Pkg.add(“Dataframes”)
Pkg.add(“StatsPlots”)

步骤 2:导入所需的包

Julia
# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions 
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV 
  
# For creation of Data Structures 
using DataFrames  
  
# For representing various plots
using StatsPlots


Julia
# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions 
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV 
  
# For creation of Data Structures 
using DataFrames  
  
# For representing various plots
using StatsPlots 
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);


Julia
# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100); 
  
# mean of Age variable
mean(Age)
  
# median of Age variable
median(Age)
  
# Variance of Age variable
var(Age)
  
# Standard deviation of Age variable
std(Age)
  
# Descriptive statistics of Age variable
describe(Age)
  
# summarystats function excludes type
summarystats(Age)


Julia
# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# number of rows and columns
size(DF)
  
# First 5 rows
head(DF, 5)
  
# Last 5 rows
tail(DF, 5)
  
# Selecting specific data only
# Data in which BGRP=AB is printed
DFAB = DF[DF[:BGRP] .=="AB", :] 
  
# Data in which AGE>50 is printed
DF50 = DF[DF[:AGE] .>90, :]


Julia
# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Perform descriptive statistics of data frame
describe(DF)


Julia
# Descriptive Statistics in Julia
# Importing required packages 
#to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Counting the number of rows 
# with blood groups A,B,O,AB
by(DF, :BGRP, DF-> DataFrame(Total = size(DF, 1)))
  
# Counting the number of rows
# with blood groups A, B, O, AB 
# using size argument
by(DF, :BGRP, size)


Julia
# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Mean AGE of Blood groups A, B, AB, O
by(DF, :BGRP, DF->mean(DF.AGE))
  
# Using the describe function 
# we can get the complete descriptive statistics
by(DF, :BGRP, DF->describe(DF.AGE))


Julia
# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Plotting density plot
@df DF density(
   :AGE,
   group = :BGRP,
   xlab = "Age",
   ylab = "Distribution"    
)


Julia
# Descriptive Statistics in Julia
# Importing required packages to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Plotting Box plot
@df DF boxplot(
  :AGE,
  xlab = ”Age”,
  ylab = ”Distribution”    
)


第 3 步:创建受激数据(随机变量)

让我们用随机数据值创建各种变量

例子:

朱莉娅

# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions 
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV 
  
# For creation of Data Structures 
using DataFrames  
  
# For representing various plots
using StatsPlots 
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);

第 4 步:执行描述性统计

Julia 中常见的统计函数包括mean()median()var()std() ,分别用于计算数据的均值、中值、方差和标准差。更方便的函数是 StatsBase 包中的 describe ()summarystats()来执行描述性统计。

例子:

朱莉娅

# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100); 
  
# mean of Age variable
mean(Age)
  
# median of Age variable
median(Age)
  
# Variance of Age variable
var(Age)
  
# Standard deviation of Age variable
std(Age)
  
# Descriptive statistics of Age variable
describe(Age)
  
# summarystats function excludes type
summarystats(Age)

输出:

第 5 步:从受激数据创建数据框

受激数据应存储在数据框对象中,以便轻松执行操作操作。

例子:

朱莉娅

# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# number of rows and columns
size(DF)
  
# First 5 rows
head(DF, 5)
  
# Last 5 rows
tail(DF, 5)
  
# Selecting specific data only
# Data in which BGRP=AB is printed
DFAB = DF[DF[:BGRP] .=="AB", :] 
  
# Data in which AGE>50 is printed
DF50 = DF[DF[:AGE] .>90, :]

输出:

第 6 步:使用 DataFrame 对象的描述性统计

  • describe()函数可用于执行数据对象的描述性统计。

例子:

朱莉娅

# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Perform descriptive statistics of data frame
describe(DF)

输出:

  • by()函数用于计算分类变量的样本空间中的元素数量。

例子:

朱莉娅

# Descriptive Statistics in Julia
# Importing required packages 
#to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Counting the number of rows 
# with blood groups A,B,O,AB
by(DF, :BGRP, DF-> DataFrame(Total = size(DF, 1)))
  
# Counting the number of rows
# with blood groups A, B, O, AB 
# using size argument
by(DF, :BGRP, size)

输出:

  • 不同数值变量的描述性统计量可以通过分类变量分离后计算。

例子:

朱莉娅

# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Mean AGE of Blood groups A, B, AB, O
by(DF, :BGRP, DF->mean(DF.AGE))
  
# Using the describe function 
# we can get the complete descriptive statistics
by(DF, :BGRP, DF->describe(DF.AGE))

输出:

第 7 步:使用绘图可视化数据

DataFrames 包与使用宏函数的 Plots 包配合良好。在以下代码中:

  • 让我们分析血型 A、B、AB、O 的年龄分布:

例子:

朱莉娅

# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Plotting density plot
@df DF density(
   :AGE,
   group = :BGRP,
   xlab = "Age",
   ylab = "Distribution"    
)

输出:

  • 让我们创建一个 Age 的盒须图:

例子:

朱莉娅

# Descriptive Statistics in Julia
# Importing required packages to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Plotting Box plot
@df DF boxplot(
  :AGE,
  xlab = ”Age”,
  ylab = ”Distribution”    
)

输出: