📜  R中的DataFrame操作

📅  最后修改于: 2022-05-13 01:55:03.200000             🧑  作者: Mango

R中的DataFrame操作

DataFrames是 R 的通用数据对象,用于存储表格数据。数据框被认为是 R 编程中最流行的数据对象,因为以表格形式分析数据更加舒适。数据框架也可以作为床垫教授,其中矩阵的每一列都可以是不同的数据类型。 DataFrame 由三个主要组件组成,即数据、行和列。

可以对 DataFrame 执行的操作有:

  • 创建数据框
  • 访问行和列
  • 选择数据框的子集
  • 编辑数据框
  • 向数据框中添加额外的行和列
  • 根据现有变量向数据框添加新变量
  • 删除数据框中的行和列

创建数据框

在现实世界中,将通过从现有存储中加载数据集来创建 DataFrame,存储可以是 SQL 数据库、CSV 文件和 Excel 文件。 DataFrame 也可以从 R 中的向量创建。以下是一些可用于创建 DataFrame 的各种方法:

使用向量创建数据框:要创建数据框,我们使用 R 中的data.frame()函数。要创建数据框,请使用data.frame()命令,然后将您创建的每个向量作为参数传递给函数。

例子:

Python3
# R program to illustrate dataframe
 
# A vector which is a character vector
Name = c("Amiya", "Raj", "Asish")
 
# A vector which is a character vector
Language = c("R", "Python", "Java")
 
# A vector which is a numeric vector
Age = c(22, 25, 45)
 
# To create dataframe use data.frame command and
# then pass each of the vectors
# we have created as arguments
# to the function data.frame()
df = data.frame(Name, Language, Age)
 
print(df)


Python3
# R program to illustrate operations
# on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
print(df)
 
# Accessing first and second row
cat("Accessing first and second row\n")
print(df[1:2, ])


Python3
# R program to illustrate operations
# on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
print(df)
 
# Accessing first and second column
cat("Accessing first and second column\n")
print(df[, 1:2])


Python3
# R program to illustrate operations
# on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
print(df)
 
# Selecting the subset of the data frame
# where Name is equal to Amiya
# OR age is greater than 30
newDf = subset(df, Name =="Amiya"|Age>30)
 
cat("After Selecting the subset of the data frame\n")
print(newDf)


Python3
# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before editing the dataframe\n")
print(df)
 
# Editing dataframes by direct assignments
# [[3]] accessing the top level components
# Here Age in this case
# [[3]][3] accessing inner level components
# Here Age of Asish in this case
df[[3]][3] = 30
 
cat("After edited the dataframe\n")
print(df)


Python3
# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before adding row\n")
print(df)
 
# Add a new row using rbind()
newDf = rbind(df, data.frame(Name = "Sandeep",
                            Language = "C",
                            Age = 23
                           ))
cat("After Added a row\n")
print(newDf)


Python3
# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before adding column\n")
print(df)
 
# Add a new column using cbind()
newDf = cbind(df, Rank=c(3, 5, 1))
 
cat("After Added a column\n")
print(newDf)


Python3
# R program to illustrate operation on a data frame
 
# Importing the dplyr library
library(dplyr)
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Original Dataframe\n")
print(df)
 
# Creating an extra variable column
# "log_Age" which is log of variable column "Age"
# Using mutate() command
newDf = mutate(df, log_Age = log(Age))
 
cat("After creating extra variable column\n")
print(newDf)


Python3
# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before deleting the 3rd row and 2nd column\n")
print(df)
 
# delete the third row and the second column
newDF = df[-3, -2]
 
cat("After Deleted the 3rd row and 2nd column\n")
print(newDF)


输出:

Name  Language  Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

使用文件中的数据创建数据框:也可以通过从文件中导入数据来创建数据框。为此,您必须使用名为“ read.table() ”的函数。

句法:

newDF = read.table(path="Path of the file")

要从 R 中的 CSV 文件创建数据框:

句法:

newDF = read.csv("FileName.csv")

访问行和列

下面给出了访问行和列的语法,

df[val1, val2]

df = dataframe object
val1 = rows of a data frame
val2 = columns of a data frame

因此,这个 ' val1 ' 和 ' val2 ' 可以是一个值数组,例如“1:2”或“2:3”等。如果您仅指定df[val2]这仅指列的集合,您需要从数据框访问。

示例:行选择

Python3

# R program to illustrate operations
# on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
print(df)
 
# Accessing first and second row
cat("Accessing first and second row\n")
print(df[1:2, ])

输出:

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

Accessing first and second row
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25

示例:列选择

Python3

# R program to illustrate operations
# on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
print(df)
 
# Accessing first and second column
cat("Accessing first and second column\n")
print(df[, 1:2])

输出:

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

Accessing first and second column
   Name Language
1 Amiya        R
2   Raj   Python
3 Asish     Java

选择 DataFrame 的子集

借助以下语法,也可以根据某些条件创建 DataFrame 的子集。

例子:

Python3

# R program to illustrate operations
# on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
print(df)
 
# Selecting the subset of the data frame
# where Name is equal to Amiya
# OR age is greater than 30
newDf = subset(df, Name =="Amiya"|Age>30)
 
cat("After Selecting the subset of the data frame\n")
print(newDf)

输出:

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After Selecting the subset of the data frame
   Name Language Age
1 Amiya        R  22
3 Asish     Java  45

编辑数据框

在 R 中,可以通过两种方式编辑 DataFrame:
通过直接分配编辑数据框:与 R 中的列表非常相似,您可以通过直接分配编辑数据框。

例子:

Python3

# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before editing the dataframe\n")
print(df)
 
# Editing dataframes by direct assignments
# [[3]] accessing the top level components
# Here Age in this case
# [[3]][3] accessing inner level components
# Here Age of Asish in this case
df[[3]][3] = 30
 
cat("After edited the dataframe\n")
print(df)

输出:

Before editing the data frame
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After edited the data frame
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  30

使用 edit() 命令编辑数据框:
按照给定的步骤编辑 DataFrame:

步骤 1 : 所以,你需要做的是你必须创建一个数据框的实例,例如,你可以看到这里创建了一个数据框的实例,并使用命令data命名为“myTable” .frame()这将创建一个空数据框。

第 2 步:接下来我们将使用编辑函数启动查看器。请注意,“myTable”数据框被传递回“myTable”对象,这样我们对该模块所做的更改将保存到原始对象中。

因此,当执行上述命令时,它会弹出一个像这样的窗口,

第 3 步:现在,这个小名单已经完成了表格。

请注意,通过单击它们的标签并键入您的更改来更改变量名称。变量也可以设置为数字或字符。一旦 DataFrame 中的数据如上所示,关闭表格。更改会自动保存。

第 4 步:通过打印检查结果数据框。

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45 

向数据框中添加行和列

添加额外的行:我们可以使用命令rbind()添加额外的行。下面给出了它的语法,

请注意,您必须在使用rbind()时添加新行的条目,因为每个列条目中的数据类型应该等于已经存在的行的数据类型。

例子:

Python3

# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before adding row\n")
print(df)
 
# Add a new row using rbind()
newDf = rbind(df, data.frame(Name = "Sandeep",
                            Language = "C",
                            Age = 23
                           ))
cat("After Added a row\n")
print(newDf)

输出:

Before adding row
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After Added a row
     Name Language Age
1   Amiya        R  22
2     Raj   Python  25
3   Asish     Java  45
4 Sandeep        C  23

添加额外的列:我们可以使用命令cbind()添加额外的列。下面给出了它的语法,

例子:

Python3

# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before adding column\n")
print(df)
 
# Add a new column using cbind()
newDf = cbind(df, Rank=c(3, 5, 1))
 
cat("After Added a column\n")
print(newDf)

输出:

Before adding column
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After Added a column
   Name Language Age Rank
1 Amiya        R  22    3
2   Raj   Python  25    5
3 Asish     Java  45    1

向 DataFrame 添加新变量

在 R 中,我们可以在现有变量的基础上将新变量添加到数据框中。为此,我们必须首先使用命令library()调用dplyr库。然后调用mutate()函数将在现有变量的基础上添加额外的变量列。

句法:

例子:

Python3

# R program to illustrate operation on a data frame
 
# Importing the dplyr library
library(dplyr)
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Original Dataframe\n")
print(df)
 
# Creating an extra variable column
# "log_Age" which is log of variable column "Age"
# Using mutate() command
newDf = mutate(df, log_Age = log(Age))
 
cat("After creating extra variable column\n")
print(newDf)

输出:

Original Dataframe
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After creating extra variable column
   Name Language Age  log_Age
1 Amiya        R  22 3.091042
2   Raj   Python  25 3.218876
3 Asish     Java  45 3.806662

从数据框中删除行和列

要删除行或列,首先,您需要访问该行或列,然后在该行或列之前插入一个负号。它表明您必须删除该行或列。

句法:

例子:

Python3

# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before deleting the 3rd row and 2nd column\n")
print(df)
 
# delete the third row and the second column
newDF = df[-3, -2]
 
cat("After Deleted the 3rd row and 2nd column\n")
print(newDF)

输出:

Before deleting the 3rd row and 2nd column
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45
After Deleted the 3rd row and 2nd column
   Name Age
1 Amiya  22
2   Raj  25