R中的DataFrame操作

DataFrames是 R 的通用数据对象，用于存储表格数据。数据框被认为是 R 编程中最流行的数据对象，因为以表格形式分析数据更加舒适。数据框架也可以作为床垫教授，其中矩阵的每一列都可以是不同的数据类型。 DataFrame 由三个主要组件组成，即数据、行和列。

可以对 DataFrame 执行的操作有：

创建数据框
访问行和列
选择数据框的子集
编辑数据框
向数据框中添加额外的行和列
根据现有变量向数据框添加新变量
删除数据框中的行和列

创建数据框

在现实世界中，将通过从现有存储中加载数据集来创建 DataFrame，存储可以是 SQL 数据库、CSV 文件和 Excel 文件。 DataFrame 也可以从 R 中的向量创建。以下是一些可用于创建 DataFrame 的各种方法：

使用向量创建数据框：要创建数据框，我们使用 R 中的data.frame()函数。要创建数据框，请使用data.frame()命令，然后将您创建的每个向量作为参数传递给函数。

例子：

Python3

# R program to illustrate dataframe
 
# A vector which is a character vector
Name = c("Amiya", "Raj", "Asish")
 
# A vector which is a character vector
Language = c("R", "Python", "Java")
 
# A vector which is a numeric vector
Age = c(22, 25, 45)
 
# To create dataframe use data.frame command and
# then pass each of the vectors
# we have created as arguments
# to the function data.frame()
df = data.frame(Name, Language, Age)
 
print(df)

Python3

# R program to illustrate operations
# on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
print(df)
 
# Accessing first and second row
cat("Accessing first and second row\n")
print(df[1:2, ])

Python3

# R program to illustrate operations
# on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
print(df)
 
# Accessing first and second column
cat("Accessing first and second column\n")
print(df[, 1:2])

Python3

# R program to illustrate operations
# on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
print(df)
 
# Selecting the subset of the data frame
# where Name is equal to Amiya
# OR age is greater than 30
newDf = subset(df, Name =="Amiya"|Age>30)
 
cat("After Selecting the subset of the data frame\n")
print(newDf)

Python3

# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before editing the dataframe\n")
print(df)
 
# Editing dataframes by direct assignments
# [[3]] accessing the top level components
# Here Age in this case
# [[3]][3] accessing inner level components
# Here Age of Asish in this case
df[[3]][3] = 30
 
cat("After edited the dataframe\n")
print(df)

Python3

# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before adding row\n")
print(df)
 
# Add a new row using rbind()
newDf = rbind(df, data.frame(Name = "Sandeep",
                            Language = "C",
                            Age = 23
                           ))
cat("After Added a row\n")
print(newDf)

Python3

# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before adding column\n")
print(df)
 
# Add a new column using cbind()
newDf = cbind(df, Rank=c(3, 5, 1))
 
cat("After Added a column\n")
print(newDf)

Python3

# R program to illustrate operation on a data frame
 
# Importing the dplyr library
library(dplyr)
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Original Dataframe\n")
print(df)
 
# Creating an extra variable column
# "log_Age" which is log of variable column "Age"
# Using mutate() command
newDf = mutate(df, log_Age = log(Age))
 
cat("After creating extra variable column\n")
print(newDf)

Python3

# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before deleting the 3rd row and 2nd column\n")
print(df)
 
# delete the third row and the second column
newDF = df[-3, -2]
 
cat("After Deleted the 3rd row and 2nd column\n")
print(newDF)

输出：

Name  Language  Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

使用文件中的数据创建数据框：也可以通过从文件中导入数据来创建数据框。为此，您必须使用名为“ read.table() ”的函数。

句法：

newDF = read.table(path="Path of the file")

要从 R 中的 CSV 文件创建数据框：

句法：

newDF = read.csv("FileName.csv")

访问行和列

下面给出了访问行和列的语法，

df[val1, val2]

df = dataframe object
val1 = rows of a data frame
val2 = columns of a data frame

因此，这个 ' val1 ' 和 ' val2 ' 可以是一个值数组，例如“1:2”或“2:3”等。如果您仅指定df[val2]这仅指列的集合，您需要从数据框访问。

示例：行选择

Python3

# R program to illustrate operations
# on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
print(df)
 
# Accessing first and second row
cat("Accessing first and second row\n")
print(df[1:2, ])

输出：

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

Accessing first and second row
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25

示例：列选择

Python3

# R program to illustrate operations
# on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
print(df)
 
# Accessing first and second column
cat("Accessing first and second column\n")
print(df[, 1:2])

输出：

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

Accessing first and second column
   Name Language
1 Amiya        R
2   Raj   Python
3 Asish     Java

选择 DataFrame 的子集

借助以下语法，也可以根据某些条件创建 DataFrame 的子集。

newDF = subset(df, conditions)
df = Original dataframe
conditions = Certain conditions

编程需要懂一点英语

例子：

Python3

# R program to illustrate operations
# on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
print(df)
 
# Selecting the subset of the data frame
# where Name is equal to Amiya
# OR age is greater than 30
newDf = subset(df, Name =="Amiya"|Age>30)
 
cat("After Selecting the subset of the data frame\n")
print(newDf)

输出：

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After Selecting the subset of the data frame
   Name Language Age
1 Amiya        R  22
3 Asish     Java  45

编辑数据框

在 R 中，可以通过两种方式编辑 DataFrame：
通过直接分配编辑数据框：与 R 中的列表非常相似，您可以通过直接分配编辑数据框。

例子：

Python3

# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before editing the dataframe\n")
print(df)
 
# Editing dataframes by direct assignments
# [[3]] accessing the top level components
# Here Age in this case
# [[3]][3] accessing inner level components
# Here Age of Asish in this case
df[[3]][3] = 30
 
cat("After edited the dataframe\n")
print(df)

输出：

Before editing the data frame
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After edited the data frame
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  30

使用 edit() 命令编辑数据框：
按照给定的步骤编辑 DataFrame：

步骤 1 : 所以，你需要做的是你必须创建一个数据框的实例，例如，你可以看到这里创建了一个数据框的实例，并使用命令data命名为“myTable” .frame()这将创建一个空数据框。

myTable = data.frame()

编程需要懂一点英语

第 2 步：接下来我们将使用编辑函数启动查看器。请注意，“myTable”数据框被传递回“myTable”对象，这样我们对该模块所做的更改将保存到原始对象中。

myTable = edit(myTable)

编程需要懂一点英语

因此，当执行上述命令时，它会弹出一个像这样的窗口，

第 3 步：现在，这个小名单已经完成了表格。

请注意，通过单击它们的标签并键入您的更改来更改变量名称。变量也可以设置为数字或字符。一旦 DataFrame 中的数据如上所示，关闭表格。更改会自动保存。

第 4 步：通过打印检查结果数据框。

> myTable

编程需要懂一点英语

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

向数据框中添加行和列

添加额外的行：我们可以使用命令rbind()添加额外的行。下面给出了它的语法，

newDF = rbind(df, the entries for the new row you have to add )
df = Original data frame

编程需要懂一点英语

请注意，您必须在使用rbind()时添加新行的条目，因为每个列条目中的数据类型应该等于已经存在的行的数据类型。

例子：

Python3

# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before adding row\n")
print(df)
 
# Add a new row using rbind()
newDf = rbind(df, data.frame(Name = "Sandeep",
                            Language = "C",
                            Age = 23
                           ))
cat("After Added a row\n")
print(newDf)

输出：

Before adding row
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After Added a row
     Name Language Age
1   Amiya        R  22
2     Raj   Python  25
3   Asish     Java  45
4 Sandeep        C  23

添加额外的列：我们可以使用命令cbind()添加额外的列。下面给出了它的语法，

newDF = cbind(df, the entries for the new column you have to add )
df = Original data frame

编程需要懂一点英语

例子：

Python3

# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before adding column\n")
print(df)
 
# Add a new column using cbind()
newDf = cbind(df, Rank=c(3, 5, 1))
 
cat("After Added a column\n")
print(newDf)

输出：

Before adding column
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After Added a column
   Name Language Age Rank
1 Amiya        R  22    3
2   Raj   Python  25    5
3 Asish     Java  45    1

向 DataFrame 添加新变量

在 R 中，我们可以在现有变量的基础上将新变量添加到数据框中。为此，我们必须首先使用命令library()调用dplyr库。然后调用mutate()函数将在现有变量的基础上添加额外的变量列。

句法：

library(dplyr)
newDF = mutate(df, new_var=[existing_var])
df = original data frame
new_var = Name of the new variable
existing_var = The modify action you are taking(e.g log value, multiply by 10)

编程需要懂一点英语

例子：

Python3

# R program to illustrate operation on a data frame
 
# Importing the dplyr library
library(dplyr)
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Original Dataframe\n")
print(df)
 
# Creating an extra variable column
# "log_Age" which is log of variable column "Age"
# Using mutate() command
newDf = mutate(df, log_Age = log(Age))
 
cat("After creating extra variable column\n")
print(newDf)

输出：

Original Dataframe
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After creating extra variable column
   Name Language Age  log_Age
1 Amiya        R  22 3.091042
2   Raj   Python  25 3.218876
3 Asish     Java  45 3.806662

从数据框中删除行和列

要删除行或列，首先，您需要访问该行或列，然后在该行或列之前插入一个负号。它表明您必须删除该行或列。

句法：

newDF = df[-rowNo, -colNo]
df = original data frame

编程需要懂一点英语

例子：

Python3

# R program to illustrate operation on a data frame
 
# Creating a dataframe
df = data.frame(
  "Name" = c("Amiya", "Raj", "Asish"),
  "Language" = c("R", "Python", "Java"),
  "Age" = c(22, 25, 45)
)
cat("Before deleting the 3rd row and 2nd column\n")
print(df)
 
# delete the third row and the second column
newDF = df[-3, -2]
 
cat("After Deleted the 3rd row and 2nd column\n")
print(newDF)

输出：

Before deleting the 3rd row and 2nd column
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45
After Deleted the 3rd row and 2nd column
   Name Age
1 Amiya  22
2   Raj  25