R中的DataFrame操作
DataFrames是 R 的通用数据对象,用于存储表格数据。数据框被认为是 R 编程中最流行的数据对象,因为以表格形式分析数据更加舒适。数据框架也可以作为床垫教授,其中矩阵的每一列都可以是不同的数据类型。 DataFrame 由三个主要组件组成,即数据、行和列。
可以对 DataFrame 执行的操作有:
- 创建数据框
- 访问行和列
- 选择数据框的子集
- 编辑数据框
- 向数据框中添加额外的行和列
- 根据现有变量向数据框添加新变量
- 删除数据框中的行和列
创建数据框
在现实世界中,将通过从现有存储中加载数据集来创建 DataFrame,存储可以是 SQL 数据库、CSV 文件和 Excel 文件。 DataFrame 也可以从 R 中的向量创建。以下是一些可用于创建 DataFrame 的各种方法:
使用向量创建数据框:要创建数据框,我们使用 R 中的data.frame()函数。要创建数据框,请使用data.frame()命令,然后将您创建的每个向量作为参数传递给函数。
例子:
Python3
# R program to illustrate dataframe
# A vector which is a character vector
Name = c("Amiya", "Raj", "Asish")
# A vector which is a character vector
Language = c("R", "Python", "Java")
# A vector which is a numeric vector
Age = c(22, 25, 45)
# To create dataframe use data.frame command and
# then pass each of the vectors
# we have created as arguments
# to the function data.frame()
df = data.frame(Name, Language, Age)
print(df)
Python3
# R program to illustrate operations
# on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
print(df)
# Accessing first and second row
cat("Accessing first and second row\n")
print(df[1:2, ])
Python3
# R program to illustrate operations
# on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
print(df)
# Accessing first and second column
cat("Accessing first and second column\n")
print(df[, 1:2])
Python3
# R program to illustrate operations
# on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
print(df)
# Selecting the subset of the data frame
# where Name is equal to Amiya
# OR age is greater than 30
newDf = subset(df, Name =="Amiya"|Age>30)
cat("After Selecting the subset of the data frame\n")
print(newDf)
Python3
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Before editing the dataframe\n")
print(df)
# Editing dataframes by direct assignments
# [[3]] accessing the top level components
# Here Age in this case
# [[3]][3] accessing inner level components
# Here Age of Asish in this case
df[[3]][3] = 30
cat("After edited the dataframe\n")
print(df)
Python3
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Before adding row\n")
print(df)
# Add a new row using rbind()
newDf = rbind(df, data.frame(Name = "Sandeep",
Language = "C",
Age = 23
))
cat("After Added a row\n")
print(newDf)
Python3
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Before adding column\n")
print(df)
# Add a new column using cbind()
newDf = cbind(df, Rank=c(3, 5, 1))
cat("After Added a column\n")
print(newDf)
Python3
# R program to illustrate operation on a data frame
# Importing the dplyr library
library(dplyr)
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Original Dataframe\n")
print(df)
# Creating an extra variable column
# "log_Age" which is log of variable column "Age"
# Using mutate() command
newDf = mutate(df, log_Age = log(Age))
cat("After creating extra variable column\n")
print(newDf)
Python3
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Before deleting the 3rd row and 2nd column\n")
print(df)
# delete the third row and the second column
newDF = df[-3, -2]
cat("After Deleted the 3rd row and 2nd column\n")
print(newDF)
输出:
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
使用文件中的数据创建数据框:也可以通过从文件中导入数据来创建数据框。为此,您必须使用名为“ read.table() ”的函数。
句法:
newDF = read.table(path="Path of the file")
要从 R 中的 CSV 文件创建数据框:
句法:
newDF = read.csv("FileName.csv")
访问行和列
下面给出了访问行和列的语法,
df[val1, val2]
df = dataframe object
val1 = rows of a data frame
val2 = columns of a data frame
因此,这个 ' val1 ' 和 ' val2 ' 可以是一个值数组,例如“1:2”或“2:3”等。如果您仅指定df[val2]这仅指列的集合,您需要从数据框访问。
示例:行选择
Python3
# R program to illustrate operations
# on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
print(df)
# Accessing first and second row
cat("Accessing first and second row\n")
print(df[1:2, ])
输出:
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
Accessing first and second row
Name Language Age
1 Amiya R 22
2 Raj Python 25
示例:列选择
Python3
# R program to illustrate operations
# on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
print(df)
# Accessing first and second column
cat("Accessing first and second column\n")
print(df[, 1:2])
输出:
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
Accessing first and second column
Name Language
1 Amiya R
2 Raj Python
3 Asish Java
选择 DataFrame 的子集
借助以下语法,也可以根据某些条件创建 DataFrame 的子集。
newDF = subset(df, conditions)
df = Original dataframe
conditions = Certain conditions
例子:
Python3
# R program to illustrate operations
# on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
print(df)
# Selecting the subset of the data frame
# where Name is equal to Amiya
# OR age is greater than 30
newDf = subset(df, Name =="Amiya"|Age>30)
cat("After Selecting the subset of the data frame\n")
print(newDf)
输出:
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
After Selecting the subset of the data frame
Name Language Age
1 Amiya R 22
3 Asish Java 45
编辑数据框
在 R 中,可以通过两种方式编辑 DataFrame:
通过直接分配编辑数据框:与 R 中的列表非常相似,您可以通过直接分配编辑数据框。
例子:
Python3
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Before editing the dataframe\n")
print(df)
# Editing dataframes by direct assignments
# [[3]] accessing the top level components
# Here Age in this case
# [[3]][3] accessing inner level components
# Here Age of Asish in this case
df[[3]][3] = 30
cat("After edited the dataframe\n")
print(df)
输出:
Before editing the data frame
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
After edited the data frame
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 30
使用 edit() 命令编辑数据框:
按照给定的步骤编辑 DataFrame:
步骤 1 : 所以,你需要做的是你必须创建一个数据框的实例,例如,你可以看到这里创建了一个数据框的实例,并使用命令data命名为“myTable” .frame()这将创建一个空数据框。
myTable = data.frame()
第 2 步:接下来我们将使用编辑函数启动查看器。请注意,“myTable”数据框被传递回“myTable”对象,这样我们对该模块所做的更改将保存到原始对象中。
myTable = edit(myTable)
因此,当执行上述命令时,它会弹出一个像这样的窗口,
第 3 步:现在,这个小名单已经完成了表格。
请注意,通过单击它们的标签并键入您的更改来更改变量名称。变量也可以设置为数字或字符。一旦 DataFrame 中的数据如上所示,关闭表格。更改会自动保存。
第 4 步:通过打印检查结果数据框。
> myTable
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
向数据框中添加行和列
添加额外的行:我们可以使用命令rbind()添加额外的行。下面给出了它的语法,
newDF = rbind(df, the entries for the new row you have to add )
df = Original data frame
请注意,您必须在使用rbind()时添加新行的条目,因为每个列条目中的数据类型应该等于已经存在的行的数据类型。
例子:
Python3
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Before adding row\n")
print(df)
# Add a new row using rbind()
newDf = rbind(df, data.frame(Name = "Sandeep",
Language = "C",
Age = 23
))
cat("After Added a row\n")
print(newDf)
输出:
Before adding row
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
After Added a row
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
4 Sandeep C 23
添加额外的列:我们可以使用命令cbind()添加额外的列。下面给出了它的语法,
newDF = cbind(df, the entries for the new column you have to add )
df = Original data frame
例子:
Python3
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Before adding column\n")
print(df)
# Add a new column using cbind()
newDf = cbind(df, Rank=c(3, 5, 1))
cat("After Added a column\n")
print(newDf)
输出:
Before adding column
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
After Added a column
Name Language Age Rank
1 Amiya R 22 3
2 Raj Python 25 5
3 Asish Java 45 1
向 DataFrame 添加新变量
在 R 中,我们可以在现有变量的基础上将新变量添加到数据框中。为此,我们必须首先使用命令library()调用dplyr库。然后调用mutate()函数将在现有变量的基础上添加额外的变量列。
句法:
library(dplyr)
newDF = mutate(df, new_var=[existing_var])
df = original data frame
new_var = Name of the new variable
existing_var = The modify action you are taking(e.g log value, multiply by 10)
例子:
Python3
# R program to illustrate operation on a data frame
# Importing the dplyr library
library(dplyr)
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Original Dataframe\n")
print(df)
# Creating an extra variable column
# "log_Age" which is log of variable column "Age"
# Using mutate() command
newDf = mutate(df, log_Age = log(Age))
cat("After creating extra variable column\n")
print(newDf)
输出:
Original Dataframe
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
After creating extra variable column
Name Language Age log_Age
1 Amiya R 22 3.091042
2 Raj Python 25 3.218876
3 Asish Java 45 3.806662
从数据框中删除行和列
要删除行或列,首先,您需要访问该行或列,然后在该行或列之前插入一个负号。它表明您必须删除该行或列。
句法:
newDF = df[-rowNo, -colNo]
df = original data frame
例子:
Python3
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Before deleting the 3rd row and 2nd column\n")
print(df)
# delete the third row and the second column
newDF = df[-3, -2]
cat("After Deleted the 3rd row and 2nd column\n")
print(newDF)
输出:
Before deleting the 3rd row and 2nd column
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
After Deleted the 3rd row and 2nd column
Name Age
1 Amiya 22
2 Raj 25