在 R 编程中使用二进制文件
在计算机科学世界中,文本文件包含人类易于理解的数据。它包括字母、数字和其他字符。另一方面,二进制文件包含只有计算机才能解释的 1 和 0。存储在二进制文件中的信息无法被人类读取,因为其中的字节转换为包含各种其他不可打印字符的字符和符号。
当其他程序生成的数据对于 R 语言作为二进制文件处理必不可少时,有时可能会发生这种情况。此外,R 必须负责创建可以与其他程序共享的二进制文件。可以在二进制文件中执行的四个最重要的操作是:
- 创建和写入二进制文件
- 从二进制文件中读取
- 附加到二进制文件
- 删除二进制文件
创建和写入二进制文件
二进制文件的创建和写入都可以由单个函数writeBin()执行,方法是以“ wb ”模式打开文件,其中 w 表示写入,b 表示二进制模式。
Syntax: writeBin(object, con)
Parameters:
object: an R object to be written to the connection
con: a connection object or a character string naming a file or a raw vector.
例子:
Python3
# R program to illustrate
# working with binary file
# Creating a data frame
df = data.frame(
"ID" = c(1, 2, 3, 4),
"Name" = c("Tony", "Thor", "Loki", "Hulk"),
"Age" = c(20, 34, 24, 40),
"Pin" = c(756083, 756001, 751003, 110011)
)
# Creating a connection object
# to write the binary file using mode "wb"
con = file("myfile.dat", "wb")
# Write the column names of the data frame
# to the connection object
writeBin(colnames(df), con)
# Write the records in each of the columns to the file
writeBin(c(df$ID, df$Name, df$Age, df$Pin), con)
# Close the connection object
close(con)
Python3
# R program to illustrate
# working with binary file
# Creating a connection object
# to read the file in binary mode using "rb".
con = file("myfile.dat", "rb")
# Read the column names
# n = 4 as here 4 column
colname = readBin(con, character(), n = 4)
# Read column values
# n = 20 as here 16 values and 4 column names
con = file("myfile.dat", "rb")
bindata = readBin(con, integer(), n = 20)
# Read the ID values
# as first 1:4 byte for col name
# then values of ID col is within 5 to 8
ID = bindata[5:8]
# Similarly 9 to 12 byte for values of name column
Name = bindata[9:12]
# 13 to 16 byte for values of the age column
Age = bindata[13:16]
# 17 to 20 byte for values of Pincode column
PinCode = bindata[17:20]
# Combining all the values and make it a data frame
finaldata = cbind(ID, Name, Age, PinCode)
colnames(finaldata)= colname
print(finaldata)
Python3
# R program to illustrate
# working with binary file
# Creating another data frame
# to append with the existing data frame
df = data.frame(
"Salary" = c(100, 200, 300, 400),
"Experience" = c(3, 5, 10, 4)
)
# Creating a connection object
# to append the binary file using mode "ab"
con = file("myfile.dat", "ab")
# append the column names of the data frame
# to the connection object
writeBin(colnames(df), con)
# append the records in each of the columns to the file
writeBin(df$Salary, con)
# Close the connection object
close(con)
Python3
# R program to illustrate
# working with binary file
# Define the file name that will be deleted
fileName <- "myfile.dat"
# Check its existence
if (file.exists(fileName))
# Delete file if it exists
file.remove(fileName)
# Unlink the deleted file
unlink(fileName, recursive = TRUE)
输出:
从二进制文件中读取
可以通过函数readBin()以“ rb ”模式打开文件来从二进制文件中读取,其中 r 表示读取,b 表示二进制模式。
Syntax: readBin(con, what, n )
Parameters:
con: a connection object or a character string naming a file or a raw vector
what: either an object whose mode will give the mode of the vector to be read or a character vector of length one describing the mode: one of “numeric”, “double”, “integer”, “int”, “logical”, “complex”, “character”, “raw”
n: the (maximal) number of records to be read
例子:
Python3
# R program to illustrate
# working with binary file
# Creating a connection object
# to read the file in binary mode using "rb".
con = file("myfile.dat", "rb")
# Read the column names
# n = 4 as here 4 column
colname = readBin(con, character(), n = 4)
# Read column values
# n = 20 as here 16 values and 4 column names
con = file("myfile.dat", "rb")
bindata = readBin(con, integer(), n = 20)
# Read the ID values
# as first 1:4 byte for col name
# then values of ID col is within 5 to 8
ID = bindata[5:8]
# Similarly 9 to 12 byte for values of name column
Name = bindata[9:12]
# 13 to 16 byte for values of the age column
Age = bindata[13:16]
# 17 to 20 byte for values of Pincode column
PinCode = bindata[17:20]
# Combining all the values and make it a data frame
finaldata = cbind(ID, Name, Age, PinCode)
colnames(finaldata)= colname
print(finaldata)
输出:
ID Name Age Pin
[1, ] 0 0 0 0
[2, ] 1072693248 1074266112 1074790400 1073741824
[3, ] 0 0 0 0
[4, ] 1073741824 1074790400 1074266112 1072693248
附加到二进制文件
附加到二进制文件可以由相同的函数writeBin()执行,方法是以“ ab ”模式打开文件,其中 a 表示附加,b 表示二进制模式。
例子:
Python3
# R program to illustrate
# working with binary file
# Creating another data frame
# to append with the existing data frame
df = data.frame(
"Salary" = c(100, 200, 300, 400),
"Experience" = c(3, 5, 10, 4)
)
# Creating a connection object
# to append the binary file using mode "ab"
con = file("myfile.dat", "ab")
# append the column names of the data frame
# to the connection object
writeBin(colnames(df), con)
# append the records in each of the columns to the file
writeBin(df$Salary, con)
# Close the connection object
close(con)
输出:
删除二进制文件
在 R 中,可以使用file.remove()命令删除二进制文件,然后使用unlink()函数取消链接已删除的文件。
例子:
Python3
# R program to illustrate
# working with binary file
# Define the file name that will be deleted
fileName <- "myfile.dat"
# Check its existence
if (file.exists(fileName))
# Delete file if it exists
file.remove(fileName)
# Unlink the deleted file
unlink(fileName, recursive = TRUE)