R 简明教程

R - Binary Files

二进制文件是一个仅以位和字节(0 和 1)形式存储信息的 file。它们不可读,因为其中的字节转换为包含许多其他不可打印字符的字符和符号。尝试使用任何文本编辑器读取二进制文件将显示类似于 Ø 和 ð 的字符。

A binary file is a file that contains information stored only in form of bits and bytes.(0’s and 1’s). They are not human readable as the bytes in it translate to characters and symbols which contain many other non-printable characters. Attempting to read a binary file using any text editor will show characters like Ø and ð.

二进制文件必须由特定程序读取才能使用。例如,Microsoft Word 程序的二进制文件只能由 Word 程序读取为人类可读的形式。这表明,除了人类可读文本之外,还有更多信息,例如字符格式和页码等,这些信息也与字母数字字符一起存储。最后,二进制文件是连续的字节序列。我们在文本文件中看到的换行符是连接第一行和下一行的字符。

The binary file has to be read by specific programs to be useable. For example, the binary file of a Microsoft Word program can be read to a human readable form only by the Word program. Which indicates that, besides the human readable text, there is a lot more information like formatting of characters and page numbers etc., which are also stored along with alphanumeric characters. And finally a binary file is a continuous sequence of bytes. The line break we see in a text file is a character joining first line to the next.

有时,需要由 R 将其他程序生成的数据作为二进制文件处理。还需要 R 创建可以与其他程序共享的二进制文件。

Sometimes, the data generated by other programs are required to be processed by R as a binary file. Also R is required to create binary files which can be shared with other programs.

R 有两个函数 WriteBin()readBin() 来创建和读取二进制文件。

R has two functions WriteBin() and readBin() to create and read binary files.

Syntax

writeBin(object, con)
readBin(con, what, n )

以下是所用参数的描述 -

Following is the description of the parameters used −

  1. con is the connection object to read or write the binary file.

  2. object is the binary file which to be written.

  3. what is the mode like character, integer etc. representing the bytes to be read.

  4. n is the number of bytes to read from the binary file.

Example

我们考虑 R 内置数据“mtcars”。首先,我们从中创建一个 csv 文件,并将其转换为二进制文件并将其存储为 OS 文件。接下来,我们将创建的此二进制文件读入 R。

We consider the R inbuilt data "mtcars". First we create a csv file from it and convert it to a binary file and store it as a OS file. Next we read this binary file created into R.

Writing the Binary File

我们将数据框“mtcars”读作 csv 文件,然后作为二进制文件写入操作系统。

We read the data frame "mtcars" as a csv file and then write it as a binary file to the OS.

# Read the "mtcars" data frame as a csv file and store only the columns
   "cyl", "am" and "gear".
write.table(mtcars, file = "mtcars.csv",row.names = FALSE, na = "",
   col.names = TRUE, sep = ",")

# Store 5 records from the csv file as a new data frame.
new.mtcars <- read.table("mtcars.csv",sep = ",",header = TRUE,nrows = 5)

# Create a connection object to write the binary file using mode "wb".
write.filename = file("/web/com/binmtcars.dat", "wb")

# Write the column names of the data frame to the connection object.
writeBin(colnames(new.mtcars), write.filename)

# Write the records in each of the column to the file.
writeBin(c(new.mtcars$cyl,new.mtcars$am,new.mtcars$gear), write.filename)

# Close the file for writing so that it can be read by other program.
close(write.filename)

Reading the Binary File

上面创建的二进制文件将所有数据存储为连续字节。因此,我们将通过选择列名以及列值来读取它。

The binary file created above stores all the data as continuous bytes. So we will read it by choosing appropriate values of column names as well as the column values.

# Create a connection object to read the file in binary mode using "rb".
read.filename <- file("/web/com/binmtcars.dat", "rb")

# First read the column names. n = 3 as we have 3 columns.
column.names <- readBin(read.filename, character(),  n = 3)

# Next read the column values. n = 18 as we have 3 column names and 15 values.
read.filename <- file("/web/com/binmtcars.dat", "rb")
bindata <- readBin(read.filename, integer(),  n = 18)

# Print the data.
print(bindata)

# Read the values from 4th byte to 8th byte which represents "cyl".
cyldata = bindata[4:8]
print(cyldata)

# Read the values form 9th byte to 13th byte which represents "am".
amdata = bindata[9:13]
print(amdata)

# Read the values form 9th byte to 13th byte which represents "gear".
geardata = bindata[14:18]
print(geardata)

# Combine all the read values to a dat frame.
finaldata = cbind(cyldata, amdata, geardata)
colnames(finaldata) = column.names
print(finaldata)

当我们执行上述代码时,它会产生以下结果和图表:

When we execute the above code, it produces the following result and chart −

 [1]    7108963 1728081249    7496037          6          6          4
 [7]          6          8          1          1          1          0
[13]          0          4          4          4          3          3

[1] 6 6 4 6 8

[1] 1 1 1 0 0

[1] 4 4 4 3 3

     cyl am gear
[1,]   6  1    4
[2,]   6  1    4
[3,]   4  1    4
[4,]   6  0    3
[5,]   8  0    3

正如我们所看到的,我们通过在 R 中读取二进制文件获取了原始数据。

As we can see, we got the original data back by reading the binary file in R.