R 简明教程

R - Databases

数据关系数据库系统以规范化格式存储。因此,要执行统计计算,我们需要非常高级和复杂的 SQL 查询。但 R 能轻松连接到许多关系数据库,如 MySQL、Oracle、SQL Server 等,并从这些数据库中获取记录,作为数据框。一旦数据可用于 R 环境,它就成为正常的 R 数据集,可以利用所有强大的程序包和函数来对其进行处理或分析。

The data is Relational database systems are stored in a normalized format. So, to carry out statistical computing we will need very advanced and complex Sql queries. But R can connect easily to many relational databases like MySql, Oracle, Sql server etc. and fetch records from them as a data frame. Once the data is available in the R environment, it becomes a normal R data set and can be manipulated or analyzed using all the powerful packages and functions.

在本教程中,我们将使用 MySQL 作为连接到 R 的参考数据库。

In this tutorial we will be using MySql as our reference database for connecting to R.

RMySQL Package

R 有一个名为“RMySQL”的内置程序包,该程序包提供与 MySQL 数据库之间的原生连接。您可以在 R 环境中使用以下命令安装此程序包。

R has a built-in package named "RMySQL" which provides native connectivity between with MySql database. You can install this package in the R environment using the following command.

install.packages("RMySQL")

Connecting R to MySql

安装完程序包后,我们在 R 中创建一个连接对象,以连接到数据库。它将用户名、密码、数据库名和主机名作为输入。

Once the package is installed we create a connection object in R to connect to the database. It takes the username, password, database name and host name as input.

# Create a connection Object to MySQL database.
# We will connect to the sampel database named "sakila" that comes with MySql installation.
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'sakila',
   host = 'localhost')

# List the tables available in this database.
 dbListTables(mysqlconnection)

当我们执行上述代码时,会产生以下结果 -

When we execute the above code, it produces the following result −

 [1] "actor"                      "actor_info"
 [3] "address"                    "category"
 [5] "city"                       "country"
 [7] "customer"                   "customer_list"
 [9] "film"                       "film_actor"
[11] "film_category"              "film_list"
[13] "film_text"                  "inventory"
[15] "language"                   "nicer_but_slower_film_list"
[17] "payment"                    "rental"
[19] "sales_by_film_category"     "sales_by_store"
[21] "staff"                      "staff_list"
[23] "store"

Querying the Tables

我们可以使用函数 dbSendQuery() 查询 MySQL 中的数据库表。查询在 MySQL 中执行,并使用 R fetch() 函数返回结果集。最后,它作为数据框存储在 R 中。

We can query the database tables in MySql using the function dbSendQuery(). The query gets executed in MySql and the result set is returned using the R fetch() function. Finally it is stored as a data frame in R.

# Query the "actor" tables to get all the rows.
result = dbSendQuery(mysqlconnection, "select * from actor")

# Store the result in a R data frame object. n = 5 is used to fetch first 5 rows.
data.frame = fetch(result, n = 5)
print(data.fame)

当我们执行上述代码时,会产生以下结果 -

When we execute the above code, it produces the following result −

        actor_id   first_name    last_name         last_update
1        1         PENELOPE      GUINESS           2006-02-15 04:34:33
2        2         NICK          WAHLBERG          2006-02-15 04:34:33
3        3         ED            CHASE             2006-02-15 04:34:33
4        4         JENNIFER      DAVIS             2006-02-15 04:34:33
5        5         JOHNNY        LOLLOBRIGIDA      2006-02-15 04:34:33

Query with Filter Clause

我们可以传递任何有效的 select 查询来获取结果。

We can pass any valid select query to get the result.

result = dbSendQuery(mysqlconnection, "select * from actor where last_name = 'TORN'")

# Fetch all the records(with n = -1) and store it as a data frame.
data.frame = fetch(result, n = -1)
print(data)

当我们执行上述代码时,会产生以下结果 -

When we execute the above code, it produces the following result −

        actor_id    first_name     last_name         last_update
1        18         DAN            TORN              2006-02-15 04:34:33
2        94         KENNETH        TORN              2006-02-15 04:34:33
3       102         WALTER         TORN              2006-02-15 04:34:33

Updating Rows in the Tables

我们可以通过将 update 查询传递到 dbSendQuery() 函数中来更新 Mysql 表中的行。

We can update the rows in a Mysql table by passing the update query to the dbSendQuery() function.

dbSendQuery(mysqlconnection, "update mtcars set disp = 168.5 where hp = 110")

在执行上述代码后,我们可以看到在 MySql 环境中更新的表。

After executing the above code we can see the table updated in the MySql Environment.

Inserting Data into the Tables

dbSendQuery(mysqlconnection,
   "insert into mtcars(row_names, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb)
   values('New Mazda RX4 Wag', 21, 6, 168.5, 110, 3.9, 2.875, 17.02, 0, 1, 4, 4)"
)

在执行上述代码后,我们可以看到插入到 MySql 环境中的表的行。

After executing the above code we can see the row inserted into the table in the MySql Environment.

Creating Tables in MySql

我们可以使用函数 dbWriteTable() 在 MySql 中创建表。如果表已经存在,它将覆盖该表,并采用数据框作为输入。

We can create tables in the MySql using the function dbWriteTable(). It overwrites the table if it already exists and takes a data frame as input.

# Create the connection object to the database where we want to create the table.
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'sakila',
   host = 'localhost')

# Use the R data frame "mtcars" to create the table in MySql.
# All the rows of mtcars are taken inot MySql.
dbWriteTable(mysqlconnection, "mtcars", mtcars[, ], overwrite = TRUE)

在执行上述代码后,我们可以看到在 MySql 环境中创建的表。

After executing the above code we can see the table created in the MySql Environment.

Dropping Tables in MySql

我们可以通过将 drop table 语句传递到 dbSendQuery() 中来删除 MySql 数据库中的表,就像我们用于从表中查询数据一样。

We can drop the tables in MySql database passing the drop table statement into the dbSendQuery() in the same way we used it for querying data from tables.

dbSendQuery(mysqlconnection, 'drop table if exists mtcars')

在执行上述代码后,我们可以看到在 MySql 环境中删除了该表。

After executing the above code we can see the table is dropped in the MySql Environment.