Ggplot2 简明教程

ggplot2 - Time Series

时间序列是一个图形绘图,它以特定时间顺序表示一系列数据点。时间序列是在相等时间间隔的连续序列下获取的序列。时间序列可以被认为是离散时间数据。我们在本章中将使用的数据集是“economics”数据集,其中包含美国经济时间序列的所有详细信息。

A time series is a graphical plot which represents the series of data points in a specific time order. A time series is a sequence taken with a sequence at a successive equal spaced points of time. Time series can be considered as discrete-time data. The dataset which we will use in this chapter is “economics” dataset which includes all the details of US economic time series.

数据框包括以下属性,如下所示 −

The dataframe includes following attributes which is mentioned below −

Date

Month of data collection

Psavert

Personal savings rate

Pce

Personal consumption expenditure

Unemploy

Number of unemployed in thousands

Unempmed

Median duration of unemployment

Pop

Total population in thousands

加载必需的包并将默认主题设置为创建时序。

Load the required packages and set the default theme to create a time series.

> library(ggplot2)
> theme_set(theme_minimal())
> # Demo dataset
> head(economics)
# A tibble: 6 x 6
date           pce            pop          psavert       uempmed       unemploy
<date>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>
1 1967-07-01    507.          198712        12.6          4.5           2944
2 1967-08-01    510.          198911        12.6          4.7           2945
3 1967-09-01    516.          199113        11.9          4.6           2958
4 1967-10-01    512.          199311        12.9          4.9           3143
5 1967-11-01    517.          199498        12.8          4.7           3066
6 1967-12-01    525.          199657        11.8          4.8           3018

创建一个基本折线图来创建时序结构。

Create a basic line plots which creates a time series structure.

> # Basic line plot
> ggplot(data = economics, aes(x = date, y = pop))+
+ geom_line(color = "#00AFBB", size = 2)
time series structure

我们可以使用以下命令绘制数据的子集 -

We can plot the subset of data using following command −

> # Plot a subset of the data
> ss <- subset(economics, date > as.Date("2006-1-1"))
> ggplot(data = ss, aes(x = date, y = pop)) +
+ geom_line(color = "#FC4E07", size = 2)
subset of data

Creating Time Series

这里我们将根据日期绘制变量 psavert 和 uempmed。这里我们必须使用 tidyr 包重新整形数据。这可以通过在同一列(新列)中折叠 psavert 和 uempmed 值来实现。R 函数:gather()[tidyr]。下一步涉及创建具有级别 = psavert 和 uempmed 的分组变量。

Here we will plot the variables psavert and uempmed by dates. Here we must reshape the data using the tidyr package. This can be achieved by collapsing psavert and uempmed values in the same column (new column). R function: gather()[tidyr]. The next step involves creating a grouping variable that with levels = psavert and uempmed.

> library(tidyr)
> library(dplyr)
Attaching package: ‘dplyr’
The following object is masked from ‘package:ggplot2’: vars
The following objects are masked from ‘package:stats’: filter, lag
The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union
> df <- economics %>%
+    select(date, psavert, uempmed) %>%
+    gather(key = "variable", value = "value", -date)
> head(df, 3)
# A tibble: 3 x 3
date          variable     value
<date> <chr>   <dbl>
1 1967-07-01   psavert       12.6
2 1967-08-01   psavert       12.6
3 1967-09-01   psavert       11.9

使用以下命令创建多条折线图以查看“psavert”和“unempmed”之间的关系 -

Create a multiple line plots using following command to have a look on the relationship between “psavert” and “unempmed” −

> ggplot(df, aes(x = date, y = value)) +
+    geom_line(aes(color = variable), size = 1) +
+    scale_color_manual(values = c("#00AFBB", "#E7B800")) +
+    theme_minimal()
multiple line plots