Ggplot2 简明教程
ggplot2 - Time Series
时间序列是一个图形绘图,它以特定时间顺序表示一系列数据点。时间序列是在相等时间间隔的连续序列下获取的序列。时间序列可以被认为是离散时间数据。我们在本章中将使用的数据集是“economics”数据集,其中包含美国经济时间序列的所有详细信息。
A time series is a graphical plot which represents the series of data points in a specific time order. A time series is a sequence taken with a sequence at a successive equal spaced points of time. Time series can be considered as discrete-time data. The dataset which we will use in this chapter is “economics” dataset which includes all the details of US economic time series.
数据框包括以下属性,如下所示 −
The dataframe includes following attributes which is mentioned below −
Date |
Month of data collection |
Psavert |
Personal savings rate |
Pce |
Personal consumption expenditure |
Unemploy |
Number of unemployed in thousands |
Unempmed |
Median duration of unemployment |
Pop |
Total population in thousands |
加载必需的包并将默认主题设置为创建时序。
Load the required packages and set the default theme to create a time series.
> library(ggplot2)
> theme_set(theme_minimal())
> # Demo dataset
> head(economics)
# A tibble: 6 x 6
date pce pop psavert uempmed unemploy
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1967-07-01 507. 198712 12.6 4.5 2944
2 1967-08-01 510. 198911 12.6 4.7 2945
3 1967-09-01 516. 199113 11.9 4.6 2958
4 1967-10-01 512. 199311 12.9 4.9 3143
5 1967-11-01 517. 199498 12.8 4.7 3066
6 1967-12-01 525. 199657 11.8 4.8 3018
创建一个基本折线图来创建时序结构。
Create a basic line plots which creates a time series structure.
> # Basic line plot
> ggplot(data = economics, aes(x = date, y = pop))+
+ geom_line(color = "#00AFBB", size = 2)

我们可以使用以下命令绘制数据的子集 -
We can plot the subset of data using following command −
> # Plot a subset of the data
> ss <- subset(economics, date > as.Date("2006-1-1"))
> ggplot(data = ss, aes(x = date, y = pop)) +
+ geom_line(color = "#FC4E07", size = 2)

Creating Time Series
这里我们将根据日期绘制变量 psavert 和 uempmed。这里我们必须使用 tidyr 包重新整形数据。这可以通过在同一列(新列)中折叠 psavert 和 uempmed 值来实现。R 函数:gather()[tidyr]。下一步涉及创建具有级别 = psavert 和 uempmed 的分组变量。
Here we will plot the variables psavert and uempmed by dates. Here we must reshape the data using the tidyr package. This can be achieved by collapsing psavert and uempmed values in the same column (new column). R function: gather()[tidyr]. The next step involves creating a grouping variable that with levels = psavert and uempmed.
> library(tidyr)
> library(dplyr)
Attaching package: ‘dplyr’
The following object is masked from ‘package:ggplot2’: vars
The following objects are masked from ‘package:stats’: filter, lag
The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union
> df <- economics %>%
+ select(date, psavert, uempmed) %>%
+ gather(key = "variable", value = "value", -date)
> head(df, 3)
# A tibble: 3 x 3
date variable value
<date> <chr> <dbl>
1 1967-07-01 psavert 12.6
2 1967-08-01 psavert 12.6
3 1967-09-01 psavert 11.9
使用以下命令创建多条折线图以查看“psavert”和“unempmed”之间的关系 -
Create a multiple line plots using following command to have a look on the relationship between “psavert” and “unempmed” −
> ggplot(df, aes(x = date, y = value)) +
+ geom_line(aes(color = variable), size = 1) +
+ scale_color_manual(values = c("#00AFBB", "#E7B800")) +
+ theme_minimal()
