Python Pandas 简明教程
Python Pandas - Concatenation
Pandas 提供了多种功能,可轻松组合 Series, DataFrame 和 Panel 对象。
Pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects.
pd.concat(objs,axis=0,join='outer',join_axes=None,
ignore_index=False)
-
objs − This is a sequence or mapping of Series, DataFrame, or Panel objects.
-
axis − {0, 1, …}, default 0. This is the axis to concatenate along.
-
join − {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es). Outer for union and inner for intersection.
-
ignore_index − boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, …, n - 1.
-
join_axes − This is the list of Index objects. Specific indexes to use for the other (n-1) axes instead of performing inner/outer set logic.
Concatenating Objects
concat 函数执行沿轴执行串联操作的所有繁重工作。让我们创建不同的对象并进行串联。
The concat function does all of the heavy lifting of performing concatenation operations along an axis. Let us create different objects and do concatenation.
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
print pd.concat([one,two])
它的 output 如下所示 −
Its output is as follows −
Marks_scored Name subject_id
1 98 Alex sub1
2 90 Amy sub2
3 87 Allen sub4
4 69 Alice sub6
5 78 Ayoung sub5
1 89 Billy sub2
2 80 Brian sub4
3 79 Bran sub3
4 97 Bryce sub6
5 88 Betty sub5
假设我们希望将特定键与切片的每个 DataFrame 片段相关联。我们可以通过使用 keys 参数来实现此目的−
Suppose we wanted to associate specific keys with each of the pieces of the chopped up DataFrame. We can do this by using the keys argument −
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
print pd.concat([one,two],keys=['x','y'])
它的 output 如下所示 −
Its output is as follows −
x 1 98 Alex sub1
2 90 Amy sub2
3 87 Allen sub4
4 69 Alice sub6
5 78 Ayoung sub5
y 1 89 Billy sub2
2 80 Brian sub4
3 79 Bran sub3
4 97 Bryce sub6
5 88 Betty sub5
结果索引被复制;每个索引都被重复。
The index of the resultant is duplicated; each index is repeated.
如果结果对象必须遵循其自己的索引,则将 ignore_index 设置为 True 。
If the resultant object has to follow its own indexing, set ignore_index to True.
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
print pd.concat([one,two],keys=['x','y'],ignore_index=True)
它的 output 如下所示 −
Its output is as follows −
Marks_scored Name subject_id
0 98 Alex sub1
1 90 Amy sub2
2 87 Allen sub4
3 69 Alice sub6
4 78 Ayoung sub5
5 89 Billy sub2
6 80 Brian sub4
7 79 Bran sub3
8 97 Bryce sub6
9 88 Betty sub5
请观察,索引完全更改,并且键也被覆盖。
Observe, the index changes completely and the Keys are also overridden.
如果两个对象需要沿 axis=1 添加,那么将追加新列。
If two objects need to be added along axis=1, then the new columns will be appended.
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
print pd.concat([one,two],axis=1)
它的 output 如下所示 −
Its output is as follows −
Marks_scored Name subject_id Marks_scored Name subject_id
1 98 Alex sub1 89 Billy sub2
2 90 Amy sub2 80 Brian sub4
3 87 Allen sub4 79 Bran sub3
4 69 Alice sub6 97 Bryce sub6
5 78 Ayoung sub5 88 Betty sub5
Concatenating Using append
一个有用的串联快捷方式是 Series 和 DataFrame 上的 append 实例方法。这些方法实际上早于 concat。它们沿 axis=0 串联,即索引−
A useful shortcut to concat are the append instance methods on Series and DataFrame. These methods actually predated concat. They concatenate along axis=0, namely the index −
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
print one.append(two)
它的 output 如下所示 −
Its output is as follows −
Marks_scored Name subject_id
1 98 Alex sub1
2 90 Amy sub2
3 87 Allen sub4
4 69 Alice sub6
5 78 Ayoung sub5
1 89 Billy sub2
2 80 Brian sub4
3 79 Bran sub3
4 97 Bryce sub6
5 88 Betty sub5
append 函数也可以接受多个对象−
The append function can take multiple objects as well −
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
print one.append([two,one,two])
它的 output 如下所示 −
Its output is as follows −
Marks_scored Name subject_id
1 98 Alex sub1
2 90 Amy sub2
3 87 Allen sub4
4 69 Alice sub6
5 78 Ayoung sub5
1 89 Billy sub2
2 80 Brian sub4
3 79 Bran sub3
4 97 Bryce sub6
5 88 Betty sub5
1 98 Alex sub1
2 90 Amy sub2
3 87 Allen sub4
4 69 Alice sub6
5 78 Ayoung sub5
1 89 Billy sub2
2 80 Brian sub4
3 79 Bran sub3
4 97 Bryce sub6
5 88 Betty sub5
Time Series
Pandas 提供了一个强大的工具,用于使用时间序列数据进行工作时间,尤其是在金融领域。在使用时间序列数据时,我们经常会遇到以下问题 −
Pandas provide a robust tool for working time with Time series data, especially in the financial sector. While working with time series data, we frequently come across the following −
-
Generating sequence of time
-
Convert the time series to different frequencies
Pandas 提供了一组相对紧凑和独立的工具来执行上述任务。
Pandas provides a relatively compact and self-contained set of tools for performing the above tasks.
Get Current Time
datetime.now() 为您提供当前日期和时间。
datetime.now() gives you the current date and time.
import pandas as pd
print pd.datetime.now()
它的 output 如下所示 −
Its output is as follows −
2017-05-11 06:10:13.393147
Create a TimeStamp
时间戳数据是最基本类型的时间序列数据,它将值与时间点相关联。对于 pandas 对象,这意味着使用时间点。我们来看一个例子−
Time-stamped data is the most basic type of timeseries data that associates values with points in time. For pandas objects, it means using the points in time. Let’s take an example −
import pandas as pd
print pd.Timestamp('2017-03-01')
它的 output 如下所示 −
Its output is as follows −
2017-03-01 00:00:00
还可以转换整数或浮点数时间戳。它们的默认单位是纳秒(因为这是 Timestamp 的存储方式)。但是,时间戳经常存储在另一个单位中,该单位可以被指定。我们来看另一个例子
It is also possible to convert integer or float epoch times. The default unit for these is nanoseconds (since these are how Timestamps are stored). However, often epochs are stored in another unit which can be specified. Let’s take another example
import pandas as pd
print pd.Timestamp(1587687255,unit='s')
它的 output 如下所示 −
Its output is as follows −
2020-04-24 00:14:15
Create a Range of Time
import pandas as pd
print pd.date_range("11:00", "13:30", freq="30min").time
它的 output 如下所示 −
Its output is as follows −
[datetime.time(11, 0) datetime.time(11, 30) datetime.time(12, 0)
datetime.time(12, 30) datetime.time(13, 0) datetime.time(13, 30)]
Change the Frequency of Time
import pandas as pd
print pd.date_range("11:00", "13:30", freq="H").time
它的 output 如下所示 −
Its output is as follows −
[datetime.time(11, 0) datetime.time(12, 0) datetime.time(13, 0)]
Converting to Timestamps
要转换Series 或类似列表的类似日期的对象,例如字符串、时间戳或混合,可以使用 to_datetime 函数。传递时,它返回一个 Series(具有相同的索引),而 list-like 将转换为 DatetimeIndex 。请看以下示例 −
To convert a Series or list-like object of date-like objects, for example strings, epochs, or a mixture, you can use the to_datetime function. When passed, this returns a Series (with the same index), while a list-like is converted to a DatetimeIndex. Take a look at the following example −
import pandas as pd
print pd.to_datetime(pd.Series(['Jul 31, 2009','2010-01-10', None]))
它的 output 如下所示 −
Its output is as follows −
0 2009-07-31
1 2010-01-10
2 NaT
dtype: datetime64[ns]
NaT 表示 Not a Time (等同于 NaN)
NaT means Not a Time (equivalent to NaN)
我们来看另一个例子。
Let’s take another example.
import pandas as pd
print pd.to_datetime(['2005/11/23', '2010.12.31', None])
它的 output 如下所示 −
Its output is as follows −
DatetimeIndex(['2005-11-23', '2010-12-31', 'NaT'], dtype='datetime64[ns]', freq=None)