Pandas 中文参考指南
Time deltas
时间增量是时间差异,用差异单位表示,例如天、小时、分钟、秒。它们可以是正数也可以是负数。
Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative.
Timedelta 是 datetime.timedelta 的子类,行为类似,但它允许与 np.timedelta64 类型以及大量自定义的表示形式、解析和属性兼容。
Timedelta is a subclass of datetime.timedelta, and behaves in a similar manner, but allows compatibility with np.timedelta64 types as well as a host of custom representation, parsing, and attributes.
Parsing
你可以通过多种参数构造 Timedelta 标量,包括 ISO 8601 Duration 字符串。
You can construct a Timedelta scalar through various arguments, including ISO 8601 Duration strings.
In [1]: import datetime
# strings
In [2]: pd.Timedelta("1 days")
Out[2]: Timedelta('1 days 00:00:00')
In [3]: pd.Timedelta("1 days 00:00:00")
Out[3]: Timedelta('1 days 00:00:00')
In [4]: pd.Timedelta("1 days 2 hours")
Out[4]: Timedelta('1 days 02:00:00')
In [5]: pd.Timedelta("-1 days 2 min 3us")
Out[5]: Timedelta('-2 days +23:57:59.999997')
# like datetime.timedelta
# note: these MUST be specified as keyword arguments
In [6]: pd.Timedelta(days=1, seconds=1)
Out[6]: Timedelta('1 days 00:00:01')
# integers with a unit
In [7]: pd.Timedelta(1, unit="d")
Out[7]: Timedelta('1 days 00:00:00')
# from a datetime.timedelta/np.timedelta64
In [8]: pd.Timedelta(datetime.timedelta(days=1, seconds=1))
Out[8]: Timedelta('1 days 00:00:01')
In [9]: pd.Timedelta(np.timedelta64(1, "ms"))
Out[9]: Timedelta('0 days 00:00:00.001000')
# negative Timedeltas have this string repr
# to be more consistent with datetime.timedelta conventions
In [10]: pd.Timedelta("-1us")
Out[10]: Timedelta('-1 days +23:59:59.999999')
# a NaT
In [11]: pd.Timedelta("nan")
Out[11]: NaT
In [12]: pd.Timedelta("nat")
Out[12]: NaT
# ISO 8601 Duration strings
In [13]: pd.Timedelta("P0DT0H1M0S")
Out[13]: Timedelta('0 days 00:01:00')
In [14]: pd.Timedelta("P0DT0H0M0.000000123S")
Out[14]: Timedelta('0 days 00:00:00.000000123')
也可以在构造中使用 DateOffsets (Day, Hour, Minute, Second, Milli, Micro, Nano)。
DateOffsets (Day, Hour, Minute, Second, Milli, Micro, Nano) can also be used in construction.
In [15]: pd.Timedelta(pd.offsets.Second(2))
Out[15]: Timedelta('0 days 00:00:02')
此外,标量之间的运算产生另一个标量 Timedelta。
Further, operations among the scalars yield another scalar Timedelta.
In [16]: pd.Timedelta(pd.offsets.Day(2)) + pd.Timedelta(pd.offsets.Second(2)) + pd.Timedelta(
....: "00:00:00.000123"
....: )
....:
Out[16]: Timedelta('2 days 00:00:02.000123')
to_timedelta
使用顶层 pd.to_timedelta,你可以将标量、数组、列表或 Series 从公认的时间增量格式/值转换为 Timedelta 类型。如果输入是 Series,它将构造 Series;如果输入是类似于标量的,它将构造标量;否则,它将输出 TimedeltaIndex。
Using the top-level pd.to_timedelta, you can convert a scalar, array, list, or Series from a recognized timedelta format / value into a Timedelta type. It will construct Series if the input is a Series, a scalar if the input is scalar-like, otherwise it will output a TimedeltaIndex.
您可以将单个字符串解析为 Timedelta:
You can parse a single string to a Timedelta:
In [17]: pd.to_timedelta("1 days 06:05:01.00003")
Out[17]: Timedelta('1 days 06:05:01.000030')
In [18]: pd.to_timedelta("15.5us")
Out[18]: Timedelta('0 days 00:00:00.000015500')
或字符串列表/数组:
or a list/array of strings:
In [19]: pd.to_timedelta(["1 days 06:05:01.00003", "15.5us", "nan"])
Out[19]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT], dtype='timedelta64[ns]', freq=None)
如果输入是数字,则 unit 关键字参数指定 Timedelta 的单位:
The unit keyword argument specifies the unit of the Timedelta if the input is numeric:
In [20]: pd.to_timedelta(np.arange(5), unit="s")
Out[20]:
TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02',
'0 days 00:00:03', '0 days 00:00:04'],
dtype='timedelta64[ns]', freq=None)
In [21]: pd.to_timedelta(np.arange(5), unit="d")
Out[21]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
警告
Warning
如果字符串或字符串数组作为输入传递,则将忽略 unit 关键字参数。如果传递没有单位的字符串,则假定默认单位为纳秒。
If a string or array of strings is passed as an input then the unit keyword argument will be ignored. If a string without units is passed then the default unit of nanoseconds is assumed.
Timedelta limitations
pandas 使用 64 位整数以纳秒分辨率表示 Timedeltas。因此,64 位整型限制决定了 Timedelta 限制。
pandas represents Timedeltas in nanosecond resolution using 64 bit integers. As such, the 64 bit integer limits determine the Timedelta limits.
In [22]: pd.Timedelta.min
Out[22]: Timedelta('-106752 days +00:12:43.145224193')
In [23]: pd.Timedelta.max
Out[23]: Timedelta('106751 days 23:47:16.854775807')
Operations
您可以在 Series/DataFrames 中操作并通过在 datetime64[ns] Series 或 Timestamps 上进行减法操作来构造 timedelta64[ns] Series。
You can operate on Series/DataFrames and construct timedelta64[ns] Series through subtraction operations on datetime64[ns] Series, or Timestamps.
In [24]: s = pd.Series(pd.date_range("2012-1-1", periods=3, freq="D"))
In [25]: td = pd.Series([pd.Timedelta(days=i) for i in range(3)])
In [26]: df = pd.DataFrame({"A": s, "B": td})
In [27]: df
Out[27]:
A B
0 2012-01-01 0 days
1 2012-01-02 1 days
2 2012-01-03 2 days
In [28]: df["C"] = df["A"] + df["B"]
In [29]: df
Out[29]:
A B C
0 2012-01-01 0 days 2012-01-01
1 2012-01-02 1 days 2012-01-03
2 2012-01-03 2 days 2012-01-05
In [30]: df.dtypes
Out[30]:
A datetime64[ns]
B timedelta64[ns]
C datetime64[ns]
dtype: object
In [31]: s - s.max()
Out[31]:
0 -2 days
1 -1 days
2 0 days
dtype: timedelta64[ns]
In [32]: s - datetime.datetime(2011, 1, 1, 3, 5)
Out[32]:
0 364 days 20:55:00
1 365 days 20:55:00
2 366 days 20:55:00
dtype: timedelta64[ns]
In [33]: s + datetime.timedelta(minutes=5)
Out[33]:
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
In [34]: s + pd.offsets.Minute(5)
Out[34]:
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
In [35]: s + pd.offsets.Minute(5) + pd.offsets.Milli(5)
Out[35]:
0 2012-01-01 00:05:00.005
1 2012-01-02 00:05:00.005
2 2012-01-03 00:05:00.005
dtype: datetime64[ns]
来自 timedelta64[ns] 系列的标量操作:
Operations with scalars from a timedelta64[ns] series:
In [36]: y = s - s[0]
In [37]: y
Out[37]:
0 0 days
1 1 days
2 2 days
dtype: timedelta64[ns]
支持具有 NaT 值的时间间隔序列:
Series of timedeltas with NaT values are supported:
In [38]: y = s - s.shift()
In [39]: y
Out[39]:
0 NaT
1 1 days
2 1 days
dtype: timedelta64[ns]
可以使用 np.nan 类似于日期时间来将元素设置为 NaT:
Elements can be set to NaT using np.nan analogously to datetimes:
In [40]: y[1] = np.nan
In [41]: y
Out[41]:
0 NaT
1 NaT
2 1 days
dtype: timedelta64[ns]
操作数也可以按相反的顺序出现(用 Series 运算的单一对象):
Operands can also appear in a reversed order (a singular object operated with a Series):
In [42]: s.max() - s
Out[42]:
0 2 days
1 1 days
2 0 days
dtype: timedelta64[ns]
In [43]: datetime.datetime(2011, 1, 1, 3, 5) - s
Out[43]:
0 -365 days +03:05:00
1 -366 days +03:05:00
2 -367 days +03:05:00
dtype: timedelta64[ns]
In [44]: datetime.timedelta(minutes=5) + s
Out[44]:
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
在框架上支持 min, max 和相应的 idxmin, idxmax 操作:
min, max and the corresponding idxmin, idxmax operations are supported on frames:
In [45]: A = s - pd.Timestamp("20120101") - pd.Timedelta("00:05:05")
In [46]: B = s - pd.Series(pd.date_range("2012-1-2", periods=3, freq="D"))
In [47]: df = pd.DataFrame({"A": A, "B": B})
In [48]: df
Out[48]:
A B
0 -1 days +23:54:55 -1 days
1 0 days 23:54:55 -1 days
2 1 days 23:54:55 -1 days
In [49]: df.min()
Out[49]:
A -1 days +23:54:55
B -1 days +00:00:00
dtype: timedelta64[ns]
In [50]: df.min(axis=1)
Out[50]:
0 -1 days
1 -1 days
2 -1 days
dtype: timedelta64[ns]
In [51]: df.idxmin()
Out[51]:
A 0
B 0
dtype: int64
In [52]: df.idxmax()
Out[52]:
A 2
B 0
dtype: int64
在 Series 上也支持 min, max, idxmin, idxmax 操作。标量结果将是 Timedelta。
min, max, idxmin, idxmax operations are supported on Series as well. A scalar result will be a Timedelta.
In [53]: df.min().max()
Out[53]: Timedelta('-1 days +23:54:55')
In [54]: df.min(axis=1).min()
Out[54]: Timedelta('-1 days +00:00:00')
In [55]: df.min().idxmax()
Out[55]: 'A'
In [56]: df.min(axis=1).idxmin()
Out[56]: 0
您可以对时间间隔填充空值,传递时间间隔以获得特定值。
You can fillna on timedeltas, passing a timedelta to get a particular value.
In [57]: y.fillna(pd.Timedelta(0))
Out[57]:
0 0 days
1 0 days
2 1 days
dtype: timedelta64[ns]
In [58]: y.fillna(pd.Timedelta(10, unit="s"))
Out[58]:
0 0 days 00:00:10
1 0 days 00:00:10
2 1 days 00:00:00
dtype: timedelta64[ns]
In [59]: y.fillna(pd.Timedelta("-1 days, 00:00:05"))
Out[59]:
0 -1 days +00:00:05
1 -1 days +00:00:05
2 1 days 00:00:00
dtype: timedelta64[ns]
您还可以否定、相乘并对 Timedeltas 使用 abs:
You can also negate, multiply and use abs on Timedeltas:
In [60]: td1 = pd.Timedelta("-1 days 2 hours 3 seconds")
In [61]: td1
Out[61]: Timedelta('-2 days +21:59:57')
In [62]: -1 * td1
Out[62]: Timedelta('1 days 02:00:03')
In [63]: -td1
Out[63]: Timedelta('1 days 02:00:03')
In [64]: abs(td1)
Out[64]: Timedelta('1 days 02:00:03')
Reductions
timedelta64[ns] 的数字约减操作将返回 Timedelta 对象。与往常一样,NaT 在评估期间被跳过。
Numeric reduction operation for timedelta64[ns] will return Timedelta objects. As usual NaT are skipped during evaluation.
In [65]: y2 = pd.Series(
....: pd.to_timedelta(["-1 days +00:00:05", "nat", "-1 days +00:00:05", "1 days"])
....: )
....:
In [66]: y2
Out[66]:
0 -1 days +00:00:05
1 NaT
2 -1 days +00:00:05
3 1 days 00:00:00
dtype: timedelta64[ns]
In [67]: y2.mean()
Out[67]: Timedelta('-1 days +16:00:03.333333334')
In [68]: y2.median()
Out[68]: Timedelta('-1 days +00:00:05')
In [69]: y2.quantile(0.1)
Out[69]: Timedelta('-1 days +00:00:05')
In [70]: y2.sum()
Out[70]: Timedelta('-1 days +00:00:10')
Frequency conversion
时间间隔 Series 和 TimedeltaIndex 以及 Timedelta 可以通过 astyping 转换为特定时间间隔 dtype 来转换为其他频率。
Timedelta Series and TimedeltaIndex, and Timedelta can be converted to other frequencies by astyping to a specific timedelta dtype.
In [71]: december = pd.Series(pd.date_range("20121201", periods=4))
In [72]: january = pd.Series(pd.date_range("20130101", periods=4))
In [73]: td = january - december
In [74]: td[2] += datetime.timedelta(minutes=5, seconds=3)
In [75]: td[3] = np.nan
In [76]: td
Out[76]:
0 31 days 00:00:00
1 31 days 00:00:00
2 31 days 00:05:03
3 NaT
dtype: timedelta64[ns]
# to seconds
In [77]: td.astype("timedelta64[s]")
Out[77]:
0 31 days 00:00:00
1 31 days 00:00:00
2 31 days 00:05:03
3 NaT
dtype: timedelta64[s]
对于除支持的“s”、“ms”、“us”、“ns”以外的其他 timedelta64 分辨率,替代方法是除以另一个 timedelta 对象。请注意,除以 NumPy 标量是真除法,而 astyping 等同于地板除法。
For timedelta64 resolutions other than the supported “s”, “ms”, “us”, “ns”, an alternative is to divide by another timedelta object. Note that division by the NumPy scalar is true division, while astyping is equivalent of floor division.
# to days
In [78]: td / np.timedelta64(1, "D")
Out[78]:
0 31.000000
1 31.000000
2 31.003507
3 NaN
dtype: float64
用一个整数或整数 Series 除以或乘以一个 timedelta64[ns] Series 会产生另一个 timedelta64[ns] dtypes Series。
Dividing or multiplying a timedelta64[ns] Series by an integer or integer Series yields another timedelta64[ns] dtypes Series.
In [79]: td * -1
Out[79]:
0 -31 days +00:00:00
1 -31 days +00:00:00
2 -32 days +23:54:57
3 NaT
dtype: timedelta64[ns]
In [80]: td * pd.Series([1, 2, 3, 4])
Out[80]:
0 31 days 00:00:00
1 62 days 00:00:00
2 93 days 00:15:09
3 NaT
dtype: timedelta64[ns]
按标量 Timedelta_对 _timedelta64[ns] Series 进行舍入除法(地板除法)会产生一系列整数。
Rounded division (floor-division) of a timedelta64[ns] Series by a scalar Timedelta gives a series of integers.
In [81]: td // pd.Timedelta(days=3, hours=4)
Out[81]:
0 9.0
1 9.0
2 9.0
3 NaN
dtype: float64
In [82]: pd.Timedelta(days=3, hours=4) // td
Out[82]:
0 0.0
1 0.0
2 0.0
3 NaN
dtype: float64
当使用另一个类似时差或数字参数操作时,为 Timedelta
定义 mod (%)
和 divmod
操作。
The mod (%) and divmod operations are defined for Timedelta when operating with another timedelta-like or with a numeric argument.
In [83]: pd.Timedelta(hours=37) % datetime.timedelta(hours=2)
Out[83]: Timedelta('0 days 01:00:00')
# divmod against a timedelta-like returns a pair (int, Timedelta)
In [84]: divmod(datetime.timedelta(hours=2), pd.Timedelta(minutes=11))
Out[84]: (10, Timedelta('0 days 00:10:00'))
# divmod against a numeric returns a pair (Timedelta, Timedelta)
In [85]: divmod(pd.Timedelta(hours=25), 86400000000000)
Out[85]: (Timedelta('0 days 00:00:00.000000001'), Timedelta('0 days 01:00:00'))
Attributes
你可以直接使用属性 days,seconds,microseconds,nanoseconds
访问 Timedelta
或 TimedeltaIndex
的各种组成部分。这些与 datetime.timedelta
返回的值相同,例如,.seconds
属性表示秒数 >= 0 且 < 1 天。这些根据 Timedelta
是否带符号而进行符号化。
You can access various components of the Timedelta or TimedeltaIndex directly using the attributes days,seconds,microseconds,nanoseconds. These are identical to the values returned by datetime.timedelta, in that, for example, the .seconds attribute represents the number of seconds >= 0 and < 1 day. These are signed according to whether the Timedelta is signed.
这些操作也可以直接通过 Series
的 .dt
属性访问。
These operations can also be directly accessed via the .dt property of the Series as well.
请注意,属性不是 |
Note that the attributes are NOT the displayed values of the Timedelta. Use .components to retrieve the displayed values. |
对于 Series
:
For a Series:
In [86]: td.dt.days
Out[86]:
0 31.0
1 31.0
2 31.0
3 NaN
dtype: float64
In [87]: td.dt.seconds
Out[87]:
0 0.0
1 0.0
2 303.0
3 NaN
dtype: float64
你可以直接访问标量 Timedelta
的字段值。
You can access the value of the fields for a scalar Timedelta directly.
In [88]: tds = pd.Timedelta("31 days 5 min 3 sec")
In [89]: tds.days
Out[89]: 31
In [90]: tds.seconds
Out[90]: 303
In [91]: (-tds).seconds
Out[91]: 86097
你可以使用 .components
属性访问缩减形式的时间差。这会返回一个 DataFrame
,其索引类似于 Series
。这些是 Timedelta
的显示值。
You can use the .components property to access a reduced form of the timedelta. This returns a DataFrame indexed similarly to the Series. These are the displayed values of the Timedelta.
In [92]: td.dt.components
Out[92]:
days hours minutes seconds milliseconds microseconds nanoseconds
0 31.0 0.0 0.0 0.0 0.0 0.0 0.0
1 31.0 0.0 0.0 0.0 0.0 0.0 0.0
2 31.0 0.0 5.0 3.0 0.0 0.0 0.0
3 NaN NaN NaN NaN NaN NaN NaN
In [93]: td.dt.components.seconds
Out[93]:
0 0.0
1 0.0
2 3.0
3 NaN
Name: seconds, dtype: float64
你可以通过 .isoformat
方法将 Timedelta
转换为 ` ISO 8601 Duration` 字符串
You can convert a Timedelta to an ISO 8601 Duration string with the .isoformat method
In [94]: pd.Timedelta(
....: days=6, minutes=50, seconds=3, milliseconds=10, microseconds=10, nanoseconds=12
....: ).isoformat()
....:
Out[94]: 'P6DT0H50M3.010010012S'
TimedeltaIndex
要生成带有时差的索引,你可以使用 ` TimedeltaIndex` 或 ` timedelta_range()` 构造函数。
To generate an index with time delta, you can use either the TimedeltaIndex or the timedelta_range() constructor.
使用 TimedeltaIndex
,你可以传递字符串式、Timedelta
、timedelta
或 np.timedelta64
对象。传递 np.nan/pd.NaT/nat
将表示缺失值。
Using TimedeltaIndex you can pass string-like, Timedelta, timedelta, or np.timedelta64 objects. Passing np.nan/pd.NaT/nat will represent missing values.
In [95]: pd.TimedeltaIndex(
....: [
....: "1 days",
....: "1 days, 00:00:05",
....: np.timedelta64(2, "D"),
....: datetime.timedelta(days=2, seconds=2),
....: ]
....: )
....:
Out[95]:
TimedeltaIndex(['1 days 00:00:00', '1 days 00:00:05', '2 days 00:00:00',
'2 days 00:00:02'],
dtype='timedelta64[ns]', freq=None)
字符串“infer”可以传递,以便在创建时将索引的频率设置为推断的频率:
The string ‘infer’ can be passed in order to set the frequency of the index as the inferred frequency upon creation:
In [96]: pd.TimedeltaIndex(["0 days", "10 days", "20 days"], freq="infer")
Out[96]: TimedeltaIndex(['0 days', '10 days', '20 days'], dtype='timedelta64[ns]', freq='10D')
Generating ranges of time deltas
类似于 ` date_range(),你可以使用 ` timedelta_range()
构建 TimedeltaIndex
的常规范围。timedelta_range
的默认频率是日历日:
Similar to date_range(), you can construct regular ranges of a TimedeltaIndex using timedelta_range(). The default frequency for timedelta_range is calendar day:
In [97]: pd.timedelta_range(start="1 days", periods=5)
Out[97]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')
start
、end
和 periods
的各种组合可以与 timedelta_range
一起使用:
Various combinations of start, end, and periods can be used with timedelta_range:
In [98]: pd.timedelta_range(start="1 days", end="5 days")
Out[98]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')
In [99]: pd.timedelta_range(end="10 days", periods=4)
Out[99]: TimedeltaIndex(['7 days', '8 days', '9 days', '10 days'], dtype='timedelta64[ns]', freq='D')
freq
参数可以传递各种 ` frequency aliases`:
The freq parameter can passed a variety of frequency aliases:
In [100]: pd.timedelta_range(start="1 days", end="2 days", freq="30min")
Out[100]:
TimedeltaIndex(['1 days 00:00:00', '1 days 00:30:00', '1 days 01:00:00',
'1 days 01:30:00', '1 days 02:00:00', '1 days 02:30:00',
'1 days 03:00:00', '1 days 03:30:00', '1 days 04:00:00',
'1 days 04:30:00', '1 days 05:00:00', '1 days 05:30:00',
'1 days 06:00:00', '1 days 06:30:00', '1 days 07:00:00',
'1 days 07:30:00', '1 days 08:00:00', '1 days 08:30:00',
'1 days 09:00:00', '1 days 09:30:00', '1 days 10:00:00',
'1 days 10:30:00', '1 days 11:00:00', '1 days 11:30:00',
'1 days 12:00:00', '1 days 12:30:00', '1 days 13:00:00',
'1 days 13:30:00', '1 days 14:00:00', '1 days 14:30:00',
'1 days 15:00:00', '1 days 15:30:00', '1 days 16:00:00',
'1 days 16:30:00', '1 days 17:00:00', '1 days 17:30:00',
'1 days 18:00:00', '1 days 18:30:00', '1 days 19:00:00',
'1 days 19:30:00', '1 days 20:00:00', '1 days 20:30:00',
'1 days 21:00:00', '1 days 21:30:00', '1 days 22:00:00',
'1 days 22:30:00', '1 days 23:00:00', '1 days 23:30:00',
'2 days 00:00:00'],
dtype='timedelta64[ns]', freq='30min')
In [101]: pd.timedelta_range(start="1 days", periods=5, freq="2D5h")
Out[101]:
TimedeltaIndex(['1 days 00:00:00', '3 days 05:00:00', '5 days 10:00:00',
'7 days 15:00:00', '9 days 20:00:00'],
dtype='timedelta64[ns]', freq='53h')
指定 start
、end
和 periods
将生成从 start
到 end
(包括)均匀间隔的时间差范围,其中结果 TimedeltaIndex
中的元素数为 periods
:
Specifying start, end, and periods will generate a range of evenly spaced timedeltas from start to end inclusively, with periods number of elements in the resulting TimedeltaIndex:
In [102]: pd.timedelta_range("0 days", "4 days", periods=5)
Out[102]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
In [103]: pd.timedelta_range("0 days", "4 days", periods=10)
Out[103]:
TimedeltaIndex(['0 days 00:00:00', '0 days 10:40:00', '0 days 21:20:00',
'1 days 08:00:00', '1 days 18:40:00', '2 days 05:20:00',
'2 days 16:00:00', '3 days 02:40:00', '3 days 13:20:00',
'4 days 00:00:00'],
dtype='timedelta64[ns]', freq=None)
Using the TimedeltaIndex
类似于其他日期时间类型索引,DatetimeIndex
和 PeriodIndex
,你可以将 TimedeltaIndex
用作熊猫对象的索引。
Similarly to other of the datetime-like indices, DatetimeIndex and PeriodIndex, you can use TimedeltaIndex as the index of pandas objects.
In [104]: s = pd.Series(
.....: np.arange(100),
.....: index=pd.timedelta_range("1 days", periods=100, freq="h"),
.....: )
.....:
In [105]: s
Out[105]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
..
4 days 23:00:00 95
5 days 00:00:00 96
5 days 01:00:00 97
5 days 02:00:00 98
5 days 03:00:00 99
Freq: h, Length: 100, dtype: int64
选择机制类似,对字符串式和切片进行强制转换:
Selections work similarly, with coercion on string-likes and slices:
In [106]: s["1 day":"2 day"]
Out[106]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
..
2 days 19:00:00 43
2 days 20:00:00 44
2 days 21:00:00 45
2 days 22:00:00 46
2 days 23:00:00 47
Freq: h, Length: 48, dtype: int64
In [107]: s["1 day 01:00:00"]
Out[107]: 1
In [108]: s[pd.Timedelta("1 day 1h")]
Out[108]: 1
此外,你可以使用部分字符串选择,范围将被推断:
Furthermore you can use partial string selection and the range will be inferred:
In [109]: s["1 day":"1 day 5 hours"]
Out[109]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
1 days 05:00:00 5
Freq: h, dtype: int64
Operations
最后,TimedeltaIndex
与 DatetimeIndex
的组合允许某些组合操作保持 NaT:
Finally, the combination of TimedeltaIndex with DatetimeIndex allow certain combination operations that are NaT preserving:
In [110]: tdi = pd.TimedeltaIndex(["1 days", pd.NaT, "2 days"])
In [111]: tdi.to_list()
Out[111]: [Timedelta('1 days 00:00:00'), NaT, Timedelta('2 days 00:00:00')]
In [112]: dti = pd.date_range("20130101", periods=3)
In [113]: dti.to_list()
Out[113]:
[Timestamp('2013-01-01 00:00:00'),
Timestamp('2013-01-02 00:00:00'),
Timestamp('2013-01-03 00:00:00')]
In [114]: (dti + tdi).to_list()
Out[114]: [Timestamp('2013-01-02 00:00:00'), NaT, Timestamp('2013-01-05 00:00:00')]
In [115]: (dti - tdi).to_list()
Out[115]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2013-01-01 00:00:00')]
Conversions
类似于 Series
上频率转换,你可以转换这些索引以产生另一个 Index。
Similarly to frequency conversion on a Series above, you can convert these indices to yield another Index.
In [116]: tdi / np.timedelta64(1, "s")
Out[116]: Index([86400.0, nan, 172800.0], dtype='float64')
In [117]: tdi.astype("timedelta64[s]")
Out[117]: TimedeltaIndex(['1 days', NaT, '2 days'], dtype='timedelta64[s]', freq=None)
标量类型操作同样有效。它们可能返回不同类型的索引。
Scalars type ops work as well. These can potentially return a different type of index.
# adding or timedelta and date -> datelike
In [118]: tdi + pd.Timestamp("20130101")
Out[118]: DatetimeIndex(['2013-01-02', 'NaT', '2013-01-03'], dtype='datetime64[ns]', freq=None)
# subtraction of a date and a timedelta -> datelike
# note that trying to subtract a date from a Timedelta will raise an exception
In [119]: (pd.Timestamp("20130101") - tdi).to_list()
Out[119]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2012-12-30 00:00:00')]
# timedelta + timedelta -> timedelta
In [120]: tdi + pd.Timedelta("10 days")
Out[120]: TimedeltaIndex(['11 days', NaT, '12 days'], dtype='timedelta64[ns]', freq=None)
# division can result in a Timedelta if the divisor is an integer
In [121]: tdi / 2
Out[121]: TimedeltaIndex(['0 days 12:00:00', NaT, '1 days 00:00:00'], dtype='timedelta64[ns]', freq=None)
# or a float64 Index if the divisor is a Timedelta
In [122]: tdi / tdi[0]
Out[122]: Index([1.0, nan, 2.0], dtype='float64')
Resampling
与 timeseries resampling 类似,我们可以使用 TimedeltaIndex 重新采样。
Similar to timeseries resampling, we can resample with a TimedeltaIndex.
In [123]: s.resample("D").mean()
Out[123]:
1 days 11.5
2 days 35.5
3 days 59.5
4 days 83.5
5 days 97.5
Freq: D, dtype: float64