Python Pandas 简明教程
Python Pandas - Iteration
对 Pandas 对象进行基本迭代的行为取决于类型。当对 Series 迭代时,它被视为类似于数组,并且基本迭代会生成值。其他数据结构,例如 DataFrame 和 Panel,遵循 dict-like 约定,即对对象的 keys 进行迭代。
The behavior of basic iteration over Pandas objects depends on the type. When iterating over a Series, it is regarded as array-like, and basic iteration produces the values. Other data structures, like DataFrame and Panel, follow the dict-like convention of iterating over the keys of the objects.
简而言之,基本迭代(对于 i 在对象中)生成 −
In short, basic iteration (for i in object) produces −
-
Series − values
-
DataFrame − column labels
-
Panel − item labels
Iterating a DataFrame
迭代一个 DataFrame 会给出列名。我们考虑以下示例来理解它。
Iterating a DataFrame gives column names. Let us consider the following example to understand the same.
import pandas as pd
import numpy as np
N=20
df = pd.DataFrame({
'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
'x': np.linspace(0,stop=N-1,num=N),
'y': np.random.rand(N),
'C': np.random.choice(['Low','Medium','High'],N).tolist(),
'D': np.random.normal(100, 10, size=(N)).tolist()
})
for col in df:
print col
它的 output 如下所示 −
Its output is as follows −
A
C
D
x
y
为了对 DataFrame 的行进行迭代,我们可以使用以下函数 −
To iterate over the rows of the DataFrame, we can use the following functions −
-
iteritems() − to iterate over the (key,value) pairs
-
iterrows() − iterate over the rows as (index,series) pairs
-
itertuples() − iterate over the rows as namedtuples
iteritems()
以密钥作为密钥,以列值作为 Series 对象对每一列进行迭代。
Iterates over each column as key, value pair with label as key and column value as a Series object.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3),columns=['col1','col2','col3'])
for key,value in df.iteritems():
print key,value
它的 output 如下所示 −
Its output is as follows −
col1 0 0.802390
1 0.324060
2 0.256811
3 0.839186
Name: col1, dtype: float64
col2 0 1.624313
1 -1.033582
2 1.796663
3 1.856277
Name: col2, dtype: float64
col3 0 -0.022142
1 -0.230820
2 1.160691
3 -0.830279
Name: col3, dtype: float64
观察,每一列都以 Series 中的键值对形式单独进行迭代。
Observe, each column is iterated separately as a key-value pair in a Series.
iterrows()
iterrows() 返回迭代器,生成每一行索引值以及包含每一行数据的 series。
iterrows() returns the iterator yielding each index value along with a series containing the data in each row.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])
for row_index,row in df.iterrows():
print row_index,row
它的 output 如下所示 −
Its output is as follows −
0 col1 1.529759
col2 0.762811
col3 -0.634691
Name: 0, dtype: float64
1 col1 -0.944087
col2 1.420919
col3 -0.507895
Name: 1, dtype: float64
2 col1 -0.077287
col2 -0.858556
col3 -0.663385
Name: 2, dtype: float64
3 col1 -1.638578
col2 0.059866
col3 0.493482
Name: 3, dtype: float64
Note − 因为 iterrows() 迭代行,所以它不会保留行中的数据类型。0、1、2 是行索引,col1、col2、col3 是列索引。
Note − Because iterrows() iterate over the rows, it doesn’t preserve the data type across the row. 0,1,2 are the row indices and col1,col2,col3 are column indices.
itertuples()
itertuples() 方法将返回一个迭代器,该迭代器会为 DataFrame 中的每一行生成一个命名元组。元组的第一个元素将是行的相应索引值,而其余的值是行值。
itertuples() method will return an iterator yielding a named tuple for each row in the DataFrame. The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])
for row in df.itertuples():
print row
它的 output 如下所示 −
Its output is as follows −
Pandas(Index=0, col1=1.5297586201375899, col2=0.76281127433814944, col3=-
0.6346908238310438)
Pandas(Index=1, col1=-0.94408735763808649, col2=1.4209186418359423, col3=-
0.50789517967096232)
Pandas(Index=2, col1=-0.07728664756791935, col2=-0.85855574139699076, col3=-
0.6633852507207626)
Pandas(Index=3, col1=0.65734942534106289, col2=-0.95057710432604969,
col3=0.80344487462316527)
Note − 在进行迭代时,不要尝试修改任何对象。迭代目的是为了读取,而迭代器返回原始对象(一个视图)的副本,因此更改不会反映在原始对象中。
Note − Do not try to modify any object while iterating. Iterating is meant for reading and the iterator returns a copy of the original object (a view), thus the changes will not reflect on the original object.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])
for index, row in df.iterrows():
row['a'] = 10
print df
它的 output 如下所示 −
Its output is as follows −
col1 col2 col3
0 -1.739815 0.735595 -0.295589
1 0.635485 0.106803 1.527922
2 -0.939064 0.547095 0.038585
3 -1.016509 -0.116580 -0.523158
观察,没有反映出的更改。
Observe, no changes reflected.