Python Pandas 简明教程

Python Pandas - Reindexing

Reindexing 更改 DataFrame 的行标签和列标签。重新索引是指使数据符合沿特定轴匹配的一组给定标签。

Reindexing changes the row labels and column labels of a DataFrame. To reindex means to conform the data to match a given set of labels along a particular axis.

可以通过索引完成多个操作,如下所示 −

Multiple operations can be accomplished through indexing like −

  1. Reorder the existing data to match a new set of labels.

  2. Insert missing value (NA) markers in label locations where no data for the label existed.

Example

import pandas as pd
import numpy as np

N=20

df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
   'x': np.linspace(0,stop=N-1,num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low','Medium','High'],N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()
})

#reindex the DataFrame
df_reindexed = df.reindex(index=[0,2,5], columns=['A', 'C', 'B'])

print df_reindexed

它的 output 如下所示 −

Its output is as follows −

            A    C     B
0  2016-01-01  Low   NaN
2  2016-01-03  High  NaN
5  2016-01-06  Low   NaN

Reindex to Align with Other Objects

你可以获取一个对象并重新索引其轴以与另一个对象相同进行标记。考虑下面的示例来理解它。

You may wish to take an object and reindex its axes to be labeled the same as another object. Consider the following example to understand the same.

Example

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3'])

df1 = df1.reindex_like(df2)
print df1

它的 output 如下所示 −

Its output is as follows −

          col1         col2         col3
0    -2.467652    -1.211687    -0.391761
1    -0.287396     0.522350     0.562512
2    -0.255409    -0.483250     1.866258
3    -1.150467    -0.646493    -0.222462
4     0.152768    -2.056643     1.877233
5    -1.155997     1.528719    -1.343719
6    -1.015606    -1.245936    -0.295275

Note − 在这里, df1 DataFrame 被更改并重新索引为 df2 。列名应该匹配,否则将为整个列标签添加 NAN。

Note − Here, the df1 DataFrame is altered and reindexed like df2. The column names should be matched or else NAN will be added for the entire column label.

Filling while ReIndexing

reindex() 接受一个可选参数 method,该参数是一个填充方法,其值如下 −

reindex() takes an optional parameter method which is a filling method with values as follows −

  1. pad/ffill − Fill values forward

  2. bfill/backfill − Fill values backward

  3. nearest − Fill from the nearest index values

Example

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's
print df2.reindex_like(df1)

# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill:")
print df2.reindex_like(df1,method='ffill')

它的 output 如下所示 −

Its output is as follows −

         col1        col2       col3
0    1.311620   -0.707176   0.599863
1   -0.423455   -0.700265   1.133371
2         NaN         NaN        NaN
3         NaN         NaN        NaN
4         NaN         NaN        NaN
5         NaN         NaN        NaN

Data Frame with Forward Fill:
         col1        col2        col3
0    1.311620   -0.707176    0.599863
1   -0.423455   -0.700265    1.133371
2   -0.423455   -0.700265    1.133371
3   -0.423455   -0.700265    1.133371
4   -0.423455   -0.700265    1.133371
5   -0.423455   -0.700265    1.133371

Note − 填充最后四行。

Note − The last four rows are padded.

Limits on Filling while Reindexing

limit 参数在重新索引时提供对填充的额外控制。Limit 指定连续匹配的最大计数。我们考虑以下示例来理解它 −

The limit argument provides additional control over filling while reindexing. Limit specifies the maximum count of consecutive matches. Let us consider the following example to understand the same −

Example

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's
print df2.reindex_like(df1)

# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill limiting to 1:")
print df2.reindex_like(df1,method='ffill',limit=1)

它的 output 如下所示 −

Its output is as follows −

         col1        col2        col3
0    0.247784    2.128727    0.702576
1   -0.055713   -0.021732   -0.174577
2         NaN         NaN         NaN
3         NaN         NaN         NaN
4         NaN         NaN         NaN
5         NaN         NaN         NaN

Data Frame with Forward Fill limiting to 1:
         col1        col2        col3
0    0.247784    2.128727    0.702576
1   -0.055713   -0.021732   -0.174577
2   -0.055713   -0.021732   -0.174577
3         NaN         NaN         NaN
4         NaN         NaN         NaN
5         NaN         NaN         NaN

Note − 观察到,只有第 7 行被前面的第 6 行填充。然后,这些行会保持原样。

Note − Observe, only the 7th row is filled by the preceding 6th row. Then, the rows are left as they are.

Renaming

rename() 方法允许你根据某些映射(一个字典或 Series)或一个任意函数来重新标记一个轴。

The rename() method allows you to relabel an axis based on some mapping (a dict or Series) or an arbitrary function.

我们考虑以下示例来理解它 −

Let us consider the following example to understand this −

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
print df1

print ("After renaming the rows and columns:")
print df1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},
index = {0 : 'apple', 1 : 'banana', 2 : 'durian'})

它的 output 如下所示 −

Its output is as follows −

         col1        col2        col3
0    0.486791    0.105759    1.540122
1   -0.990237    1.007885   -0.217896
2   -0.483855   -1.645027   -1.194113
3   -0.122316    0.566277   -0.366028
4   -0.231524   -0.721172   -0.112007
5    0.438810    0.000225    0.435479

After renaming the rows and columns:
                c1          c2        col3
apple     0.486791    0.105759    1.540122
banana   -0.990237    1.007885   -0.217896
durian   -0.483855   -1.645027   -1.194113
3        -0.122316    0.566277   -0.366028
4        -0.231524   -0.721172   -0.112007
5         0.438810    0.000225    0.435479

rename() 方法提供了一个 inplace 命名参数,其默认值为 False,并复制底层数据。传递 inplace=True 来原地重命名数据。

The rename() method provides an inplace named parameter, which by default is False and copies the underlying data. Pass inplace=True to rename the data in place.