Python Pandas 简明教程

Python Pandas - Function Application

要将自己的函数或其他库的函数应用于 Pandas 对象，您应该注意这三种重要方法。下面讨论了这些方法。使用哪种合适的方法取决于您的函数是否期望对整个 DataFrame、面向行或列或按元素进行操作。

To apply your own or another library’s functions to Pandas objects, you should be aware of the three important methods. The methods have been discussed below. The appropriate method to use depends on whether your function expects to operate on an entire DataFrame, row- or column-wise, or element wise.

Table wise Function Application: pipe()
Row or Column Wise Function Application: apply()
Element wise Function Application: applymap()

Table-wise Function Application

自定义操作可以通过传递函数和适量参数作为管道参数来执行。因此，操作将对整个 DataFrame 执行。

Custom operations can be performed by passing the function and the appropriate number of parameters as pipe arguments. Thus, operation is performed on the whole DataFrame.

例如，给 DataFrame 中所有的元素添加值 2。然后，

For example, add a value 2 to all the elements in the DataFrame. Then,

adder function

adder 函数将两个数字值作为参数添加，并返回和。

The adder function adds two numeric values as parameters and returns the sum.

def adder(ele1,ele2):
   return ele1+ele2

我们现在将使用自定义函数对 DataFrame 执行操作。

We will now use the custom function to conduct operation on the DataFrame.

df = pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df.pipe(adder,2)

让我们看一下完整的程序 −

Let’s see the full program −

import pandas as pd
import numpy as np

def adder(ele1,ele2):
   return ele1+ele2

df = pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df.pipe(adder,2)
print df.apply(np.mean)

它的 output 如下所示 −

Its output is as follows −

        col1       col2       col3
0   2.176704   2.219691   1.509360
1   2.222378   2.422167   3.953921
2   2.241096   1.135424   2.696432
3   2.355763   0.376672   1.182570
4   2.308743   2.714767   2.130288

Row or Column Wise Function Application

任意函数可以使用 apply() 方法沿 DataFrame 或 Panel 的轴应用，该方法与描述性统计方法类似，接受可选的 axis 参数。默认情况下，操作按列执行，将每一列视为类似数组。

Arbitrary functions can be applied along the axes of a DataFrame or Panel using the apply() method, which, like the descriptive statistics methods, takes an optional axis argument. By default, the operation performs column wise, taking each column as an array-like.

Example 1

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df.apply(np.mean)
print df.apply(np.mean)

它的 output 如下所示 −

Its output is as follows −

col1   -0.288022
col2    1.044839
col3   -0.187009
dtype: float64

通过传递 axis 参数，可以按行执行操作。

By passing axis parameter, operations can be performed row wise.

Example 2

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df.apply(np.mean,axis=1)
print df.apply(np.mean)

它的 output 如下所示 −

Its output is as follows −

col1    0.034093
col2   -0.152672
col3   -0.229728
dtype: float64

Example 3

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df.apply(lambda x: x.max() - x.min())
print df.apply(np.mean)

它的 output 如下所示 −

Its output is as follows −

col1   -0.167413
col2   -0.370495
col3   -0.707631
dtype: float64

Element Wise Function Application

并非所有函数都可以向量化（既不能返回另一个数组的 NumPy 数组，也不能返回任何值），DataFrame 上的 applymap() 方法和 Series 上的 analogously map() 方法接受任何 Python 函数，该函数取单个值并返回单个值。

Not all functions can be vectorized (neither the NumPy arrays which return another array nor any value), the methods applymap() on DataFrame and analogously map() on Series accept any Python function taking a single value and returning a single value.

Example 1

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])

# My custom function
df['col1'].map(lambda x:x*100)
print df.apply(np.mean)

它的 output 如下所示 −

Its output is as follows −

col1    0.480742
col2    0.454185
col3    0.266563
dtype: float64

Example 2

import pandas as pd
import numpy as np

# My custom function
df = pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df.applymap(lambda x:x*100)
print df.apply(np.mean)

它的 output 如下所示 −

Its output is as follows −

col1    0.395263
col2    0.204418
col3   -0.795188
dtype: float64