Pandas 中文参考指南
Nullable Boolean data type
BooleanArray 目前正在进行实验。它的 API 或实现可能会在不发出警告的情况下发生变化。 |
BooleanArray is currently experimental. Its API or implementation may change without warning. |
Indexing with NA values
pandas 允许使用布尔数组中的 NA 值进行索引,这些值被视为 False。
pandas allows indexing with NA values in a boolean array, which are treated as False.
In [1]: s = pd.Series([1, 2, 3])
In [2]: mask = pd.array([True, False, pd.NA], dtype="boolean")
In [3]: s[mask]
Out[3]:
0 1
dtype: int64
如果你愿意保留 NA 值,则可以手动使用 fillna(True) 填充它们。
If you would prefer to keep the NA values you can manually fill them with fillna(True).
In [4]: s[mask.fillna(True)]
Out[4]:
0 1
2 3
dtype: int64
Kleene logical operations
arrays.BooleanArray 为 Kleene Logic(有时称为三值逻辑)实现了类似 &(与)、|(或)和 ^(异或)这样的逻辑操作。
arrays.BooleanArray implements Kleene Logic (sometimes called three-value logic) for logical operations like & (and), | (or) and ^ (exclusive-or).
此表演示了每种组合的结果。这些操作是对称的,因此左右调换不会影响结果。
This table demonstrates the results for every combination. These operations are symmetrical, so flipping the left- and right-hand side makes no difference in the result.
表达式
Expression
结果
Result
True & True
True & True
True
True & False
True & False
False
True & NA
True & NA
NA
False & False
False & False
False
False & NA
False & NA
False
NA & NA
NA & NA
NA
True | True
True
True | False
True
True | NA
True
False | False
False
False | NA
NA
NA | NA
NA
True ^ True
False
True ^ False
True
True ^ NA
NA
False ^ False
False
False ^ NA
NA
NA ^ NA
NA
当运算中出现 NA 时,仅当无法根据其他输入确定结果时,输出值才为 NA。例如,True | NA 为 True,因为 True | True 和 True | False 都是 True。在这种情况下,我们实际上不需要考虑 NA 的值。
When an NA is present in an operation, the output value is NA only if the result cannot be determined solely based on the other input. For example, True | NA is True, because both True | True and True | False are True. In that case, we don’t actually need to consider the value of the NA.
另一方面,True & NA 为 NA。结果取决于 NA 是否真正为 True 或 False,因为 True & True 为 True,但 True & False 为 False,因此我们无法确定输出。
On the other hand, True & NA is NA. The result depends on whether the NA really is True or False, since True & True is True, but True & False is False, so we can’t determine the output.
这与 np.nan 在逻辑运算中的行为不同。pandas 认为 np.nan 在输出中总是为 false。
This differs from how np.nan behaves in logical operations. pandas treated np.nan is always false in the output.
在 or
In or
In [5]: pd.Series([True, False, np.nan], dtype="object") | True
Out[5]:
0 True
1 True
2 False
dtype: bool
In [6]: pd.Series([True, False, np.nan], dtype="boolean") | True
Out[6]:
0 True
1 True
2 True
dtype: boolean
在 and
In and
In [7]: pd.Series([True, False, np.nan], dtype="object") & True
Out[7]:
0 True
1 False
2 False
dtype: bool
In [8]: pd.Series([True, False, np.nan], dtype="boolean") & True
Out[8]:
0 True
1 False
2 <NA>
dtype: boolean