Python|熊猫 dataframe.mask()

Python是一种用于进行数据分析的出色语言，主要是因为以数据为中心的Python包的奇妙生态系统。 Pandas就是其中之一，它使导入和分析数据变得更加容易。

Pandas dataframe.mask()函数返回一个与 self 形状相同的对象，其对应条目来自 self ，其中 cond 为 False ，否则来自其他对象。另一个对象可以是标量、序列、数据框，也可以是可调用对象。 mask 方法是 if-then 成语的应用。对于调用 DataFrame 中的每个元素，如果 cond 为 False，则使用该元素；否则使用 DataFrame other 中的相应元素。

Syntax: DataFrame.mask(cond, other=nan, inplace=False, axis=None, level=None, errors=’raise’, try_cast=False, raise_on_error=None)

Parameters :
cond : Where cond is False, keep the original value. Where True, replace with corresponding value from other. If cond is callable, it is computed on the NDFrame and should return boolean NDFrame or array. The callable must not change input NDFrame (though pandas doesn’t check it).

other : Entries where cond is True are replaced with corresponding value from other. If other is callable, it is computed on the NDFrame and should return scalar or NDFrame. The callable must not change input NDFrame (though pandas doesn’t check it).
inplace : Whether to perform the operation in place on the data
axis : alignment axis if needed, default None
level : alignment level if needed, default None
errors : str, {‘raise’, ‘ignore’}, default ‘raise’
raise allow exceptions to be raised and ignore suppress exceptions. On error return original object. Note that currently this parameter won’t affect the results and will always coerce to a suitable dtype.

try_cast : try to cast the result back to the input type (if possible),

Returns : wh : same type as caller

编程需要懂一点英语

示例 #1：使用mask()函数将数据框中大于 10 的所有值替换为 -25

# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.DataFrame({"A":[12, 4, 5, 44, 1],
                   "B":[5, 2, 54, 3, 2],
                   "C":[20, 16, 7, 3, 8],
                   "D":[14, 3, 17, 2, 6]})
  
# Print the dataframe
df

让我们使用dataframe.mask()函数将所有大于 10 的值替换为 -25

# replace values greater than 10 with -25
df.mask(df > 10, -25)

输出：

示例 #2：将mask()函数与可调用对象一起使用。将所有Na值替换为 1000。

# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.DataFrame({"A":[12, 4, 5, None, 1],
                   "B":[7, 2, 54, 3, None],
                   "C":[20, 16, 11, 3, 8],
                   "D":[14, 3, None, 2, 6]})
  
# replace the Na values with 1000
df.mask(df.isna(), 1000))

输出：