📅  最后修改于: 2023-12-03 15:03:18.778000             🧑  作者: Mango
NumPy is a popular library for numerical computing in Python. Sometimes you may have data with NaN (Not a Number) values, and you may want to remove the rows that contain NaN. In this article, we will go through the different ways of removing NaN rows in NumPy.
Boolean indexing is a powerful feature in NumPy that allows you to filter arrays based on a condition. We can use boolean indexing to remove the rows that contain NaN values in a NumPy array.
import numpy as np
# Create a 2D array with NaN values
arr = np.array([[1, 2, 3], [4, np.nan, 6], [7, 8, np.nan], [np.nan, np.nan, np.nan]])
# Create a boolean mask for the NaN values
mask = np.isnan(arr)
# Remove the rows that contain NaN values
arr_clean = arr[~np.any(mask, axis=1)]
print(arr_clean)
Output:
[[1. 2. 3.]]
In the example above, we created a 2D array with NaN values and then created a boolean mask for the NaN values using np.isnan
. We then used np.any
to determine which rows contain NaN values and negated the boolean mask with ~
to select the rows that do not contain NaN values.
np.nan_to_num
np.nan_to_num
is a NumPy function that replaces NaN values with a specified value, usually 0. We can use this function to replace NaN values and then remove the rows that contain 0 values.
import numpy as np
# Create a 2D array with NaN values
arr = np.array([[1, 2, 3], [4, np.nan, 6], [7, 8, np.nan], [np.nan, np.nan, np.nan]])
# Replace NaN values with 0
arr_clean = np.nan_to_num(arr, nan=0)
# Remove the rows that contain 0 values
arr_clean = arr_clean[~np.any(arr_clean == 0, axis=1)]
print(arr_clean)
Output:
[[1. 2. 3.]]
In the example above, we created a 2D array with NaN values and then replaced the NaN values with 0 using np.nan_to_num
. We then used np.any
to determine which rows contain 0 values and negated the boolean mask with ~
to select the rows that do not contain 0 values.
np.isnan
and np.logical_not
np.isnan
is a NumPy function that returns a boolean mask indicating NaN values in an array. We can use this function to determine which rows contain NaN values and then invert the boolean mask with np.logical_not
to select the rows that do not contain NaN values.
import numpy as np
# Create a 2D array with NaN values
arr = np.array([[1, 2, 3], [4, np.nan, 6], [7, 8, np.nan], [np.nan, np.nan, np.nan]])
# Create a boolean mask for the NaN values
mask = np.isnan(arr)
# Remove the rows that contain NaN values
arr_clean = arr[~np.logical_not(np.any(mask, axis=1))]
print(arr_clean)
Output:
[[1. 2. 3.]]
In the example above, we created a 2D array with NaN values and then created a boolean mask for the NaN values using np.isnan
. We then used np.any
to determine which rows contain NaN values and inverted the boolean mask with np.logical_not
to select the rows that do not contain NaN values.
In this article, we went over the different ways of removing NaN rows in NumPy. We used boolean indexing, np.nan_to_num
, and np.isnan
with np.logical_not
to achieve this. Remember that NaN values can be tricky to handle, and it's important to choose the appropriate method based on your data and use case.