📅  最后修改于: 2023-12-03 14:45:02.559000             🧑  作者: Mango
In data analysis and manipulation, it is common to deal with missing values, represented as NaN (Not a Number) in pandas dataframes. Filtering non-NaN values is a frequent operation that allows you to remove rows or columns containing missing data. In this guide, we will explore different methods to filter non-NaN values using pandas in Python.
To begin, let's first create a pandas dataframe with some missing values.
import pandas as pd
import numpy as np
data = {
'A': [1, 2, np.nan, 4, np.nan],
'B': ['foo', np.nan, 'bar', np.nan, 'baz'],
'C': [np.nan, 'qux', 'quux', 'quuz', np.nan]
}
df = pd.DataFrame(data)
The resulting dataframe df
looks like this:
| | A | B | C | |----|----|------|------| | 0 | 1 | foo | NaN | | 1 | 2 | NaN | qux | | 2 |NaN | bar | quux | | 3 | 4 | NaN | quuz | | 4 |NaN | baz | NaN |
To filter rows based on the presence of non-NaN values in any column, you can use the dropna()
method.
filtered_rows = df.dropna()
The filtered_rows
dataframe will contain only the rows with no NaN values:
| | A | B | C | |----|----|----|------| | 0 | 1 |foo | NaN | | 3 | 4 |NaN | quuz |
To filter columns based on the presence of non-NaN values in any row, you can use the dropna()
method along with the axis
parameter set to 1
.
filtered_columns = df.dropna(axis=1)
The filtered_columns
dataframe will contain only the columns with no NaN values:
| | B | |----|------| | 0 | foo | | 1 | NaN | | 2 | bar | | 3 | NaN | | 4 | baz |
If you want to filter the values in specific columns for non-NaN values, you can use the notnull()
method along with boolean indexing.
filtered_values = df[df['A'].notnull() & df['B'].notnull()]
The filtered_values
dataframe will contain only the rows where both columns A and B have non-NaN values:
| | A | B | C | |----|----|-----|------| | 0 | 1 |foo | NaN |
Filtering non-NaN values is an important step in data preprocessing for further analysis or model training. In this guide, we explored different methods to filter non-NaN values in pandas dataframes. We learned how to filter rows, columns, and specific values. Using these techniques, you can efficiently handle missing data and ensure the quality of your data analysis results.
Make sure to refer to the Pandas documentation for additional information and more advanced techniques.