📅  最后修改于: 2023-12-03 14:45:02.542000             🧑  作者: Mango
Pandas is a popular Python library used for data manipulation and analysis. In this article, we will explore the distinct
function in Pandas.
The distinct function in Pandas is used to find unique values in a column or across multiple columns of a DataFrame. It is similar to the DISTINCT
keyword in SQL queries.
The syntax of distinct
function is:
DataFrame.drop_duplicates(subset=None, keep='first', inplace=False)
The distinct
function takes three parameters:
subset
: This parameter is optional and specifies the columns that should be used to find unique values. By default, it takes all columns.
keep
: This parameter is optional and specifies which occurrence of a duplicate value should be kept. Possible values are first
, last
, and False
(which removes all occurrences of the duplicate value).
inplace
: This parameter is optional and specifies whether to modify the original DataFrame or return a new DataFrame with the unique values.
Let's see some examples of using the distinct
function in Pandas.
import pandas as pd
# create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Peter', 'John', 'David', 'Mary']})
# find unique values in the 'Name' column
unique_names = df['Name'].drop_duplicates()
print(unique_names)
Output:
0 John
1 Mary
2 Peter
4 David
Name: Name, dtype: object
import pandas as pd
# create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Peter', 'David'],
'City': ['New York', 'London', 'Paris', 'London']})
# find unique values across the 'Name' and 'City' columns
unique_values = df.drop_duplicates()
print(unique_values)
Output:
Name City
0 John New York
1 Mary London
2 Peter Paris
3 David London
import pandas as pd
# create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Peter', 'John', 'David', 'Mary']})
# remove duplicates from the 'Name' column
df.drop_duplicates(subset=['Name'], inplace=True)
print(df)
Output:
Name
0 John
1 Mary
2 Peter
4 David
The distinct
function in Pandas is a useful tool for finding unique values in a DataFrame. It can be used to filter out duplicates or to perform other analyses on unique values.