📅  最后修改于: 2023-12-03 14:47:27.190000             🧑  作者: Mango
Pandas is a popular data manipulation library in Python, used extensively in data science and data analysis. One of the fundamental operations in data science is shuffling rows in a dataframe. In this guide, we will discuss how to shuffle rows in a Pandas dataframe using the sample
function.
Shuffling rows in a dataframe refers to randomly reordering the rows of a dataframe. This operation is commonly used in data preprocessing to randomize the order of the data before splitting it into training and test sets or for any other operation where randomness is required.
To shuffle rows in a Pandas dataframe, we can use the sample
function. This function randomly samples a given number of rows from a dataframe without replacement. The syntax for the sample
function is as follows:
df.sample(n=None, frac=None, replace=False, random_state=None, axis=None)
If both n and frac are None, the function samples all rows in the dataframe. If both n and frac are not None, n takes precedence over frac.
Here's an example of how to shuffle rows in a Pandas dataframe:
import pandas as pd
# create a sample dataframe
df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'age': [20, 30, 25, 22, 28],
'gender': ['female', 'male', 'male', 'male', 'female']})
# shuffle the rows of the dataframe
shuffled_df = df.sample(frac=1)
print(shuffled_df)
The shuffle_df
variable here contains the shuffled dataframe.
Shuffling rows in a Pandas dataframe is a crucial operation in data preprocessing. Using the sample
function, we can quickly shuffle the rows of a Pandas dataframe in a few lines of code.