📅  最后修改于: 2023-12-03 14:47:27.188000             🧑  作者: Mango
Shuffling a DataFrame means to randomly change the order of the rows. This can be useful in situations where you want to randomly sample or create train and test sets from a dataset. In this tutorial, we will explore different ways of shuffling a DataFrame in Python.
shuffle
function from the random
moduleThe random
module in Python has a shuffle
function that can be used to randomly reorder a list. We can use this function to shuffle the index of a DataFrame and then use the loc
accessor to extract the rows in the shuffled order.
import random
import pandas as pd
# create a small DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
# shuffle the index
index = list(df.index)
random.shuffle(index)
# extract rows in shuffled order
shuffled_df = df.loc[index]
print(shuffled_df)
This will output:
A B C
2 3 6 9
1 2 5 8
0 1 4 7
Note that we first converted the DataFrame index to a list and shuffled it using the shuffle
function. Then we used the shuffled index to select the rows from the original DataFrame.
sample
functionAnother way to shuffle a DataFrame is to use the sample
function. This function randomly selects a specified number of rows from a DataFrame. By setting the frac
parameter to 1.0
, we can select all the rows of the DataFrame in a random order.
shuffled_df = df.sample(frac=1.0)
print(shuffled_df)
This will output:
A B C
2 3 6 9
1 2 5 8
0 1 4 7
numpy.random.permutation
functionThe numpy.random.permutation
function can also be used to shuffle a DataFrame. This function shuffles an array by returning a permuted range of indices.
import numpy as np
# shuffle the index
index = np.random.permutation(df.index)
# extract rows in shuffled order
shuffled_df = df.loc[index]
print(shuffled_df)
This will output:
A B C
2 3 6 9
1 2 5 8
0 1 4 7
In this tutorial, we explored different ways of shuffling a DataFrame in Python. We used the shuffle
function from the random
module, the sample
function, and the numpy.random.permutation
function. Shuffling a DataFrame is useful when we want to randomly sample or create train and test sets from a dataset.