📜  如何从 Pandas DataFrame 中随机选择行

📅  最后修改于: 2022-05-13 01:54:23.300000             🧑  作者: Mango

如何从 Pandas DataFrame 中随机选择行

让我们讨论如何从 Pandas DataFrame 中随机选择行。可以通过不同的方式从 DataFrame 中随机选择行。
使用列表字典创建一个简单的数据框。

Python3
# Import pandas package
import pandas as pd
  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# select all columns
df


Python3
# Selects one row randomly using sample()
# without give any parameters.
 
# Import pandas package
import pandas as pd
  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Select one row randomly using sample()
# without give any parameters
df.sample()


Python3
# To get 3 random rows
# each time it gives 3 different rows
 
# df.sample(3) or
df.sample(n = 3)


Python3
# Fraction of rows
 
# here you get .50 % of the rows
df.sample(frac = 0.5)


Python3
# fraction of rows
 
# here you get 70 % row from the df
# make put into another dataframe df1
df1 = df.sample(frac =.7)
 
# Now select 50 % rows from df1
df1.sample(frac =.50)


Python3
# Dataframe df has only 4 rows
 
# if we try to select more than 4 row then will come error
# Cannot take a larger sample than population when 'replace = False'
df1.sample(n = 3, replace = False)


Python3
# Select more than rows with using replace
# default it is False
df1.sample(n = 6, replace = True)


Python3
# Weights will be re-normalized automatically
test_weights = [0.2, 0.2, 0.2, 0.4]
 
df1.sample(n = 3, weights = test_weights)


Python3
# Accepts axis number or name.
 
# sample also allows users to sample columns
# instead of rows using the axis argument.
df1.sample(axis = 0)


Python3
# With a given seed, the sample will always draw the same rows.
 
# If random_state is None or np.random,
# then a randomly-initialized
# RandomState object is returned.
df1.sample(n = 2, random_state = 2)


Python3
# Import pandas & Numpy package
import numpy as np
import pandas as pd
  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Choose how many index include for random selection
chosen_idx = np.random.choice(4, replace = True, size = 6)
 
df2 = df.iloc[chosen_idx]
 
df2


方法 #1:使用sample()方法

示例 1:

Python3

# Selects one row randomly using sample()
# without give any parameters.
 
# Import pandas package
import pandas as pd
  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Select one row randomly using sample()
# without give any parameters
df.sample()

输出:

示例 2:使用参数n ,它随机选择n行。
使用 sample(n) 或 sample(n=n) 随机选择n行。每次你运行这个,你会得到 n 不同的行。

Python3

# To get 3 random rows
# each time it gives 3 different rows
 
# df.sample(3) or
df.sample(n = 3)

输出:

示例 3:使用 frac 参数。
一个人可以做部分轴项目并获得行。例如,如果 frac= .5 则 sample 方法返回 50% 的行。

Python3

# Fraction of rows
 
# here you get .50 % of the rows
df.sample(frac = 0.5)

输出:

示例 4:
首先选择整个df数据帧的 70% 行,然后放入另一个数据帧df1 ,然后我们从df1中选择 50% frac 。

Python3

# fraction of rows
 
# here you get 70 % row from the df
# make put into another dataframe df1
df1 = df.sample(frac =.7)
 
# Now select 50 % rows from df1
df1.sample(frac =.50)

输出:

示例 5:使用 replace = false 随机选择一些行
参数替换允许多次选择一行(如)。 sample() 方法的 replace 参数的默认值为 False,因此您永远不会选择超过总行数。

Python3

# Dataframe df has only 4 rows
 
# if we try to select more than 4 row then will come error
# Cannot take a larger sample than population when 'replace = False'
df1.sample(n = 3, replace = False)

输出:

示例 6:在替换的帮助下选择多于n行,其中n是总行数。

Python3

# Select more than rows with using replace
# default it is False
df1.sample(n = 6, replace = True)

输出:

示例 7:使用权重

Python3

# Weights will be re-normalized automatically
test_weights = [0.2, 0.2, 0.2, 0.4]
 
df1.sample(n = 3, weights = test_weights)

输出:

示例 8:使用轴
轴接受数字或名称。 sample() 方法还允许用户使用轴参数对列而不是行进行采样。

Python3

# Accepts axis number or name.
 
# sample also allows users to sample columns
# instead of rows using the axis argument.
df1.sample(axis = 0)

输出:

示例 9:使用 random_state
对于给定的 DataFrame,样本将始终获取相同的行。如果 random_state 为 None 或 np.random,则返回一个随机初始化的 RandomState 对象。

Python3

# With a given seed, the sample will always draw the same rows.
 
# If random_state is None or np.random,
# then a randomly-initialized
# RandomState object is returned.
df1.sample(n = 2, random_state = 2)

输出:


方法 #2:使用 NumPy
Numpy 选择包含多少索引以进行随机选择,我们可以允许替换。

Python3

# Import pandas & Numpy package
import numpy as np
import pandas as pd
  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Choose how many index include for random selection
chosen_idx = np.random.choice(4, replace = True, size = 6)
 
df2 = df.iloc[chosen_idx]
 
df2

输出: