如何从 Pandas DataFrame 中随机选择行

让我们讨论如何从 Pandas DataFrame 中随机选择行。可以通过不同的方式从 DataFrame 中随机选择行。
使用列表字典创建一个简单的数据框。

Python3

# Import pandas package
import pandas as pd
  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# select all columns
df

Python3

# Selects one row randomly using sample()
# without give any parameters.
 
# Import pandas package
import pandas as pd
  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Select one row randomly using sample()
# without give any parameters
df.sample()

Python3

# To get 3 random rows
# each time it gives 3 different rows
 
# df.sample(3) or
df.sample(n = 3)

Python3

# Fraction of rows
 
# here you get .50 % of the rows
df.sample(frac = 0.5)

Python3

# fraction of rows
 
# here you get 70 % row from the df
# make put into another dataframe df1
df1 = df.sample(frac =.7)
 
# Now select 50 % rows from df1
df1.sample(frac =.50)

Python3

# Dataframe df has only 4 rows
 
# if we try to select more than 4 row then will come error
# Cannot take a larger sample than population when 'replace = False'
df1.sample(n = 3, replace = False)

Python3

# Select more than rows with using replace
# default it is False
df1.sample(n = 6, replace = True)

Python3

# Weights will be re-normalized automatically
test_weights = [0.2, 0.2, 0.2, 0.4]
 
df1.sample(n = 3, weights = test_weights)

Python3

# Accepts axis number or name.
 
# sample also allows users to sample columns
# instead of rows using the axis argument.
df1.sample(axis = 0)

Python3

# With a given seed, the sample will always draw the same rows.
 
# If random_state is None or np.random,
# then a randomly-initialized
# RandomState object is returned.
df1.sample(n = 2, random_state = 2)

Python3

# Import pandas & Numpy package
import numpy as np
import pandas as pd
  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Choose how many index include for random selection
chosen_idx = np.random.choice(4, replace = True, size = 6)
 
df2 = df.iloc[chosen_idx]
 
df2

方法 #1：使用sample()方法

Sample method returns a random sample of items from an axis of object and this object of same type as your caller.

编程需要懂一点英语

示例 1：

Python3

# Selects one row randomly using sample()
# without give any parameters.
 
# Import pandas package
import pandas as pd
  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Select one row randomly using sample()
# without give any parameters
df.sample()

输出：

示例 2：使用参数n ，它随机选择n行。
使用 sample(n) 或 sample(n=n) 随机选择n行。每次你运行这个，你会得到 n 不同的行。

Python3

# To get 3 random rows
# each time it gives 3 different rows
 
# df.sample(3) or
df.sample(n = 3)

输出：

示例 3：使用 frac 参数。
一个人可以做部分轴项目并获得行。例如，如果 frac= .5 则 sample 方法返回 50% 的行。

Python3

# Fraction of rows
 
# here you get .50 % of the rows
df.sample(frac = 0.5)

输出：

示例 4：
首先选择整个df数据帧的 70% 行，然后放入另一个数据帧df1 ，然后我们从df1中选择 50% frac 。

Python3

# fraction of rows
 
# here you get 70 % row from the df
# make put into another dataframe df1
df1 = df.sample(frac =.7)
 
# Now select 50 % rows from df1
df1.sample(frac =.50)

输出：

示例 5：使用 replace = false 随机选择一些行
参数替换允许多次选择一行（如）。 sample() 方法的 replace 参数的默认值为 False，因此您永远不会选择超过总行数。

Python3

# Dataframe df has only 4 rows
 
# if we try to select more than 4 row then will come error
# Cannot take a larger sample than population when 'replace = False'
df1.sample(n = 3, replace = False)

输出：

示例 6：在替换的帮助下选择多于n行，其中n是总行数。

Python3

# Select more than rows with using replace
# default it is False
df1.sample(n = 6, replace = True)

输出：

示例 7：使用权重

Python3

# Weights will be re-normalized automatically
test_weights = [0.2, 0.2, 0.2, 0.4]
 
df1.sample(n = 3, weights = test_weights)

输出：

示例 8：使用轴
轴接受数字或名称。 sample() 方法还允许用户使用轴参数对列而不是行进行采样。

Python3

# Accepts axis number or name.
 
# sample also allows users to sample columns
# instead of rows using the axis argument.
df1.sample(axis = 0)

输出：

示例 9：使用 random_state
对于给定的 DataFrame，样本将始终获取相同的行。如果 random_state 为 None 或 np.random，则返回一个随机初始化的 RandomState 对象。

Python3

# With a given seed, the sample will always draw the same rows.
 
# If random_state is None or np.random,
# then a randomly-initialized
# RandomState object is returned.
df1.sample(n = 2, random_state = 2)

输出：

方法 #2：使用 NumPy
Numpy 选择包含多少索引以进行随机选择，我们可以允许替换。

Python3

# Import pandas & Numpy package
import numpy as np
import pandas as pd
  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Choose how many index include for random selection
chosen_idx = np.random.choice(4, replace = True, size = 6)
 
df2 = df.iloc[chosen_idx]
 
df2

输出：