根据列表中的匹配值过滤 PySpark DataFrame 中的一行
在本文中,我们将使用 Pyspark 数据帧中的 isin 根据列表中的匹配值过滤数据帧中的行
isin():用于查找给定数据框中包含的元素,它将获取元素并获取与数据匹配的元素
Syntax: isin([element1,element2,.,element n])
创建用于演示的数据框:
Python3
# importing module
import pyspark
# importing sparksession
from pyspark.sql import SparkSession
# creating sparksession
# and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data with null values
# we can define null values with none
data = [[1, "sravan", "vignan"],
[2, "ramya", "vvit"],
[3, "rohith", "klu"],
[4, "sridevi", "vignan"],
[5, "gnanesh", "iit"]]
# specify column names
columns = ['ID', 'NAME', 'college']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
dataframe.show()
Python3
# get the ID : 1,2,3 from dataframe
dataframe.filter((dataframe.ID).isin([1,2,3])).show()
Python3
# get the ID : not in 1 and 3 from dataframe
dataframe.filter(~(dataframe.ID).isin([1, 3])).show()
Python3
# get name as sravan
dataframe.filter((
dataframe.NAME).isin(['sravan'])).show()
Python3
# get college as vignan
dataframe.where((
dataframe.college).isin(['vignan'])).show()
输出:
方法一:使用 filter() 方法
用于检查条件并给出结果,两者相似
Syntax: dataframe.filter(condition)
Where, condition is the dataframe condition.
在这里,我们将使用所有讨论过的方法。
Syntax: dataframe.filter((dataframe.column_name).isin([list_of_elements])).show()
where,
- column_name is the column
- elements are the values that are present in the column
- show() is used to show the resultant dataframe
示例 1:使用 filter() 子句获取特定 ID。
蟒蛇3
# get the ID : 1,2,3 from dataframe
dataframe.filter((dataframe.ID).isin([1,2,3])).show()
输出:
示例 2:获取 ID 不存在于 1 和 3 中
蟒蛇3
# get the ID : not in 1 and 3 from dataframe
dataframe.filter(~(dataframe.ID).isin([1, 3])).show()
输出:
示例 3:从数据框中获取名称。
蟒蛇3
# get name as sravan
dataframe.filter((
dataframe.NAME).isin(['sravan'])).show()
输出:
方法二:使用 where() 方法
where()用于检查条件并给出结果
Syntax: dataframe.where(condition)
where, condition is the dataframe condition
where 子句的整体语法:
dataframe.where((dataframe.column_name).isin([elements])).show()
where,
- column_name is the column
- elements are the values that are present in the column
- show() is used to show the resultant dataframe
示例:使用 where() 子句获取特定大学
蟒蛇3
# get college as vignan
dataframe.where((
dataframe.college).isin(['vignan'])).show()
输出: