从元组列表创建 PySpark DataFrame
在本文中,我们将讨论从元组列表创建 Pyspark 数据框。
为此,我们将使用 pyspark 中的 createDataFrame() 方法。此方法从 RDD、列表或 Pandas 数据帧创建数据帧。这里数据将是元组列表,列将是列名列表。
句法:
dataframe = spark.createDataFrame(data, columns)
示例 1:
Python3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of tuples of college data
data = [("sravan", "IT", 80),
("jyothika", "CSE", 85),
("harsha", "ECE", 60),
("thanmai", "IT", 65),
("durga", "IT", 91)]
# giving column names of dataframe
columns = ["Name", "Branch", "Percentage"]
# creating a dataframe
dataframe = spark.createDataFrame(data, columns)
# show data frame
dataframe.show()
Python3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of tuples of plants data
data = [("mango", "AP", "Guntur"),
("mango", "AP", "Chittor"),
("sugar cane", "AP", "amaravathi"),
("paddy", "TS", "adilabad"),
("wheat", "AP", "nellore")]
# giving column names of dataframe
columns = ["Crop Name", "State", "District"]
# creating a dataframe
dataframe = spark.createDataFrame(data, columns)
# show data frame
dataframe.show()
Python3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
#list of tuples of plants data
data = [("mango", "AP", "Guntur"),
("mango", "AP", "Chittor"),
("sugar cane", "AP", "amaravathi"),
("paddy", "TS", "adilabad"),
("wheat", "AP", "nellore")]
# giving column names of dataframe
columns = ["Crop Name", "State", "District"]
# creating a dataframe
dataframe = spark.createDataFrame(data, columns)
#count records in the list
dataframe.count()
输出:
示例 2:
蟒蛇3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of tuples of plants data
data = [("mango", "AP", "Guntur"),
("mango", "AP", "Chittor"),
("sugar cane", "AP", "amaravathi"),
("paddy", "TS", "adilabad"),
("wheat", "AP", "nellore")]
# giving column names of dataframe
columns = ["Crop Name", "State", "District"]
# creating a dataframe
dataframe = spark.createDataFrame(data, columns)
# show data frame
dataframe.show()
输出:
示例 3:
计算列表中记录(元组)的Python代码
蟒蛇3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
#list of tuples of plants data
data = [("mango", "AP", "Guntur"),
("mango", "AP", "Chittor"),
("sugar cane", "AP", "amaravathi"),
("paddy", "TS", "adilabad"),
("wheat", "AP", "nellore")]
# giving column names of dataframe
columns = ["Crop Name", "State", "District"]
# creating a dataframe
dataframe = spark.createDataFrame(data, columns)
#count records in the list
dataframe.count()
输出:
5