📅  最后修改于: 2020-11-06 05:17:30             🧑  作者: Mango
StorageLevel决定应如何存储RDD。在Apache Spark中,StorageLevel决定应将RDD存储在内存中还是应将其存储在磁盘上,或者两者都确定。它还决定是否序列化RDD以及是否复制RDD分区。
以下代码块具有StorageLevel的类定义-
class pyspark.StorageLevel(useDisk, useMemory, useOffHeap, deserialized, replication = 1)
现在,要确定RDD的存储,有不同的存储级别,如下所示:
DISK_ONLY = StorageLevel(真,假,假,假,1)
DISK_ONLY_2 = StorageLevel(真,假,假,假,2)
MEMORY_AND_DISK = StorageLevel(真,真,假,假,1)
MEMORY_AND_DISK_2 = StorageLevel(真,真,假,假,2)
MEMORY_AND_DISK_SER = StorageLevel(真,真,假,假,1)
MEMORY_AND_DISK_SER_2 = StorageLevel(真,真,假,假,2)
MEMORY_ONLY = StorageLevel(假,真,假,假,1)
MEMORY_ONLY_2 = StorageLevel(False,True,False,False,2)
MEMORY_ONLY_SER = StorageLevel(假,真,假,假,1)
MEMORY_ONLY_SER_2 = StorageLevel(错误,正确,错误,错误2)
OFF_HEAP = StorageLevel(真,真,真,假,1)
让我们考虑下面的StorageLevel示例,其中使用存储级别MEMORY_AND_DISK_2,这意味着RDD分区的复制为2。
------------------------------------storagelevel.py-------------------------------------
from pyspark import SparkContext
import pyspark
sc = SparkContext (
"local",
"storagelevel app"
)
rdd1 = sc.parallelize([1,2])
rdd1.persist( pyspark.StorageLevel.MEMORY_AND_DISK_2 )
rdd1.getStorageLevel()
print(rdd1.getStorageLevel())
------------------------------------storagelevel.py-------------------------------------
命令-命令如下-
$SPARK_HOME/bin/spark-submit storagelevel.py
输出-上面命令的输出如下-
Disk Memory Serialized 2x Replicated