📜  标签编码器 pyspark - 任何代码示例

📅  最后修改于: 2022-03-11 14:59:30.093000             🧑  作者: Mango

代码示例1
from pyspark.ml.feature import StringIndexer

df = sqlContext.createDataFrame(
            [(0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")],
            ["id", "category"]) 
indexer = StringIndexer(inputCol="category", outputCol="categoryIndex") 
indexed = indexer.fit(df).transform(df) 
indexed.show()