本文共 1104 字,大约阅读时间需要 3 分钟。
RDD -》 DF
有两种方式
一、
将 RDD[t] 转为一个 object ,然后 to df
val peopleDF = spark.sparkContext .textFile("examples/src/main/resources/people.txt") .map(_.split(",")) .map(attributes => Person(attributes(0), attributes(1).trim.toInt)) .toDF()
rdd 也能直接装 DATASet 要 import 隐式装换 类 import spark.implicits._
如果 转换的对象为 tuple . 转换后 下标为 _1 _2 .....
二、Programmatically Specifying the Schema
把 columnt meta 和 rdd createDataFrame 在一起
val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt")// The schema is encoded in a stringval schemaString = "name age"// Generate the schema based on the string of schemaval fields = schemaString.split(" ") .map(fieldName => StructField(fieldName, StringType, nullable = true))val schema = StructType(fields)
val rowRDD = peopleRDD .map(_.split(",")) .map(attributes => Row(attributes(0), attributes(1).trim))// Apply the schema to the RDDval peopleDF = spark.createDataFrame(rowRDD, schema)// Creates a temporary view using the DataFramepeopleDF.createOrReplaceTempView("people")
DF to RDd
val tt = teenagersDF.rdd
转载地址:http://fylei.baihongyu.com/