初始设置

设置表名、基本路径和数据生成器,以生成示例所需要的记录,代码如下:

// spark-shell
import org.apache.hudi.QuickstartUtils._
import scala.collection.JavaConversions._
import org.apache.spark.sql.SaveMode._
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._

val tableName = "hudi_trips_cow"
val basePath = "hdfs://xueai8:8020/hudi/hudi_trips_cow"
val dataGen = new DataGenerator

// 测试生成的json数据集
convertToStringList(dataGen.generateInserts(2)).foreach(println)

执行以上代码,可以看到Hudi的数据生成器生成2条JSON数据,数据格式如下:

{"ts": 1647196090688, "uuid": "421b6078-2d2c-4e23-a6f5-b64713bdf81d", "rider": "rider-284", "driver": "driver-284", "begin_lat": 0.7340133901254792, "begin_lon": 0.5142184937933181, "end_lat": 0.7814655558162802, "end_lon": 0.6592596683641996, ......
          

......

抱歉,只有登录会员才可浏览!会员登录


《PySpark原理深入与编程实战》