PySpark SQL: 向DataFrame添加常量列
发布时间:2021-11-04 | 作者:小白学苑
1、构造一个DataFrame
# List
data = [{"Category": 'Category A', "ID": 1, "Value": 12.40},
{"Category": 'Category B', "ID": 2, "Value": 30.10},
{"Category": 'Category C', "ID": 3, "Value": 100.01}
]
# 创建DataFrame
df = spark.createDataFrame(data)
df.show()
df.printSchema()
执行以上代码,输出结果如下:
+----------+---+------+ | Category| ID| Value| +----------+---+------+ |Category A| 1| 12.4| |Category B| 2| 30.1| |Category C| 3|100.01| +----------+---+------+ root |-- Category: string (nullable = true) |-- ID: long (nullable = true) |-- Value: double (nullable = true)
2、使用lit 函数添加常量列
函数 lit 可用于向DataFrame添加具有常数值的列。
from datetime import date
from pyspark.sql.functions import lit
df1 = df.withColumn('ConstantColumn1', lit(1)) \
.withColumn('ConstantColumn2', lit(date.today()))
df1.show()
执行以上代码,输出结果如下:
+----------+---+------+---------------+---------------+ | Category| ID| Value|ConstantColumn1|ConstantColumn2| +----------+---+------+---------------+---------------+ |Category A| 1| 12.4| 1| 2020-08-11| |Category B| 2| 30.1| 1| 2020-08-11| |Category C| 3|100.01| 1| 2020-08-11| +----------+---+------+---------------+---------------+
3、通过Spark SQL添加新的常量列
df.createOrReplaceTempView("tb1")
df2 = spark.sql("select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from tb1")
df2.show()
执行以上代码,输出结果如下:
+----------+---+------+---------------+---------------+ | Category| ID| Value|ConstantColumn1|ConstantColumn2| +----------+---+------+---------------+---------------+ |Category A| 1| 12.4| 1| 2020-08-11| |Category B| 2| 30.1| 1| 2020-08-11| |Category C| 3|100.01| 1| 2020-08-11| +----------+---+------+---------------+---------------+
4、通过UDF添加新的常量列
from pyspark.sql.functions import udf
@udf("int")
def const_col():
return 1
df1 = df.withColumn('ConstantColumn1', const_col())
df1.show()
执行以上代码,输出结果如下:
+----------+---+------+---------------+ | Category| ID| Value|ConstantColumn1| +----------+---+------+---------------+ |Category A| 1| 12.4| 1| |Category B| 2| 30.1| 1| |Category C| 3|100.01| 1| +----------+---+------+---------------+
技术标签
Spark