Spark

Forward fill: propagar el ultimo valor conocido

last(ignorenulls=True) sobre una ventana no acotada a la izquierda para rellenar los huecos de una serie: el ffill de pandas, en distribuido.

Requisitos

PySpark 3.x

Python
from pyspark.sql import functions as F
from pyspark.sql.window import Window

w_ffill = (
    Window.partitionBy("sensor_id")
    .orderBy("reading_ts")
    .rowsBetween(Window.unboundedPreceding, Window.currentRow)
)

df_filled = df.withColumn(
    "temperature_filled",
    F.last("temperature", ignorenulls=True).over(w_ffill),
)

# Backward fill : first(ignorenulls=True) sur la fenêtre miroir
# .rowsBetween(Window.currentRow, Window.unboundedFollowing)

Resultado

+---------+-------------------+-----------+------------------+
|sensor_id|         reading_ts|temperature|temperature_filled|
+---------+-------------------+-----------+------------------+
|     S-01|2026-06-09 10:00:00|       21.4|              21.4|
|     S-01|2026-06-09 10:05:00|       null|              21.4|
|     S-01|2026-06-09 10:10:00|       null|              21.4|
|     S-01|2026-06-09 10:15:00|       22.1|              22.1|
+---------+-------------------+-----------+------------------+
PySparkWindowForward fillImputation

Snippets relacionados

Volver al Data Lab