Spark

نگهداشت (retention) بر اساس کوهورت ماهانه

هر مشتری به کوهورت خودش (اولین ماه خرید) وصل میشه، سن به ماه pivot میشه و ستون‌های M1/M2/M3 نگهداشت رو به‌صورت درصدی از کوهورت اولیه نشون میدن.

کاربرد

پیگیری اینکه مشتری‌هایی که توی حراج جذب شدن، به اندازه بقیه برمی‌گردن یا نه.

پیش‌نیازها

PySpark 3.x

Python

from pyspark.sql import functions as F

base = commandes.withColumn("mois", F.trunc("order_date", "month"))
cohorte = base.groupBy("customer_id").agg(F.min("mois").alias("cohorte"))

retention = (
    base.join(cohorte, "customer_id")
        .withColumn("age_mois", F.months_between("mois", "cohorte").cast("int"))
        .groupBy("cohorte").pivot("age_mois", [0, 1, 2, 3])
        .agg(F.countDistinct("customer_id"))
)

for m in [1, 2, 3]:
    retention = retention.withColumn(
        f"M{m}", F.round(F.col(str(m)) / F.col("0") * 100, 1))

retention.select("cohorte", F.col("0").alias("clients"), "M1", "M2", "M3") \
         .orderBy("cohorte").show()

نتیجه

+----------+-------+----+----+----+
|   cohorte|clients|  M1|  M2|  M3|
+----------+-------+----+----+----+
|2026-01-01|  8,204|34.1|22.8|18.5|
|2026-02-01|  6,917|31.7|21.2|17.9|
|2026-03-01| 12,480|19.4|11.0| 8.2|
|2026-04-01|  7,308|33.2|22.5|null|
|2026-05-01|  7,841|32.9|null|null|
+----------+-------+----+----+----+

La cohorte de mars (soldes, 12 480 clients) retient 2x moins : acquis par
le prix, partis avec lui — recalibrer le coût d'acquisition de l'opération

PySparkCohorteRétentionpivot

اسنیپت‌های مرتبط

← بازگشت به آزمایشگاه داده