Python

Pareto analysis: value_counts and cumulative share

Counts, percentages and cumulative share in three lines: pinpoints how many categories account for 80% of the volume.

Prerequisites

Python 3.9+, pandas

Python
import pandas as pd

counts = df["motif_retour"].value_counts()
pareto = pd.DataFrame({
    "nb": counts,
    "part_pct": (counts / counts.sum() * 100).round(1),
})
pareto["cumul_pct"] = pareto["part_pct"].cumsum().round(1)

top = pareto[pareto["cumul_pct"] <= 80]
print(pareto.head(6))
print(f"\n{len(top)} motifs expliquent 80 % des retours")

Result

                    nb  part_pct  cumul_pct
motif_retour
taille_incorrecte  612      38.3       38.3
article_endommage  389      24.3       62.6
ne_correspond_pas  201      12.6       75.2
change_avis        118       7.4       82.6
erreur_livraison    97       6.1       88.7
autre               83       5.2       93.9

3 motifs expliquent 80 % des retours
Pandasvalue_countsParetoAnalyse

Related snippets

Back to the Data Lab