
Gathering player metrics
I’m thinking about the various metrics to provide, like goals, expected goals (xG), assists, critical contributions, and more. I should define "关键球员," likely referring to players who make clutch contributions during key moments of the game. I need to ask clarifying questions, such as the league and season the user is interested in, as well as their preference for language and whether they want a live ranking. If network access is an issue, I might suggest an offline approach using a CSV file and a pandas script for calculations.Planning player ranking tasks
I think the instruction to use a planning tool feels unnecessary since this is quite straightforward. I should keep my answer concise by asking just a couple of questions and offering some default suggestions. Instead of creating a list with hypothetical names, which could be misleading, I'll provide a template with sample placeholder rows. I also aim to generate a reusable notebook or Python script that calculates ranking from a CSV file of match events, including required columns, a weighting scheme, and recommended visualizations.Clarifying user intent for ranking
The user only provided a phrase, so they might be looking for a ranked list. I should ask some clarifying questions: What league or season are they interested in? What metrics should I focus on for determining "key contributions"? Do they want an automated script for rankings or a manually curated list/template? If they don’t respond, I could suggest using the EPL 2023-24 season and provide a formula combining expected points added, non-penalty G+A per 90, and other relevant metrics along with an implementation offer and code snippet as an example.Creating a pandas function for player metrics
I need to develop a small pandas function that expects several columns, including player statistics like minutes, goals, xG, and more. Computing expected points (xP) usually requires event-level timelines, which might not always be available, so we can approximate with go-ahead goals instead. I'll write code to read a CSV with the necessary columns, compute per 90 rates, and create a contribution index based on player position with corresponding weights.
I'll normalize the metrics using z-scores or min-max methods and generate an overall "关键指数," outputting the top 20 players in the process. Since it specifies "ASCII only," I’ll ensure the code is clear and the explanation concise.Providing player ranking methodology
I’ll start by saying, “Sure! Here’s a definition, calculation method, and runnable script for a 'key player performance ranking'.” I’ll quickly ask clarifying questions.
I’ll include bullet points defining components of the "关键指数." I'll provide a Python script in a code block for practical use and explain how to run it. The user should save their CSV and run the script, which will output key_players_rank.csv.
I'll invite them to share the league and season or their dataset, or ask if I can fetch data if allowed. I'll make sure to keep the style concise!
想做哪一个联赛/赛季的榜单?以及“关键”你更偏好哪种定义:
- 制胜与抢分:基于预期积分增量(xP added)、制胜/反超进球与助攻
- 全面贡献:进球/助攻、xG/xA、推进、压迫与抢断、上场可用性
- 仅进攻或分位置(前锋/中场/后卫/门将分别评估)
如果你暂不指定,我可以给出通用“关键指数”的计算方法和可运行脚本;你把数据CSV喂进去就能产出排行榜。
关键指数(通用版,分位置加权)
- 前锋:0.5 进球与npxG + 0.25 机会创造(xA/SCA) + 0.15 推进(渐进传/带) + 0.10 场上影响(on/off xG差) + 0.05 关键时刻
- 中场:0.30 进球与npxG + 0.35 机会创造 + 0.20 推进 + 0.10 防守动作(压迫/抢断+拦截) + 0.05 影响
- 后卫:0.15 进球与npxG + 0.20 机会创造 + 0.25 推进 + 0.30 防守动作 + 0.10 影响
- 门将:PSxG-GA为主,辅以出击/高球、传球指标
- 统一做每90分钟化;用稳健z分数标准化;对上场时间做轻惩罚(<900分钟会降权)
可运行脚本(把你的赛季汇总CSV喂进去)
- 必需列(尽可能):player, team, position, minutes, goals, pens_made, xg, npxg, assists, xa, sca, gca, prog_passes, prog_carries, pressures, tackles, interceptions, on_off_xg_diff
- 可选列(有则用):match_winners, go_ahead_g+a, gk_psxg, gk_ga, gk_cross_stops, sweeper_actions, pass_launch_cmp
保存为 rank_key_players.py,按需改输入/输出路径:
# rank_key_players.py
import pandas as pd
import numpy as np
def robust_z(s: pd.Series) -> pd.Series:
s = s.astype(float)
med = s.median()
mad = (s - med).abs().median()
if mad == 0:
std = s.std(ddof=0)
return (s - s.mean()) / (std if std else 1.0)
return 0.6745 * (s - med) / (mad if mad else 1.0)
def per90(df, col, minutes_col="minutes"):
mins = df[minutes_col].replace(0, np.nan)
return df[col] / (mins / 90.0)
def safe_col(df, name, default=0.0):
return df[name] if name in df.columns else pd.Series(default, index=df.index, dtype=float)
def compute_index(df: pd.DataFrame) -> pd.DataFrame:
df = df.copy()
for c in ["minutes","goals","pens_made","xg","npxg","assists","xa","sca","gca",
"prog_passes","prog_carries","pressures","tackles","interceptions",
"on_off_xg_diff","gk_psxg","gk_ga","gk_cross_stops",
"sweeper_actions","pass_launch_cmp","match_winners","go_ahead_ga"]:
if c in df.columns:
df[c] = pd.to_numeric(df[c], errors="coerce").fillna(0.0)

# Per90s
df["npg"] = (df["goals"] - safe_col(df,"pens_made")).clip(lower=0)
df["npg90"] = per90(df, "npg")
df["npxg90"] = per90(df, "npxg") if "npxg" in df.columns else per90(df,"xg")
df["xa90"] = per90(df, "xa")
df["sca90"] = per90(df, "sca")
df["gca90"] = per90(df, "gca")
df["prog_pass90"] = per90(df, "prog_passes")
df["prog_carry90"] = per90(df, "prog_carries")
df["press90"] = per90(df, "pressures")
df["tack_int90"] = per90(df, "tackles") + per90(df, "interceptions")
df["impact90"] = per90(df, "on_off_xg_diff") if "on_off_xg_diff" in df.columns else 0.0
df["clutch90"] = 0.0
if "match_winners" in df.columns:
df["clutch90"] += per90(df, "match_winners")
if "go_ahead_ga" in df.columns:
df["clutch90"] += 0.5 * per90(df, "go_ahead_ga")
# Sub-scores (robust z)
scoring = robust_z(0.55*df["npxg90"] + 0.45*df["npg90"])
creation = robust_z(0.5*df["xa90"] + 0.3*df["sca90"] + 0.2*df["gca90"])
progression = robust_z(0.6*df["prog_pass90"] + 0.4*df["prog_carry90"])
defensive = robust_z(0.6*df["press90"] + 0.4*df["tack_int90"])
impact = robust_z(df["impact90"]) if isinstance(df["impact90"], pd.Series) else 0.0
clutch = robust_z(df["clutch90"]) if isinstance(df["clutch90"], pd.Series) else 0.0
# Position group
pos = df["position"].str.upper().fillna("")
is_gk = pos.str.contains("GK")
is_def = pos.str.contains("CB|RB|LB|RWB|LWB|DF|FB")
is_mid = pos.str.contains("DM|CM|AM|RM|LM|MF|MID")
is_fwd = pos.str.contains("FW|ST|CF|LW|RW|W")
# Outfield index by weights
outfield_idx = (
np.where(is_fwd, 0.50*scoring + 0.25*creation + 0.15*progression + 0.10*impact + 0.05*clutch,
np.where(is_mid, 0.30*scoring + 0.35*creation + 0.20*progression + 0.10*defensive + 0.05*impact,
np.where(is_def, 0.15*scoring + 0.20*creation + 0.25*progression + 0.30*defensive + 0.10*impact,
0.30*scoring + 0.30*creation + 0.20*progression + 0.15*defensive + 0.05*impact)))
)
# Goalkeeper index
if "gk_psxg" in df.columns and "gk_ga" in df.columns:
psxg_ga = df["gk_psxg"] - df["gk_ga"] # positive is good
gk_core = robust_z(psxg_ga)
else:
gk_core = pd.Series(0.0, index=df.index)
gk_cross = robust_z(safe_col(df,"gk_cross_stops"))
gk_sweeper = robust_z(per90(df, "sweeper_actions")) if "sweeper_actions" in df.columns else pd.Series(0.0, index=df.index)
gk_pass = robust_z(safe_col(df,"pass_launch_cmp"))
gk_idx = gk_core + 0.25*gk_cross + 0.25*gk_sweeper + 0.20*gk_pass
base_idx = np.where(is_gk, gk_idx, outfield_idx)
# Availability penalty for very low minutes
mins = df["minutes"].clip(lower=0)
avail = np.minimum(1.0, mins / 900.0) 0.5 # soft penalty under 900'
final_idx = base_idx * avail

df["key_index"] = final_idx
# Secondary sort helpers
df["g_a90"] = per90(df, "goals") + per90(df, "assists")
cols = ["player","team","position","minutes","key_index","g_a90","npg90","npxg90","xa90","sca90","prog_pass90","prog_carry90","press90","tack_int90","impact90"]
keep = [c for c in cols if c in df.columns or c in ["key_index","g_a90","npg90","npxg90","xa90","sca90","prog_pass90","prog_carry90","press90","tack_int90","impact90"]]
out = df[["player","team","position","minutes"]].copy()
for c in keep[4:]:
out[c] = df[c]
out = out.sort_values(["key_index","g_a90","minutes"], ascending=[False, False, False])
return out
if __name__ == "__main__":
import argparse
ap = argparse.ArgumentParser()
ap.add_argument("--in", dest="inp", required=True, help="Input CSV with season aggregates")
ap.add_argument("--out", dest="outp", default="key_players_rank.csv", help="Output CSV")
ap.add_argument("--top", dest="topk", type=int, default=50, help="Rows to print")
args = ap.parse_args()
df = pd.read_csv(args.inp)
ranked = compute_index(df)
ranked.to_csv(args.outp, index=False)
print(ranked.head(args.topk).to_string(index=False, float_format=lambda x: f"{x:,.3f}"))
用法
- 准备一个赛季汇总CSV(每行一个球员),列名按上面说明。
- 运行:python rank_key_players.py --in your_season.csv --out key_players_rank.csv --top 25
- 输出文件包含关键指数与主要每90指标;默认按关键指数降序。
需要我:
- 直接给出某联赛某赛季的前20名单;或
- 把脚本改成你现有数据的列名/口径;或
- 我来抓取公开数据(FBref/Understat)并跑一版(说明联赛与赛季)。
