Methodology

How Wayfinder's deck-strength and player-rating metrics are computed.

The Four Rating Systems

Wayfinder maintains parallel rating systems because each one answers a different competitive question.

Rating
Question
Updates
Best For
What is the next match's most likely outcome?
Every match
Match-outcome prediction
What is a player's true skill, with confidence?
Once per tournament event, with Swiss batched and elimination rounds separated
Leaderboards, long-term skill estimation, and trends
How strong is this deck at equal pilot skill?
Weekly per era from the M2 logistic-regression fit
Deck strength tier lists
How much does pilot skill matter for this deck?
Weekly per era from a per-deck refit
Skill ceiling and deck difficulty

The Rating Tables

Each system is split into independent tables by subject (player or archetype), format (Premier or Limited), and source (in-person melee.gg tournaments or online Karabast). Ratings never cross between sources — a Karabast result can never move your in-person sanctioned rating, and a tournament result can never move your Karabast rating. They're separate ladders that happen to share a math engine.

System
Subject
Format
Source
Notes
Player
Premier
melee.gg
In-person Premier — every tier of tournament, all weighted equally.
Player
Premier
Karabast
Online Premier — separate ladder, never bleeds into the in-person rating.
Player
Limited
melee.gg
In-person Sealed + Draft tournaments.
Player
Limited
Karabast
Online Limited (PTP).
Archetype
melee.gg
Format is encoded in the archetype name itself, e.g. “Han Solo (Limited)”.
Archetype
Karabast
Player
Premier
melee.gg
Sanctioned tiers only (PQ, MA1, SQ, MA2, RQ, GC, GO). Locals excluded.
Player
Premier
Karabast
All Karabast Premier — never visible on the sanctioned ladder.
Player
Limited
melee.gg
Sanctioned tiers only.
Player
Limited
Karabast
All Karabast Limited.
Archetype
Premier
melee.gg
Current era only — α/γ refit each era closes the prior table.
Archetype
Premier
Karabast
Current era only.

Twelve tables in all — six ELO and six Glicko-2. The skill model (α, γ) is fit once over the Premier sanctioned melee.gg pool and does not have separate Karabast or Limited variants today.

The Win Rate Metrics

Metric
Stripped Out
Left In
Use
WR (raw win rate)
Nothing
Pilot skill, matchup luck, opponent strength
Easy-to-explain anchor metric
MWR (matchup-weighted)
Random matchup luck
Pilot skill
What this deck would win into the current field
MαWR (skill-adjusted, meta-weighted)
Pilot skill and matchup noise
Pure deck strength against the field mix
Current power rankings
αWR (skill-adjusted)
Pilot skill
Deck strength plus true matchup distribution
Honest tier-list comparisons

The gap between MWR and MαWR is informative. A big positive gap means strong pilots are inflating the deck's results. A big negative gap means the deck is better than raw results suggest.

The ELO Details

Wayfinder runs Arpad Elo's original update with a single product-specific knob: a per-event-tier K multiplier so that a Galactic match moves ratings more than a local FNM match. There is no bonus for winning the event itself — top-cut, semi-final, and final matches use the same K as Swiss matches at the same tier.

  • Starting rating 1500. Floor 1000 — no rating drops below that.
  • Expected score: 1 / (1 + 10^((R_B − R_A) / 400))
  • New rating: max(1000, round(R + K × (actual − expected))) where actual is 1 / 0.5 / 0.
  • Intentional draws (the 0-0-3 "ID into Top 8" line) are excluded entirely. Played draws use standard ELO draw math.
  • Final K = baseK(matchCount) × tierMultiplier. Base K shrinks as the rating stabilises:
Matches
Base K
Note
< 20 (player)
40
Fresh rating — moves fast to find the right neighbourhood.
20 – 80 (player)
32
Settling. Still responsive to streaks.
> 80 (player)
24
Stable. A single result is a small fraction of the rating.
< 50 (archetype)
32
Archetypes see far more matches than people.
50 – 200 (archetype)
24
> 200 (archetype)
16
Archetype meta is slow to revalue — set rotation moves it more than any single game.

Karabast matches use the default 1.0× K. There used to be a 0.25× discount for anonymous opponents and a 0.75× discount for registered Karabast opponents (intended to reflect identity confidence). Those are gone — Karabast is online play, not "less real" play, and the rating system shouldn't try to compensate for who might be on the other side. The separation that matters is structural: Karabast results feed only the Karabast rating tables. They never update the in-person sanctioned ratings.

Event Tier Multipliers

The K-factor on each sanctioned in-person match is scaled by the tier code that swuapi stores on the tournament. Higher-tier events have stronger fields, so a result there carries more information about a player's skill.

Tier
K ×
Why
GC (Galactic Championship)
1.50
Top of the sanctioned ladder — full-rated field, peak stakes.
GO (Galactic Open)
1.50
Open-format Galactic event — same field-strength as GC.
RQ (Regional Qualifier)
1.50
Multi-region; field is stronger than a single Planetary/Sector pool.
PQ (Planetary Qualifier)
1.25
Largest weekend sanctioned tier by volume.
SQ (Sector Qualifier)
1.25
Sector-level sanctioned qualifier.
MA1 / MA2 (Majors)
1.25
Large multi-day non-qualifier events on the major calendar.
Everything else (local, store showdown, casual)
1.00
Default. No tier code → no boost.

Glicko-2 doesn't use a multiplier — instead it gates inclusion. Only sanctioned tiers on melee.gg (PQ, MA1, SQ, MA2, RQ, GC, GO) count for the sanctioned Glicko-2 ladder; locals and casual side events are excluded so the ladder stays comparable across regions. Karabast results form their own independent Glicko-2 ladders by format and never appear on the sanctioned one.

The Glicko-2 Details

Wayfinder follows Glickman's 2012 spec with a few product-specific choices.

  • Default rating 1500, default RD 250, default volatility 0.06, system constant τ = 0.5.
  • Why RD 250 and not the textbook 350? Glickman's paper uses 350 to mean "the system knows nothing about a new player." Wayfinder's prior is that a player who shows up at a sanctioned event already has non-trivial skill, so 250 is a tighter starting confidence band.
  • Rating periods are per-event, not per-match. All Swiss rounds in a tournament collapse into one rating period. Each top-cut single-elimination round (QF, SF, F) is its own period so the inter-round dependencies in the bracket are respected. This is the batching Glickman's paper actually recommends; per-match Glicko-2 over-shrinks RD and produces overconfident leaderboards.
  • Premier and Limited ratings are separate buckets per player and source.
  • Conservative ranking uses rating − 2 × RD, so an uncertain rating does not outrank a stable one until it has earned its confidence.
  • RD inflates with inactivity — the confidence band widens when a player stops playing, and shrinks again when they return.
  • The convergence loop on volatility (σ) is the Illinois algorithm with tolerance ε = 10⁻⁶.

Leaderboard Eligibility

Two thresholds keep noise off the public boards:

  • Minimum 30 matches on the chosen source + format before a player appears on the ELO or Glicko-2 leaderboard. Fewer than 30 and the rating is still tracked, just hidden from the ladder.
  • Percentile bands and "your rating vs. the field" widgets use a softer floor of 5 matches — enough to place someone roughly, not enough to claim a rank.
  • Karabast online play is rated but never enters the sanctioned Glicko-2 ladder. It has its own ELO leaderboard per source.

The Skill Model

α and γ come from a logistic-regression fit on sanctioned matches in the era.

P(deck D beats deck E | elo_D, elo_E)
  = sigmoid(α_D - α_E + γ_global * (elo_D - elo_E) * κ)

where κ = ln(10) / 400

α is the per-deck intercept, anchored so zero is average for the field. Positive α means the deck wins more than the matchup math says it should at equal pilot ELO.

γ captures how much pilot rating matters for that deck. High values mark decks with a high skill ceiling; low values mark more forgiving decks.

Time Scales

Every metric is computed at three granularities, stored in snapshots and skill-adjusted ratings.

  • Weekly — one snapshot per Monday, driving sparklines and deltas.
  • Cumulative — era-start through that Monday, driving current dashboards.
  • Per-era — one closed row per era, with the active era updated weekly.

Data Sources & Cross-References

Wayfinder ingests competitive results from melee.gg (in-person tournaments, via the swuapi mirror), Karabast (online play, two distinct streams), and swuapi's archetype catalog (canonical archetype definitions). Each carries a different selection bias, and surfacing them separately matters:

  • melee.gg pool — tournament results, the gold standard for in-person sanctioned play. Bias: only players who entered ranked events.
  • Karabast public-games poll — unauthenticated 20–35s poll of /api/ongoing-games and /api/available-lobbies. Bias: players who chose public lobbies (skews toward streamers, content creators, and people not using quick-match). No outcomes — matchups only.
  • Karabast extension stream — recorded by Wayfinder users running the browser extension. Bias: a self-selected, competitive-leaning cohort. Provides outcomes + handles. Surfaced under the Plugin source in the universal filter bar.
  • Karabast spectator captures — authenticated Socket.IO spectator stream, third-person observation of public lobbies. Bias: same public-lobbies cohort as the poll, but with full game state (winners, handles, replay frames). Capped by Karabast's one-game-at-a-time spectator constraint.

The Karabast Meta surface compares against karameta.pages.dev, a community-built Karabast prevalence tracker, as a third-party benchmark. When our public-games poll and karameta's rollup disagree by more than the expected sampling noise, that's a gap-analysis signal — usually an archetype-catalog or resolver gap on one side, and a useful prompt to investigate.

References

  • Elo, A. The Rating of Chessplayers, Past and Present (1978) — original K-factor system.
  • Glickman, M. Example of the Glicko-2 System (2012) — paper Wayfinder's update step follows step-for-step.
  • karameta.pages.dev — community Karabast prevalence tracker; used as a third-party cross-reference for the online cohort.
Methodology | Wayfinder News