Fundamentals 11 min read

Can the Massey Method Predict the World Cup Winner? A Data‑Driven Ranking Study

This article explains the Massey ranking method, shows how to build the required matrices and vectors from World Cup match data, implements the model in Python, and compares three scoring strategies to forecast whether Argentina or France will win the tournament.

Model Perspective
Model Perspective
Model Perspective
Can the Massey Method Predict the World Cup Winner? A Data‑Driven Ranking Study

Massey’s Method in Mathematics

The Massey method, originally proposed by K. Massey in 1997 for ranking American college football teams, uses a least‑squares solution of a linear system to assign strength scores to teams based on match outcomes and cumulative score differences.

Predicting the World Cup with the Massey Method

To forecast the 2022 World Cup final, the author applies the Massey method to the two finalist teams, Argentina and France. The core idea is that the difference between the teams' scores predicts the advantage of the winning side in a match.

The model constructs a sparse "Massey matrix" where each row corresponds to a match: a 1 appears in the column of the winning team, a –1 in the column of the losing team, and the diagonal entries equal the number of matches each team has played. The right‑hand vector contains the cumulative score differences (or other defined advantages) for each team.

Because the system is usually over‑determined and singular, a constraint is added by replacing one row with all‑ones and setting the corresponding element of the right‑hand vector to zero, forcing the sum of scores to be zero and yielding a full‑rank matrix.

Data

The match data from the group stage to the semi‑finals were collected manually and stored in an Excel file. The data include the two teams in each match and their respective scores.

Code

<code># packages
import pandas as pd
import numpy as np

data = pd.read_excel('data/2022worldCup.xlsx')

# all teams
teams1 = set(data['队伍1'].unique())
teams2 = set(data['队伍2'].unique())
teams = teams1 | teams2
team_list = list(teams)

# compute game number
games_array = data[['队伍1','队伍2']].values
games_list = [{games_array[n,0],games_array[n,1]} for n in range(len(games_array))]

# construct dataframe of competition
df_comp = pd.DataFrame(data=np.zeros((len(team_list),len(team_list))), index=team_list, columns=team_list)
df_comp1 = df_comp.copy()

# fill numbers
for t1 in df_comp.index:
    for t2 in df_comp.columns:
        for i in data.index:
            t1t2 = data.loc[i,['队伍1','队伍2']].tolist()
            if [t1,t2] == t1t2 or [t2,t1] == t1t2:
                df_comp1.loc[t1,t2] -= 1

# Fill in the diagonal values
df_comp2 = df_comp1.copy()
for t in df_comp.index:
    df_comp2.loc[t,t] = - (df_comp1.loc[t,:]).sum()

total_number_game_list = [df_comp2.loc[t,t] for t in df_comp.index]

# strategy 1: score
win_dict = {t:0 for t in team_list}
for t in team_list:
    for i in data.index:
        if t == data.loc[i,'队伍1']:
            tgoal = data.loc[i,'得分1']
            ta = data.loc[i,'得分2']
            if tgoal > ta:
                win_dict[t] += 3
            elif tgoal == ta:
                win_dict[t] += 1
        if t == data.loc[i,'队伍2']:
            tgoal = data.loc[i,'得分2']
            ta = data.loc[i,'得分1']
            if tgoal > ta:
                win_dict[t] += 3
            elif tgoal == ta:
                win_dict[t] += 1

# strategy 2: winning times
win_dict2 = {t:0 for t in team_list}
for t in team_list:
    for i in data.index:
        if t == data.loc[i,'队伍1']:
            tgoal = data.loc[i,'得分1']
            ta = data.loc[i,'得分2']
            if tgoal > ta:
                win_dict2[t] += 1
            elif tgoal < ta:
                win_dict2[t] -= 1
        if t == data.loc[i,'队伍2']:
            tgoal = data.loc[i,'得分2']
            ta = data.loc[i,'得分1']
            if tgoal > ta:
                win_dict2[t] += 1
            elif tgoal < ta:
                win_dict2[t] -= 1

# calculate result
def get_result(plist=goal_list):
    '''
    plist: score list used for calculating cumulated advantage
    '''
    M0 = df_comp2.values
    p0 = np.array(plist).reshape(-1,1)
    M1 = M0.copy()
    M1[-1,:] = 1
    p1 = p0.copy()
    p1[-1] = 0
    r = np.linalg.inv(M1) @ p1
    df_result = pd.DataFrame({'Team':df_comp.index,'Total_Number_Games':total_number_game_list,'Score':r.flatten()})
    return df_result.sort_values(by='Score',ascending=False)
</code>

Results

The computed rankings for each strategy are visualized below.

Strategy 1: Using Goal Difference as Advantage

France ranks slightly higher, but the margin is small and neither team dominates the overall ranking.

Strategy 2: Using Points (3‑1‑0) as Advantage

Again France leads marginally, yet the ranking does not show a clear superiority.

Strategy 3: Using Points (1‑0‑‑1) as Advantage

Argentina moves ahead, producing a ranking that aligns better with expectations, though the gap remains narrow.

Based on these analyses, the author predicts a narrow Argentine victory in the final, noting that the result is for entertainment purposes only.

Reference

"Who Ranks First? The Science of Evaluation and Ranking" by R. Lanville and R. Meyer, Mechanical Industry Press, 2014.

Strategy 1 result chart
Strategy 1 result chart
Strategy 2 result chart
Strategy 2 result chart
Strategy 3 result chart
Strategy 3 result chart
data analysislinear algebraleast squaresMassey methodsports rankingWorld Cup prediction
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.