Call of Duty is a very popular video game series published by Activision. Recently, its free-to-play game Warzone has come into great popularity, specifically with the rise of Battle Royale type games. With this rise has also come a very large community that is very competitive. These players started to realize that the game's lack of a ranking system did not match up with their own matchmaking experience. Thus, players have started to wonder if there is a hidden skill-based matchmaking system present in the game.
In general, all gamers care about their gaming experience. It's obviously not fun to consistently lose, but its also not fun to consistently win. Finding the balance, is very important to staying interested in a game for a long time. So, both from a consumer and developer standpoint, matchmaking is integral to keeping a video game relevant. While this is the case, not being able to see your performance in relation to matchmaking is removing a significant part of the experience from players. In other very popular video games such as League of Legends, Apex Legends, Valorant, and even FIFA all show their players their ranking and progression through the ranked tier system. In Warzone, such a system does not exist, and this leads players to question the skill levels of their opponents and themselves.
In addition, there is also the possibility that Activision has a financial incentive to modify matchmaking, especially at the content creator level. The reasoning behind this is that content creators hold great influence over potential customers, and giving them a good experience might lead to more customers.
Skill based matchmaking is a system that matches players in a game based on some ranking. This ranking can be whatever metric the developers choose, but is often implemented as an ELO score or a custom MMR (matchmaking rating) score fitting the game's qualities.
We will be exploring two main questions for this project.
The first question will be "Is there skill-based matchmaking in Call of Duty: Warzone?". This question is a very common one amongst the COD user base and being members of this user base, we wanted to find out an answer.
The second question is arguably juicier because it puts Activision in the hot seat. We will be trying to answer "Does Activision purposefully lower the matchmaking difficulty of content creators?".
Due to the fact that the game has a very passionate community, learning the answers to these questions can be very insightful for the input fans want to give to developers. For the content creators, players who are also passionate viewers of COD on Twitch or YouTube might rethink their opinions of whoever they watch.
Due to the specific nature of this project, we had to find creative ways to collect data regarding Call of Duty matchmaking information. Luckily, there is a Call of Duty API that enables developers to look at past match data and stats for specific players. However, Call of Duty's official API enforces a setting where accounts must set their visibility to public for their profiles to be viewable. However, some third party APIs aggregate data across games and paint a clearer picture of players' statistics. One of these is WZStats.gg (Warzone Stats) and they show detailed per match data. By using their website and its API, we are able to get the data of many players that will help to inform our research question.
Because it is a manual process to set your profile to public visibility, it is likely that better-skilled players are going to be the ones with visible profiles. This potentially limits our visibility into the player skill spectrum. If there is skill-based matchmaking, the initial accounts, and their respective game history, that we are analyzing, will be biased towards higher tier skill levels because these players care a lot more about their stats than lower-skilled players and would be more likely to set their profiles to public. However, if there is no skill-based matchmaking, then the game lobbies will be entirely random as far as skill is concerned (there could be other factors such as network latency and global location). With random lobbies, we will hypothetically be able to tap into the entire spectrum of players if we analyze enough games.
As said, there is no existing database, so we needed to write code that could help create one. In order to do so, we first started by assembling a list of profiles that had public visibility. This included some of our own accounts and also those of pro players and content creators. As mentioned previously, looking at content creators' accounts could provide some insight into our second question, whether the matchmaking skill level of content creators' lobbies were lower.
The data collection process ended up being quite complicated for us. In fact, we spent 6 hours on this and had to try it about seven times. Ooops. So what went wrong? Our initial collection process was essentially built on the perspective of this problem through graph theory. Specifically, we wanted to preform a breath first traversal of a lot of accounts, to attempt to sample the player base as effective as possible. The plan was to start with the 10 seed accounts, as described earlier, and then treat each player as a new node, eliminating those who have been visited already, and continnue to analyze each persons' previous 20 matches. This analysis, more so data gathering, include capturing, all (up to) 150 players per lobby and the lifetime kds for each player in the lobby. Where was our logic faulty? Breadth first search only works under the assumption that there is no skill based match making. Specifically, if there is no skill based match making, then the breadth first search would allow us to branch away form the current lobby and to various different skill levels quite quickly, with few degrees of separation, if any at all. However, if there is skill based matching making, then we would be stuck in the same spectrum of the skill distribution and we wouled be unable to reach the rest of the player base unless our initial seed accounts were perfectly distributed across the spectrum of plauers, which they are not.
So, we move to method two. Method two entails a pivot away from the breadth first search and moves towards a depth first search, or rather a graph traversal with a split factor of 2. More specficially, in order to achieve a better sampling, starting at one player, we randomly sample 2 players from their most recent match lobby. If we can successfully sample 2 accounts with public data settings, then we add them to our queue, and then repeat the process that we did on the initial player. We aim to do this repeatedly to achieve 11 degrees of separation from the original account. We arbitirarily selected a professional content creator's account, NICKMERCS, as the initial account and let the program run overnight, sampling 2^11. The following diagram helps show the method in which we designed our system to collect data.
The following code implements "wzstats.py" which is the script we wrote that implements scraping data from wzstats.gg, the data aggregating site mentioned previously in this writeup. The following code is contained within wzstats.py:
import requests
from bs4 import BeautifulSoup
import time
import csv
import pandas as pd
import unidecode
import matplotlib.pyplot as plt
# Xbox = xbox
# Battle.net = battle
# Playstation = psn
XBOX = 'xbl'
BNET = 'battle'
PSN = 'psn'
# Gets Kill/Death Ratio of given user.
def getKD(user, platform):
params = {
'username': user,
'platform': platform
}
r = requests.get('https://app.wzstats.gg/v2/player', params=params)
return r.json()['data']['lifetime']['mode']['br']['properties']['kdRatio']
# Gets # of wins for given user.
def getWins(user, platform):
params = {
'username': user,
'platform': platform
}
r = requests.get('https://app.wzstats.gg/v2/player', params=params)
j = r.json()
return j['data']['lifetime']['mode']['br']['properties']['wins']
# Gets win percentage of given player.
def getWinPct(user, platform):
params = {
'username': user,
'platform': platform
}
r = requests.get('https://app.wzstats.gg/v2/player', params=params)
j = r.json()
return ((j['data']['lifetime']['mode']['br']['properties']['wins'] / j['data']['lifetime']['mode']['br']['properties']['gamesPlayed']) * 100)
# Gets total kills from given player.
def getKills(user, platform):
params = {
'username': user,
'platform': platform
}
r = requests.get('https://app.wzstats.gg/v2/player', params=params)
j = r.json()
return j['data']['lifetime']['mode']['br']['properties']['kills']
# Gets average kills per games from given player.
def getKillsPerGame(user, platform):
params = {
'username': user,
'platform': platform
}
r = requests.get('https://app.wzstats.gg/v2/player', params=params)
j = r.json()
return (j['data']['lifetime']['mode']['br']['properties']['kills'] / j['data']['lifetime']['mode']['br']['properties']['gamesPlayed'])
# Gets the Gulag win percentage of last 100 games (Gulag is a one vs. one battle a player goes to after death)
def getGulagLast100(user, platform):
params = {
'username': user,
'platform': platform
}
r = requests.get('https://app.wzstats.gg/v2/player', params=params)
j = r.json()
return j['last100games']['gulagWinPercentage']
# Gets headshot accuracy of last 100 games.
def getHSLast100(user, platform):
params = {
'username': user,
'platform': platform
}
r = requests.get('https://app.wzstats.gg/v2/player', params=params)
j = r.json()
return (j['last100games']['headshots'] / j['last100games']['kills'])
# Gets KD ratio of last 100 games.
def getKDLast100(user, platform):
params = {
'username': user,
'platform': platform
}
r = requests.get('https://app.wzstats.gg/v2/player', params=params)
j = r.json()
return (j['last100games']['kills'] / j['last100games']['deaths'])
# Gets list of last 20 matches' IDs.
def getLast20Matches(user, platform):
params = {
'username': user,
'platform': platform
}
r = requests.get('https://app.wzstats.gg/v2/player/match', params=params)
j = r.json()
matches = []
for m in j:
matches.append(m['id'])
return matches
# Gets average KD of this match.
def getAvgKDMatch(match):
params = {
'matchId': match
}
r = requests.get('https://app.wzstats.gg/v2/', params=params)
j = r.json()
return j['matchStatData']['playerAverage']
# Gets median KD of this match.
def getMedianKDMatch(match):
params = {
'matchId': match
}
r = requests.get('https://app.wzstats.gg/v2/', params=params)
j = r.json()
return j['matchStatData']['playerMedian']
# Gets average KD by team of this match.
def getAvgTeamKDMatch(match):
params = {
'matchId': match
}
r = requests.get('https://app.wzstats.gg/v2/', params=params)
j = r.json()
return j['matchStatData']['teamAverage']
# Gets median KD by team of this match.
def getMedianTeamKDMatch(match):
params = {
'matchId': match
}
r = requests.get('https://app.wzstats.gg/v2/', params=params)
j = r.json()
return j['matchStatData']['teamMedian']
seen_accounts = []
# Gets stats of players from this lobby.
def getLobbyStats(match, unique=False):
params = {
'matchId': match
}
# Some requests for specfic games cannot be converted to json
try:
r = requests.get('https://app.wzstats.gg/v2/', params=params)
j = r.json()
except:
if unique:
return [], []
return []
players = j['data']['players']
unseen_players = []
results = []
for p in players:
stat = p['playerStat']
if stat != None:
account = ""
platform = ""
# Do they have a public account? Check if their account is linked to psn, battle, or xbox live
if stat['battle'] != None or stat['psn'] != None or stat['xbl'] != None:
if stat['battle'] != None:
account = stat['battle']
platform = 'battle'
elif stat['psn'] != None:
account = stat['psn']
platform = 'psn'
else:
account = stat['xbl']
platform = 'xbl'
if unique and (not account in seen_accounts):
unseen_players.append({'username': account, 'platform':platform})
lifetime_kd = stat['lifetime']['mode']['br']['properties']['kdRatio']
results.append({'id':match, 'username':account, 'platform':platform, 'lifetime_kd': lifetime_kd})
else:
print("Account not old enough/no decisive data.")
if unique:
return results, unseen_players
return results
# Starts with list of accounts, and loads n matches.
def loadAccounts(file_name, n):
df = pd.read_csv(file_name)
queue = []
for index, account in df.iterrows():
if not account["username"] in seen_accounts:
queue.append(account)
for i in range(n):
print(i)
next_user = queue.pop(0)
while next_user["username"] in seen_accounts:
next_user = queue.pop(0)
seen_accounts.append(next_user['username'])
matches = getLast20Matches(next_user["username"], next_user["platform"])
for match in matches:
match_players, new_players = getLobbyStats(match, unique=True)
# print(new_players)
queue += new_players
# Open data set and write new line
with open('./dataset/gen_pop_games.csv', 'a', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',', quotechar='\'', quoting=csv.QUOTE_MINIMAL)
# For each player, get their stats in the game
for match_player in match_players:
# Some player names have foreign characters that csv cannot handle, we need to transliterate to ascii
match_player['username'] = unidecode.unidecode(match_player['username'])
# Converting the entire data values from a player and writing it as a singular line
# Format is id,username,platform,lifetime_kd
line = list(match_player.values())
writer.writerow(line)
# Gets top players from WZStats' featured player list.
def getTopPlayers():
r = requests.get('https://app.wzstats.gg/player/top')
return r.json()
# The line below is what we used to create our data, it takes roughly 4-6 hours to run, so run at your own risk.
# loadAccounts("./config/accounts.csv", 125)
The following is a snippet of the dataset that we built.
# Games
df = pd.read_csv('./dataset/gen_pop_games.csv')
df.head()
id | username | platform | lifetime_kd | |
---|---|---|---|---|
0 | 377175583563765943 | NaN | NaN | 0.786765 |
1 | 377175583563765943 | NaN | NaN | 0.751709 |
2 | 377175583563765943 | NaN | NaN | 0.727273 |
3 | 377175583563765943 | NaN | NaN | 0.220339 |
4 | 377175583563765943 | NaN | NaN | 0.548535 |
For our second question, we are going to want to look at the pro and content creator players games specifically. The following code will iterate through the top accounts and get their games. Lucky for us, we wrote a function above called getTopPlayers() that allows us to interact with the Warzone Stats API to get the top rated players in the Warzone system. Lets get their data and write it to a CSV for future use.
# Commented so that it doesnt run everytime
# You should only run this code to collect the data, to save time we wrote it to a file called top_players_games.csv, and then read from that for the future.
import utils.wzstats as wz
accounts = wz.getTopPlayers()
games = []
for a in accounts:
account = ""
platform = ""
if a['battle'] != None:
account = a['battle']
platform = 'battle'
elif a['xbl'] != None:
account = a['xbl']
platform = 'xbl'
else:
account = a['psn']
platform = 'psn'
print(f'Fetching data for {account}')
last20 = wz.getLast20Matches(account, platform)
for game in last20:
kd = wz.getAvgKDMatch(game)
games.append((game, kd))
Fetching data for iron#11745 Fetching data for nickmercs#11526 Fetching data for jgod#11463 Fetching data for teepee#1840 Fetching data for truegamedata#1375 Fetching data for zlaner#1345 Fetching data for icemanisaac#1815 Fetching data for aydan#11691 Fetching data for huskerrs#1343 Fetching data for averagejoewo#1438 Fetching data for almond#11120 Fetching data for tommey#21329 Fetching data for superevan#11680 Fetching data for shadedstep#1738 Fetching data for yeet#11987 Fetching data for opmarked#1818 Fetching data for devious#11655 Fetching data for lenun#21968 Fetching data for jaredfps#1454 Fetching data for frozone#11329 Fetching data for warsz#2905 Fetching data for picnick#11353 Fetching data for flexz#2541 Fetching data for rated#21620 Fetching data for destroy#12878 Fetching data for stu#11800 Fetching data for clutchbelk#1526 Fetching data for bbreadman#1673 Fetching data for metaphor#11972 Fetching data for soki#21161 Fetching data for newbz#11184 Fetching data for finessen#1762 Fetching data for jukeyz#2681 Fetching data for ahtract#1570 Fetching data for intechs#1266 Fetching data for nickool#1437
# Open data set and write new line
import csv
# Open csv and write game data of pro players.
with open('./dataset/top_players_games.csv', 'a', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',', quotechar='\'', quoting=csv.QUOTE_MINIMAL)
for tup in games:
line = [tup[0], tup[1]]
writer.writerow(line)
# Reads the games and imports them to a dataframe.
prodf = pd.read_csv('./dataset/top_players_games.csv')
prodf.head()
id | avg_lobby_lifetime_kd | |
---|---|---|
0 | 15379849324100702267 | 1.085038 |
1 | 13602511679897282652 | 1.147214 |
2 | 6161366157749200230 | 1.065039 |
3 | 5821957974244316095 | 1.116253 |
4 | 15541493233622371097 | 1.167926 |
From looking at the head of our dataset, we can immediately see that there are lots of NaN values. This might be alarming at first, but there is actually a very good reason for this. As we mentioned previously, not all accounts have public data available. Specifically, a player must go into their account settings to toggle this for every console linked to their Activision/Call Of Duty account. This means that people who care about their stats will probably go through the trouble of toggling this setting so that websites like Warzone Stats can display their history for them in an aggregated manner.
This then brings up the question of what type of missing data this is? Our initial hunch is that this data is Missing at Random, specifically that the missing username and platform is related to a player's lifetime_kd. We believe that this is a very strong potential reasoning because players who care about their statistics are probably players who also play the game a lot and thus might have a higher lifetime_kd. Lets check this theory out with some code!
import numpy as np
import statistics as st
kds_of_md = [] # Will store the kds of players with missing data
kds_not_missing = [] # Will store the kds of players without missing data
# Check if a player's name or platform is NaN, if so they are private, else public
# private --> missing data, public --> not missing data
for index, x in df.iterrows():
#private
if pd.isna(x.username)or pd.isna(x.platform):
if pd.isna(x.lifetime_kd):
print(index)
else:
kds_of_md.append(x.lifetime_kd)
#public
else:
kds_not_missing.append(x.lifetime_kd)
We currently have two distinct datasets, those entries who have missing data and are private, and those entries who are public and do not have missing data. Since we created this database, we know clearly that the missing data is a result of whether or not someomne has toggled their data privacy settings to public or not within the Call of Duty account settings. However, we believe that there is a deeper correlation. We believe that there could be a relationship between a player's skill and whether or not they have public data. Specifically, from a practicality stand point, it would make sense that players who play the game a lot care about their stats being publicly available. Playing the game a lot and being passionate for the game does not mean that someone is good at the game, but there is a an argument to be made that skill scales with time played on the game, at least to an extent (some people might forever be bad at the game unfortunately).
So how do we decide if two datasets are different. We will use a T-test since we will effectively be comparing the mean KDs of players who are public with that of players who are private.
What are the assumptions made by a T-test?
In regards to our data sets, we know that they are independent sets because a player cannot be both public and private. We also will assume that the KDs are normally distributed, approximately. This assumption is definitely more of a stretch as the KDs are somewhat skewed left, but because there is no distinct tail and a heavy concentration of KDs within <3 standard deviations from the mean, which allows us to be more confident in this assumption. We will write a little bit of code to test this assumption below. The third assumption refers to the amount of variance in each data set which we can also check below with some simple code.
plt.hist(kds_of_md)
plt.title('Histogram of KDs of Private Players')
plt.xlabel('Lifetime KD')
plt.ylabel('Number of Players')
plt.show()
plt.close()
plt.hist(kds_not_missing)
plt.title('Histogram of KDs of Public Players')
plt.xlabel('Lifetime KD')
plt.ylabel('Number of Players')
plt.show()
plt.close()
avg_missing_kd = st.mean(kds_of_md)
avg_not_missing_kd = st.mean(kds_not_missing)
std_missing_kd = st.stdev(kds_of_md)
std_not_missing_kd = st.stdev(kds_not_missing)
var_missing_kd = st.variance(kds_of_md)
var_not_missing_kd = st.variance(kds_not_missing)
print(f'Average KD of Private Players is: {avg_missing_kd}')
print(f'Average KD of Public Players is: {avg_not_missing_kd}')
print(f'Std Dev of KD of Private Players is: {std_missing_kd}')
print(f'Std Dev of KD of Public Players is: {std_not_missing_kd}')
print(f'Variance of KD of Private Players is: {var_missing_kd}')
print(f'Variance of KD of Public Players is: {var_not_missing_kd}')
Average KD of Private Players is: 0.9586685429154905 Average KD of Public Players is: 1.3580333113942 Std Dev of KD of Private Players is: 0.4725462437744462 Std Dev of KD of Public Players is: 0.5610117987298344 Variance of KD of Private Players is: 0.22329995250533835 Variance of KD of Public Players is: 0.3147342383140842
We can see that the variances here are actually near exactly proportional to the means of each data set and are quite similar. This supports the third assumption, especially with very similar standard deviations as well. Next we can check if the Empirical Rule (that 95% of data lies within 2 standard deviations of the mean) holds.
# Check private players first
priv_inside = 0
for kd in kds_of_md:
if abs(avg_missing_kd - kd) < (2 * std_missing_kd):
priv_inside += 1
priv_percent = 100 * (priv_inside/len(kds_of_md))
print(f'{priv_percent}% of Private Players data is within 2 standard deviations of the mean.')
# Check public players now
pub_inside = 0
for kd in kds_not_missing:
if abs(avg_not_missing_kd - kd) < (2 * std_not_missing_kd):
pub_inside += 1
pub_percent = 100 * (pub_inside/len(kds_not_missing))
print(f'{pub_percent}% of Public Players data is within 2 standard deviations of the mean.')
97.10278036911299% of Private Players data is within 2 standard deviations of the mean. 95.77899961074348% of Public Players data is within 2 standard deviations of the mean.
From the outputted print statements, we can see that the Empirical Rule holds, and in fact if you test the other components of the rule, you can see that over 70% of our data is within 1 standard deviation of the mean and that just over 99% of the data is within 3 standard deviations of the mean.
With these three assumptions clarified, lets move onto establishing some stats before moving onto the T-test.
For additional resources regarding T-tests and other concepts for this section, see the following links:
https://www.investopedia.com/terms/e/empirical-rule.asp
https://www.investopedia.com/terms/t/t-test.asp
https://www.statisticshowto.com/probability-and-statistics/t-test/
https://www.scribbr.com/statistics/t-test/
https://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm
avg_kd =st.mean(df['lifetime_kd'])
std_kd = st.stdev(df['lifetime_kd'])
mdlen = len(kds_of_md) # Length of private player's kds array
notmdlen = len(kds_not_missing) # Length of public player's kds array
# difference(mean of private and overall mean) divided by stdev didvided by sqrt num users
# Since our data sets are a sample of the main set, by Central limit theorem it can be assumed to be approximately normally distributed. We can also then use the Square Root of N law to find Z-scores with regard to the original data set.
missing_zscore = (avg_missing_kd-avg_kd) / (std_kd / st.sqrt(mdlen))
not_missing_zscore = (avg_not_missing_kd-avg_kd) / (std_kd / st.sqrt(notmdlen))
print(f'The average lifetime KD of a player in our dataset is {avg_kd}')
print(f'The average lifetime KD of a player whose data settings are PRIVATE is {avg_missing_kd}, which is {missing_zscore} standard deviations from the mean')
print(f'The average lifetime KD of a player whose data settings are PUBLIC is {avg_not_missing_kd}, which is {not_missing_zscore} standard deviations from the mean')
The average lifetime KD of a player in our dataset is 1.024577792192975 The average lifetime KD of a player whose data settings are PRIVATE is 0.9586685429154905, which is -58.90356957344962 standard deviations from the mean The average lifetime KD of a player whose data settings are PUBLIC is 1.3580333113942, which is 132.49130766135403 standard deviations from the mean
Here comes the T-test now! The final step is to decide which type of T-test we want. The options are paired, two-sample, and one-sample. We will be going with two-sample test since our sets are independent. Since we only care to show that the two sets are different from each other, specifically that their means are different enough, we will be doing a two-tailed T-test. See the scribbr link above for more clear explanations of when to use each type of test.
from scipy import stats as scistats
T=(avg_missing_kd - avg_not_missing_kd)/st.sqrt((std_missing_kd**2)/mdlen + (std_not_missing_kd**2)/notmdlen)
print(f'T-distribution value: {T}')
deg_of_freedom = (((std_not_missing_kd**2)/notmdlen + (std_missing_kd**2)/mdlen)**2)/((((std_not_missing_kd**2)/notmdlen)**2)/(notmdlen-1)+(((std_missing_kd**2)/mdlen)**2)/(mdlen-1))
print(f'Degrees of Freedom: {deg_of_freedom}')
p_value = 2*scistats.t.cdf(T, deg_of_freedom, loc=0, scale=1)
print(f'Probability value: {p_value}')
T-distribution value: -135.15833033968178 Degrees of Freedom: 53232.472835692744 Probability value: 0.0
From analyzing the average KDs of users with missing data and users without missing data, we can see that the averages are very different. But just how significant is the difference? We preformed a two-sided T-test on our two sets of data. We found the T score for the difference between the means of these two subsets was -135.158 and 53232.47 degrees of freedom. This corresponded to a p-value insignificantly different from zero. This is extremely strong evidence that there is a difference between the average KDs of players who have public and private data settings. In conclusion, the missing data is Missing at Random (MAR).
plt.hist(df['lifetime_kd'])
plt.title('Histogram of lifetime KDs for sample')
plt.xlabel('Lifetime KD')
plt.ylabel('Number of Players')
Text(0, 0.5, 'Number of Players')
A basic level histogram doesnt actually show us much. We can see that most KDs are centered between 0 and 5, but there are definitely some significantly higher outliers. In practicality, this could mean that in the 2048 games we analyzed, we came upon players who are either insanely good, better than any professional ever, or we came upon players that are hacking. Having an incredibly high lifetime value is very difficult because of the randomness of games and even some of the best players will still have KDs around the 6-10 mark.
plt.hist(df.loc[df['lifetime_kd'] >= 6]['lifetime_kd'])
plt.title('Histogram of lifetime KDs (>= 6) for sample')
plt.xlabel('Lifetime KD')
plt.ylabel('Number of Players')
Text(0, 0.5, 'Number of Players')
When we take a look at players whose KDs are over 6 we can still see a very high concentration between the 6 and 10 kd range. However, we then also see a second cluster of players at the 20+ KD mark. Because of the very low frequency of these players, we can definitely assume that these players are insignificant outliers. They actually end up changing a lobby's average KD by up to 35/150 = 0.23 which is a large amount of skew, but because we only see 15 of them across 2048 games, the overal difference is negligible.
over10df =df.loc[df['lifetime_kd'] >= 10]
plt.hist(over10df['lifetime_kd'])
plt.title('Histogram of Player\'s Lifetime KDs (over 10 only)')
plt.xlabel('Lifetime KD')
plt.ylabel('Numnber of Players')
print(f'Number of players with over a 10 lifetime kd: {len(over10df)}')
count = 1
for index, x in over10df.iterrows():
if pd.isna(x.username) or pd.isna(x.platform):
print(f'{count} There is a PRIVATE user with an over 10 lifetime kd in match {x.id}')
else:
print(f'{count} There is a PUBLIC user with an over 10 lifetime kd in match {x.id}')
count += 1
Number of players with over a 10 lifetime kd: 15 1 There is a PRIVATE user with an over 10 lifetime kd in match 15312029568279681814 2 There is a PRIVATE user with an over 10 lifetime kd in match 4673868877075267553 3 There is a PRIVATE user with an over 10 lifetime kd in match 6573573280802323490 4 There is a PRIVATE user with an over 10 lifetime kd in match 526039276483101153 5 There is a PRIVATE user with an over 10 lifetime kd in match 15204777370665016310 6 There is a PRIVATE user with an over 10 lifetime kd in match 14455535145959848534 7 There is a PRIVATE user with an over 10 lifetime kd in match 17803083994292567640 8 There is a PRIVATE user with an over 10 lifetime kd in match 10527309497664288941 9 There is a PRIVATE user with an over 10 lifetime kd in match 3287462810166929906 10 There is a PRIVATE user with an over 10 lifetime kd in match 1031614374727749052 11 There is a PRIVATE user with an over 10 lifetime kd in match 7282482404698762792 12 There is a PRIVATE user with an over 10 lifetime kd in match 4082030459319508710 13 There is a PRIVATE user with an over 10 lifetime kd in match 10887373374029236929 14 There is a PRIVATE user with an over 10 lifetime kd in match 11571758250919960897 15 There is a PRIVATE user with an over 10 lifetime kd in match 17784864676679737013
We can see that there are 15 users with an over 10 lifetime kd. Of these 15 users, all 15 are actually private accounts. This is quite interesting because, we previously proved that players with higher KDs tend to turn their data settings to public. These players seem to be the best 15 by far, and yet their privacy settings are still set to private. While it is entirely possible that some extremely good players do not care enough to change the setting, it is unlikely that ALL 15 follow the same logic.
Moreover, in practicality, having a KD that high as a LIFETIME KD and not a SINGLE GAME KD is extremely unlikely. This would require players to drop 10+, 20+, or even 30+ kills per game consistently while limiting their deaths to 1 or 2. Note that for KD calculations, for the sake of avoiding divide by zero errors, COD counts 0 deaths as 1 death (i.e. 35 kills 0 deaths = 35 kills 1 death). Because even professional players and content creators are unable to achieve this level of success in their skill, we can reasonably assume that these 15 players are one of two things. They are either a brand new account with maybe 1 or 2 insanely good games, hence their lifetime kd and single game kds might be very similar, OR they are hackers who tend to get lots of kills over large amounts of games using hacks like aimbot and other cheats. Also note that I said brand new account and not brand new player. The reason for this is players can have numerous accounts, and it is possible that a professional player, content creator, or anyone for that matter, created a new account and had a very very good first game (or multiple), but the odds of the this happening are actually very low for another reason. In Call of Duty: Warzone, one way to improve your chances of winning is to level up your guns and achieve new attachments and other perks (https://www.dexerto.com/call-of-duty/best-warzone-loadouts-class-setup-1342383/). This can only be done by playing the game for an extensive amount of time, usually requirings 10s if not 100s of games to complete all necessary achievements to level up your equipment and profile. Inherently, this means that a very good player on a brand new account, still faces this challenge and is severely disadvantaged when entering into a game for the first time. Thus, from a practicality standpoint, it is more likely that these users are hackers or bots and not real, legitimate players.
Now that we have talked about the outliers, let's look back towards the more realistic end of the player spectrum.
plt.hist(df.loc[df['lifetime_kd'] <= 6]['lifetime_kd'])
plt.title('Histogram of Players (Lifetime KD <= 6)')
plt.xlabel('Lifetime KD')
plt.ylabel('Number of Players')
Text(0, 0.5, 'Number of Players')
We are kind of curious about the frequency of certain KDs. We wonder if there are very frequent and also very infrequent KDs. We can see from the histogram that they are generally around 1, but we wonder if we can get the image when looking a bit more specific.
kds = dict()
# Define a subset of our original dataframe using this syntax
realdf = df.loc[df['lifetime_kd'] <= 6]
for index, x in realdf.iterrows():
if x.lifetime_kd in kds.keys():
kds[x.lifetime_kd] += 1
else:
kds[x.lifetime_kd] = 1
plt.scatter(kds.keys(), kds.values())
plt.title('Number of people with a specific KD')
plt.xlabel('Lifetime KD')
plt.ylabel('Number of People')
Text(0, 0.5, 'Number of People')
We can see that there are 2 major outliers here where ~250 people have the same KD, which is very unlikely to happen, especially at this scale. Lets take a look at what their KDs are.
for k,v in kds.items():
if v > 100:
print(f'There are {v} users with KD = {k}')
There are 250 users with KD = 1.0 There are 108 users with KD = 0.3333333333333333 There are 252 users with KD = 0.5 There are 109 users with KD = 0.0 There are 148 users with KD = 0.6666666666666666
We ended up looking for all KDs who had over 100 people with that KD. We end up seeing that very surprisingly, the KDs fall in line with {0, 1/3, 1/2, 2/3, 1} are the most common KDs, by far. In practicality, this is probably a sign of players who are playing their first games, or very limited games as these KDs are very common at a small scale number of games. Another possibility is some potential rounding on the API side of things for players with limited data. Interestingly, if you expand the range to query those with over 75, the KDs that appear are also still very nice numbers/decimals.
# Mean KD per game
gameIDs = df.id.unique()
lf_kd_by_game = {}
for game in gameIDs:
game_df = df.loc[df['id'] == game]
lf_kd = game_df.lifetime_kd.mean()
lf_kd_by_game[game] = lf_kd
plt.hist(lf_kd_by_game.values())
plt.title('Avg Lifetime KD Per Game')
plt.xlabel('Avg Lifetime KD')
plt.ylabel('Number of Lobbies')
basicmean = st.mean(lf_kd_by_game.values())
basicstdev = st.stdev(lf_kd_by_game.values())
print(f'Average General Player Lobby Lifetime KD: {basicmean}')
print(f'Standard Deviation of General Player Lobby Lifetime KD: {basicstdev}')
Average General Player Lobby Lifetime KD: 1.012254568012402 Standard Deviation of General Player Lobby Lifetime KD: 0.16818777931045018
Lets also do a little bit of analysis specifically on the lobbys of high skill playered games
plt.hist(prodf['avg_lobby_lifetime_kd'])
plt.xlabel('Lifetime KD')
plt.ylabel('Number of Games')
plt.title('Average Lifetime KD of Pro Players\' Lobbies')
promean = st.mean(prodf['avg_lobby_lifetime_kd'])
prostdev = st.stdev(prodf['avg_lobby_lifetime_kd'])
print(f'Average pro player lobby lifetime kd: {promean}')
print(f'Standard Deviation of pro player lobby lifetime kd: {prostdev}')
Average pro player lobby lifetime kd: 1.1463587618335986 Standard Deviation of pro player lobby lifetime kd: 0.14089705204015202
If we assume that COD Warzone does not matchmake lobbies on the basis of skill, or KD, we should expect that lobbies are a random sampling of individuals from the distribution of KD. We will call this the null model. We will analyze the correspondence of actual observations to this theoretical model to determine the fitness of this null model.
# By convolving the pmf of two random variables, we obtain the pmf of their sum.
# To find the pmf of a sample of size n from the distribution of KDs, the theoretical distribution of
# lobbies in our null model we auto-correlate (convolution of a distribution with itself) the pmf of the distribution
# of KDs in our dataset n-1 times to find the pmf of the sum of n observances, and then rescale the support
# of the resultant random variable by 1/n. The result is the theoretical pmf of the average KD of a lobby of size n
# under our null model.
# Takes the pmf of a single KD observation, and convolves it with itself n-1 times.
# The result is the pmf of the sum of n independent KD observations
def n_auto_correlation(pmf, n):
result = pmf
for _ in range(n-1):
result = np.convolve(result, pmf)
return result
# This uses the previous function to find the pmf of the sum of n independent KD observations
# And then rescales the support of the distribution to 1/n times its original support
# giving the pmf of the mean of a random sample of n KDs
def pmf_lobby_avg(densities, n):
bins = densities[1]
#print(bins[-1])
for _ in range(n-1):
bins = np.concatenate((bins[:-2], densities[1] + bins[-2]))
bins = bins / n
pmf = n_auto_correlation(densities[0], n)
return (pmf, bins)
# The cdf of an average lobby KD value x is the probability that an observation is less than or equal to x
# This will be important later.
def pmf_cdf_lobby_avg(densities, n):
pmf, bins = pmf_lobby_avg(densities, n)
cdf = np.cumsum(pmf)
return pmf, cdf, bins
To read more about convulsions and the statistical theory behind them, see the following links:
https://www.statlect.com/glossary/convolutions
https://www.youtube.com/watch?v=P3ZcJEy84ps
# Verifying that our correlation produces the expected distribution. First we simulate the underlying distribution
# of KDs by producing a histogram and normalizing by the total number of observations (~250,000) to produce a
# discrete pmf.
import seaborn as sb
q1, q3 = np.percentile(df['lifetime_kd'], [25, 75])
fmdc = 2 * (q3 - q1) / (len(df['lifetime_kd'])**(1/3))
n_bins = int(10 / fmdc)
print(fmdc)
print(n_bins)
densities = np.histogram(df['lifetime_kd'], bins = n_bins, range = (0, 10))
densities = (densities[0]/len(df['lifetime_kd']), densities[1])
cmf = (np.cumsum(densities[0]), densities[1])
plt.plot(densities[1][:-1], densities[0])
plt.title('PMF of Lifetime KD Ratios')
plt.ylabel('Probability Mass')
plt.xlabel('Lifetime KD')
plt.show()
plt.plot(cmf[1][:-1], cmf[0])
plt.title('CDF of Lifetime KD Ratios')
plt.ylabel('Probability KD <= x ')
plt.xlabel('Lifetime KD')
plt.show()
# Verifying that autocorrelation creates a pmf centered on the center of the underlying KD pmf.
densities_150 = pmf_lobby_avg(densities, 150)
plt.plot(densities_150[1][:-1], densities_150[0])
plt.title('PMF of Average Lobby Lifetime KD')
plt.xlabel('Average Lifetime KD')
plt.ylabel('Probability Mass')
plt.show()
cmf = (np.cumsum(densities_150[0]), densities_150[1])
plt.plot(cmf[1][:-1], cmf[0])
plt.title('CDF of Lobby Average Lifetime KD Ratios')
plt.ylabel('Probability AVG KD <= x ')
plt.xlabel('Average Lifetime KD')
plt.show()
0.0169973515457261 588
# Mean KD per game
gameIDs = df.id.unique()
lf_kd_by_game = {}
for game in gameIDs:
game_df = df.loc[df['id'] == game]
lf_kd = game_df.lifetime_kd.mean()
lf_kd_by_game[game] = (lf_kd, len(game_df))
lobby_kds = list(map(lambda x: x[0], lf_kd_by_game.values()))
q1, q3 = np.percentile(lobby_kds, [25, 75])
fmdc = 2 * (q3 - q1) / (len(lobby_kds)**(1/3))
n_bins = int(10 / fmdc)
lobby_kd_densities = np.histogram(lobby_kds, bins = n_bins, range = (0, 10))
lobby_kd_densities = (lobby_kd_densities[0]/len(lobby_kds), lobby_kd_densities[1])
plt.plot(lobby_kd_densities[1][:-1], lobby_kd_densities[0])
plt.show()
Given our null model, if the ~2000 actual observances of lobby average lifetime KDs fall under this model, they should be uniformly distributed over the percentiles (CDF at value observed) given by the model. To test this, we calculate the percentile of each observation by computing the theoretical distribution of each lobby given its size and then using this to find the CDF of that lobby size at the actual lobby average KD observed.
# takes like 2 hours to run
# sum_likelihoods = {}
# for game in lf_kd_by_game.keys():
# game_avg = lf_kd_by_game[game][0]
# n_players = lf_kd_by_game[game][1]
# pmf, cdf, bins = pmf_cdf_lobby_avg(densities, n_players)
# index = np.argmax(bins>game_avg) - 1
# sum_likelihood = cdf[index]
# # print(sum_likelihood)
# likelihoods[game] = (game_avg, n_players, sum_likelihood)
sumlike = pd.read_csv('./dataset/match_sum_likelihood.csv')
sumlike.head()
gameid | avg_kd | lobby_size | sum_likelihood | |
---|---|---|---|---|
0 | 377175583563765943 | 1.091592 | 136 | 0.953550 |
1 | 1867167520034580454 | 1.054870 | 151 | 0.834745 |
2 | 13063882378137964531 | 1.475153 | 34 | 0.997929 |
3 | 8648205854600033849 | 1.219164 | 146 | 0.991239 |
4 | 3981297559543080699 | 1.130915 | 148 | 0.987540 |
Again, if the null model is correct, and lobbies are a simple random sample of the population of players, we should see uniformly distributed percentiles according to this model over the sample of lobby average KDs. The idea behind this method is that if F(x) is the cdf of random variable X, then F(X) is a uniformly distributed random variable over [0,1]. This means that for a sample X1, ... , Xn, F(X1), ... , F(Xn) should be a sample of the uniform distribution over [0,1].
https://math.stackexchange.com/questions/868400/showing-that-y-has-a-uniform-distribution-if-y-fx-where-f-is-the-cdf-of-contin#:~:text=37-,Showing%20that%20Y%20has%20a%20uniform%20distribution%20if%20Y%3DF,the%20cdf%20of%20continuous%20X&text=Let%20X%20be%20a%20random,1%20is%20well%2Dde%EF%AC%81ned).&text=Show%20that%20Y%20has%20a,interval%20%5B0%2C1%5D.
plt.hist(sumlike['sum_likelihood'])
plt.xlabel('Cumulative Probability of Occurrence')
plt.ylabel('Number of Games')
Text(0, 0.5, 'Number of Games')
These percentiles are not uniformly distributed so it is extremely unlikely that lobbies are generated in a way that randomly samples the population of players, i.e. ignores skill.
So, from the above statistical analysis, we can see that in regards to question 1, there is definitely some influencing factor to the creation of game lobbies. This does not mean that there is skill based match making, but rather suggests that since lobbies are not random samples of the player base, that there is something else going on here, which could potentially be skill based match making.
Now, let's do some EDA regarding question two.
Lets refresh our memories as to how the Top Player's Df is formatted and how the General Population Df is formatted.
prodf.head()
id | avg_lobby_lifetime_kd | |
---|---|---|
0 | 15379849324100702267 | 1.085038 |
1 | 13602511679897282652 | 1.147214 |
2 | 6161366157749200230 | 1.065039 |
3 | 5821957974244316095 | 1.116253 |
4 | 15541493233622371097 | 1.167926 |
genlobdf = pd.DataFrame(index = lf_kd_by_game.keys(), data = lf_kd_by_game.values())
genlobdf.columns = ['avg_lobby_lifetime_kd', 'num_players']
genlobdf.head()
avg_lobby_lifetime_kd | num_players | |
---|---|---|
377175583563765943 | 1.091592 | 136 |
1867167520034580454 | 1.054870 | 151 |
13063882378137964531 | 1.475153 | 34 |
8648205854600033849 | 1.219164 | 146 |
3981297559543080699 | 1.130915 | 148 |
Similar to the procedure that we performed for the missing data section, we again want to test whether we can, with confidence, determine if the Top Player's data is different enough from the General Population's data. Specifically, we want to see if Top Player's games are different than the General Population's games and if this can be determined with confidence (the probability that it is not true is very low).
So we will resort to another T-test. Lets go over the assumptions again.
By the construction of our sample, these data sets are independent.
avg_pro_kd = st.mean(prodf['avg_lobby_lifetime_kd'])
avg_gen_kd = st.mean(genlobdf['avg_lobby_lifetime_kd'])
std_pro_kd = st.stdev(prodf['avg_lobby_lifetime_kd'])
std_gen_kd = st.stdev(genlobdf['avg_lobby_lifetime_kd'])
var_pro_kd = st.variance(prodf['avg_lobby_lifetime_kd'])
var_gen_kd = st.variance(genlobdf['avg_lobby_lifetime_kd'])
avg_kd =st.mean(df['lifetime_kd'])
std_kd = st.stdev(df['lifetime_kd'])
prolen = len(prodf['avg_lobby_lifetime_kd'])
genlen = len(genlobdf['avg_lobby_lifetime_kd'])
print(f'Average KD of Top Player\'s Games is: {avg_pro_kd}')
print(f'Average KD of General Player\'s Games is: {avg_gen_kd}')
print(f'Std Dev of KD of Top Player\'s Games is: {std_pro_kd}')
print(f'Std Dev of KD of General Player\'s is: {std_gen_kd}')
print(f'Variance of KD of Top Player\'s Games is: {var_pro_kd}')
print(f'Variance of KD of General Player\'s is: {var_gen_kd}')
Average KD of Top Player's Games is: 1.1463587618335986 Average KD of General Player's Games is: 1.012254568012402 Std Dev of KD of Top Player's Games is: 0.14089705204015202 Std Dev of KD of General Player's is: 0.16818777931045018 Variance of KD of Top Player's Games is: 0.019851979273605307 Variance of KD of General Player's is: 0.028287129109380693
pro_inside = 0
for kd in prodf['avg_lobby_lifetime_kd']:
if abs(avg_pro_kd - kd) < (2 * std_pro_kd):
pro_inside += 1
pro_percent = 100 * (pro_inside/len(prodf['avg_lobby_lifetime_kd']))
print(f'{pro_percent}% of Top Player\'s data is within 2 standard deviations of the mean.')
gen_inside = 0
for kd in genlobdf['avg_lobby_lifetime_kd']:
if abs(avg_gen_kd - kd) < (2 * std_gen_kd):
gen_inside += 1
gen_percent = 100 * (gen_inside/len(genlobdf['avg_lobby_lifetime_kd']))
print(f'{gen_percent}% of General Player\'s data is within 2 standard deviations of the mean.')
96.759941089838% of Top Player's data is within 2 standard deviations of the mean. 96.9208211143695% of General Player's data is within 2 standard deviations of the mean.
# difference(mean of private and overall mean) divided by stdev didvided by sqrt num users
pro_zscore = (avg_pro_kd-avg_kd) / (std_kd / st.sqrt(prolen))
gen_zscore = (avg_gen_kd-avg_kd) / (std_kd / st.sqrt(genlen))
print(f'The average lifetime KD of a player in our dataset is {avg_kd}')
print(f'The average lifetime KD of a player in a TOP PLAYER\'S game is {avg_pro_kd}, which is {pro_zscore} standard deviations from the mean')
print(f'The average lifetime KD of a player in a GEN POP game is {avg_gen_kd}, which is {gen_zscore} standard deviations from the mean')
The average lifetime KD of a player in our dataset is 1.024577792192975 The average lifetime KD of a player in a TOP PLAYER'S game is 1.1463587618335986, which is 8.795018263209576 standard deviations from the mean The average lifetime KD of a player in a GEN POP game is 1.012254568012402, which is -1.092407307924713 standard deviations from the mean
We can see now at face value that the means and standard deviations are very different when we compare top players' games to the general population's games. Let's now conduct a T-test again to check if we can confidently say that there is enough difference between the two data sets to make a confident conclusion.
T=(avg_pro_kd - avg_gen_kd)/st.sqrt((std_pro_kd**2)/prolen + (std_gen_kd**2)/genlen)
print(f'T-distribution value: {T}')
deg_of_freedom = (((std_gen_kd**2)/genlen + (std_pro_kd**2)/prolen)**2)/((((std_gen_kd**2)/genlen)**2)/(genlen-1)+(((std_pro_kd**2)/prolen)**2)/(prolen-1))
print(f'Degrees of Freedom: {deg_of_freedom}')
# note here we are using -T because positive T gives us a value of 200% which does not make sense in probability
p_value = 2*scistats.t.cdf(-1*T, deg_of_freedom, loc=0, scale=1)
print(f'Probability value: {p_value}')
T-distribution value: 25.144680744771893 Degrees of Freedom: 3224.004288059503 Probability value: 1.4902752992955875e-127
We can say with strong likelihood based off of these T-test results that the top player's games and general population's games are siginificantly different. We will address this more directly in the conclusion.
So, now that we have done all of this analysis and exploration, what are the results?
We started the project off, being inspired by two main questions:
1) Is there skill based matchmaking in Call of Duty: Warzone lobbies?
2) Do Top Player's and Content Creators face easier lobbies than the general population to help Activision boost sales?
This created the two null hypotheses:
1) Lobbies are random samples of the general population
2) Top Player's and Content Creators are placed into easier lobbies (determined by average lobby lifetime KD)
In order to address these questions, we first looked at our datasets and analyzed them for missing data. We learned about the difference between public and private data settings in regards to a player's account and how that creates missing data in our dataset. We were curious about the relationship of KD to the missing data, so after looking at some basic statistics about the two populations of players,we performed a T-test to see if we could confidently say that the two sets were indepdendent and different.
Once confirming the type of missing data, we then looked at some Exploratory Data Analysis. This mostly involved creating lots of histograms for various frames of our dataset. We noticed a really long right tail in the data and decided to look more closely at this since most of the data was around 1, but we had KDs over 20. We talked about what this meant practically and discovered the presence of data from hackers (most likely) in our dataset. While we could have discarded this data, hackers are unfortunately a part of the game and can make real lobbies much harder. They are also not impossible to defeat as very good players can outplay hackers frequently. So, we then decided to leave them in the dataset since they are still valid datapoints, allbeit they are ruining the game, but this is not so important to our analysis.
We then looked at the presumably non-hacker data that was KDs <= 6. We created a scatter plot of the number of people with unique KDs and found that the most common KDs were really really nice numbers. We explained that this might be caused by newer players with less games, and then took a brief graphical look at the Top Player's games before moving on to hypothesis testing.
In order to examine the null hypothesis that lobbies were not matchmade using skill based criteria, we first constructed a null model for lobby construction that did not consider skill, one that randomly sampled through the entire population of KDs. We calculated the percentiles of each observed lobby within this model, and found that these percentiles were not uniformly distributed over 0%-100% so it was likely that lobbies were made using criteria that consider skill. In essence, the model poorly predicted the observed lobbies, which indicated that there was some influential factor in matchmaking, which may or may not directly involve skill. Thus, we rejected the null hypothesis since we can show that lobbies are not actually random samples of the player base.
After concluding that hypothesis testing and model arguing in regards to question 1, we moved to question 2 to see if we could develop some insights here. Similar to the missing data T-test procedure before, we peformed another T-test, this time looking to conclude that the mean lobby lifetime KDs of Top Player's games were in fact different from that of General Player's games. Our T-test showed that this was in fact true and that Top Player's games are different than General Player's games. Then comparing the averages of the Top Player's games and the General Player's games, we saw that the Top Player's actually face significantly more difficult opponents since their lobbies had a much higher average lifetime KD. Thus, we rejected the null hypothesis, since we can now confidently determine that Top Player's actually face harder lobbies than the general population. Interestingly, this also helps further support our conclusion to question 1, since we have shown that being higher skilled yields higher skilled lobbies as well. While this does not allow us to conclude that the matchmaking influencer is actually skill, it would definitely support that argument and could serve as a basis for further testing.
Important Note
Throughout the entirety of the conclusions, you will see that we are not directly saying that skill based match making exists, but are rather arguing that some form of matchmaking bias does exist. The reason for this is, while KD is the main metric in which we performed analysis, we cannot actually prove that KD alone is the reason for the distribution and differences between lobbies. For example, the time of day when a game occurs could heavily influence the spectrum of players that is online. More specifically, a lobby being created during the late-night/early morning hours are likely to contain better players that are staying up late to grind the game, whereas a lobby being created during the early evening hours is likely to have a better sampling of the player base since more casual players are likely to be online. Alternatively, games during weeknights might also have a a significantly less amount of casual players who might only play the game on the weekends. Unfortunately, we do not have datetime data for the lobbies that we sampled and so analyzing this aspect was beyond the scopes of our analysis. However, it is still grounds for further analysis into this topic because have have certainly shown that there is definitely some factor heavily influencing the creation of game lobbies despite ActivisionBlizzard claiming that there is not Skill Based Matchmaking.
Thank you!
Written by Alex Coppens, Luke Stuart, and Sandeep Ramesh