Mobile legend analysis

Preliminary

Mobile Legends: Bang Bang is a MOBA game (Multiplayer online battle arena) for mobile devices with Android and iOS developed by Shanghai Moonton. The game was originally released in Asia on 11th of June 2016.

In the game there are 2 opposing teams consisting of 5 players each. Players choose a character they will play with before game starts. As for now there are around 90 champions (character) to choose from. Each character is unique and may be used for different purposes depending on their skills and abillities. In that way one can distinguish mages, assasins, fighters, supports, tanks and marksmen. Main task is to destroy enemies’ defence towers resulting in concquering their base.

The game was getting more and more attention in Poland for a couple of years now. The graph below presents interest over time for google query “Mobile Legends” and “MOBA” in Poland. As you can see around 2017 there was a huge increase in popularity of Mobile Legends while interest in MOBA games in general was falling down gradually in past 5 years. However in March and April 2020 they experienced a rapid renaissance. We can probably associate it at least in part with lockdown caused by COVID-19 outbreak.

As stated above there are several types of characters in that game so we will try to check whether it is reflected in the data or the characters are labelled artificially. In order to do so we are going to implement Principal Component Analysis to reduce dimentionallity and then hierarchical algorithm to cluster the characters. Although the labels are known as such this analysis may be helpful for:

choosing an alternative character if the one you want to play with is unavailable
discovering underlying forces generating skills
maintaining characters skillsets in a balance way,

Data

First we have to collect the data. As there is no official site with the data on champions characteristics we will scrape it from mobile league wiki site. Let’s check robot.txt file before we start.

paths_allowed("https://mobile-legends.fandom.com/wiki/Mobile_Legends_Wiki")

The upper command returns value TRUE. That’s nice - we are allowed to scrape their data. For that purpose we will combine rvest package and selector gadget widget. Whole scraping/wrangling code is provided in a speparate Rmd file in GitHub repository.

Let’s have a look on how our data looks like. In the table below you can find all characters in alphabetical order.

One important remark is although the list below present all playable characters right now we will consider it sample since the characters set is being constantly updated with new characters - in that way statistical inference can be justified.

Id	Hero	Movement speed	Magic Resistance	Mana	HP Regen Rate	Physical Attack	Armor	Health points	Attack speed	Mana regen rate	Role
1	Akai	260	10	422	42	115	24	2769	0.8500	12	Tank
2	Aldous	260	10	405	45	129	22	2718	0.8360	18	Fighter
3	Alice	240	10	493	36	114	21	2573	0.8000	18	Mage
4	Alpha	260	10	453	39	121	20	2646	0.9160	16	Fighter
5	Alucard	260	10	0	39	123	21	2821	0.9000	0	Fighter
6	Angela	240	10	515	34	115	15	2421	0.7920	18	Support
7	Argus	260	10	0	40	124	21	2628	0.9160	0	Fighter
8	Atlas	240	10	440	42	135	0	2819	0.7860	15	Tank
9	Aurora	245	10	500	34	105	17	2441	0.8000	23	Mage
10	Badang	255	10	0	40	119	23	2708	0.9080	0	Fighter
11	Balmond	260	10	0	47	119	25	2836	0.8500	0	Fighter
12	Bane	260	10	433	42	117	23	2659	0.8500	12	Fighter
13	Belerick	250	10	450	62	110	20	3109	0.8100	12	Tank
14	Bruno	240	10	439	30	128	17	2522	0.8500	15	Marksman
15	Carmilla	197	13	477	45	118	10	2378	NA	34	Support
16	Cecilion	265	15	574	32	165	23	2425	NA	26	Mage
17	Change	240	10	505	34	115	16	2301	0.8080	21	Mage
18	Chou	260	10	0	39	121	23	2708	0.8840	0	Fighter
19	Claude	240	10	450	40	137	14	2370	0.8260	15	Marksman
20	Clint	240	10	450	36	115	20	2530	0.8420	15	Marksman
21	Cyclops	240	10	500	38	112	18	2521	0.8000	20	Mage
22	Diggie	250	10	490	36	115	18	2351	0.8000	20	Support
23	Dyrroth	266	10	0	41	117	19	2758	0.9160	0	Fighter
24	Esmeralda	240	10	502	36	114	21	2573	0.8000	20	Mage
25	Estes	240	10	545	36	115	13	2161	0.8000	18	Support
26	Eudora	250	10	468	38	112	19	2524	0.8000	16	Mage
27	Fanny	265	10	0	33	126	17	2526	0.8940	0	Assassin
28	Faramis	260	10	0	39	222	36	3700	0.9400	19	Support
29	Franco	260	10	440	46	116	25	2709	0.8260	10	Tank
30	Freya	260	10	462	49	109	22	2801	0.8760	14	Fighter
31	Gatotkaca	260	10	440	42	120	20	2709	0.8180	12	Tank
32	Gord	240	10	570	32	110	13	2478	0.7720	25	Mage
33	Granger	240	10	0	27	125	15	2490	0.8180	0	Marksman
34	Grock	260	10	430	42	135	21	2819	0.8100	42	Tank
35	Guinevere	260	10	0	39	126	18	2528	0.9160	0	Fighter
36	Gusion	260	10	469	39	119	18	2578	0.8920	16	Assassin
37	Hanabi	245	10	390	30	115	17	2510	0.8500	15	Marksman
38	Hanzo	260	10	0	35	118	17	2594	0.8700	0	Assassin
39	Harith	240	10	490	36	114	19	2701	0.8400	18	Mage
40	Harley	240	10	490	36	114	19	2501	0.8480	18	Mage
41	Hayabusa	260	10	0	37	117	17	2629	0.8540	0	Assassin
42	Helcurt	255	10	440	35	121	17	2559	0.8700	16	Assassin
43	Hilda	260	10	0	42	123	24	2709	0.8420	0	Fighter
44	Hylos	260	10	430	42	105	17	3309	0.8360	12	Tank
45	Irithel	260	10	438	35	110	17	2540	0.8260	15	Marksman
46	Jawhead	255	10	430	39	119	24	2778	0.9000	16	Fighter
47	Johnson	255	10	0	42	112	27	2809	0.8260	12	Tank
48	Kadita	240	10	495	34	105	18	2491	0.8000	18	Mage
49	Kagura	240	10	519	35	118	19	2556	0.8160	21	Mage
50	Kaja	270	10	400	52	120	30	2609	0.8420	12	Fighter
51	Karina	260	10	431	39	121	20	2633	0.9000	16	Assassin
52	Karrie	240	10	440	40	112	17	2498	0.8396	15	Marksman
53	Khufra	255	0	460	47	117	19	2709	0.7860	15	Tank
54	Kimmy	245	10	100	40	104	22	2450	0.8260	0	Marksman
55	Lancelot	260	10	450	35	124	16	2549	0.8700	16	Assassin
56	Lapu-Lapu	260	10	0	35	119	21	2628	0.9000	16	Fighter
57	Layla	240	10	424	27	130	15	2500	0.8500	14	Marksman
58	Leomord	240	10	0	35	128	25	2738	0.8440	0	Fighter
59	Lesley	240	10	0	36	115	14	2490	0.8260	0	Marksman
60	Ling	260	10	0	39	119	18	2578	0.8920	0	Assassin
61	Lolita	260	10	480	48	115	27	2679	0.7860	12	Tank
62	Lunox	240	10	540	34	115	15	2521	0.8080	23	Mage
63	Lylia	245	10	500	34	113	17	2501	0.8080	19	Mage
64	Martis	260	10	405	35	128	25	2738	0.8680	16	Fighter
65	Masha	312	10	101	19	NA	12	1948	NA	0	Fighter
66	Minotaur	260	10	0	44	123	23	2759	0.7300	0	Tank
67	Minsitthar	260	10	380	37	121	23	2698	0.8520	16	Fighter
68	Miya	240	10	445	30	129	17	2524	0.8500	15	Marksman
69	Moskov	240	10	420	32	125	16	2530	0.8140	15	Marksman
70	Nana	250	10	510	34	115	17	2501	0.8640	18	Mage
71	Natalia	260	10	486	35	121	18	2589	0.9020	16	Assassin
72	Odette	240	10	495	34	105	18	2491	0.8000	23	Mage
73	Pharsa	240	10	490	34	109	15	2421	0.7900	18	Mage
74	Rafaela	245	10	545	36	117	15	2441	0.7920	23	Support
75	Roger	240	10	450	36	128	22	2730	0.8420	15	Fighter
76	Ruby	260	10	430	30	114	23	2859	0.8580	14	Fighter
77	Saber	260	10	443	35	118	17	2599	0.8700	16	Assassin
78	Selena	240	10	490	34	110	15	2401	0.8040	18	Assassin
79	Silvanna	255	10	430	39	126	22	2828	0.9160	16	Fighter
80	Sun	260	10	400	41	114	23	2758	0.9160	16	Fighter
81	Terizla	255	10	0	54	129	19	2728	0.8200	0	Fighter
82	Thamuz	255	10	0	39	123	24	2758	0.8600	0	Fighter
83	Tigreal	260	10	450	42	112	25	2890	0.8260	12	Tank
84	Uranus	260	10	455	32	115	20	2689	0.8340	12	Tank
85	Vale	250	10	490	34	115	15	2401	0.8000	21	Mage
86	Valir	245	10	495	34	105	18	2516	0.8000	18	Mage
87	Vexana	245	10	490	38	112	17	2421	0.8000	20	Mage
88	Wanwan	240	0	424	27	100	0	2540	0.8260	14	Marksman
89	X.Borg	260	10	0	39	117	25	1138	0.8680	0	Fighter
90	Yi_Sun-Shin	240	10	438	36	110	18	2520	0.8000	15	Marksman
91	Zhask	240	10	490	34	107	15	2401	0.8000	20	Mage
92	Zilong	265	10	405	35	123	25	2689	0.9640	16	Fighter

One important thing we should be interested in is the variability of champions characteristics becasue if there is no variability at all or just a little even the most sophisticated analysis would be redundant. Below you can see the coefficient of variation (in %).

Movement speed	Magic Resistance	Mana	HP Regen Rate	Physical Attack	Armor	Health points	Attack speed	Mana regen rate
5.04	16.19	59.37	15.99	11.76	26.35	10.18	5.18	63.65

The variabiliy of mana and mana regeneration exeed 60% - that is exactly what we were looking for! Health points regeneration and armor vary for about 16% and 26% responsively - not that much but also fine. Although the coefficient for magic resistance is at the level of 16% the value of that abillity is constant almost for every character so anyway we will drop that variable in further analysis. In any case we will have to check the data for outliers as some of those values might be inflated for instance just by a single or two observations. Rest of the variables vary just a bit (most of them under 10%).

Now let’s look on some possible relationships and check distributions of the variables.

We can see some relationships - f.e. mana vs. mana regeneration and health points vs. armor and many more - we will investigate them soon.

Density functions for variables movement speed, mana and mana regenerations seem to be bimodal - it is a clear sign there are some subpopulations in our “sample” so it is reasonable to proceed with cluster analysis.

There are some outliers - note a champion whose health point regeneration ability is about 2 times more powerful than the mean for the sample. We can also see a champion whose health points ability and attack points are extremly high. For the sake of analysis we will remove both of them from our “sample” so that they will not affect clustering results in a significant way. Let’s find out who are those people.

Id	Hero	Movement speed	Magic Resistance	Mana	HP Regen Rate	Physical Attack	Armor	Health points	Attack speed	Mana regen rate	Role
13	Belerick	250	10	450	62	110	20	3.109	0.81	12	Tank
28	Faramis	260	10	0	39	222	36	3.700	0.94	19	Support

The last thing we can do is to check the correlations and their significance - just to have general view since Simson paradox might be present.

Principal Compontent Analysis

Dealing with high dimentional data might be challenging and can lead to several problems. However in most cases it is possible to reduce the number of dimentions retaining most of the information stored in the data. One of the most widely used method that can allow us to do so is Principal Component Analysis. So what we basically want to do is to project our data matrix on some reduced-feature space using a linear transformation while restoring as much information as possible. And that is exactly what PCA does!

How does the math look like?

Let’s assume we have data matrix \(X\) consisting of \(n\) variables and \(m\) observations, so \(X \in \mathbb{R}^{n \times m}\). We want to find a linear transformation \(U\) that transforms \(X\) as follows: \[Z = UX, \text{ where } Z \in \mathbb{R}^{d \times m}, U \in \mathbb{R}^{d \times n} \text{ and } d<m.\] At the same time we want make sure we mimnimize the information loss. We can think of variance-covariance matrix as a representation of information in our data. In terms of our transformed data matrix it can be denoted as \[\Sigma = \frac{1}{N}Z^TZ, \text{ where } \Sigma \in \mathbb{R}^{n\times n}.\] Keeping that in mind searching for our transformation becomes following optimisation problem: \[\max_{U}\Sigma=\max_U\frac{1}{N}(XU)^T(XU) = \max_U\frac{1}{N}U^TX^TXU=\max_UU^T\Sigma U, \text{ where } U^TU = I.\]Note that we have to add normalization condition to make sure all of the vectors have unit magnitude because otherwise we would not be able to solve this expression as there is no upper bound. One possible way to solve such problems is Method of Lagrange Multipliers.

Firstly we construct our Lagrange multiplier as following: \[F(U,\lambda)=U^T\Sigma U + \lambda(I-U^TU).\]

Then we differentiate it with respect to \(U\) and equate to 0 as the differential should equal 0 in extremum \[\frac{dF}{dU}=\Sigma U-\lambda U.\]

We can rewrite it as \[\Sigma U=\lambda U.\]

The later looks indeed as eigenvectors equation so what we do is perform variance-covariance matrix diagonalization (eigendecopostion) to obtain eigenvectors and corresponding eigenvalues \[\Sigma = U \Lambda U^{-1}.\]

Then we can sort pairs of eigenvectors with their eigenvalues in descending order and choose top m pairs. In that way we come up with set of m eigenvectors that retain as much part of variance as following ratio: \[\frac{\Sigma_i^m \lambda_i}{\Sigma_i \lambda_i}.\].

Our U transformation that we are looking for is composed of the selected eigenvectors \[U = [u_1, ..., u_m].\]

Back to our analysis

First let’s detrmine relevant prinipal components using standarized data. As scree plot would not tell us much, we should probably choose the number of compontents based on eigenvalue rule of thumb. Each of three top components has eigenvalue bigger than 1, i.e. “contains more information than a single variable”.

	Eigenvalue	Variance percent	Cumulative variance percent
PC1	3.19	39.84	39.84
PC2	1.40	17.54	57.38
PC3	1.02	12.73	70.11
PC4	0.83	10.38	80.49
PC5	0.74	9.21	89.70
PC6	0.45	5.62	95.32
PC7	0.25	3.13	98.45
PC8	0.12	1.55	100.00

As you can see in the table above they account for about 70,1% of data variability. That is not as much as we expected but it’s fine. We droped 5 from 8 variables and still managed to retain over 70% of variance.

Let’s have a look now on the PCA loadings so we can think of some resonable interpretations.

Original loadings
Variable	PC1	PC2	PC3	PC4	PC5	PC6	PC7	PC8
MV_SPD	-0.45	0.22	-0.06	0.28	0	0.44	-0.67	0.1
MANA	0.41	0.47	-0.23	0.17	0.1	0.11	0.15	0.7
HP_RGN	-0.33	0.43	0.36	-0.31	0.27	0.43	0.46	-0.11
P_ATK	-0.23	-0.14	-0.62	-0.49	0.53	-0.1	-0.09	0.1
P_DFN	-0.36	0.33	0.2	0.32	0.32	-0.72	0	0.05
HP	-0.22	0.43	-0.25	-0.4	-0.7	-0.24	0	0
ATK_SPD	-0.36	-0.14	-0.49	0.53	-0.14	0.14	0.54	-0.1
MANA_RGN	0.4	0.46	-0.3	0.14	0.19	0.02	-0.11	-0.68

Rotated loadings
Variable	PC1	PC2	PC3
MV_SPD	-0.71	-0.27	-0.4
MANA	0.14	0.91	0.22
HP_RGN	-0.83	-0.15	0.12
P_ATK	-0.01	-0.09	-0.76
P_DFN	-0.75	-0.2	-0.07
HP	-0.57	0.23	-0.34
ATK_SPD	-0.18	-0.28	-0.75
MANA_RGN	0.15	0.92	0.14

As it is hard to interpret the Principal Components in that framework we may want to rotate the whole system to obtain more intuitive interpretations. For that purpose we take 3 top Principal Compontent and use orthogonal VARIMAX roation. We do not change cooridanate system - we roate the orthogonal basis to allign with those coordinates. In that way we assure that squared correlation between variables and factors will be maximized. On the right you can see two tables with loadings before and after rotation responisvely.

The most obvious interpretation has definitely PC2. The loading on mana and mana regeneration are very high so the underlying force here is magic.

PC1 has relatively high loadings (in absolute values) on health points regeneration and armor so we would lean towards some kind of durability interpretation.

PC3 is driven mostly by attack points and attack speed so one can interpret it as readiness to fight.

Now let’s check whether it is possible to distinguish some clusters just by looking at the rotated score plots. We can clearly see two clusters or maybe three… Next section will help us to understand what is going on.

Clusters

Now as we reduced dimentionality we can proceed to the most exciting part of our analysis - clusters distinguishment. In that part we will implement hierarchical algorithm to see wheter there is an underlying data structer to discover.

First let’s start with computing distances between observation. For that purpose we will use second order of Minkowski metric, i.e. Euclidean distance. Below you can see vizualisation of discussed distance matrix.

There are two types of hierarchical clustering methods in general. In the first one at the beginning every data point is a separate cluster, then we connect the closest ones with one another based on chosen distance metric and criteria (see below) till all data points are in one cluster. Because of that fact we often refer to this approach as agglomerative or bottom-up clustering. In the second type it is the opposite - at the beginning all of the data points are in one, big cluster then we separate them till every cluster consists of just one data point. This approach is called devisive or top-down clustering.

Another issue is the choice of data points we are going to calculate the distance between. Also in this case there are several possibilities. The most widely used ones are single linkage, complete linkage, average linkage and Ward’s method. In single linkage approach we connenct the to-be-connected sets based on the data points that are closest to each other. Complete linkage works in the opposite way - we connect the data points based on the maximal distance between the sets. Average method is a compromise between these two approaches. Ward’s method is a bit different than the previous ones. Using that method we create clusters for which the variance witihin the groups is minimized.

In our analysis we will use the agglomerative algorithm with euclidean metrics and Ward’s method of linking.

One huge advantage of hierarchical clustering methods over for instance k-means clustering algorithm is that we can get a valuable insight of the data structure by looking at the so called dendrogram. Domain knowledge is very helpful at that moment as it might get way easier to work out the number of clusters and their possible names. Below you can see a dendrogram based on methodology we chose presenting 4 different clusters.

Although the choice of the threshold, (i.e. how many cluster we want to distinguish) is arbitrary we can look at some metrics that can reflect something similar to goodness of fit in clustering framework. For instance we can compute so called silhoutte width for every observation to see how similar that observation is with the cluster it was assigned to. If we average out all the computed silhoutte widths for different number of clusters and plot it we might get a nice insight on what is going on in the data.

As you can see above silhoutte plots for different methods of linking indicate different number of clusters. Plots for complete linkage and average linkage method suggest there are around seven to nine different clusters in our data. On the other hand plot for single linkage approach indicate there are just two distinct groups. Which one should we trust? That’s a though question nobody knows answer at first but let’s focus on the plot for Ward method. It suggests there are something between 4 and 7 clusters so let’s examine the data structure firstly for the most rough division as it is of most interest to us.

It seems quite reasonable to distinguish four clusters. On the right hand side you can see a radar plot with skills averages of characters within clusters rescaled via division with highest value. What we can observe:

Blue cluster is the most durable one - lots of points of defense, well developed health points regeneration and also highly unmagical. There are probably tanks over there.

On the other hand the red one is characterized with strong magical skills like mana or mana regeneration and at the same time is very weak physically. Those are most likely mages and/or other champions with some magical skills.

Third cluster - the yellow one - is the most balanced one and but with most points of attack. Here we have characters we could probably fight with on the front line.

The last group - green one - is the most peculiar one. Those are the most unmagical characters with quite a lot of attack speed and attack points. Here we have somekind of assasins or something similar.

Such division in four clusters is quite satisfying so we are going to stay with it as more clusters do not give much more insight about the data structure. There are some subgroups for instance in the green cluster but we don’t find it that interesting to show it here.

Of course we actually knew the characters labels the whole time but the purpose of the study was to see what the data says about the different groups of characters and not to classify them upon their characteristics. Anyway it might be interesting to see of which type of characters do the clusters cosist. Below you can see a plot answering that question.

As we can see the cluster we attributed durability consists mainly of tanks characters and some (probably) strong fighters. The magical cluster we distinguished (cluster 3) captures all of mages and all of support characters. Probably support champions’ abilities are also quite “magical”. Cluster 2 and 4 capture most of fighter and assasins as we thought earlier, so those are characters you would play with to fight on the front line or by blitz attack responsively.

Conclusions

As we discovered there are 3 underlying forces driving the characters skills: durability, magic and readiness to fight.

There are 4 clusters of characters you can play with. Magical ones, durable ones and two suitable for fight - either normal or sneaky one.

Different types of characters might be useful for the same purpose.

There are some outliers in the data - either they are mistakes, some weird unbalanced characters or maybe there is just something about them we don’t know ( ͡° ͜ʖ ͡°).