



遵循相同的逻辑,虚拟变量已被用于确定其他变量(如国家和团队)的影响。 分析中使用的数据来自网站Ultimate A-League。得分最多的年龄组的计算可以通过直方图完成。分析基于5岁年龄组从15岁开始分配数据。因此,年龄组包括15至19岁,20至24岁等。对整个数据以及一些具有大量数据点的组进行了分析。


To include the other intrinsic variables, further regressions were carried out. To capture the effect of different categories, dummy variables were created to compare if there exists any significant relation with goal keeping in these categories and that if they were different from the other categories. The general standard rule of using a dummy variable is that the category in a variable having the highest frequency is not assigned a dummy variable but all other categories are assigned a dummy variable (Data and Statistical Services, Princeton University). The dummy variables are used to compare any difference they create in the dependent variable as compared to the variable which has no dummy. For example, the analysis includes the maximum number of players from the striker position. So, the model does not use any dummy variable for the position of a striker but uses a dummy variable for all the other positions. So, the model coefficient for the position of goalkeeper can be interpreted as the difference of impact the goalkeeping position has on goal scoring against the position of striker. If the coefficient is positive and statistically significant, then the position has a significantly positive effect on the number of goals against the position of striker. A similar interpretation can be drawn for all other positions.

Following the same logic, dummy variables have been used to determine the impact of other variables like country and team. The data used in the analysis has been sourced from the website Ultimate A-League。

The calculation of age groups having the most number of goals scored can be done through histograms. The analysis distributes the data based on a 5 year age group starting from 15 years. Thus the age groups would include 15 to 19 years, 20 to 24 years and so on. The analysis has been carried out for the overall data as well as for some of the groups which have significant number of data points.