[]Journal:
[]Open Quantitative Sociology & Political Science
[]Authors:
[]Noah Carl
[]Title:
[]A Global Analysis of Islamist Terrorism
[]Abstract:
[]This paper examines the relationship between percentage of Muslims in the population (logged) and two separate measures of Islamist terrorism for a large cross-section of countries (n = 168). The first measure of Islamist terrorism is the number of Islamist terror attacks 2001[]–2016 (logged); the second is the number of casualties from Islamist terrorism []2001[]–2016 (logged). Percentage Muslim was strongly associated with both measures of Islamist terrorism (β = .49–50). These associations were not disproportionately driven by co-variation within one or two global regions: positive associations were found within Sub-Saharan Africa (β = .35–38), South & East Asia (β = .49–50), Eurasia (β = .31–37), and the West (β = .32–50).[] The raw associations within Latin America & Caribbean (β = .06–13) were very weak, and those within Middle East & North Africa were negative (β = –.18–21). Yet the results for Middle East & North Africa were attributable to Israel being a major outlier; when Israel was omitted, strong positive associations emerged (β = .65–69). In a multiple regression analysis, both associations were robust to controlling for region fixed-effects, land area (logged), absolute latitude, average elevation, terrain roughness, legal origin, GDP per capita (logged), democracy, and ethnic fractionalisation (β = .32–33). Consistent with a previous study, both percentage Muslim (β = .21–56) and indicators of military intervention in the Middle East (β = .17–80) were associated with Islamist terrorism across Western countries.
Key words:
[]Islamist terrorism; percentage Muslim; military intervention; multiple regression
Length:
[]~4,100[] words, 19[] pages.
[]
Files:
https://osf.io/43gbe/
Back to [Archive] Post-review discussions
Please find attached a commented PDF.
Many thanks to Emil for the comments. My responses are as follows:
Section 2.1:
1. I have removed "every" from the text.
2. I have added the statement, "Note that both of these measures are strongly correlated with other measures of Islamist terrorism that are available for Western countries (see Carl, 2016; Appendix A of this paper)"
3. I have noted that the percentage Muslim data are from 2010.
4. I have stated in a footnote that the exact transformation applied was log(1 + x).
5. I have utilised GDP per capita for 2010, rather than for 2000.
6. I have added the statement, "details of each measure can be found in Section F of that paper’s Online Appendix, which is available for download at the website of the American Economic Review"
Section 2.2:
1. I have added the statement, "Note that Eurasia encompasses South Eastern Europe, Russia, the Caucasus, and the central Asian ‘stans’"
Section 2.3:
1. I have cited studies from the conflict literature which have used the approach of controlling for log population.
2. I have noted that the original approach produced small and inconsistent betas.
3. I have added a table of descriptive statistics for the three main variables
Section 3.1:
1. I would prefer not to combine the two measures. In this regard, I have noted that within-region associations between the two measures––although strong––are far from unity (e.g., beta = .75 for the West).
2. I have given a description of robust standard errors, and cited two sources from econometrics where they are described in further detail.
3. I have utilised log percentage Muslim, rather than residuals of log percentage Muslim, in all charts.
Section 3.2:
1. I would prefer to report 10% significance levels, since all significance levels are arbitrary anyway. Note that I have distinguished them by using the symbol '+', rather than '*'.
2. I have added the statement, "Note that both the percentage of Muslims and the incidence of Islamist terrorism are very low in Latin America & Caribbean (see Table 1)."
3. I have noted the reason Israel is an outlier, namely that it is a major target for Islamist terrorism, due to its ongoing conflict with Palestine. I have also cited public opinion evidence that a majority of Palestinians support attacks against Israel.
Section 3.3:
1. I have explained why I included the basic controls and additional controls. I have also cited papers that have utilised them before.
Section 3.4:
1. I have changed the language, so as to put more emphasis on effect sizes, and less on statistical significance.
Appendix A:
1. The analyses in the first version of the paper were afflicted by a coding error, namely that I had forgotten to take the log of 1 + arrests for religious terrorism, but rather had simply used 1 + arrests for religious terrorism. The results pertaining to this variable are now much stronger. Its associations with other measures of Islamist terrorism are stronger. In addition, both percentage Muslim and part of anti-ISIS coalition have stronger associations with it.
New files are available here.
Section 2.1:
1. I have removed "every" from the text.
2. I have added the statement, "Note that both of these measures are strongly correlated with other measures of Islamist terrorism that are available for Western countries (see Carl, 2016; Appendix A of this paper)"
3. I have noted that the percentage Muslim data are from 2010.
4. I have stated in a footnote that the exact transformation applied was log(1 + x).
5. I have utilised GDP per capita for 2010, rather than for 2000.
6. I have added the statement, "details of each measure can be found in Section F of that paper’s Online Appendix, which is available for download at the website of the American Economic Review"
Section 2.2:
1. I have added the statement, "Note that Eurasia encompasses South Eastern Europe, Russia, the Caucasus, and the central Asian ‘stans’"
Section 2.3:
1. I have cited studies from the conflict literature which have used the approach of controlling for log population.
2. I have noted that the original approach produced small and inconsistent betas.
3. I have added a table of descriptive statistics for the three main variables
Section 3.1:
1. I would prefer not to combine the two measures. In this regard, I have noted that within-region associations between the two measures––although strong––are far from unity (e.g., beta = .75 for the West).
2. I have given a description of robust standard errors, and cited two sources from econometrics where they are described in further detail.
3. I have utilised log percentage Muslim, rather than residuals of log percentage Muslim, in all charts.
Section 3.2:
1. I would prefer to report 10% significance levels, since all significance levels are arbitrary anyway. Note that I have distinguished them by using the symbol '+', rather than '*'.
2. I have added the statement, "Note that both the percentage of Muslims and the incidence of Islamist terrorism are very low in Latin America & Caribbean (see Table 1)."
3. I have noted the reason Israel is an outlier, namely that it is a major target for Islamist terrorism, due to its ongoing conflict with Palestine. I have also cited public opinion evidence that a majority of Palestinians support attacks against Israel.
Section 3.3:
1. I have explained why I included the basic controls and additional controls. I have also cited papers that have utilised them before.
Section 3.4:
1. I have changed the language, so as to put more emphasis on effect sizes, and less on statistical significance.
Appendix A:
1. The analyses in the first version of the paper were afflicted by a coding error, namely that I had forgotten to take the log of 1 + arrests for religious terrorism, but rather had simply used 1 + arrests for religious terrorism. The results pertaining to this variable are now much stronger. Its associations with other measures of Islamist terrorism are stronger. In addition, both percentage Muslim and part of anti-ISIS coalition have stronger associations with it.
New files are available here.
Good edits, Noah.
The dependent variable
Critics will attack this variable. You note that it is highly correlated with other measures, but leave people to look for themselves. I suggest explicitly mentioning the correlations and the other measures in the text. This makes for a stronger defense of this variable.
One or two dependent variables
You decline to merge your dependents to one. You mention that they don't relate at 1.00. Well, given some amount of noise -- and you have plenty --, one would not expect them to do so. In general, it is better to use aggregated variables over multiple single outcomes because the aggregates are more reliable. See e.g.:
http://psycnet.apa.org/index.cfm?fa=buy.optionToBuy&id=1984-00952-001
Regional breakdown
What about having a map showing the regions? Alternatively, adding a table in the appendix with the coding.
ISO3 codes
Romania was given ROM as the iso3, but this is incorrect. The new code is ROU.
Also true for Zaire, which now uses COD as their code.
https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3
Per capita vs. absolute numbers
I skimmed the cited examples and yes, it seems a common choice to use population as a predictor instead of adjusting directly for it. I examined the two approaches by examining the distributions and their correlations. The per capita version is very problematic which is presumably why you get nonsensical results. I also checked whether one could improve it by taking the log, which in fact does nothing at all. See the analytic replication output.
Log percent Muslim
There does not seem to be any reason to use the log of this variable. It's a ratio and therefore does not suffer from order of magnitude differences that e.g. population size does. I used the normal version and it gave about the same results. It also produces more interpretable plots. I suggest using the untransformed variable everywhere.
Spatial autocorrelation
You used a regional fixed effects approach to controlling for spatial autocorrelation (SAC). This assumes that the countries can be divided into discrete groups and that this accounts for all the correlated errors. A better approach is to use a 'spatial lag' predictor in the model. This is a prediction of what the dependent variable would be if we attempt to predict it from looking at the country's neighbors, usually k=3.
However, to examine whether there are remaining SAC errors, one can examine the model residuals. I chose the world model 3 as the primary model. It uses the number of attacks as the dependent and includes a diverse set of predictors. The results show that the residuals are only weakly SAC, the KNSN approach gave a correlation of .27, Moran's I was .03. As such, it seems not very necessary to control for SAC. Controlling for SAC using a spatial lag predictor did not affect results. I conclude that SAC is not a problem for this model.
http://rpubs.com/EmilOWK/IslamWorldTerror2017
NHST
I really, really don't like inclusion of 'borderline significance'!
Latin American results
You note that the variation in Muslim% is smaller for this region. However, according to your own table, it is still smaller in the West, yet for this region the results work out nicely. This seems odd.
Approval
I cannot give my approval before the discrepancies in the analytic replication are resolved. These are: 1) results for MENA without Israel, 2) one of the subplots.
The dependent variable
Critics will attack this variable. You note that it is highly correlated with other measures, but leave people to look for themselves. I suggest explicitly mentioning the correlations and the other measures in the text. This makes for a stronger defense of this variable.
One or two dependent variables
You decline to merge your dependents to one. You mention that they don't relate at 1.00. Well, given some amount of noise -- and you have plenty --, one would not expect them to do so. In general, it is better to use aggregated variables over multiple single outcomes because the aggregates are more reliable. See e.g.:
http://psycnet.apa.org/index.cfm?fa=buy.optionToBuy&id=1984-00952-001
Regional breakdown
What about having a map showing the regions? Alternatively, adding a table in the appendix with the coding.
ISO3 codes
Romania was given ROM as the iso3, but this is incorrect. The new code is ROU.
Also true for Zaire, which now uses COD as their code.
https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3
Per capita vs. absolute numbers
I skimmed the cited examples and yes, it seems a common choice to use population as a predictor instead of adjusting directly for it. I examined the two approaches by examining the distributions and their correlations. The per capita version is very problematic which is presumably why you get nonsensical results. I also checked whether one could improve it by taking the log, which in fact does nothing at all. See the analytic replication output.
Log percent Muslim
There does not seem to be any reason to use the log of this variable. It's a ratio and therefore does not suffer from order of magnitude differences that e.g. population size does. I used the normal version and it gave about the same results. It also produces more interpretable plots. I suggest using the untransformed variable everywhere.
Spatial autocorrelation
You used a regional fixed effects approach to controlling for spatial autocorrelation (SAC). This assumes that the countries can be divided into discrete groups and that this accounts for all the correlated errors. A better approach is to use a 'spatial lag' predictor in the model. This is a prediction of what the dependent variable would be if we attempt to predict it from looking at the country's neighbors, usually k=3.
However, to examine whether there are remaining SAC errors, one can examine the model residuals. I chose the world model 3 as the primary model. It uses the number of attacks as the dependent and includes a diverse set of predictors. The results show that the residuals are only weakly SAC, the KNSN approach gave a correlation of .27, Moran's I was .03. As such, it seems not very necessary to control for SAC. Controlling for SAC using a spatial lag predictor did not affect results. I conclude that SAC is not a problem for this model.
http://rpubs.com/EmilOWK/IslamWorldTerror2017
NHST
I really, really don't like inclusion of 'borderline significance'!
Latin American results
You note that the variation in Muslim% is smaller for this region. However, according to your own table, it is still smaller in the West, yet for this region the results work out nicely. This seems odd.
Approval
I cannot give my approval before the discrepancies in the analytic replication are resolved. These are: 1) results for MENA without Israel, 2) one of the subplots.
Thanks again to Emil for further comments and suggestions.
Dependent variable:
I have added the betas for the associations between measures of Islamist terrorism to Section 2.1.
One or two dependent variables:
I have added a new Appendix (Appendix B) that contains additional robustness and sensitivity checks. Here I report the association between percentage Muslim and a combined measure of Islamist terrorism. The reason I would prefer not to combine the two variables in all analyses is that, within certain regions, they are not correlated near unity (as noted in Section 3.1).
Regional breakdown:
I have added a map showing this.
ISO codes:
Yes, some codes were changed in order to merge data.
Log percent Muslim:
I have utilised percentage Muslim in all analyses.
Latin American results:
I now simply state that the incidence of Islamist terrorism is low in Latin America.
Approval:
I believe these issues have now been resolved. The discrepancy related to MENA without Israel was due to a coding error on my part.
Other changes:
I have changed a few of the charts (e.g., adding colours for different regions), and––as noted above––added a new Appendix containing additional robustness/sensitivity checks.
Files here.
Dependent variable:
I have added the betas for the associations between measures of Islamist terrorism to Section 2.1.
One or two dependent variables:
I have added a new Appendix (Appendix B) that contains additional robustness and sensitivity checks. Here I report the association between percentage Muslim and a combined measure of Islamist terrorism. The reason I would prefer not to combine the two variables in all analyses is that, within certain regions, they are not correlated near unity (as noted in Section 3.1).
Regional breakdown:
I have added a map showing this.
ISO codes:
Yes, some codes were changed in order to merge data.
Log percent Muslim:
I have utilised percentage Muslim in all analyses.
Latin American results:
I now simply state that the incidence of Islamist terrorism is low in Latin America.
Approval:
I believe these issues have now been resolved. The discrepancy related to MENA without Israel was due to a coding error on my part.
Other changes:
I have changed a few of the charts (e.g., adding colours for different regions), and––as noted above––added a new Appendix containing additional robustness/sensitivity checks.
Files here.
Figure 2. Nice figure. What about adding names for some of the cases? In ggplot2 (in R), there's a way to add labels only for the cases that don't overlap each other such as the blue MENA case in the top right.
--
Figure 3, a+b. You note that it is percentage, but actually you use fractions. All you need to do is modify the labels on the x-axis.
--
Seems like you wanted to mention the population % too, but forgot.
--
Perhaps a good idea to mention the numbers. There's a large difference between 51% and 99%, but both are majority.
--
I think it makes more sense to include some of this in the main text. In general, your paper has a bit weird structure with regards to how I would do it. I would do it like this:
3.1 - Associations between the two terrorism measures from religionofpeace.com. Form a composite measure (e.g. mean of Z values).
3.2 - World associations using the composite measure, with and without controls.
3.3 - Within region associations.
3.4 - Subtype dependent measures.
3.5 - Additional robustness checks.
This approach maximizes the chance of finding and giving weight to the main effects with the largest sample size. Focusing on regional or subtypes of outcomes has a higher chance of finding what appears to be 'interesting' interactions, such as one subtype outcome having p > .05 in one region, and the other having p < .05.
You don't mention the spatial autocorrelation analysis I carried out. May be a good idea (just 1-2 lines is fine). Of course, it's a bit hard to get any large confound because you already include absolute latitude in your models.
--
Is there some neat way of visualizing the main result? Perhaps residualize the composite outcome on population size, then plot with Muslim% as x-axis. Color by region. Add some labels. This is not quite the analysis you do, but adding the other predictors and using the residuals from that risks omitted variable bias which would remove your association (this equivalent to a sequential analysis with Muslim% entered last).
--
I have no more important complaints. So I give my approval.
--
Figure 3, a+b. You note that it is percentage, but actually you use fractions. All you need to do is modify the labels on the x-axis.
--
Note that both the incidence of Islamist terrorism is very low in Latin America & Caribbean
Seems like you wanted to mention the population % too, but forgot.
--
Indeed, opinion polls indicate that the majority of Palestinians support attacks against Israel (PCPO, 2014; PCR, 2015).
Perhaps a good idea to mention the numbers. There's a large difference between 51% and 99%, but both are majority.
--
Appendix B presents additional robustness/sensitivity checks.
I think it makes more sense to include some of this in the main text. In general, your paper has a bit weird structure with regards to how I would do it. I would do it like this:
3.1 - Associations between the two terrorism measures from religionofpeace.com. Form a composite measure (e.g. mean of Z values).
3.2 - World associations using the composite measure, with and without controls.
3.3 - Within region associations.
3.4 - Subtype dependent measures.
3.5 - Additional robustness checks.
This approach maximizes the chance of finding and giving weight to the main effects with the largest sample size. Focusing on regional or subtypes of outcomes has a higher chance of finding what appears to be 'interesting' interactions, such as one subtype outcome having p > .05 in one region, and the other having p < .05.
You don't mention the spatial autocorrelation analysis I carried out. May be a good idea (just 1-2 lines is fine). Of course, it's a bit hard to get any large confound because you already include absolute latitude in your models.
--
Is there some neat way of visualizing the main result? Perhaps residualize the composite outcome on population size, then plot with Muslim% as x-axis. Color by region. Add some labels. This is not quite the analysis you do, but adding the other predictors and using the residuals from that risks omitted variable bias which would remove your association (this equivalent to a sequential analysis with Muslim% entered last).
--
I have no more important complaints. So I give my approval.
I have been asked to review the latest version of this paper. I have two concerns with the current version.
Firstly, in section 2.1 the author lists an array of covariates (land area (logged), absolute latitude, average elevation, terrain roughness, legal origin, GDP per capita (logged), democracy, and ethnic fractionalisation), however no justification is given for why this particular set of covariates is employed. I would like to see an explicit theoretical rationale given for this choice of covariates. It is also worth noting that different ethnolinguistic fractionalisation measures, computed using different metrics, often do not correlate strongly with one another. Perhaps the author can look at alternative operationalizations of this variable in order to determine the robustness of his regression results.
Secondly, in section 2.3 the author mentions that the use of log-transformation "resulted in substantially more normally distributed residuals for the present sample of countries". How was normality determined here? Skewness? Q-Q plots? Some other method? Please be specific, and also report the 'before' and 'after' values, so that the impact of the transformation can be quantified.
Firstly, in section 2.1 the author lists an array of covariates (land area (logged), absolute latitude, average elevation, terrain roughness, legal origin, GDP per capita (logged), democracy, and ethnic fractionalisation), however no justification is given for why this particular set of covariates is employed. I would like to see an explicit theoretical rationale given for this choice of covariates. It is also worth noting that different ethnolinguistic fractionalisation measures, computed using different metrics, often do not correlate strongly with one another. Perhaps the author can look at alternative operationalizations of this variable in order to determine the robustness of his regression results.
Secondly, in section 2.3 the author mentions that the use of log-transformation "resulted in substantially more normally distributed residuals for the present sample of countries". How was normality determined here? Skewness? Q-Q plots? Some other method? Please be specific, and also report the 'before' and 'after' values, so that the impact of the transformation can be quantified.
Thanks to Emil for further comments. And thanks to Michael for the review.
Labels on axes of graphs for percentages, rather than fractions:
These have been changed.
Incomplete sentence in Section 3.2:
This has been changed.
Percentages from Palestinian opinion polls:
I have now reported exact percentages.
Spatial autocorrelation analysis:
I have mentioned Emil's spatial autocorrelation analysis in a footnote on p. 10.
Covariates used in multiple regression analysis:
I believe I have already provided justification for the covariates in the text on p. 14:
Measures of ethnic fractionalisation:
I have added additional models to Table B.1 on p. 25, which test whether the results are robust to alternative measures of ethnic fractionalisation. Two models use religious fractionalisation instead of ethnic fractionalisation, and two models use a principal component of fractionalisation extracted from 5 measures. The results are robust to these alternative measures.
Modelling strategy:
I have added a footnote on p. 7, which describes how the residuals and betas were affected by utilising log of 1 + Islamist terror attacks per capita, rather than log of 1 + Islamist terror attacks residualized on log population:
Other changes:
I have now utilised all the data on Islamist terrorist attacks up to March 2017. Because attacks in 2017 are a small percentage of total attacks since 2001, the results were largely unaffected. Nonetheless, the analysis is now more complete.
Files here.
Labels on axes of graphs for percentages, rather than fractions:
These have been changed.
Incomplete sentence in Section 3.2:
This has been changed.
Percentages from Palestinian opinion polls:
I have now reported exact percentages.
Spatial autocorrelation analysis:
I have mentioned Emil's spatial autocorrelation analysis in a footnote on p. 10.
Covariates used in multiple regression analysis:
I believe I have already provided justification for the covariates in the text on p. 14:
The basic controls comprise variables that are largely exogenous to contemporary social and economic development. Land area, absolute latitude, average elevation and terrain roughness capture aspects of geography and climate that might influence the risk of civil conflict. For example, suppose that countries nearer the equator tend to experience more civil conflict for climatic reasons. In that case, if countries nearer the equator tend to have larger Muslim populations, percentage Muslim will be confounded by absolute latitude. These variables––especially absolute latitude, land area, and legal origin––have become standard in cross-country regression analyses (see Ashraf & Galor, 2013, Spolaore & Wacziarg, 2013; Alesina et al., 2016). The additional controls comprise variables that capture contemporary social and economic development, as well as historical and contemporary migration patterns. For example, any effect of percentage Muslim on Islamist terrorism might be attributable to a tendency for countries with large Muslim populations to have higher levels of ethnolinguistic fractionalisation.
Measures of ethnic fractionalisation:
I have added additional models to Table B.1 on p. 25, which test whether the results are robust to alternative measures of ethnic fractionalisation. Two models use religious fractionalisation instead of ethnic fractionalisation, and two models use a principal component of fractionalisation extracted from 5 measures. The results are robust to these alternative measures.
Modelling strategy:
I have added a footnote on p. 7, which describes how the residuals and betas were affected by utilising log of 1 + Islamist terror attacks per capita, rather than log of 1 + Islamist terror attacks residualized on log population:
When log of 1 + Islamist terror attacks per capita was regressed on percentage Muslim and the residuals were saved, the distribution was heavily right-skewed, and the single largest residual accounted for 67% of the variance. When log of 1 + Islamist terror attacks per capita was regressed on percentage Muslim as well as region fixed-effects and covariates, the standardized beta for percentage Muslim was highly inconsistent across the various specifications. It ranged from β = .31 in the unconditional model to β = 0.07 in the model with region fixed-effects, basic controls and additional controls.
Other changes:
I have now utilised all the data on Islamist terrorist attacks up to March 2017. Because attacks in 2017 are a small percentage of total attacks since 2001, the results were largely unaffected. Nonetheless, the analysis is now more complete.
Files here.
I'm happy with the changes.
Given that the data I utilise include honour killings by Islamists as well as acts of overt political violence, I have decided that it is more accurate to speak of Islamist violence, rather than Islamist terrorism. I have changed the title and language of the paper accordingly.
I have also corrected a number of typos.
Files here.
I have also corrected a number of typos.
Files here.
I am satisfied with the changes that have been made and recommend publication.
The analysis seems to be straightforward, and the results are unsurprising. It is predictable that countries without Muslims have no Muslim violence. That countries that occupy or bomb Muslim countries are prime targets for Muslim violence is also pretty predictable. Within these modest aims, the paper is okay, and I see no methodological flaws. It would be nice to extend this research in the future to find out why some countries have a lot of problems with their Muslim populations while others don't.
Gerhard,
I agree that the findings are straightforward -- captain obvious science --, but given that many researchers don't think these simple compositional models are plausible, one needs to do the studies. As you say, when one has the basics down and validated with multiple studies, one can try to look for other predictors. Maybe Muslims are particularly terror-prone in countries with specific other religions (e.g. Christians), or countries that are very different to typical Muslim majority countries (can use World Values surveys to calculate an overall cultural distance between pairs of countries, see Mahalanobis distance).
Unfortunately, there are only so many countries in the world, so doing these kinds of regressions is statistically limited. However, it should be possible to collect lower level data. Noah and I have been discussing trying to get data for the EU NUTS units.
https://en.wikipedia.org/wiki/Nomenclature_of_Territorial_Units_for_Statistics
There are 3 NUTS levels:
Some thoughts about this:
The smaller units one uses, the more noisy the data become because acts of terrorism are inherently rare events. Nothing can be done about this except wait for more terror to happen.
One has to determine the NUTS unit from the terrorism data and these are not given explicitly. However, they usually give the cities and EU has a table with every major city and it's NUTS regions, so this should be doable.
The EU also publishes social and economic data for the NUTS units.
It is not very easy to estimate number of Muslims by NUTS region. The EU does not publish detailed country of origin data. They use categories such as non-EU born, which can mean things depending on which country one's looking at. For instance, for north Italy, it can mean Swiss. For Sweden, it can mean Norwegian. For Eastern EU, it can be Russian or Ukrainian. And of course, it can be MENAP, or the USA and so on.
So, we thought of perhaps only collecting data for the largest cities in the terrorism dataset. Terrorism almost always happens in cities, so one would ignore the country side regions. It is usually possible to estimate the Muslim population % by city, at least with some precision. The downside is that it's not easy to find good S data by city, so it's hard to control for confounds. But one can of course control for e.g. geoposition (longitude, latitude), climate as well as country-level variables. If one sticks with EU cities, one can get semi-detailed S data from Eurostat I think. The approach requires a lot of manual labor, however, in estimating the Muslim%. Wikipedia only has a small list to begin from.
https://en.wikipedia.org/wiki/List_of_cities_in_the_European_Union_by_Muslim_population
I agree that the findings are straightforward -- captain obvious science --, but given that many researchers don't think these simple compositional models are plausible, one needs to do the studies. As you say, when one has the basics down and validated with multiple studies, one can try to look for other predictors. Maybe Muslims are particularly terror-prone in countries with specific other religions (e.g. Christians), or countries that are very different to typical Muslim majority countries (can use World Values surveys to calculate an overall cultural distance between pairs of countries, see Mahalanobis distance).
Unfortunately, there are only so many countries in the world, so doing these kinds of regressions is statistically limited. However, it should be possible to collect lower level data. Noah and I have been discussing trying to get data for the EU NUTS units.
https://en.wikipedia.org/wiki/Nomenclature_of_Territorial_Units_for_Statistics
There are 3 NUTS levels:
- Countries: 28
- NUTS-1: 98
- NUTS-2: 273
- NUTS-3: 1324
Some thoughts about this:
The smaller units one uses, the more noisy the data become because acts of terrorism are inherently rare events. Nothing can be done about this except wait for more terror to happen.
One has to determine the NUTS unit from the terrorism data and these are not given explicitly. However, they usually give the cities and EU has a table with every major city and it's NUTS regions, so this should be doable.
The EU also publishes social and economic data for the NUTS units.
It is not very easy to estimate number of Muslims by NUTS region. The EU does not publish detailed country of origin data. They use categories such as non-EU born, which can mean things depending on which country one's looking at. For instance, for north Italy, it can mean Swiss. For Sweden, it can mean Norwegian. For Eastern EU, it can be Russian or Ukrainian. And of course, it can be MENAP, or the USA and so on.
So, we thought of perhaps only collecting data for the largest cities in the terrorism dataset. Terrorism almost always happens in cities, so one would ignore the country side regions. It is usually possible to estimate the Muslim population % by city, at least with some precision. The downside is that it's not easy to find good S data by city, so it's hard to control for confounds. But one can of course control for e.g. geoposition (longitude, latitude), climate as well as country-level variables. If one sticks with EU cities, one can get semi-detailed S data from Eurostat I think. The approach requires a lot of manual labor, however, in estimating the Muslim%. Wikipedia only has a small list to begin from.
https://en.wikipedia.org/wiki/List_of_cities_in_the_European_Union_by_Muslim_population