Hi all,
I have aggregated count data of criminal incidents in each neighbourhood of a particular city. I also have the population of each of those particulars neighbourhoods. I want to create a ratio crime per capita, (in others words the number of crimes of each neighbourhood divided by population of that particular neighbourhood). However, for two of the neighbourhoods, this ratio is very high compared with the remaining neighbourhoods, which is making them outliers.
My objective is to conduct a linear regression analysis using several others independent variables, which by scatterplot I have seen some good correlation with crime rates, although changed by the two outliers. I know the reason for this outliers. Those neighbourhoods with very high crimes rates (per capita) are due to their functionality. For instance, there are workers zones, where few people reside there, most of the buildings are for services and business, which are inflated my crime rates.
Maybe if I include population + workers I might resolve this problem, but I don't have such data. I am thinking to change those two outliers values to the lower up crimes rate, but I do not know if is the right thing. I also discarded the possibility of removing the two outliers because I have a limited number of observations.
I am here to see if someone can give me some advice and solution to this problem.
Thank you in advance!