I need to statistically compare two maps in order to determine if the spatial distribution of their data is correlated or not. any suggestions? Thanks!
I am replying to this question since I recently came across a similar issue. I will give my two cents here, bearing in mind that this solution applies to the specific issue at hand.
I have two rasters, each representing two path systems (actually, two least-cost paths networks). Each cell belonging to each path is given a value of 1, the off-path cells are given 0. The rasters have the same resolution and spatial extent.
I wanted to quantify if and to what extent they can be considered correlated, that is how "strong" is the overlap between them. I focused on the Jaccard coefficient (e.g., http://people.revoledu.com/kardi/tutorial/Similarity/Jaccard.html).
This coefficient is equal to: the INTERSECTION between the two rasters divided by the UNION between the two rasters.
Now, in terms of this specific example, the INTERSECTION is the number of only those cells that the two rasters have in common (i.e., the number of overlapping path cells). The UNION is total number of path cells (belonging to either of the two rasters).
In ArcGIS, we can use RASTER CALCULATOR to compute the INTERSECTION and the UNION.
To get the INTERSECTION, we just feed the following formula into RASTER CALCULATOR: "RASTER A" & "RASTER B" (where Raster A and Raster B is the name of the two rasters being analysed).
The same for UNION: "RASTER A" | "RASTER B"
Once we have obtained two new output rasters, to get the Jaccard coefficient, we simply open the attribute table of the two rasters, and take note of the cell count that has value equal to 1, dividing them accordingly (rememeber: INTERSECTION divided by UNION).
In my case, the count of cell with value 1 in the INTERSECTION raster is 22,822, while in the UNION raster is 37,716. The Jaccard coefficient turns out to be about 0.61
I hope this quite long reply will be useful to anyone that will jump here in the future.
A similar approach (in Matlab) is provided here: http://kawahara.ca/matlab-jaccard-similarity-coefficient-between-images/
Sofia - one challenge is that no matter what you do, the results will likely be significant. An approach that I've taken in the past is to subtract successive timesteps from one another to quantify where differences exist through time. You can then estimate the fractal dimension of the differences. Df is the log of perimeter regressed against the log of area. you can test if the slopes of regression lines are different using ANCOVA. I realize this is kind of an inverted way of testing your question. But, it does make your type I error rate more manageable if you can make it work. good luck.
Article Response to disturbance in a highly managed alluvial river: ...
Use Mantel or Partial Mantel tests, it creates two matrix, one for data values one for distance. Measures correlation, randomizes locations repeats analysis. Each iteration gives new correlation. based in the actual correlation within the populated distribution it gives you probability that the spatial locations is significant in terms of the data. FYI consider standardizing the data on each map prior to analysis in case one location has higher or lower values of variables are scaled different. There is free software called PASSaGE.
The approach you can use depends on the problem: there is not a 'universal' answer to the problem of map comparisons. What kind of maps are you dealing with? Are you comparing two different regions or the same region at diferent times? You must clarify rhe problem
Interesting question and this is something I have been thinking about for a while also. As others have said, it depends on the specific question, but assuming you are talking about comparing two different distributions in the same location, here are a couple of ideas that might help:
Spatial overlay of polygons and calculation of the proportion of the area of one which overlaps with the other. (example here: http://www.nature.com/nature/journal/v365/n6444/abs/365335a0.html)
Cross-covariance analysis allow you to calculate the correlation between two datasets at the same spatial location, while also accounting for correlations with neighbouring locations. Co-regionalisation type methods produce spatial models which highlight areas with shared spatial patterns. (Another couple of examples: http://biomet.oxfordjournals.org/content/100/3/539.abstract and http://www.jstor.org/stable/2937096?origin=JSTOR-pdf). I hope these will at least give you a couple of ideas to look up.
Sofia, if you want to find out correlation between two spatial data sets, either two data sets are correlated or not, you can use regression option in Idrisi image processing software, it will tell you the correlation status and regression line also. For that analysis, your two data sets should be in raster format and same pixel size. You can see one example in my one published paper (http://www.sciencedirect.com/science/article/pii/S0304380009002634).If you need further help, feel free to contact me.
I am assuming that your spatial data is formatted as point distribution. In that case there are several ways to compare the spatial distribution;
By comparing how the points are spatially dispersed. The spatial dispersion, tendency and direction can be summarize as the standard deviational ellipsoid based on certain p-value).
By modelling the relationship between the spatial position and the non-spatial value of each of your datapoints. One good way to do this is by using the Geographically Weighted Regression (WGR).
Both can be done rather easily in ArcInfo/Arcmap software.
I am assuming you are dealing with point data on a map of some sort. If that is the case you can use the Geostatistical Analyst extension in ArcGIS to examine spatial autocorrelation in the data. Output will include measured versus predicted values. You can look at the correlation between the two predicted data sets to see if they are the same.
Assuming both maps are spatially on the same place (so every location has 2 values, one on each map) you can calculate correlation (continue variable) or similarity (discrete variable) directly. For continue variables you may try:
a) Pearson linear correlation (http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient)
b) Reflective correlation (variant of Pearson, also on the link above)
c) Spearman rank correlation (http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient)
a) Jaccard index (http://en.wikipedia.org/wiki/Jaccard_index)
b) Sorensen-Dice coefficient (http://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient)
c) Hamming distance (http://en.wikipedia.org/wiki/Hamming_distance)
These will give you numbers (each coefficient has its own criteria so some are better than others on certain fields) and there are many others. If you need something more wide than just a number you can try the P-P plot (http://en.wikipedia.org/wiki/P%E2%80%93P_plot) or the Q-Q plot (http://en.wikipedia.org/wiki/Q%E2%80%93Q_plot).
This assumes that you've already went for the common scatterplot (http://en.wikipedia.org/wiki/Scatter_plot).
It will be helpful to define the nature of data shown on the map. The comparison of topographic maps will be different than comparing thematic maps. Are the data values accompanied by uncertainty estimates ?
Dr. Bajocco, Comparing two maps displaying polygonal info is much easier when the choropleth classes use the same interval definition method -- i.e., quantile-based class intervals, and the same number of classes on each map. Even if these are not two maps of the same variable at different times, a straightforward "transitiion matrix" can be constructed. From that you can employ a variety of non-parametric methods of correlation related to the general chi-squared methodology -- i.e., phi coefficients for 2x2 matrices, but there are others. See the very old, but time-tested volume by Hubert Blalock, Social Statistics (published in many editions.) Best, David
Assuming you are dealing with raster data, try PONTIUS method of budgeting the sources of error (discordance between two maps) in terms of location and quantity. IDRISI has a module to calculate them. Luck! Henrique
it is possible to compare to map by mean of spatial statistics tools as Patch Analyst , you can download this small software in Google and add it as an extension in ArcGIS 10 , than you will gererate lots of statistical information related to your map either in raster or vector format
You can change two maps to raster data by using arcgis tool 'polygon to raster', then use some other tools related to raster type of data, such as 'raster calculator' .
I think you have several entries to the issue depending on the type of methods used to portray the data, the general goal and the particular objectives, the context of use and so on. One of these entries is particularly interesting from a semiological viewpoint, that is, the way the data is represented on the map and therefore the way one map could be 'different' from another one.
I would compare how ht-index varies from one map to another. Ht-index capture the complexity of maps.
Jiang B. and Yin J. (2014), Ht-index for quantifying the fractal or scaling structure of geographic features, Annals of the Association of American Geographers, 104(3), 530–541.
Jiang B. and Miao Y. (2015), The evolution of natural cities from the perspective of location-based social media, The Professional Geographer, 67(2), 295 - 306.
The Map Comparison Kit is a free software that address the problem of comparing maps, it offers different indices according to the type of spatial information and how the maps are defined. Link:
A Statistical Test for a Difference between the Spatial Distributions of Two Populations, ECOLOGY 77(1):75 · DECEMBER 1995. An example included in the paper. You can apply it easily with R.
The first question is why do you want to compare the two maps, i.e. what will you use the comparison for? What is being"mapped" in each map? If your situation is as described by Corrriea, which may be the simplest case, there is still the question of what the maps are being used for and why do you want to compare them?
E.g.,do you want to decide which is the better "quality" map, which has the most features.?
I am replying to this question since I recently came across a similar issue. I will give my two cents here, bearing in mind that this solution applies to the specific issue at hand.
I have two rasters, each representing two path systems (actually, two least-cost paths networks). Each cell belonging to each path is given a value of 1, the off-path cells are given 0. The rasters have the same resolution and spatial extent.
I wanted to quantify if and to what extent they can be considered correlated, that is how "strong" is the overlap between them. I focused on the Jaccard coefficient (e.g., http://people.revoledu.com/kardi/tutorial/Similarity/Jaccard.html).
This coefficient is equal to: the INTERSECTION between the two rasters divided by the UNION between the two rasters.
Now, in terms of this specific example, the INTERSECTION is the number of only those cells that the two rasters have in common (i.e., the number of overlapping path cells). The UNION is total number of path cells (belonging to either of the two rasters).
In ArcGIS, we can use RASTER CALCULATOR to compute the INTERSECTION and the UNION.
To get the INTERSECTION, we just feed the following formula into RASTER CALCULATOR: "RASTER A" & "RASTER B" (where Raster A and Raster B is the name of the two rasters being analysed).
The same for UNION: "RASTER A" | "RASTER B"
Once we have obtained two new output rasters, to get the Jaccard coefficient, we simply open the attribute table of the two rasters, and take note of the cell count that has value equal to 1, dividing them accordingly (rememeber: INTERSECTION divided by UNION).
In my case, the count of cell with value 1 in the INTERSECTION raster is 22,822, while in the UNION raster is 37,716. The Jaccard coefficient turns out to be about 0.61
I hope this quite long reply will be useful to anyone that will jump here in the future.
A similar approach (in Matlab) is provided here: http://kawahara.ca/matlab-jaccard-similarity-coefficient-between-images/
If you want to compare many maps containing data series (statistics, historical data, etc.), Look at Kumbi. This algorithm sorts a series of data according to given set of criteria. If the set of criteria is equal to one of the data series, the remaining series will be sorted according to the degree of similarity to this series.
The image of statistics compared in this way (for Poland) is presented in the attached kumbi_examp01.jpg.
Presentation and a simple demo are available here: http://kumbi.co > Applications > Maps
I was reading through this thread and I wonder if it is possible to compare two historical hurricane track maps (in raster format) using the Jaccard Coefficient (considering it is a measure of dissimilarity; as discussed by Gianmarco Alberti) and Change Vector Analysis (usually applied for land cover change detection).
I can now tell you what I did in my cases. In a first case, I had to compare a remotely-sensed fuel map (map1) with a climatic map (map2); what I did was building a contingency table with the number of wildfires falling in each combination of categories (categories from map1 vs categories from map2) and then testing the degree of association through a permutational chi-square test in order to see if the association was statistically significant. For further details and explanation, here it is the corresponding paper: Bajocco, S., Dragozi, E., Gitas, I., Smiraglia, D., Salvati, L., Ricotta, C. Mapping Forest Fuels through Vegetation Phenology: The Role of Coarse-Resolution Satellite Time-Series (2015) PLoS ONE 10(3): e0119811. doi:10.1371/journal.pone.0119811.
Another time, I performed a correspondence analysis (CA) between the map1 and a fire hotspots map that I derived (map3). Correspondence analysis is used to characterize the relationships between two nominal variables; in our study, categories from map1 vs categories from map3. This is the related paper: Bajocco, S., Koutsias, N., Ricotta, C., Linking fire ignitions hotspots and fuel phenology: The importance of being seasonal (2017) Ecological Indicators, 82, pp. 433-440. Maybe also the selectivity analysis we performed in the same paper could be considered to this aim.
Both maps are built on the same Earth model or do you assume that the mathematical model is the same? I think it is necessary to focus on geodetics of both maps before all statistical analysis in order to eliminate all possible systematic errors.
You could compare statistically only scalar parameters. So you should identify comparative parameter/s. Area of polygons? Length of lines? Share of same objects in the layer? Answer will depend on selection.
Two maps shall be in same coordinate system (horizontal and vertical) and at same scale for comparison. comparision can be made in two ways calculating the positional differences between two maps RMSE x,y. at well defined features in maps 2) comparing the information content in two maps how many layers etc; 3) calculating areas for polygon features and linear distances between two points for well defined points..
As indicated by Anteneh Zewdie Abiy, use Cross Tabulate Tool. https://gis.stackexchange.com/questions/45020/land-use-land-cover-change-post-classification-in-arcgis-for-cross-tabulation
The first question is "what kind of maps?", i.e. are these contour maps, road maps, oceanographic charts, aeronautical charts, soil type maps or what. It appears that each respondent has some particular kind of maps in mind but doesn't say what that is. The original poser also does not really say anything about what the maps are, i.e. what kind of information is presented on the maps. The original poser also refers to "spatial correlation" but this could pertain to correlation "within" a map as opposed to correlation "between" the two maps. This reference also suggests that the maps plot numerical values but are those pixel values (if so, what size?), values at the nodes of a grid? Before rushing in and suggesting particular algorithms or software it is necessary to ask fundamental questions first. Remember that correlation has both a theoretical meaning and an empirical meaning.
I would look into the features from each map (possibly after converting to points or polygons).
Starting with a uni-variate k-function to identify the spatial distribution of each feature of interest (for each map separately). Then I would do a bi-variate k-function analysis to study the spatial relationship between features from different maps. This will allow you to identify attraction or repulsion complete randomness of one group (from one map) with respect to the second group (from the second map).
Hi Sofia. There are of course several ways to do it and ultimately the method that you will choose depends on what you are trying to answer
A suggestion is to use spatial cross-correlation. The method can quantify differences between spatial locations or properties and also quantify the difference across scales i.e. how does this change when the distance increases
Assuming that you have x1, y1, z1 and x1, y1, z2 where z is the value on the map and x , y the locations you could do that in R using the ncf package among others
an application for addressing a question if invasive species are more inside or outside protected areas (2 maps same x y but z1 is potected surface area, z2 is alien species richness, calculate across distances...) is here
Article Sampling alien species inside and outside protected areas: D...
If your 2 maps are identical then spatial cross correlation will be 1 at scale unit distance 1. If the one map is a mirror of the other then spatial cross correlation will be -1 at distance unit 1