GeoDA
From Wikipedia, the free encyclopedia
Geoda is a free software to conduct Spatial Data Analysis, geovisualization, spatial autocorrelation and spatial modeling. The package was developed by the Spatial Analysis Laboratory -SAL- under the direction of Dr. Luc Anselin.
As of February 2007 GeoDa had been downloaded by more than 17,000 users all around the globe. The package has very powerful capabilities to perform spatial analysis, multivariate exploratory data analysis, global and local spatial autocorrelation, basic linear regression and spatial models such as Spatial Lag Model and Spatial Error Model, estimated by Maximum Likelihood with great efficiency in terms of the number cranking.
Geoda is the latest incarnation of what was called DynESDA, a module that worked under the old ArcView 3.x (R) , to perform Exploratory Spatial Data Analysis -ESDA-.
The latest release of GeoDA is no longer dependent from ArcView or other GIS packages in the sense that the user does not need to have a GIS installed on his/her computer in order to be able to conduct analyses.
Contents |
[edit] Creating a project
A project in GeoDA consists of basically a Shapefile that defines the lattice data. The Shapefiles come with a table in a .dbf format. GeoDA has editing capabilities over that table of attributes i.e. the users can make unary and binary calculations between the variables or columns in the data.
GeoDA also is capable of producing histograms, box plots, Scatter plots to conduct simple exploratory analyses of the data. The most important thing, however, is the capability of mapping and linking those statistical devices with the spatial distribution of the phenomenon that the users are studying.
[edit] Exploratory spatial data analysis (ESDA)
The package is specialized on ESDA and Geo-visualization as it exploits techniques for dynamic linking and brushing. This means that one can have multiple views or windows in a project and select an object in one of them and all other representations of that object are linked and highlighted in the other windows.
[edit] Multivariate exploratory data analysis in GeoDA
The picture displays a choropleth map of Colombia with the municipalities of high rates of malaria incidence highlighted. Those areas are depicted also in terms of the degree of urbanization in the histogram and in this display the bar highlights as well what level of urbanization characterizes those municipalities. The scatter plot highlights in yellow the points selected in the map and from this display is possible to explore what is the average level of school attendance in those municipalities (in the x-axis). This type of geovisualization is then a very useful to explore correlation between different variables but most importantly, to explore spatial patterns of occurrence of a specific phenomenon e.g. malaria incidence, violent attacks from outlaws, and so on an so forth.
[edit] Moran scatter plot
A very interesting device to explore global patterns of autocorrelation in space is the Moran Scatter Plot. This graph depicts a standardized variable in the x axis versus the spatial lag of that standardized variable. The spatial lag is nothing but a summary of the effects of the neighboring spatial units. That summary is obtained by means of a spatial weights matrix which can take various forms but a very commonly used is the contiguity matrix. The contiguity matrix is an array that has a value of one in the position (i,j) whenever the spatial unit j is contiguous to the unit i. For convenience that matrix is standardized in such a way that the rows sum to one by dividing each value by the row sum of the original matrix.
[edit] Moran's I
In essence the Moran Scatter plot presents the relation of the variable in the location i with respect the values of that variable in the neighboring locations. By construction the slope of the line in the scatter plot is equivalent to the Moran's I coefficient. The latter is a well known statistic that accounts for the Global spatial autocorrelation. If that slope is positive it means that there is positive spatial autocorrelation: high values of the variable in location i tend to be clustered with high values of the same variable in locations that are neighbors of i, and vice versa. If the slope in the scatter plot is negative that means that we have a sort of checkerboard pattern or a sort of spatial competition in which high values in a variable in location i tend to be co-located with lower values in the neighboring locations.
In the Moran Scatter plot the slope of the curve is calculated and displayed right on top of the graph. As you can see the value in this case is positive, which means that areas with high rate of criminality tend to have neighbors with high rates as well.
[edit] Global versus local
At the global level we can talk about clustering, i.e. the general trend of the map to be clustered; at the local level we can talk about clusters i.e. we are able to pinpoint the locations of the clusters. The latter can be assessed by means of Local Indicators of Spatial Association - LISA. LISA analysis allows us to identify where are the areas high values of a variable that are surrounded by high values on the neighboring areas i.e. what is called the high-high clusters. Concomitantly, the low-low clusters are also identified from this analysi data and the construction of an empirical distribution of simulated statistics. Afterwards the value obtained originally is compared to the distribution of simulated values and if the value exceeds the 95h percentile it is said that the relation found is significant at 5%.
Another type of phenomenon that is important to analyze in this context is the existence of outliers that represent high values of the variable in a given location surrounded by low values in the neighboring locations.
Note that the fact that a value is high in comparison with the values in neighboring locations does not necessarily means that it is an outlier as we need to assess the statistical significance of that relationship. In other words, we may find areas where there seems to be clustering or where there may seem to be clusters but when the statistical procedures are conducted they turn to be non statistically significant clusters or outliers. The procedures employed to assess statistical significance consists on a Monte Carlo simulation of different arrangements of the data and the construction of an empirical distribution of simulated statistics. Afterwards the value obtained originally is compared to the distribution of simulated values and if the value exceeds the 95h percentile it is said that the relation found is significant at 5%.
[edit] Dynamic linking and brushing
Dynamic linking and brushing are extremely powerful devices as they allow users to interactively discover or confirm suspected patterns of spatial arrangement of the data or otherwise discard the existence of those. It allow users to extract information from data in spatial arrangements that may otherwise require very heavy computer routines to crank the numbers and start yielding some statistical results. The latter may also cost the users quite a bit in terms of expert knowledge and software capabilities; GeoDA has the advantage of being FREE.
[edit] References
- Anselin, Luc (1995). "Local indicators of spatial association – LISA". Geographical Analysis, 27, 93-115.
- Anselin, Luc (2005). "Exploring Spatial Data with GeoDATM: A Workbook". Spatial Analysis Laboratory. p. 138.
- Anselin, Luc, Ibnu Syabri and Youngihn Kho (2006). GeoDa: An introduction to spatial data analysis. Geographical Analysis 38, 5-22