implement cluster of nearest neighbor method

1. Some introduction by summarize the sense making process form the attached file of Pirolli and Card_2005.

2. Then link what I explained in the following section with the process of sense making

Sense making process

Collecting the data to process it for analyzing purpose.

First, by using ArcGIS.

the crime dataset for Washington DC has record for each crime and of that crime classified by offence

and it contains a full information about crime location form example it has district neighborhood and even the geocoordinate of the exact point of the crime by using the longitude and latitude system and some extra information. this data only reveal or extract limited information. In this case I joined this data with census data for Washington DC. The figure shows the which area has the highest crime rate after normalizing the crime among population.

implement cluster of nearest neighbor method

By creating size graduated point markers as it shown in the following picture of size graduated point method in ArcGIS there is some evidence of some cluster spatially in some aria has big bubble.

Then I implement the nearest-neighbor statistical method to confirm if there a real cluster to reject or accept null hypothec.

The hypothec is that that there is a significant clustering which mean the crimes is not spread out randomly or by chance.

the result provide a strong evidence of heavily clustered in Washington DC that because the observation of mean distance to the nearest neighbors of crime which is 222.45 is significantly less than the expected mean distance under the null hypothesis that the crime are uniformly and randomly distributed.

Also the z-score which is important measurement to determine whether to reject or accept null hypothesis is equal to -14.7 which is significantly less and far from -2 or -3. that give a strong evidence to reject null hypothesis and support the decision and the cluster is not occur by chance.

After determining that there is and a strong evidence of some pattern in the crime

I implement additional statistical methods for analyzing and reveal that pattern in order to know which attributes are responses for the crime.

Here I reuse the sense making principal or strategy. I need to clean the dataset and remove any unnecessary attribute could negatively affect the result and get the best fit element to support analyze result. For example I remove all location attributes like latitude and longitude and so one only the TRACT which is refer to the Tract code of DC has been lifted and that because I joined the census data and crime data by this attribute and it reference for the location of the crime.

In the Best sub selection method to select the best attributes that are believed to related to the crime offense this reveal those attributes

BLACK, HISPANIC, MALES, AGE_25_34, AVE_HH_SZ, HSEHLD_1_M and MARHH_NO_C

All this attributes affect the offence type.

Another statistical test has been implement to compare the result

Using the random forest method has reveal the flowing result

The CENSUS_TRA, MARCHH_CHD, WHITE, ASSIAN, MARHH_NO_C, BLACK, FHH_CHILD and OWNER_OCC are the most top attributes that affect the offense type

MARHH_NO_C : Married households with no children

MARHH_CHD : Married households with children

FHH_CHILD : Female householder, no husband present, with children

OWNER_OCC: Units occupied by owner

TRACT: Tract code

Then in the book of Fundamentals of Crime Mapping second edition by Bryan Hill and Rebecca Paynich. In page # 26 Social Efficacy he explain some social that affect to the crime I want to link this by that attribute by doing some analyzing as result of this study