Following up on last week’s regressions and correlations, this week I will be creating categories from the Boston census data to look at different ways mixed use zoning relates to groups across the city. The first two categories I will create are from the proportion white variable, which describes from zero to one the proportion of the population that is white in every subdistrict. These proportions were aggregated to the neighborhood statistical area level, and then divided into the categories white or nonwhite based on if the white proportion was the majority in the neighborhood, over 0.5. The 0.5 split point seems reasonable as the mean of the proportion white for each neighborhood is about 0.48. Creating these two groups will help address whether there is a racial component to mixed use zoning, possibly illuminating disproportionate zoning types in either direction. In order to make this comparison, I will be utilizing a t test to compare the means of both groups.
> propwhiteNSA View(propwhiteNSA) > propwhiteNSA$majority .50,c("white"),c("nonwhite")) > NSAmixeduse<-merge(NSAmixeduse, propwhiteNSA, by="NSA_NAME") > View(NSAmixeduse) > t.test(TotalMean~majority,data=NSAmixeduse)
Welch Two Sample t-test
data: TotalMean by majority
t = -3.1923, df = 38.323, p-value = 0.002818
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
mean in group nonwhite mean in group white
The two means indicate that majority white neighborhoods have higher amounts of mixed use zoning than nonwhite majority residents, with their mean mixed use scores differing by about 0.11. For reference, the median of mixed use zoning across all neighborhoods is 0.18, with a third quartile of 0.24, meaning the score’s distribution is heavily skewed right. The p-value of .0028 indicates that this difference in scores is very unlikely to be due to chance.
These results indicate evidence that there is increased mixed use zoning in majority white neighborhoods, and because in last weeks’ analysis I had found a positive correlation between home values and mixed use zoning, this could be illustrating socioeconomic barriers, as well as a history of segregation by both race and income in Boston. While the factors leading to this divide are complex, there certainly seems to be a racial component to how amenities are zoned in Boston.
The next test I will be utilizing is ANOVA, or analysis of variance, in order to compare more than 2 groups with each other through the mixed use zoning scores. To find new categories, I moved back to the subdistrict level data, which classifies each subdistrict into the types Business, Harborpark, Industrial, Miscellaneous, Mixed Use, Open Space, Other, Residential, and Waterfront. Measuring these categories could potentially show any accuracy in how these scores relate to the types, as certain types would expect different values, e.g. Mixed Use versus Residential.
> typeanova <-aov(TotalMean~subdistric,data=mixedmeans) > summary(typeanova)
Df Sum Sq Mean Sq F value Pr(>F)
subdistric 9 70.48 7.831 359.6
Residuals 1627 35.43 0.022
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The above F value indicates that there is about 360 times the amount of variation as would be expected by pure chance, with a probability that is infinitesimal. These results show strong evidence of differences between mixed use zoning scores and neighborhood types. To further explore these neighborhood types, I then melted and aggregated the categories to get the mixed use means for each category. After calculating the standard error to show the mean’s variation, I plotted the results as a bar graph with standard errors.
> melttype <- melt(mixedmeans[c(7,18)],id.vars=c("subdistric")) > typemeans<-aggregate(value~subdistric,data=melttype,mean) > names(typemeans) <- "mean" > ses<-aggregate(value~subdistric,data=melttype,function(x) sd(x, na.rm=TRUE)/sqrt(length(!is.na(x)))) > names(ses)<-'se' > typemeans <- merge(typemeans,ses,by='subdistric') > typemeans <- transform(typemeans, lower=mean-se,upper=mean+se) > typebar <-ggplot(data=typemeans,aes(x=subdistric,y=mean))+geom_bar(stat="identity",position="dodge",fill="blue") + ylab("Mixed Use Zoning Mean") + xlab("Neighborhood Type") > typebar+geom_errorbar(aes(ymax=upper,ymin=lower),position=position_dodge(.9))
The graph shows the variation of mixed use scores in the different neighborhood type categories. Predictably, mixed use, other, and business have the highest amounts of diversity in their zoned amenities, as these are areas with variation of activity, while waterfront, residential, and open spaces do not.