Comparing Groups: Liquor License Types and Social Fabric

I was working on finding a meaningful metric to explain alcohol access in neighborhoods. One of the assumptions I made was that different types of alcohol outlets have different implications in social characteristics and economic strength of a neighborhood. I am curious to find out whether this assumption is supported by statistics.

Prepare the Dataset

For this task, I am associating each liquor license with a few selected social economic metrics at the census block group level, i.e. assigning each liquor license an value of the indicator based on which census block group it is located; then the metrics are aggregated by census block groups, and finally, I use t-test and ANOVA to analyze the differences in these metrics between each liquor license type.

I have picked median house income and the percentage of white residents as the two indicators for the social fabric of the neighborhoods.

source("src/preparation.R")
source("src/alcohol.R")

bizz.cols <- c("CT_ID_10", "ALC_BIZ_TYPE", "ALC_TYPE")
acs.cols <- c("CT_ID_10", "MedHouseIncome", "White")

dat <- merge(bizz.alc[, bizz.cols], acs.ct[, acs.cols], all.x = TRUE)

t-test

When categorized by type of alcohol allowed, the licenses can be divided into two group: malt & wine, and all alcoholic beverages. I used a Student’s t-test to see whether census indicators for these two groups are significantly different from each other.

First, the median house income:

library(broom)
library(knitr)
dat.t <- t.test(MedHouseIncome ~ ALC_TYPE, dat) %>% tidy()
dat.t <- dat.t[, c(1:6)]
colnames(dat.t) <- c(
  "mean difference",
  "mean in group (All Alcohol)",
  "mean in group (Malte & Wine)",
  "t statistic",
  "p value",
  "df"
)
row.names(dat.t) <- c("value")
round(dat.t, 2) %>% t() %>% kable()
value
mean difference 10765.01
mean in group (All Alcohol) 76030.47
mean in group (Malte & Wine) 65265.46
t statistic 5.76
p value 0.00
df 801.18

We get a t-test of 5.76, and the p value is as low as 1.1e-8 indicating a very strong difference. On average, a census block group containing full liquor licenses have a $10,765 higher median house income than those with malt & wine only liquor licenses. Note that each group has different number of members (liquor licenses)

This makes sense because full liquor licenses are much harder to get and have higher prices on the market. They naturally would be more distributed among rich neighborhoods.

Then let’s look at race:

dat.t <- t.test(White ~ ALC_TYPE, dat) %>% tidy()
dat.t <- dat.t[, c(1:6)]
colnames(dat.t) <- c(
  "mean difference",
  "mean in group (All Alcohol)",
  "mean in group (Malte & Wine)",
  "t statistic",
  "p value",
  "df"
)
row.names(dat.t) <- c("value")
round(dat.t, 2) %>% t() %>% kable()
value
mean difference 0.02
mean in group (All Alcohol) 0.66
mean in group (Malte & Wine) 0.64
t statistic 1.23
p value 0.22
df 714.32

The p value is 0.22, not so significant. Statistically speaking, race is not related to the distribution of different alcohol types.

ANOVA

We have 6 different business types designated in the categorization of liquor licenses. An ANOVA test can be used to test whether the implications of these business types are statistically significant.

dat.anova <- aov(MedHouseIncome ~ ALC_BIZ_TYPE, dat)
broom::tidy(dat.anova) %>% knitr::kable()
term df sumsq meansq statistic p.value
ALC_BIZ_TYPE 5 17121124445 3424224889 3.806447 0.0020108
Residuals 1097 986845509189 899585697 NA NA

With F value’s p value as low as 0.00201, it is statistically significant that at least one business type is different than others.

And based on Tukey’s “Honest Significant Difference” method, three group combinations were identified with significant differences.

broom::tidy(TukeyHSD(dat.anova))[2:6] %>% knitr::kable()
comparison estimate conf.low conf.high adj.p.value
Common Vectualler-Club 4563.333 -7309.544 16436.210 0.8825740
Farmer Distillery-Club -4402.655 -66038.275 57232.966 0.9999516
General On-Premises-Club -10090.611 -31352.054 11170.831 0.7539779
Hotel-Club 17154.345 1467.223 32841.468 0.0226704
Tavern-Club -12757.655 -74393.275 48877.966 0.9916508
Farmer Distillery-Common Vectualler -8965.987 -69573.924 51641.949 0.9982903
General On-Premises-Common Vectualler -14653.944 -32721.074 3413.186 0.1886058
Hotel-Common Vectualler 12591.013 1615.674 23566.351 0.0138650
Tavern-Common Vectualler -17320.987 -77928.924 43286.949 0.9646641
General On-Premises-Farmer Distillery -5687.957 -68810.105 57434.192 0.9998475
Hotel-Farmer Distillery 21557.000 -39912.037 83026.037 0.9176134
Tavern-Farmer Distillery -8355.000 -93978.049 77268.049 0.9997743
Hotel-General On-Premises 27244.957 6471.373 48018.540 0.0026154
Tavern-General On-Premises -2667.043 -65789.192 60455.105 0.9999964
Tavern-Hotel -29912.000 -91381.037 31557.037 0.7336351

They are Hotel-Club, Hotel-CommonVectualler, and Hotel-GeneralOnPremises.

broom::tidy(TukeyHSD(dat.anova))[c(4, 8, 13), 2:6] %>% knitr::kable()
comparison estimate conf.low conf.high adj.p.value
4 Hotel-Club 17154.35 1467.223 32841.47 0.0226704
8 Hotel-Common Vectualler 12591.01 1615.674 23566.35 0.0138650
13 Hotel-General On-Premises 27244.96 6471.373 48018.54 0.0026154

They are all related to hotels, which indicates that liquor licenses issues to hotels might be the outlier here, i.e., hotels offering on-premise consumption alcohol beverages tend to be located in richer neighborhoods.

The differences between hotel and the remaining two categories, Farmer Distillery and Tavern, are not statistically significant. However, this might just be because of the very small number of licenses in these two categories (with just 2 and 6 each)–the degree of freedom is really low, making it impossible to get a significant result.

Explained by Plots

Using bar chart and error bars, the mean differences of median house income between different liquor licenses can be plotted as following:

library(reshape2)

dat.melted <- melt(dat[, c("ALC_BIZ_TYPE", "MedHouseIncome")],
               id.vars = "ALC_BIZ_TYPE")
dat.means <- aggregate(value ~ ALC_BIZ_TYPE + variable, dat.melted, FUN = mean)
names(dat.means)[3]<-"mean"
dat.ses <- aggregate(value ~ ALC_BIZ_TYPE + variable, dat.melted,
                     function(x) sd(x, na.rm = TRUE) / sqrt(length(!is.na(x)))
                     )
names(dat.ses)[3]<-"se"

dat.means <- merge(dat.means, dat.ses, by = c("ALC_BIZ_TYPE", "variable"))
dat.means <- transform(dat.means, lower = mean - se, upper = mean + se)

ggplot(data = dat.means,
       aes(x = ALC_BIZ_TYPE, y = mean)) +
  geom_bar(stat = "identity", position = "dodge", fill = "tomato2") +
  geom_errorbar(
    aes(ymax = upper, ymin = lower),
    position = position_dodge(.9)) +
  theme(axis.text.x = element_text(angle = 70, hjust = 1)) +
  ylab("Mean") + xlab("Business Type")

p1

The high income level for liquor licenses in hotels is very obvious.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s