Pulse of the City: Understanding the Distribution of Restaurant Violation by Location and Categories of Restaurant – City of Boston, MA

Introduction

This paper focus on Boston Restaurant Violation Datasets. It specifically takes interest in location(“city”) and types of restaurants (“descript”) correlated with violations.

Overview of Violation 

There are 762,709 rows and 26 total columns of data on restaurant violations of regulations.  “viollevel” indicates 3 levels of violations, marked by the asterisk from * , the lowest level to ***, the highest level. 711,692 of the rows are * (one asterisk) level violation, which take up 72.3% of total violations.  the is 3854 (0.5%) violations not defined. According to An Introduction to Restaurant Grading by City of Boston’s rule on food inspection (2016), three levels of Restaurant violations are divided into three categories, foodborne critical violation (the most severe), critical violation (medium), and a non-critical violation (the most minor violation). Thus, the inspection team should figure out the types of the non-critical violation or the one asterisk level violations and focus on them.

#Code 1: Violation level divided by category of restaurant

restau <- read_csv(“/Users/xymeng/Desktop/23 Fall Academics/5232 BD4C/restau_viol_copy.csv”)

ggplot2::aes(x=descript, y=viollevel)

data <- data.frame(viollevel = c(‘*’, ‘**’, ‘***’))

data$viollevel <- factor(data$viollevel, levels = c(‘*’, ‘**’, ‘***’), labels = c(1, 2, 3))

base<-ggplot(data=restau, aes(x=descript, y=viollevel)) + geom_bar(stat=’identity’) + xlab(“descript”) + ylab(“viollevel”)

base + geom_smooth(method=lm)

The Top 3 District by Number of Violation Cases

It can be observed that Boston (340,272) and Dorchester (103,510) takes up more than half of the violations. The two areas should be the focus of administration of food inspection authority. Further studies may consider the number and density of restaurant in each area to investigate the violation rate of restaurant in each district.

#Code 2 for variables (locations, categories of restaurant, Violation Status) frequency counting:

restau <- read_csv(“/Users/xymeng/Desktop/23 Fall Academics/5232 BD4C/restau_viol_copy.csv”)

restau$city <- tolower(trimws(restau$city))

unique_cities <- unique(restau$city)

city_counts <- table(restau$city)

print(city_counts)

restau$descript <- tolower(trimws(restau$descript))                      

unique_descript <- unique(restau$descript)

descript_counts <- table(restau$descript)

print(descript_counts)

restau$violstatus <- tolower(trimws(restau$violstatus))

unique_violstatus <- unique(restau$violstatus)

violstatus_counts <- table(restau$violstatus)

print(violstatus_counts)

With column GGPLOT graph, the top 3 districts on violation occurrence (count) are observed as followed.

#Code 3 for the ggplot graph Top 3 District on Number of Violation, with dplyr and ggplot2 package:

restau <- read_csv(“/Users/xymeng/Desktop/23 Fall Academics/5232 BD4C/restau_viol_copy.csv”)

top_cities <- restau %>%

  count(city) %>%

  top_n(3, wt = n)

base <- ggplot(data=top_cities, aes(x=reorder(city, n), y=n)) +

  geom_col() + 

  xlab(“City”) + 

  ylab(“Count”) +

  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +

  scale_y_continuous(labels = scales::comma)  # This will format y-axis labels with commas

base

Type of restaurant

The types of restaurants can indicate the number of violations in the below 4 types. The law enforcement officers, or policy makers team may focus eat & drinking and eating & drinking w/ take out. To seek for a more precise understanding, they may calculate the proportion of violation rate of all the types based their total quantity across City of Boston.

Codes stated in Code 2.

#Code 4: Violation level divided by category of restaurant

restau <- read_csv(“/Users/xymeng/Desktop/23 Fall Academics/5232 BD4C/restau_viol_copy.csv”)

ggplot2::aes(x=descript, y=viollevel)

data <- data.frame(viollevel = c(‘*’, ‘**’, ‘***’))

data$viollevel <- factor(data$viollevel, levels = c(‘*’, ‘**’, ‘***’), labels = c(1, 2, 3))

base<-ggplot(data=restau, aes(x=descript, y=viollevel)) + geom_bar(stat=’identity’) + xlab(“descript”) + ylab(“viollevel”)

base + geom_smooth(method=lm)

Limitations

The GGPLOT graph which illustrates Violation level (viollevel) divided by category of restaurant (descript) can only tell the total number of violations occurred in each type of restaurant but not the number of the three exact levels respectively. Due to the categorical nature of the 2 variables,  analysis cannot apply a point, line, or column GGPLOT graph. Policymakers is suggested to apply a numeric marking system e.g., percentage or 10-scale marking, which makes the Violation Level more quantifiable and precise in comparison. 

Regarding Level of Violations, there are 3854 violations (0.5% ) unknown and 1 violation defined as “1919”, which are both minor figure in such a large-scale datasets. However, it is unknown that how would they affect statistical test in the future, e.g., regression for understanding correlation or t-test for inferring the population.

Conclusion

This paper found that Boston, Dorchester and Roxbury are the Top 3 districts with the most restaurant violation incidents. Further inspection should give a focus on them. To better understand the phenomenon, City of Boston authority is advised to investigate the total number and density of restaurant in the 3 areas to understand the violation rate of restaurant in each district.

One-asterisk violation (*) is the dominating in a general term, which accounts for 72.3%. there are 0.5% of total violations undefined. The inspection officers should figure out the common types of one asterisk level violations. The Level of Violations are suggested to be evaluated in a scale or interval number such as percentage or marks for quantifying the correlation between Level of Violations and other variables. 

Reference

City of Boston. (2016, October). An Introduction to Restaurant Grading. An Introduction to Restaurant Grading.https://www.boston.gov/sites/default/files/file/document_files/2016/12/restaurant_grade_info_sheet_final.pdf


Leave a comment