City Exploration 2 – Downtown Boston and the Food Industry

Introduction to the Latent Construct of ‘Performance’ for Boston Food Establishments

For the purpose of my city walk assignment, I decided to use my latent construct of “performance” as the filter variable to decide my neighborhood. Performance is a variable that is a latent construct of multiple continuous numerical variables within the dataset. The 11 variables I have chosen are – Inspection Pass Count Inspection Fail Count Level 1 Violation Count Level 2 Violation Count Level 3 Violation Count Violation Pass Count Violation Fail Count Total Review Count Year Count Average Rating Price (after converting to numerical). I selected these variables for the PCA because my goal is to identify one or more factors that have a factor eigenvalue cutoff of 1.0, therefore helping me identify that my variables are unidimensional and uniform throughout the dataset.

Note that the weight of the performance variable through the PCA is heavily skewed towards violation data, therefore the results are somewhat inverted. A higher performance score indicates a lower ‘true’ performance, while a lower score indicates better ‘true’ performance. In essence, the highest scoring establishment on this variable is the ‘true’ lowest performing establishment, and vice versa for the lowest performance score.

I managed to successfully create my variable of performance and added it to the dataset I am working with, then aggregated mean performance scores by Neighborhood Statistical Area (NSA), to discover that:

  • Jamaica/Central/South Sumner (Jamaica) NSA has the highest average score, indicating the worst establishments are present in this NSA.
  • Downtown/Central/West End (Downtown) NSA has the lowest average score, indicating that the best performing establishments are present in this NSA.

The following is a map of all the establishments that had sufficient data points to calculate the performance score, with establishments falling in the same NSA having similar shapes so I could distinguish between them on the map. Additionally, the size of each point (representing individual establishments) is a product of its performance score – the larger the point, the worse the ‘true’ performance of that establishment is:

Map of all establishments utilized in PCA, sizes based on performance score, shapes based on NSA.

The following map is of the establishments in the Jamaica NSA, with the size of each establishment being relative to its performance score – the larger the diamond, the worse performing the establishment is (following the previous principle – higher performance score indicates lower ‘true’ performance)

Establishments with point sizes by score – Jamaica NSA.

Moving over to Downtown NSA, the same shape principle applies – the smallest diamond corresponds to the establishment with the highest ‘true’ performance.

Establishments with point sizes by score – Downtown NSA.

There are a couple of interesting points here:

  • The highest score for the Jamaica NSA is a product of one extremely poorly performing restaurant skewing a very small set of 3 establishments in that direction.
  • The Downtown NSA has a number of establishments with high scores/low true performance, and therefore is a better area to focus for further exploration of underlying reasons contributing towards this trend.

City Exploration 2 – Downtown/Central/West End NSA

The reason behind choosing this NSA was that it performs the best according to the latent construct of performance, however, there is a broad variety amongst the scores in this NSA. The highest value is 113, and the lowest is 0.47, with these scores indicating that there is a broad disparity between the restaurant performances according to the variables I had previously selected for my PCA.

The second reason behind choosing this area of Boston is the land use variety it contains, from commercial to residential to transportation, this NSA is a prime example of mixed-use land within Boston. Through looking at how establishment scores vary based on the sub-divisions within this NSA, it is possible to form assumptions and inferences regarding the linkages between ‘type’ of activity in a neighborhood, and how that is reflected through food establishment performance.

The third reason for choosing this NSA is a test of validity of my latent construct – according to my PCA work, this NSA should, on average, have the best performing restaurants with the lowest average scores. How is this distributed across the 84 unique establishments I have mapped? Is the distribution of higher or lower scoring establishments (on my created scale) clustered around sub-divisions within my NSA?

Visualizing the Limits of Downtown/Central/West End

Street Map of Downtown NSA (red border)

Which areas of Boston does this NSA cover? Using the above map, it becomes clear that the NSA lives up to its name and covers 3 main areas of the city – Downtown, West End, and Central Boston.

Downtown/Central/NSA – Perceptions of Food Industry and beyond

Housing and Racial Characteristics – It must be noted that a large portion of this NSA is commercial and does not contain any residential properties. The following map also helps us clearly distinguish between mixed and purely commercial areas (Darker the color, greater percentage of white residents). It must be noted that within the limited residential based demographic data, there is a broader variety of racial composition in this NSA as compared to the neighboring areas of Beacon Hill and the North End, driven primarily by the Chinatown region in the Central area of the city.

Map of White resident density – darker shades indicate higher white resident proportions.

Downtown Food Establishments in the News –

Headlines on establishment turnover in Boston.

Two headlines in the same period of time indicate the turnover of food establishments in this region. Studies and reports on the restaurant industry have indicated that the as space becomes less freely available (and at higher prices) in downtown/commercial areas of urban centers, it is more likely to observe high rates of business failures and new openings amongst establishments.

Downtown NSA in (extremely) current events – 

Headlines on developments in the Downtown Area over the past week.

As seen above, the unrest and lack of certainty around the 2020 presidential elections has caused businesses to close down and shutter up to prevent loss/damage of property. This will further complicate eating establishment operations, as the reduced capacities due to COVID-19 have already affected a significant portion of these establishments.

Establishments by Score

Within this NSA, it is clear that there are different clusters of commercial or mixed activity. This leads me to wonder if the commercial areas have a higher or lower score according to my variable –  

Left – Low Score, Right – High Score

Based on the above map, the clusters of lowest and highest scoring restaurants fall in the same areas of the NSA, which is not particularly helpful information.

Conclusion. –

There were several challenges associated with conducting a thorough exploration of this NSA, with the COVID-19 pandemic being the foremost amongst them. The dataset as it currently exists within dataverse, does not fully encompass the recent events and how violation and health inspections have been conducted across restaurants. This means that my calculated score is more historical than current.  The implications of this challenge are a) it is difficult to. Capture shifts in performance from pre 2020 to 2020 levels, and b) the frequency of inspections has significantly reduced as a result of the pandemic. Particularly during the first 2 months of complete closure during the pandemic, the distinct lack of inspection data is highly visible in previous analysis I had conducted.

Headline on Struggles within the restaurant industry in Boston.

The second challenge is the diversity of this NSA in terms of residential and commercial activities – the broad variety of price ranges and establishment sizes amongst the total of 84 (that had sufficient data for the latent variable construction) makes it difficult to create a complete analysis of the linkages between various environmental/social/economic factors and the performance of establishments.

Despite these challenges, there are positive developments towards my overall understanding of the dataset. The foremost being the law of large numbers being visualized through the maps – somewhat amended for the purpose of this assignment. With one of the highest densities of establishments, this NSA was bound to produce a lower average score as compared to other NSAs with smaller numbers of establishments. This led to my previous identification of the Downtown NSA as the ‘best’ performing NSA on average, which despite being a valid deduction, fails to clearly identify the disparity in performance range that led to this average score. Therefore, for the next steps of my analysis, I will be isolating higher density scores and analyzing them in isolation to find any commonalities in the violation failure trends they exhibit.

The other positive development is my realization that this data is primarily historical and will not be sufficient to effectively capture the developments of 2020. I am intending to focus my work on a year-over-year analysis of violations from 2010 to 2019, with the decision to excluding 2020 being validated by the lack of inspection records for this period. Based on this city exploration, it would be more interesting to create a final analysis that observes changes in violation occurrences in space and time across 2010-19 to identify pre-pandemic conditions in Boston’s food industry. This analysis can be utilized at a later point in time to observe changes to these trends in the post-pandemic era.


Leave a comment