What are Boston’s offers to foodies?

Business establishments are an important indicator of city dwellers’ characteristics–Do they drink much? What type of entertainment do they have, casual drink in bars or live music as well? What food are most convenient to their access, fast food or posh cuisine? I will try to answer some of these questions for Boston by digging into the Boston Business Licences Database.

A quick glance at all the data files in this database, I was immediately drawn to the active food establishment file. As new comer to Boston and a foodie myself, I am interested in finding out what Boston can offer.

Let’s load the data and do a quick examination first:

foodPlaces <- read.table(
  "business_licenses_data/Active_Food_Establishment_Licenses.tab",
  header = TRUE, sep = "\t")
dim(foodPlaces)
## [1] 2930   13
names(foodPlaces)
##  [1] "BusinessName"   "DBAName"        "Address"        "City"          
##  [5] "State"          "Zip"            "LICSTATUS"      "LICENSECAT"    
##  [9] "DESCRIPT"       "LicenseAddDtTm" "DAYPHN"         "Property_ID"   
## [13] "Location"

This dataset contains 2,930 records, each row of record consists of 13 columns, including the name, address, geolocation, phone number and license date of the business. These business licenses were added into database in between Dec, 2006 to Aug, 2016.

foodPlaces$LicenseAddDtTm <- as.Date(foodPlaces$LicenseAddDtTm,
                                     "%m/%d/%Y %T %p")
summary(foodPlaces$LicenseAddDtTm)
##         Min.      1st Qu.       Median         Mean      3rd Qu.
## "2006-12-05" "2006-12-07" "2008-11-16" "2010-01-11" "2013-01-31"
##         Max.
## "2016-08-10"

The raw data contains a few meaningless columns that are not so helpful to our quantitative analysis. So I decided to clean them up before moving forward.

foodPlaces[c("LICSTATUS",  # all records are "active"
             "State",  # all records are "MA"
             "DESCRIPT",  # description of license category
             "DAYPHN",  # day phone
             "Property_ID")] <-  NULL

LICENSCAT (license category code) and DESCRIPT (a readable description of the license category) are basically the same information. I removed DESCRIPT, and kept LICENSCAT, simply because the values are shorter and easier to type (for future filtration). FS is equivalent to “Eating & Drinking”; FS is “Eating & Drinking w/ Take Out”.

There is a special column that is empty for most of the rows, but tells us some valuable information. “DBAName”, the “doing business as” name of the business, tells us how a company brands their service with a different name, in hope to get more customers.

foodPlaceAliases <- foodPlaces[foodPlaces$DBAName != "", ]
foodPlaceAliases <- foodPlaceAliases[c("BusinessName", "DBAName")]
foodPlaceAliases
##                                BusinessName                        DBAName
## 336              Charlies Pizza and Kitchen           Pantry Pizza Kitchen
## 394  Comedy Connection @ The Wilbur Theater                     Re: 3 Bars
## 395                               Comella's                      1844 Inc.
## 419                      Cosi South Station         Hearthstone Associates
## 575                                   Towne         Hynes Fine Dining  LLC
## 635                            Extreme Pita  Trustees of Boston University
## 856                  Jade Garden Restaurant                      LCY  Inc.
## 1344                         Rustic Kitchen                 SMC Stuart LLC
## 1545                             Taj Boston                       IHMS LLC
## 1569                         Teriyaki House             T.H. Boylston Inc.
## 1603                       The Greatest Bar             The Next Place LLC
## 1653                        Top of The Hill Top of the Hill Seafood & Subs
## 1707                           Viva Burrito       Viva Burrito Boston Inc.
## 1821                         Icon aka Rumor                      Paga Inc.
## 1829             Morgan Lewis & Bockius LLP             Flik International
## 1926                  Santarpio's Cafe Inc.                       FJS INC.
## 2151                            CO3/BO3/YO3            Aramark Corporation
## 2668                            PO6/BO7/PBD            Aramark Corporation
## 2823                           Dudly Coffee              Shanti Boston LLC
## 2898                                LY4/LB4            Aramark Corporation
## 2899                                    V10            Aramark Corporation

Only 21 out of 2,930 businesses are operating in another name, a mere less than 1 percent. Besides the normal non-food-reminding to food-related name conversion, e.g. “Rustic Kitchen” vs “SMC Stuart LLC”, we can also see a few weird names operated by the company “Aramark Corporation”–CO3/BO3/YO3, PO6/BO7/PBD, LY4/LB4, V10. What’s up with these mysterious code?

It turns out that Aramark Corporation is a company providing food services to big groups of people in places like large event venues or workplaces; the four establishments in question are all in 4 Yawkey, where the Fenway Park is located. At least now we know where do Redsox fans get their hot dogs and beer!

This dataset classifies food establishments with and without takeout under two different categories. It’s natural for one to wonder, how many food places in Boston actually do takeouts?

ftRatio <- nrow(foodPlaces[foodPlaces$LICENSECAT == "FT", ]) / nrow(foodPlaces)
sprintf("%.2f%% of food establishments accept takeout orders.", ftRatio * 100)
## [1] "47.10% of food establishments accept takeout orders."

It almost half of them! It would be interesting to see which neighborhood gets the highest density of these businesses, because it gives you a rough impression of the livability of each region. No one would complain if a big selection of food delivery is just around the corner.

However, a first look at the City column tells me that this data is not so prepared for analysis.

unique(foodPlaces$City)
##  [1] West Roxbury       Hyde Park          Boston            
##  [4] Dorchester         Boston/            Jamaica Plain     
##  [7] Charlestown        Roslindale         Allston           
## [10] South Boston       Mission Hill       Brighton          
## [13] Mattapan           Roxbury            East Boston       
## [16] BOSTON             EAST BOSTON        South End         
## [19] East BOSTON        DORCHESTER         BRIGHTON          
## [22] CHARLESTOWN        ROSLINDALE         South Boston/     
## [25] HYDE PARK          CHESTNUT HILL      Roslindale/       
## [28] ALLSTON            WEST ROXBURY       SOUTH BOSTON      
## [31] Financial District ROXBURY            Mission Hill/     
## [34] Brighton/          MATTAPAN           East Boston/      
## [37] Boston/Fenway      JAMAICA PLAIN                        
## [40] Fenway/            Charlestown/       Mattapan/         
## [43] Dorchester/        roxbury            East  Boston      
## [46] Fenway            
## 46 Levels:  Allston ALLSTON Boston BOSTON Boston/ ... WEST ROXBURY

There are duplicate names in different capitalization format, or contain extraneous characters (as “/” in “Boston/”), and some of them are down to a specific neighborhood, some only say “Boston”. While I can consolidate the format, it would be impossible to infer the neighborhood just with the city name “Boston”.

library("stringi")
# replace extraneous "/", then normalize the capitalizations with
# a function provided by the stringi library
foodPlaces$City <-sub("/", "", stri_trans_totitle(foodPlaces$City))
unique(foodPlaces$City)  # consolidated names
##  [1] "West Roxbury"       "Hyde Park"          "Boston"            
##  [4] "Dorchester"         "Jamaica Plain"      "Charlestown"       
##  [7] "Roslindale"         "Allston"            "South Boston"      
## [10] "Mission Hill"       "Brighton"           "Mattapan"          
## [13] "Roxbury"            "East Boston"        "South End"         
## [16] "Chestnut Hill"      "Financial District" "BostonFenway"      
## [19] ""                   "Fenway"             "East  Boston"

By consolidating the names, number of unique neighborhood names are cut down from 46 to 19.

A more tangible solution would be to use the zip code or street address to look up the neighborhood they belong to, or to paint them on map directly. I won’t go further from here since it requires additional dataset and much more work. Let’s save the best for the next time. 🙂

Advertisements

One thought on “What are Boston’s offers to foodies?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s