Exploratory Data Assignment 2: Pulse of the City

Hyun Lee – September 27, 2023

This week, I am going more into depth in the building permits data and starting to create data visualizations that may provide insights in the dynamics of the city.

Again, I read in the building permits data.

building_permits_2023<-read.csv(‘Permits.Records.Geocoded.2023.csv’)

Looking for patterns in the data, I made a scatter plot of permit duration compared with total fees. Does longer permit duration coincide with greater fees? This does not seem to be the case.

library(tidyverse)

## Warning: package ‘tidyverse’ was built under R version 4.2.3

## Warning: package ‘ggplot2’ was built under R version 4.2.3

## Warning: package ‘tibble’ was built under R version 4.2.3

## Warning: package ‘tidyr’ was built under R version 4.2.3

## Warning: package ‘readr’ was built under R version 4.2.3

## Warning: package ‘purrr’ was built under R version 4.2.3

## Warning: package ‘dplyr’ was built under R version 4.2.3

## Warning: package ‘stringr’ was built under R version 4.2.3

## Warning: package ‘forcats’ was built under R version 4.2.3

## Warning: package ‘lubridate’ was built under R version 4.2.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2    
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors

install.packages(“ggplot2”)

## Warning: package ‘ggplot2’ is in use and will not be installed

library(ggplot2)

ggplot(data = building_permits_2023) +
  geom_point(mapping = aes(x = PermitDuration, y = total_fees))

## Warning: Removed 25984 rows containing missing values (`geom_point()`).

#seemingly no correlation between permit duration and total fees

median(building_permits_2023$total_fees, na.rm = TRUE) # $70

## [1] 70

mean(building_permits_2023$total_fees, na.rm = TRUE) # $4110.77

## [1] 4110.773

max(building_permits_2023$total_fees, na.rm = TRUE) # $625,000,606

## [1] 625000606

min(building_permits_2023$total_fees, na.rm = TRUE) # $0

## [1] 0

#total fees skewed heavily by outliers, permit duration more spread out

There was also no correlation found between declared valuation and total fees.

ggplot(data = building_permits_2023) +
  geom_point(mapping = aes(x = DECLARED_VALUATION, y = total_fees))

## Warning: Removed 10 rows containing missing values (`geom_point()`).

I made a bar graph of the top five occupancy types and their average sq feet. Looking at the graph and the results of the by() function, it seems that there indeed is a correlation between larger occupancy types and square footage.

In this case, I manually inputted results from the by() function into a data frame, which I then turned into a bar graph. In the future, I would hope I can directly convert the by() function results into a table or vector, that I can then pipe into a ggplot. I can also probably make the bar graph less ugly.

Tried running following code from ChatGPT but he failed me 😦 result <- by(building_permits_2023OCCUPANCY, mean, na.rm=TRUE)

result_table <- data.frame( Category = names(result), # Extract category names Mean = unlist(result) # Extract mean values and convert to a vector )

library(dplyr)

building_permits_2023 %>%
  dplyr::count(OCCUPANCY) %>%
  arrange(desc(n))

##    OCCUPANCY      n
## 1     1-2FAM 182253
## 2       Comm 171000
## 3      Multi  61748
## 4     1-3FAM  60064
## 5      Mixed  23420
## 6      Other  21019
## 7             16762
## 8     1-4FAM  16352
## 9      1Unit   6046
## 10     VacLd   5509
## 11     7More   4792
## 12    1-7FAM   2051
## 13     3unit    712
## 14     2unit    613
## 15     4unit    375
## 16     6unit    299
## 17     5unit    280
## 18      COMM    185
## 19     7unit     76
## 20     4Unit      7
## 21     6Unit      4
## 22     MIXED      2

mean_sqfeet_by_occupany <- data.frame(
  Occupancy = c(“1-2FAM”, “Comm”, “Multi”, “1-3FAM”, “Mixed”),
  Sq_ft = c(235.715, 61337.19, 1182.355, 542.9253, 7689.422)
)

mean_sqfeet_by_occupany %>%
  ggplot() +
  geom_col(mapping = aes(x = Occupancy, y = Sq_ft))

Some declared valuations are negative, which does not make sense. More research should be done into why this is the case.

summary(building_permits_2023[c(‘DECLARED_VALUATION’, ‘total_fees’, ‘sq_feet’, ‘PermitDuration’)]) # summarize selected columns

##  DECLARED_VALUATION     total_fees           sq_feet          PermitDuration 
##  Min.   :  -1000000   Min.   :        0   Min.   :0.000e+00   Min.   :   0.0 
##  1st Qu.:      1200   1st Qu.:       35   1st Qu.:0.000e+00   1st Qu.: 180.7 
##  Median :      5029   Median :       70   Median :0.000e+00   Median : 182.4 
##  Mean   :    218397   Mean   :     4111   Mean   :1.919e+04   Mean   : 179.8 
##  3rd Qu.:     20187   3rd Qu.:      220   3rd Qu.:0.000e+00   3rd Qu.: 183.4 
##  Max.   :2100000000   Max.   :625000606   Max.   :1.000e+10   Max.   :3838.5 
##                       NA’s   :10                              NA’s   :25984

apply(building_permits_2023[c(‘DECLARED_VALUATION’, ‘total_fees’, ‘sq_feet’, ‘PermitDuration’)], 2, mean, na.rm=TRUE) # alternative method

## DECLARED_VALUATION         total_fees            sq_feet     PermitDuration
##        218397.3994          4110.7734         19187.5936           179.7601

# min declared valuation is -1000000

filter(building_permits_2023, DECLARED_VALUATION < 0) # filter for negative declared valuations

##      X.1 PermitNumber WORKTYPE          permittypedescr
## 1  18957      A613433    OTHER Amendment to a Long Form
## 2  19184      A687464   INTREN Amendment to a Long Form
## 3  19333      A748553   INTREN Amendment to a Long Form
## 4  19401      A771335   INTREN Amendment to a Long Form
## 5 525332      A781148    OTHER Amendment to a Long Form
##                  description
## 1                      Other
## 2 Renovations – Interior NSC
## 3 Renovations – Interior NSC
## 4 Renovations – Interior NSC
## 5                      Other
##                                                                                                                                                                    NOTES
## 1                                                                                                                                                  Revised HVAC system.;
## 2                                                                                                     Diminished scope at the sixth floor. Less partitions and MEP work.
## 3                                                                  Cosmetic Upgrades of the 5th floor Menino Building at Boston Medical Center.; Amendment to ALT543370.
## 4 Amendment to ALT700716 to reduce scope of work. There will no longer be an addition built in the rear  it is now just an interior renovation as depicted in new plans.
## 5                                                     Amendment to ERT557131.; Revisions to basement and first floor levels and change in structural engineer of record.
##               APPLICANT DECLARED_VALUATION total_fees         ISSUED_DATE
## 1      Nicholas Stewart            -250000        520 2016-07-29 09:15:31
## 2        Michael Harris            -900000        155 2017-04-12 06:35:41
## 3           sean lawton           -1000000       1812 2017-09-22 09:33:20
## 4             Ryan Hunt             -40000         41 2018-05-21 15:20:55
## 5 Christopher P Desisto            -200000        333 2018-01-22 14:00:20
##   EXPIRATION_DATE STATUS                                 owner OCCUPANCY
## 1            <NA>   Open                            MATOV ALEX      Comm
## 2      2017-10-12 Closed                       CFS SEAPORT LLC      Comm
## 3      2018-03-22   Open     BOSTON MEDICAL CENTER CORPORATION     Other
## 4      2018-11-21 Closed                        JCG REALTY LLC    1-2FAM
## 5      2018-07-22   Open THIRTY-1 NORTH BEACON ST LLC MASS LLC     Mixed
##   sq_feet                  ADDRESS         CITY STATE   ZIP Property_ID
## 1       0 1505    Commonwealth AVE     Brighton    MA 02135       37777
## 2       0    33-39   Farnsworth ST       Boston    MA 02210       56395
## 3       0     840     Harrison AVE      Roxbury    MA 02118      166049
## 4       0      299-301   Silver ST South Boston    MA 02127      127425
## 5       0           1   Everett ST      Allston    MA 02134      419828
##       GIS_ID parcel_num         X        Y Land_Parcel_ID     TLID    Blk_ID_10
## 1 2101830005 2101830005 -71.14154 42.34711     2101830005 85695564 2.502500e+14
## 2  602660010  602660001 -71.04796 42.35172      602660010 85730627 2.502506e+14
## 3  801420000  801420000 -71.07416 42.33481      801420000 85728735 2.502507e+14
## 4  601915000  601915000 -71.04736 42.33581      601915000 85712754 2.502506e+14
## 5 2201752000 2201752000 -71.13874 42.35399     2201752000 85727128 2.502500e+14
##       BG_ID_10    CT_ID_10                   NSA_NAME           BRA_PD newcon
## 1 250250006023 25025000602  Brighton – St Elizabeth’s Allston/Brighton      0
## 2 250250606001 25025060600     D Street/West Broadway     South Boston      0
## 3 250250711012 25025071101 South End – Harrison Lenox        South End      0
## 4 250250608001 25025060800     D Street/West Broadway     South Boston      0
## 5 250250008022 25025000802                    Allston Allston/Brighton      0
##   addition demo reno PermitDuration government
## 1        1    0    0             NA          0
## 2        1    0    0       182.7252          0
## 3        1    0    0       180.6019          0
## 4        1    0    0       183.3605          0
## 5        1    0    0       180.4164          0

Many building permits are also labeled open, even though it is past their expiration date. I examined some of the oldest permits (expiration date before 2023-06-16) and compared those that were open to those that were closed to see if I could find any patterns or discern why some were closed while others remained open. However, I wasn’t able to find anything significant at this time. It also doesn’t seem like there is a relationship between status and permit duration.

building_permits_2023 %>%
  filter(STATUS==’Open’, EXPIRATION_DATE < ‘2010-06-15’) %>%
  View()

#337,080 obs that are open with an expiration date before 2023-01-01, omitted for brevity

#Instead looked at some of the oldest permits that still haven’t been closed

building_permits_2023 %>%
  filter(STATUS==’Closed’, EXPIRATION_DATE < ‘2010-06-15’) %>%
  View()

#filtered for permits that expired before 2010-06-15 and were closed

by(building_permits_2023$PermitDuration, building_permits_2023$STATUS, mean, na.rm=TRUE) # find average permit duration for open and closed pemits

## building_permits_2023$STATUS: Closed
## [1] 182.2818
## ————————————————————
## building_permits_2023$STATUS: Issued
## [1] 182.1304
## ————————————————————
## building_permits_2023$STATUS: Open
## [1] 178.3673
## ————————————————————
## building_permits_2023$STATUS: Stop Work
## [1] 183.0753

Conclusion

To summarize the results of this week’s assignment, I found that there was no correlation between permit duration and total fees, or between permit duration and declared valuation.

The top 5 occupancy types were found to be 1-2FAM, Comm, Multi, 1-3FAM, and Mixed, although there is some overlap that should be addressed in the nomenclature, i.e., Comm vs COMM. Generally larger occupancy types had larger sq_feet values, confirming accuracy.

Looking into things in the data set that do not make sense, I found that there were five cases of negative declared valuations, which may have just been human error. Also, many expired building permits were still labeled open.

Discrepancies like this may have to be omitted for the data set to be studied more properly. The fact that so many permits are still open may be concerning for Boston’s communities, or they may not have much significance.


Leave a comment