Hyun Lee – September 27, 2023
This week, I am going more into depth in the building permits data and starting to create data visualizations that may provide insights in the dynamics of the city.
Again, I read in the building permits data.
building_permits_2023<-read.csv(‘Permits.Records.Geocoded.2023.csv’)
Looking for patterns in the data, I made a scatter plot of permit duration compared with total fees. Does longer permit duration coincide with greater fees? This does not seem to be the case.
library(tidyverse)
## Warning: package ‘tidyverse’ was built under R version 4.2.3
## Warning: package ‘ggplot2’ was built under R version 4.2.3
## Warning: package ‘tibble’ was built under R version 4.2.3
## Warning: package ‘tidyr’ was built under R version 4.2.3
## Warning: package ‘readr’ was built under R version 4.2.3
## Warning: package ‘purrr’ was built under R version 4.2.3
## Warning: package ‘dplyr’ was built under R version 4.2.3
## Warning: package ‘stringr’ was built under R version 4.2.3
## Warning: package ‘forcats’ was built under R version 4.2.3
## Warning: package ‘lubridate’ was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
install.packages(“ggplot2”)
## Warning: package ‘ggplot2’ is in use and will not be installed
library(ggplot2)
ggplot(data = building_permits_2023) +
geom_point(mapping = aes(x = PermitDuration, y = total_fees))
## Warning: Removed 25984 rows containing missing values (`geom_point()`).
#seemingly no correlation between permit duration and total fees
median(building_permits_2023$total_fees, na.rm = TRUE) # $70
## [1] 70
mean(building_permits_2023$total_fees, na.rm = TRUE) # $4110.77
## [1] 4110.773
max(building_permits_2023$total_fees, na.rm = TRUE) # $625,000,606
## [1] 625000606
min(building_permits_2023$total_fees, na.rm = TRUE) # $0
## [1] 0
#total fees skewed heavily by outliers, permit duration more spread out
There was also no correlation found between declared valuation and total fees.
ggplot(data = building_permits_2023) +
geom_point(mapping = aes(x = DECLARED_VALUATION, y = total_fees))
## Warning: Removed 10 rows containing missing values (`geom_point()`).
I made a bar graph of the top five occupancy types and their average sq feet. Looking at the graph and the results of the by() function, it seems that there indeed is a correlation between larger occupancy types and square footage.
In this case, I manually inputted results from the by() function into a data frame, which I then turned into a bar graph. In the future, I would hope I can directly convert the by() function results into a table or vector, that I can then pipe into a ggplot. I can also probably make the bar graph less ugly.
Tried running following code from ChatGPT but he failed me 😦 result <- by(building_permits_2023OCCUPANCY, mean, na.rm=TRUE)
result_table <- data.frame( Category = names(result), # Extract category names Mean = unlist(result) # Extract mean values and convert to a vector )
library(dplyr)
building_permits_2023 %>%
dplyr::count(OCCUPANCY) %>%
arrange(desc(n))
## OCCUPANCY n
## 1 1-2FAM 182253
## 2 Comm 171000
## 3 Multi 61748
## 4 1-3FAM 60064
## 5 Mixed 23420
## 6 Other 21019
## 7 16762
## 8 1-4FAM 16352
## 9 1Unit 6046
## 10 VacLd 5509
## 11 7More 4792
## 12 1-7FAM 2051
## 13 3unit 712
## 14 2unit 613
## 15 4unit 375
## 16 6unit 299
## 17 5unit 280
## 18 COMM 185
## 19 7unit 76
## 20 4Unit 7
## 21 6Unit 4
## 22 MIXED 2
mean_sqfeet_by_occupany <- data.frame(
Occupancy = c(“1-2FAM”, “Comm”, “Multi”, “1-3FAM”, “Mixed”),
Sq_ft = c(235.715, 61337.19, 1182.355, 542.9253, 7689.422)
)
mean_sqfeet_by_occupany %>%
ggplot() +
geom_col(mapping = aes(x = Occupancy, y = Sq_ft))
Some declared valuations are negative, which does not make sense. More research should be done into why this is the case.
summary(building_permits_2023[c(‘DECLARED_VALUATION’, ‘total_fees’, ‘sq_feet’, ‘PermitDuration’)]) # summarize selected columns
## DECLARED_VALUATION total_fees sq_feet PermitDuration
## Min. : -1000000 Min. : 0 Min. :0.000e+00 Min. : 0.0
## 1st Qu.: 1200 1st Qu.: 35 1st Qu.:0.000e+00 1st Qu.: 180.7
## Median : 5029 Median : 70 Median :0.000e+00 Median : 182.4
## Mean : 218397 Mean : 4111 Mean :1.919e+04 Mean : 179.8
## 3rd Qu.: 20187 3rd Qu.: 220 3rd Qu.:0.000e+00 3rd Qu.: 183.4
## Max. :2100000000 Max. :625000606 Max. :1.000e+10 Max. :3838.5
## NA’s :10 NA’s :25984
apply(building_permits_2023[c(‘DECLARED_VALUATION’, ‘total_fees’, ‘sq_feet’, ‘PermitDuration’)], 2, mean, na.rm=TRUE) # alternative method
## DECLARED_VALUATION total_fees sq_feet PermitDuration
## 218397.3994 4110.7734 19187.5936 179.7601
# min declared valuation is -1000000
filter(building_permits_2023, DECLARED_VALUATION < 0) # filter for negative declared valuations
## X.1 PermitNumber WORKTYPE permittypedescr
## 1 18957 A613433 OTHER Amendment to a Long Form
## 2 19184 A687464 INTREN Amendment to a Long Form
## 3 19333 A748553 INTREN Amendment to a Long Form
## 4 19401 A771335 INTREN Amendment to a Long Form
## 5 525332 A781148 OTHER Amendment to a Long Form
## description
## 1 Other
## 2 Renovations – Interior NSC
## 3 Renovations – Interior NSC
## 4 Renovations – Interior NSC
## 5 Other
## NOTES
## 1 Revised HVAC system.;
## 2 Diminished scope at the sixth floor. Less partitions and MEP work.
## 3 Cosmetic Upgrades of the 5th floor Menino Building at Boston Medical Center.; Amendment to ALT543370.
## 4 Amendment to ALT700716 to reduce scope of work. There will no longer be an addition built in the rear it is now just an interior renovation as depicted in new plans.
## 5 Amendment to ERT557131.; Revisions to basement and first floor levels and change in structural engineer of record.
## APPLICANT DECLARED_VALUATION total_fees ISSUED_DATE
## 1 Nicholas Stewart -250000 520 2016-07-29 09:15:31
## 2 Michael Harris -900000 155 2017-04-12 06:35:41
## 3 sean lawton -1000000 1812 2017-09-22 09:33:20
## 4 Ryan Hunt -40000 41 2018-05-21 15:20:55
## 5 Christopher P Desisto -200000 333 2018-01-22 14:00:20
## EXPIRATION_DATE STATUS owner OCCUPANCY
## 1 <NA> Open MATOV ALEX Comm
## 2 2017-10-12 Closed CFS SEAPORT LLC Comm
## 3 2018-03-22 Open BOSTON MEDICAL CENTER CORPORATION Other
## 4 2018-11-21 Closed JCG REALTY LLC 1-2FAM
## 5 2018-07-22 Open THIRTY-1 NORTH BEACON ST LLC MASS LLC Mixed
## sq_feet ADDRESS CITY STATE ZIP Property_ID
## 1 0 1505 Commonwealth AVE Brighton MA 02135 37777
## 2 0 33-39 Farnsworth ST Boston MA 02210 56395
## 3 0 840 Harrison AVE Roxbury MA 02118 166049
## 4 0 299-301 Silver ST South Boston MA 02127 127425
## 5 0 1 Everett ST Allston MA 02134 419828
## GIS_ID parcel_num X Y Land_Parcel_ID TLID Blk_ID_10
## 1 2101830005 2101830005 -71.14154 42.34711 2101830005 85695564 2.502500e+14
## 2 602660010 602660001 -71.04796 42.35172 602660010 85730627 2.502506e+14
## 3 801420000 801420000 -71.07416 42.33481 801420000 85728735 2.502507e+14
## 4 601915000 601915000 -71.04736 42.33581 601915000 85712754 2.502506e+14
## 5 2201752000 2201752000 -71.13874 42.35399 2201752000 85727128 2.502500e+14
## BG_ID_10 CT_ID_10 NSA_NAME BRA_PD newcon
## 1 250250006023 25025000602 Brighton – St Elizabeth’s Allston/Brighton 0
## 2 250250606001 25025060600 D Street/West Broadway South Boston 0
## 3 250250711012 25025071101 South End – Harrison Lenox South End 0
## 4 250250608001 25025060800 D Street/West Broadway South Boston 0
## 5 250250008022 25025000802 Allston Allston/Brighton 0
## addition demo reno PermitDuration government
## 1 1 0 0 NA 0
## 2 1 0 0 182.7252 0
## 3 1 0 0 180.6019 0
## 4 1 0 0 183.3605 0
## 5 1 0 0 180.4164 0
Many building permits are also labeled open, even though it is past their expiration date. I examined some of the oldest permits (expiration date before 2023-06-16) and compared those that were open to those that were closed to see if I could find any patterns or discern why some were closed while others remained open. However, I wasn’t able to find anything significant at this time. It also doesn’t seem like there is a relationship between status and permit duration.
building_permits_2023 %>%
filter(STATUS==’Open’, EXPIRATION_DATE < ‘2010-06-15’) %>%
View()
#337,080 obs that are open with an expiration date before 2023-01-01, omitted for brevity
#Instead looked at some of the oldest permits that still haven’t been closed
building_permits_2023 %>%
filter(STATUS==’Closed’, EXPIRATION_DATE < ‘2010-06-15’) %>%
View()
#filtered for permits that expired before 2010-06-15 and were closed
by(building_permits_2023$PermitDuration, building_permits_2023$STATUS, mean, na.rm=TRUE) # find average permit duration for open and closed pemits
## building_permits_2023$STATUS: Closed
## [1] 182.2818
## ————————————————————
## building_permits_2023$STATUS: Issued
## [1] 182.1304
## ————————————————————
## building_permits_2023$STATUS: Open
## [1] 178.3673
## ————————————————————
## building_permits_2023$STATUS: Stop Work
## [1] 183.0753
Conclusion
To summarize the results of this week’s assignment, I found that there was no correlation between permit duration and total fees, or between permit duration and declared valuation.
The top 5 occupancy types were found to be 1-2FAM, Comm, Multi, 1-3FAM, and Mixed, although there is some overlap that should be addressed in the nomenclature, i.e., Comm vs COMM. Generally larger occupancy types had larger sq_feet values, confirming accuracy.
Looking into things in the data set that do not make sense, I found that there were five cases of negative declared valuations, which may have just been human error. Also, many expired building permits were still labeled open.
Discrepancies like this may have to be omitted for the data set to be studied more properly. The fact that so many permits are still open may be concerning for Boston’s communities, or they may not have much significance.