Building Latent Constructs: Property Assessment

Last week I proposed creating a potential latent construct in the property assessment dataset for an Energy Burden Score. This score would combine several fields in the property assessment data as manifest variables, to try and assess the distribution of risk for higher energy burden based on property characteristics. Measuring energy burden directly would require me to have two relatively inaccessible pieces of information: income of each household and the energy consumption of each household. However, my proposed latent construct attempts to understand the risk of higher energy burden based on what I believe are good predictors of energy burden within the property assessment data, chiefly, property characteristics that might raise a family’s energy bill.

Year built, square footage, AC type, and overall condition are some of these potential manifest variables. In the following sections, I begin the process of constructing the latent variable for energy burden by: 1) assigning scores to Air conditioning type, year built, and overall condition as they might impact energy burden (where higher score = higher burden) 2) then begin aggregating the dataset to examine their relationships broadly. Note that these three variables are not all of the manifest variables I have identified that might contribute to creating an effective latent construct.

Loading & subsetting the Data

padcross <- read.csv(file="PADCross.Record.2021.csv",header=TRUE,sep=",",stringsAsFactors=FALSE)

#Subsetting for residential properties only and cleaning for Cities not in Boston
not_BOS <- c('DEDHAM', 'READVILLE', 'BROOKLINE', 'CHESTNUT HILL') 
padcross <- filter(padcross, !(CITY %in% not_BOS))
resi <- c('A','CD','R1', 'R2','R3','R4')

resi <- filter(padcross, LU %in% resi)

Cleaning & Scoring the Air Conditioning Field
My first manifest variable of interest is whether the property has central air conditioning. Though if a building has central air conditioning this technically increases the overall energy consumption at the property,I hypothesize that this has a decreasing effect on energy burden. Central AC systems are generally more energy efficient than self-installed individual air conditioning, and running AC centrally would produce a lower energy bill than an individual AC unit would. As summers become hotter, having cooling is nearly a necessity, so for the purposes of this analysis, I am assuming that residents in properties without central AC install their own AC units, and may have an increased energy burden.

Cleaning the Entries to Say either YES or NO

resi$AC_TYPE[resi$AC_TYPE == "Y - Yes"] <- "YES"
resi$AC_TYPE[resi$AC_TYPE == ""]<- "NO"
resi$AC_TYPE[resi$AC_TYPE == "N - None"] <- "NO"
resi$AC_TYPE[resi$AC_TYPE == "C - Central AC"] <- "YES"
resi$AC_TYPE[resi$AC_TYPE == "D - Ductless AC"] <- "YES"

Establishing an AC Score

resi$acscore <- ifelse(resi$AC_TYPE == 'YES', 0, ifelse(resi$AC_TYPE == 'NO',1,NA))

Aggregating AC Score by Property Type, City, and Census Tract ID

ac_burden <- aggregate(resi$acscore, list(resi$CITY, resi$CT_ID_10), mean, na.rm=TRUE)

colnames(ac_burden) = c("city","CT_ID", "acscore")

print(ac_burden)
## city CT_ID acscore
## 1 ALLSTON 25025000100 0.79448622
## 2 BRIGHTON 25025000100 0.62125341
## 3 ALLSTON 25025000201 0.50000000
## 4 BRIGHTON 25025000201 0.83251232
## 5 BRIGHTON 25025000202 0.85078910
## 6 ALLSTON 25025000301 0.00000000
## 7 BRIGHTON 25025000301 0.64066496
## 8 BRIGHTON 25025000302 0.52039801
## 9 ALLSTON 25025000401 0.88888889
## 10 BRIGHTON 25025000401 0.89840470
## 11 BRIGHTON 25025000402 0.77590090
## 12 ALLSTON 25025000502 0.83333333
## 13 BOSTON 25025000502 1.00000000
## 14 BRIGHTON 25025000502 0.67002237
## 15 BRIGHTON 25025000503 0.96384040
## 16 ALLSTON 25025000504 1.00000000
## 17 BOSTON 25025000504 1.00000000
## 18 BRIGHTON 25025000504 0.84811166
## 19 ALLSTON 25025000601 0.54237288
## 20 BRIGHTON 25025000601 0.81559220
## 21 ALLSTON 25025000602 0.80995475
## 22 BRIGHTON 25025000602 0.25274725
## 23 ALLSTON 25025000701 0.80727273
## 24 BRIGHTON 25025000701 0.86666667
## 25 ALLSTON 25025000703 0.97549020
## 26 ALLSTON 25025000704 0.81334623
## 27 ALLSTON 25025000802 0.68841912
## 28 BRIGHTON 25025000802 0.88888889
## 29 ALLSTON 25025000803 0.90909091
## 30 BOSTON 25025010103 1.00000000
## 31 BOSTON 25025010104 0.37998056
## 32 BOSTON 25025010203 0.99083770
## 33 BOSTON 25025010204 0.70800000
## 34 BOSTON 25025010300 0.88194444
## 35 BOSTON 25025010403 0.61832061
## 36 BOSTON 25025010404 0.54280822
## 37 BOSTON 25025010405 0.93159609
## 38 BOSTON 25025010408 0.86397059
## 39 BOSTON 25025010500 0.40525328
## 40 BOSTON 25025010600 0.36091298
## 41 BOSTON 25025010701 0.55326877
## 42 BOSTON 25025010702 0.52508961
## 43 BOSTON 25025010801 0.53367125
## 44 BOSTON 25025010802 0.56144890
## 45 BOSTON 25025020101 0.57927171
## 46 BOSTON 25025020200 0.75961538
## 47 BOSTON 25025020301 0.01039501
## 48 BOSTON 25025020302 0.62962963
## 49 BOSTON 25025020303 0.01707317
## 50 BOSTON 25025030100 0.79044118
## 51 BOSTON 25025030200 0.43750000
## 52 BOSTON 25025030300 0.23091725
## 53 BOSTON 25025030400 0.66877971
## 54 BOSTON 25025030500 0.26476190
## 55 CHARLESTOWN 25025040100 0.39975248
## 56 CHARLESTOWN 25025040200 0.49056604
## 57 CHARLESTOWN 25025040300 0.53929539
## 58 CHARLESTOWN 25025040401 0.50079239
## 59 CHARLESTOWN 25025040600 0.41795367
## 60 CHARLESTOWN 25025040801 0.06300115
## 61 BOSTON 25025050101 1.00000000
## 62 EAST BOSTON 25025050101 0.83381924
## 63 EAST BOSTON 25025050200 0.74101611
## 64 EAST BOSTON 25025050300 0.55223881
## 65 EAST BOSTON 25025050400 0.72874494
## 66 EAST BOSTON 25025050500 0.68211921
## 67 EAST BOSTON 25025050600 0.55454545
## 68 EAST BOSTON 25025050700 0.77259036
## 69 EAST BOSTON 25025050901 0.79629630
## 70 EAST BOSTON 25025051000 0.74246575
## 71 EAST BOSTON 25025051101 0.68473351
## 72 EAST BOSTON 25025051200 0.60882801
## 73 BOSTON 25025060101 0.00000000
## 74 SOUTH BOSTON 25025060101 0.52508651
## 75 SOUTH BOSTON 25025060200 0.53902798
## 76 SOUTH BOSTON 25025060301 0.52930728
## 77 DORCHESTER 25025060400 0.00000000
## 78 SOUTH BOSTON 25025060400 0.51543793
## 79 SOUTH BOSTON 25025060501 0.35502211
## 80 BOSTON 25025060600 0.01298701
## 81 SOUTH BOSTON 25025060600 0.28832630
## 82 BOSTON 25025060800 0.00000000
## 83 SOUTH BOSTON 25025060800 0.42375785
## 84 DORCHESTER 25025061000 1.00000000
## 85 SOUTH BOSTON 25025061000 0.42995169
## 86 SOUTH BOSTON 25025061101 0.50000000
## 87 BOSTON 25025061200 0.03103448
## 88 SOUTH BOSTON 25025061200 0.23781213
## 89 BOSTON 25025070101 0.04310345
## 90 BOSTON 25025070200 0.19365079
## 91 BOSTON 25025070300 0.40133038
## 92 BOSTON 25025070402 0.01587302
## 93 BOSTON 25025070500 0.46691635
## 94 BOSTON 25025070600 0.44468085
## 95 BOSTON 25025070700 0.41422594
## 96 BOSTON 25025070800 0.52848723
## 97 ROXBURY 25025070800 1.00000000
## 98 SOUTH BOSTON 25025070800 1.00000000
## 99 BOSTON 25025070900 0.56363636
## 100 BOSTON 25025071101 0.41666667
## 101 BOSTON 25025071201 0.13935681
## 102 BOSTON 25025080100 1.00000000
## 103 ROXBURY 25025080100 0.91039427
## 104 DORCHESTER 25025080300 1.00000000
## 105 ROXBURY 25025080300 0.84645669
## 106 BOSTON 25025080401 0.26086957
## 107 ROXBURY 25025080401 0.69148936
## 108 ROXBURY CROSSIN 25025080401 1.00000000
## 109 BOSTON 25025080500 1.00000000
## 110 ROXBURY 25025080500 1.00000000
## 111 ROXBURY CROSSIN 25025080500 0.80733945
## 112 BOSTON 25025080601 0.02564103
## 113 ROXBURY 25025080601 0.45679012
## 114 ROXBURY CROSSIN 25025080601 0.37500000
## 115 ROXBURY CROSSIN 25025080801 0.83333333
## 116 BOSTON 25025080900 1.00000000
## 117 ROXBURY CROSSIN 25025080900 0.73937677
## 118 BOSTON 25025081001 0.64634146
## 119 ROXBURY 25025081001 1.00000000
## 120 ROXBURY CROSSIN 25025081001 0.71111111
## 121 BOSTON 25025081100 0.97169811
## 122 JAMAICA PLAIN 25025081100 0.71962617
## 123 ROXBURY CROSSIN 25025081100 0.48115942
## 124 JAMAICA PLAIN 25025081200 0.78542510
## 125 JAMAICA PLAIN 25025081300 1.00000000
## 126 ROXBURY 25025081300 0.75423729
## 127 ROXBURY 25025081400 0.69968553
## 128 ROXBURY CROSSIN 25025081400 0.86585366
## 129 ROXBURY 25025081500 0.61737089
## 130 BOSTON 25025081700 1.00000000
## 131 ROXBURY 25025081700 0.81894737
## 132 DORCHESTER 25025081800 0.93750000
## 133 ROXBURY 25025081800 0.84567901
## 134 DORCHESTER 25025081900 0.87434555
## 135 ROXBURY 25025081900 0.88333333
## 136 DORCHESTER 25025082000 0.88009050
## 137 DORCHESTER 25025082100 0.92307692
## 138 DORCHESTER 25025090100 0.92970123
## 139 DORCHESTER 25025090200 0.85520362
## 140 DORCHESTER 25025090300 0.90697674
## 141 DORCHESTER 25025090400 0.86932849
## 142 ROXBURY 25025090400 0.76470588
## 143 DORCHESTER 25025090600 0.71428571
## 144 ROXBURY 25025090600 0.87066246
## 145 DORCHESTER 25025090700 0.57479508
## 146 SOUTH BOSTON 25025090700 1.00000000
## 147 DORCHESTER 25025091001 0.76569038
## 148 DORCHESTER 25025091100 0.67067530
## 149 DORCHESTER 25025091200 0.71694915
## 150 DORCHESTER 25025091300 0.80780781
## 151 ROXBURY 25025091300 1.00000000
## 152 DORCHESTER 25025091400 0.90654206
## 153 BOSTON 25025091500 0.85714286
## 154 DORCHESTER 25025091500 0.74862637
## 155 DORCHESTER 25025091600 0.82086614
## 156 BOSTON 25025091700 0.87500000
## 157 DORCHESTER 25025091700 0.92558140
## 158 DORCHESTER 25025091800 0.91860465
## 159 DORCHESTER 25025091900 0.89889706
## 160 DORCHESTER 25025092000 0.92465753
## 161 BOSTON 25025092101 0.72727273
## 162 DORCHESTER 25025092101 0.68792711
## 163 DORCHESTER 25025092200 0.85905045
## 164 DORCHESTER 25025092300 0.92607803
## 165 MATTAPAN 25025092300 1.00000000
## 166 DORCHESTER 25025092400 0.88984509
## 167 DORCHESTER 25025100100 0.84253247
## 168 DORCHESTER 25025100200 0.84347826
## 169 DORCHESTER 25025100300 0.90136571
## 170 MATTAPAN 25025100300 1.00000000
## 171 DORCHESTER 25025100400 0.87711864
## 172 DORCHESTER 25025100500 0.79511278
## 173 ROSLINDALE 25025100500 0.00000000
## 174 BOSTON 25025100601 0.87500000
## 175 DORCHESTER 25025100601 0.83633516
## 176 BOSTON 25025100603 0.80000000
## 177 DORCHESTER 25025100603 0.59550562
## 178 BOSTON 25025100700 0.63636364
## 179 DORCHESTER 25025100700 0.77777778
## 180 BOSTON 25025100800 0.96000000
## 181 DORCHESTER 25025100800 0.78488372
## 182 DORCHESTER 25025100900 0.53512397
## 183 MATTAPAN 25025100900 0.89358372
## 184 HYDE PARK 25025101001 1.00000000
## 185 MATTAPAN 25025101001 0.86044444
## 186 MATTAPAN 25025101002 0.91538462
## 187 MATTAPAN 25025101101 0.90643275
## 188 MATTAPAN 25025101102 0.90578512
## 189 JAMAICA PLAIN 25025110103 0.75296443
## 190 ROSLINDALE 25025110103 0.88725490
## 191 ROSLINDALE 25025110201 0.82552083
## 192 JAMAICA PLAIN 25025110301 0.84375000
## 193 ROSLINDALE 25025110301 0.73486088
## 194 JAMAICA PLAIN 25025110401 0.00000000
## 195 ROSLINDALE 25025110401 0.69544365
## 196 JAMAICA PLAIN 25025110403 1.00000000
## 197 ROSLINDALE 25025110403 0.76794258
## 198 JAMAICA PLAIN 25025110501 1.00000000
## 199 ROSLINDALE 25025110501 0.84355828
## 200 WEST ROXBURY 25025110501 0.64285714
## 201 JAMAICA PLAIN 25025110502 1.00000000
## 202 ROSLINDALE 25025110502 0.80430712
## 203 WEST ROXBURY 25025110502 1.00000000
## 204 JAMAICA PLAIN 25025110601 0.53333333
## 205 ROSLINDALE 25025110601 0.85714286
## 206 WEST ROXBURY 25025110601 0.75131579
## 207 JAMAICA PLAIN 25025110607 1.00000000
## 208 ROSLINDALE 25025110607 0.77740642
## 209 WEST ROXBURY 25025110607 0.67741935
## 210 JAMAICA PLAIN 25025120103 0.60634328
## 211 JAMAICA PLAIN 25025120104 0.78386167
## 212 JAMAICA PLAIN 25025120105 0.48143982
## 213 JAMAICA PLAIN 25025120201 0.70286396
## 214 ROSLINDALE 25025120201 1.00000000
## 215 JAMAICA PLAIN 25025120301 0.66843783
## 216 ROXBURY 25025120301 0.89051095
## 217 JAMAICA PLAIN 25025120400 0.62946429
## 218 JAMAICA PLAIN 25025120500 0.60737527
## 219 JAMAICA PLAIN 25025120600 0.74254317
## 220 BOSTON 25025120700 0.82000000
## 221 JAMAICA PLAIN 25025120700 0.78877005
## 222 BRIGHTON 25025130100 0.00000000
## 223 JAMAICA PLAIN 25025130100 0.00000000
## 224 ROSLINDALE 25025130100 0.00000000
## 225 WEST ROXBURY 25025130100 0.69854586
## 226 WEST ROXBURY 25025130200 0.78102664
## 227 WEST ROXBURY 25025130300 0.77908497
## 228 WEST ROXBURY 25025130402 0.71589487
## 229 JAMAICA PLAIN 25025130404 1.00000000
## 230 ROSLINDALE 25025130404 0.00000000
## 231 WEST ROXBURY 25025130404 0.72302558
## 232 HYDE PARK 25025130406 1.00000000
## 233 WEST ROXBURY 25025130406 0.62297496
## 234 BOSTON 25025140102 0.00000000
## 235 HYDE PARK 25025140102 0.70928463
## 236 MATTAPAN 25025140102 0.00000000
## 237 HYDE PARK 25025140105 0.88429752
## 238 JAMAICA PLAIN 25025140105 1.00000000
## 239 ROSLINDALE 25025140105 0.81016949
## 240 ROSLINDALE 25025140106 0.81707317
## 241 HYDE PARK 25025140107 0.85218703
## 242 HYDE PARK 25025140201 0.84470588
## 243 HYDE PARK 25025140202 0.83211679
## 244 ROSLINDALE 25025140202 1.00000000
## 245 BRIGHTON 25025140300 0.00000000
## 246 HYDE PARK 25025140300 0.78913738
## 247 BOSTON 25025140400 1.00000000
## 248 HYDE PARK 25025140400 0.85544890
## 249 MATTAPAN 25025140400 0.86597938
## 250 ROSLINDALE 25025140400 0.88732394
## 251 JAMAICA PLAIN 25025981000 1.00000000
## 252 DORCHESTER 25025981100 1.00000000
## 253 MATTAPAN 25025981100 0.05000000
## 254 EAST BOSTON 25025981300 0.04135338

This table displays AC Score aggregated by City and Census Tract ID. We can see, out of all properties assigned an AC type, what proportion of households may have an increased energy burden due to not having AC. For example, based on the City field only, overall, properties in Allston are overall more burdened by those in South Boston. However, there are differences within these neighborhoods, at the census tract level. Looking at Allston specifically, there is a broad distribution of AC Scores. We know that neighborhoods can be large and have a diversity of properties, so breaking down by census tract is a necessity to learn more about the relationships between my variables. (This also avoids over-smoothing of my data.)

Scoring the Year Built Field

Construction Period has a large impact on building energy efficiency. Older buildings are generally more poorly insulated, are more likely to have less energy efficient heating and cooling systems, as well as increased energy leakage. Here, I hypothesize that living in a newer building will result in a lower energy burden. I use the YR_BUILT field to define 5 different construction periods with different weights.

On a related note, the property assessment data does have a field for remodel year, but unless a potential remodel has nearly entirely re-constructed, re-insulated, and re-cladded a building, remodeling will have less of an impact on energy intensity on an aggregate level than year built. Year remodeled is also a field with extensive lack of data in the dataset, and homes that may have been remodeled could easily be missing this information. Future projects using this built_score may still consider introducing remodel year.

Establishing a Score

resi$builtscore <- ifelse(resi$YR_BUILT %in% c(1700:1899),6 , 
                        ifelse(resi$YR_BUILT %in% c(1851:1899), 5, 
                                 ifelse(resi$YR_BUILT %in% c(1900:1950),4,
                            ifelse(resi$YR_BUILT %in% c(1951:1980),3 , 
                             ifelse(resi$YR_BUILT %in% c(1981:2000), 2,
                                ifelse(resi$YR_BUILT %in% c(2001:2021), 1,NA))))))

Aggregating by Year Built Score

built <- sqldf("select CITY as Neighborhood, CT_ID_10 as CensusTract,
           avg(builtscore) as MeanYearBuiltScore
            from resi
           group by CITY")
built
## Neighborhood CensusTract MeanYearBuiltScore
## 1 ALLSTON 25025000802 3.844673
## 2 BOSTON 25025080900 4.049940
## 3 BRIGHTON 25025140300 3.720896
## 4 CHARLESTOWN 25025040300 4.502111
## 5 DORCHESTER 25025081900 3.881207
## 6 EAST BOSTON 25025050901 3.741800
## 7 HYDE PARK 25025101001 3.615681
## 8 JAMAICA PLAIN 25025081100 3.814107
## 9 MATTAPAN 25025101102 3.728725
## 10 ROSLINDALE 25025120201 3.727821
## 11 ROXBURY 25025081001 3.960935
## 12 ROXBURY CROSSIN 25025080900 3.671300
## 13 SOUTH BOSTON 25025061200 3.823661
## 14 WEST ROXBURY 25025110502 3.598462

The average of these scores, grouped by neighborhood, reveals some interesting points on where newer buildings may be distributed across Boston. Boston as a whole has a large proportion of properties built in the early 1900s, which is evidenced here by the mean scores in each neighborhood generally being grouped around 4, which is the score I assigned to properties built between 1900-1950. However, some neighborhoods have more old properties than others, which may impact the energy burden of residents in these areas to a greater extent. The highest scores are in Charlestown, Boston (central), and Roxbury.

Scoring the Overall Condition Field
Similar to the justification of scoring the year built field, the overall condition of a property will have an impact on potential energy burden. Poorly maintained properties in below average condition may be associated with lower energy efficiency and higher energy burden due to increased bills.

resi$condscore<- ifelse(resi$OVERALL_COND == 'E - Excellent'|resi$OVERALL_COND == 'EX - Excellent', 1,
      ifelse(resi$OVERALL_COND == 'VG - Very Good', 2,
          ifelse(resi$OVERALL_COND == 'G - Good', 3,
           ifelse(resi$OVERALL_COND == 'F - Fair', 4,
            ifelse(resi$OVERALL_COND == 'A - Average'|resi$OVERALL_COND == 'AVG - Default - Average', 5,
                   ifelse(resi$OVERALL_COND == 'P - Poor' | resi$OVERALL_COND == 'US - Unsound', 6, NA))))))

Aggregated: Average Year Built and Average Condition per Neighborhood

Using the aggregated scores I calculated, I plotted the average year built and average condition of properties per census tract (using their neighborhood names to identify them, which is why some neighborhoods appear more than once, they have multiple census tracts), and included their average AC score as a color gradient.
While no direct conclusions can be drawn from such a broad look, there are still some interesting insights.

Charlestown has the least burden from overall condition, and a low AC burden, but has the oldest overall building score (which we can see from its higher average year_built burden on the x axis). Allston has one of the higher scores for condition (recall that a higher score = worse condition), but still ranks lower than some neighborhoods like Dorchester, Roxbury, Charlestown, and Boston in terms of the Year Built Score.

Burden <- sqldf("select CITY, CT_ID_10, avg(condscore), avg(acscore), avg(builtscore) from resi group by CITY")

colnames(Burden) = c("city", "CensusTract", "Avg_Condition_Score", "Avg_AC_Score", "Avg_YrBuilt_Score" )
View(Burden)
## city CensusTract Avg_Condition_Score Avg_AC_Score
## 1 ALLSTON 25025000802 4.766767 0.7759134
## 2 BOSTON 25025080900 4.355235 0.4302501
## 3 BRIGHTON 25025140300 4.780862 0.7588687
## 4 CHARLESTOWN 25025040300 4.209968 0.3943986
## 5 DORCHESTER 25025081900 4.593674 0.8011020
## 6 EAST BOSTON 25025050901 4.619138 0.6849414
## 7 HYDE PARK 25025101001 4.743108 0.8163022
## 8 JAMAICA PLAIN 25025081100 4.422317 0.6742627
## 9 MATTAPAN 25025101102 4.778459 0.8769271
## 10 ROSLINDALE 25025120201 4.658074 0.7822710
## 11 ROXBURY 25025081001 4.498523 0.7687483
## 12 ROXBURY CROSSIN 25025080900 4.467372 0.6604235
## 13 SOUTH BOSTON 25025061200 4.421399 0.4198359
## 14 WEST ROXBURY 25025110502 4.612523 0.7305477
## Avg_YrBuilt_Score
## 1 3.844673
## 2 4.049940
## 3 3.720896
## 4 4.502111
## 5 3.881207
## 6 3.741800
## 7 3.615681
## 8 3.814107
## 9 3.728725
## 10 3.727821
## 11 3.960935
## 12 3.671300
## 13 3.823661
## 14 3.598462
ggplot(Burden, aes(x=Avg_YrBuilt_Score, y=Avg_Condition_Score, color=Avg_AC_Score)) + geom_point()+geom_text(aes(label=city), size = 2, nudge_y = 0.022)+ggtitle("Average Overall Condition Score and Average Year Built Score by City&CT")

plot of chunk unnamed-chunk-11


Leave a comment