City Exploration 3 : Urban Hike at MIT / Cambridge

Introduction

In the bustling landscape of urban mobility, understanding the factors influencing the distances covered in bike-sharing rides is paramount for optimizing transportation services. This exploration delves into the dynamic relationship between age, gender, and trip distances using Blue Bikes data. Focusing on the iconic MIT/Cambridge area, this study aims to unravel patterns that transcend numerical correlations, offering insights into the nuanced interplay between demographic characteristics and commuting behaviors. As we navigate the statistical intricacies, we seek to bridge the gap between data analysis and real-world urban experiences, with MIT/Cambridge serving as a microcosm for these inquiries.

Rationale for Choosing MIT / Cambridge

The rationale behind selecting MIT / Cambridge stems from a robust regression analysis between trip disctances and age – gender, revealing notable patterns in bike-sharing trip distances. As we scrutinized station-level data, the MIT / Cambridge area emerged prominently with starting stations boasting the smallest average distances covered. Notably, stations like Discovery Park, MIT Vassar St, MIT Stata Center at Vassar St / Main, MIT at Mass Ave / Amherst and MIT Pacific St at Purrington St exhibited remarkably short average trip distances. This beckoned a closer examination of the MIT / Cambridge locale, providing a unique lens through which to understand the intricate relationship between demographic variables and commuting behaviors.

Data Analysis Discoveries

In the preliminary exploration of the Blue Bikes dataset, an examination of the correlation among key variables revealed intriguing patterns. Notably, the analysis unveiled a statistically significant yet weak correlation between the distance of bike-sharing trips (‘distances’) and two demographic factors—age and gender. The correlation coefficient between distances and age stood at -0.1077, indicating a mild negative association. This suggests that, on average, as age increases, there is a subtle tendency for the covered distance to decrease. Furthermore, the correlation between distances and gender was found to be 0.0346, reflecting a very weak positive correlation. This implies a slight inclination for varied distances covered by different genders. The statistical significance of these correlations, as indicated by the p-values, adds weight to the validity of these initial observations. This groundwork provides a foundation for a deeper investigation through regression analysis, aiming to discern the nuanced interplay of age and gender on the distances covered in bike-sharing rides.

#Read Blue Bike and station Data
BlueBikes_20<-read.csv("202001-bluebikes-tripdata.csv")
stations <- read.csv("current_bluebikes_stations.csv")
# Perform the join
joined_data <- bike_data %>%
left_join(stations, by = c("start_name" = "Name"))
#Correlation results
correlation_result <- joined_data %>%
group_by(distances, age, gender) %>%
summarize(total = n())
rcorr_matrix <- correlation_result %>%
select('distances', 'age', 'gender') %>%
as.matrix() %>%
rcorr()

rcorr_matrix[1]
rcorr_matrix[2]
rcorr_matrix[3]
# Convert variables to numeric values
correlation_result$distances <- as.numeric(correlation_result$distances)
correlation_result$gender <- as.numeric(correlation_result$gender)
correlation_result$age <- as.numeric(correlation_result$age)
#Correlation plot
selected_columns <- c(1, 2, 3)
ggpairs(data=correlation_result, columns=selected_columns)

In examining the overall Blue Bikes dataset, a linear regression was conducted to understand the relationships between bike-sharing trip distances and demographic variables, namely age and gender. The model revealed statistically significant but weak associations. The regression coefficients indicated that, on average, being male was associated with longer trip distances, while increasing age showed a slight tendency for shorter distances. However, the low R-squared value suggested that the model, incorporating age and gender, explained only a small proportion of the variance in trip distances. This implies that other unexplored factors contribute to the observed variations in bike-sharing trip lengths.

The observed results align with intuitive expectations in the context of bike-sharing behaviors. The positive coefficient for ‘genderMale’ (0.303894) suggests that males cover approximately 0.3 units more distance compared to females. This is consistent with the common understanding that, on average, males tend to have greater physical strength and endurance than females. On the other hand, the negative coefficient of -0.014256 for ‘age’ indicates that, holding gender constant, there is a small decrease of about 0.014 units in the estimated distance covered for each additional year of age. This corresponds to the anticipated trend that younger individuals generally possess greater endurance and physical fitness compared to their older counterparts. Therefore, as age increases, the slight decrease in estimated distance aligns with the notion that older individuals may exhibit a bit less capacity for longer rides in the bike-sharing system. These intuitive connections between the coefficients and real-world characteristics provide further credibility to the findings of the regression model.

#Perform Regression
reg <-lm(distances ~ gender + age, data = correlation_result)
summary(reg)

Cambridge-Specific Analysis

In contrast to the broader analysis encompassing all districts in Boston, the specific examination of Cambridge revealed nuanced differences in the correlation and regression results. While the overall correlation between trip distances and age remained weak and negative in both cases, the magnitude of the association was slightly less in Cambridge compared to the broader dataset. Specifically, the age coefficient for the entire dataset was -0.014, while for Cambridge, it was -0.0024. However, the correlation with gender in Cambridge, although still weak, exhibited a different direction, indicating that being male was associated with shorter trip distances—a deviation from the positive but weak correlation observed across all districts in Boston. The regression results reinforced these distinctions, emphasizing the unique predictive power of gender and age in shaping bike-sharing trip distances within Cambridge. The disparities between the two analyses highlight the importance of considering local contexts and variations when interpreting the factors influencing transportation patterns.

The graph below shows interesting insights about:

Age Distribution: The majority of individuals in the dataset seem to fall within a specific age range (between 0 and 75).

Distance Distribution: The high density of points in the 0 to 10 distance range implies that a significant number of observations have a relatively short distance. This could mean that a large portion of your data corresponds to situations or cases where the distance is shorter.

Left-Side Concentration: The majority of points are on the left side of the graph and don’t follow a clear regression line, which suggests that there may not be a strong linear relationship between age and distance. In other words, changes in age may not be consistently associated with changes in distance in a linear way.

# Regression plot

base <- ggplot(data = correlation_result, aes(x = age, y = distances)) +
geom_point() +
xlab("Age") +
ylab("Distance")

base + geom_smooth(method=lm)

Creating maps of median age and sex ratio in Cambridge’s Census Tracts serves as a complementary approach to regression analysis by providing spatial context and visual representation of demographic characteristics. While regression models quantify relationships between variables, maps offer a geographic perspective, allowing us to observe how these relationships manifest across different locations.

The results obtained from the maps of median age and sex ratio in the MIT area provide valuable insights that complement the regression findings. The observation that the median age in the MIT vicinity falls within the 20-25 range, representing the lowest bin compared to other census tracts in Cambridge, aligns with the regression results. The regression analysis indicated a weaker association between age and trip distances in Cambridge, with the age coefficient being notably smaller (-0.0024) than the overall dataset (-0.014). This suggests that, indeed, the MIT area, characterized by a younger population, exhibits a less pronounced impact of age on bike-sharing trip distances compared to the broader context of Boston.

Similarly, the sex ratio map, indicating that men outnumber women in the MIT area (sex ratio between 100 and 120), corresponds with the unexpected negative correlation between being male and longer trip distances in the regression analysis. While the broader dataset displayed a weak positive correlation between gender and trip distances, the unique dynamics of Cambridge revealed a weak negative correlation, indicating that being male was associated with shorter trip distances in this local context. These concurrences between the map results and regression findings emphasize the importance of considering local demographic characteristics when interpreting transportation patterns.

MIT (Cambridge) Area Discoveries

In my exploration of the MIT area, I made several notable observations that shed light on the dynamics of bike-sharing activities in this academic hub. The predominant demographic in the vicinity comprised young students, reflecting the vibrant and youthful atmosphere of the MIT community and confirming the results shown in the maps above. Additionally, the MIT area boasts a dense network of bike lanes, providing a conducive environment for bike-sharing. The presence of a free or discounted membership program for MIT students further contributes to the popularity of bike-sharing, potentially influencing usage patterns and trip distances. During my visit, I specifically monitored the usage patterns of key bike-sharing stations affiliated with MIT that had the smallest average trip distance in the Cambridge dataset, such as MIT Vassar St, MIT Stata Center at Vassar St / Main, MIT at Mass Ave / Amherst, and MIT Pacific St at Purrington St. These stations exhibited consistent high activity levels, bustling with users both in the morning and evening. An intriguing aspect was the varying station capacities, with some locations operating at half capacity while others were nearly full. This observation hints at the potential need for station-specific analyses to understand the factors influencing bike-sharing demand in different parts of the MIT area.

Conclusion

The comprehensive analysis of various areas, including Salem in previous City Explorations with a focus on tourist behaviors and now MIT and Cambridge with an emphasis on the academic community, provides a well-rounded understanding of bike-sharing dynamics across diverse settings. The observation of distinct user groups, encompassing both tourists and permanent residents or students, emphasizes the importance of tailoring urban planning strategies to accommodate the specific needs and preferences of different demographics.

In Salem, where tourism plays a significant role, the focus may be on optimizing bike-sharing services to enhance the visitor experience, contributing to the local economy. On the other hand, the MIT area and Cambridge highlight the potential to integrate sustainable transportation solutions into housing considerations for a community dominated by young students.

This evolution in exploration signifies a broader perspective on the relevance of data analyses for communities. It emphasizes the need to recognize the unique characteristics of each locality and user group, guiding the development of nuanced policies that address housing, transportation, and community well-being. By incorporating insights from both tourist-centric and academic areas, planners can create holistic strategies that cater to diverse populations, fostering more inclusive and sustainable urban environments.


Leave a comment