In this week’s post, I’ll try to build a latent construct for some of the questions raised during my previous work. This will be the foundation work for future advanced analysis.
Pick a Question
We have identified that our Business Licenses data set contains only following limited set of license categories: food service, liquor retail, entertainment, swimming pools and day camps.
To me, the liquor license is the most interesting category. First, liquor licenses are issued to all different kinds of businesses, be it a restaurant, a bar, a night club, or a retail liquor store. It gives us an opportunities to explore various faces of the city in answering one main question. Secondly, regulations around alcoholic beverage are noteably complex, disentangle this intricate system is a good way to walk through the nitty-gritty of city administration.
Additionally, we have noticed that the Allston/Brighton district seem to has a lower number of liquor licenses than city average. Are people in Allston particularly uninterested in alcohol consumption? Answer question like this may reveal some nature of the neighborhood and potentially contribute to economic planning or public health work.
The Latent Construct and Its Measurement
We have picked alcohol consumption as our latent construct. So how do we measure it?
The number of liquor licenses can be used as an important indicator of the alcohol consumptions in a neighborhood. The Massachusetts Alcoholic Beverage Control Commssion (ABCC) has distinguished liquor licenses into two categories: on-premise and off-primise. The former is for businesses where customer consume the beverage on the premise, for example, restaurants, bars, clubs, etc; the latter is for where customers buy beverages in sealed packaging and consume elsewhere, for example, supermarkets, package stores and covenience stores.
Just with the business licenses data alone, we can categorize our measurements as such:
Unfortunately, the data set we were given only includes On-Premise licenses. We will not be able to analyze off-premise consumption unless we have more data available.
Knowing the degree of consumption in a neighborhood is not enough. Naturally we would wonder what makes some neighborhood consume more, and others less.
First of all, Like all consumer behaviors, it is about the people who consume it. The ethnic background, the inclination to social events, the economy status, all have influence on one’s habit of alcohol consumption. To generalize, following demographic metrics are mostly relevant for investigation:
- Socialeconomic status
- Immigration Background
Second, alcohol consumption also comes in different ways. Some drink with meals; some drink at parties; some drink with their colleges in bars; some drink alone at home. Identifying the nature of the businesses who are selling alcohol is important to tell a story of which kind of alcohols are consumed at which occasion. Previously discussed distinction between “on-premise” and “off-premise” can be a good source of inspiration for such analysis.
Existing Observable Variables
What we can infer from the data set is the number of alcohol licenses in each neighborhood, and by identify unique businesses, we can also relate different licenses for one particular business establishment. Then it’s possible to tell which businesses are selling alcohol, what type of business are they, and where are they selling it (of course, within the on-premises category).
Based on our work during past weeks, we can almost conclude that the license validity duration data (EXPDTTM) is unreliable. Although the issuing dates alone may still be suitable for certain type of longitudinal analysis.
What Additional Data Do We Need?
It would be most interesting if we can get data for off-premises licenses. Then it’s about various demographic data. The ratio of property types in accordance to the residents population of a region may be also helpful for us to understand who are consuming the beverages.