Methods

Our Project

This project develops neighborhood typologies in Philadelphia based on available amenities. Our analysis aims to provide a contemporary amenity profile of the city to aid urban planning, research, and enhance the understanding of the city for residents and tourists.

Data Acquisition and Formatting

Our analysis draws from two primary data sources: Yelp Fusion API business listings and Zillow home listings. We used the Yelp Fusion API’s Business Search endpoint, utilizing parameters like location, category, limit, and offset to acquire a comprehensive dataset. The yelpr R package facilitated this collection, with custom functions enhancing the retrieval process. To overcome Yelp’s 50 result limit per call, we performed 480 API calls per category across Philadelphia’s 48 ZIP codes. The resulting data, encompassing various broad categories, was then geocoded into GeoJSON files for subsequent cleaning and analysis in Python.

Zillow Webscraping

Zillow data was scraped using selenium, selenium-stealth, and Beautiful Soup, capturing titles, text descriptions, neighborhoods, and other property-specific details. The information, including property titles and descriptions, was compiled into a CSV for further analysis, such as generating word clouds and conducting cluster analyses, similar to our approach with Yelp data.

Cluster Analysis

In Python, we refined our Yelp data categorization using the alias field to extract meaningful sub-categories, creating three descriptive columns from the aliases. This cleaned data, aligned with census tracts and neighborhood boundaries, underpinned our exploratory and k-means cluster analyses. After iterative adjustments, we identified seven meaningful clusters of neighborhoods sharing similar amenity compositions.