New York AirBNB Exploratory and Sentiment Analysis¶
Importing Packages and preparing data¶
In [1]:
## Data source: https://insideairbnb.com/get-the-data/
In [2]:
import os
#Checking for current directory and switching to project folder
current_directory = os.getcwd()
print(current_directory)
os.chdir('/Users/smarr/Documents/Data Sets/AirBNBAnalysis')
/Users/smarr/Documents/Data Sets/Churn Analysis
In [3]:
# Verifying Directory was switched
os.getcwd()
Out[3]:
'/Users/smarr/Documents/Data Sets/AirBNBAnalysis'
In [4]:
import pandas as pd
In [5]:
## Importing listings data and cleaning the price column to ensure successful numerical analysis
listings = pd.read_csv('../AirBNBAnalysis/data/listings_clean.csv')
listings['price']= listings['price'].replace('[\$,]', '', regex=True).astype(float)
listings.to_csv('listings_clean.csv', index=False)
<>:3: SyntaxWarning: invalid escape sequence '\$'
<>:3: SyntaxWarning: invalid escape sequence '\$'
/var/folders/yw/pqk5wsp91vlcbh2lpn3gp63w0000gn/T/ipykernel_99117/818574451.py:3: SyntaxWarning: invalid escape sequence '\$'
listings['price']= listings['price'].replace('[\$,]', '', regex=True).astype(float)
In [6]:
# Testing
listings.head()
Out[6]:
| id | listing_url | scrape_id | last_scraped | source | name | description | neighborhood_overview | picture_url | host_id | ... | review_scores_communication | review_scores_location | review_scores_value | license | instant_bookable | calculated_host_listings_count | calculated_host_listings_count_entire_homes | calculated_host_listings_count_private_rooms | calculated_host_listings_count_shared_rooms | reviews_per_month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 36121 | https://www.airbnb.com/rooms/36121 | 20250301213336 | 2025-03-03 | city scrape | Lg Rm in Historic Prospect Heights | Cozy space share in the heart of a great neigh... | Full of tree-lined streets and beautiful brown... | https://a0.muscache.com/pictures/8776354/35b32... | 62165 | ... | 5.00 | 5.00 | 5.00 | NaN | f | 1 | 0 | 1 | 0 | 0.05 |
| 1 | 36647 | https://www.airbnb.com/rooms/36647 | 20250301213336 | 2025-03-03 | city scrape | 1 Bedroom & your own Bathroom, Elevator Apartment | Private bedroom with your own bathroom in a 2 ... | Manhattan, SE corner of 2nd Ave/ E. 110th street | https://a0.muscache.com/pictures/miso/Hosting-... | 157798 | ... | 4.90 | 4.38 | 4.71 | NaN | f | 1 | 0 | 1 | 0 | 0.58 |
| 2 | 38663 | https://www.airbnb.com/rooms/38663 | 20250301213336 | 2025-03-02 | city scrape | Luxury Brownstone in Boerum Hill | Beautiful, large home in great hipster neighbo... | diverse, lively, hip, cool: loaded with restau... | https://a0.muscache.com/pictures/miso/Hosting-... | 165789 | ... | 4.88 | 4.86 | 4.62 | OSE-STRREG-0001784 | f | 1 | 0 | 1 | 0 | 0.28 |
| 3 | 38833 | https://www.airbnb.com/rooms/38833 | 20250301213336 | 2025-03-03 | city scrape | Spectacular West Harlem Garden Apt | This is a very large and unique space. An inc... | West Harlem is now packed with great restauran... | https://a0.muscache.com/pictures/7554f9e5-4ab0... | 166532 | ... | 4.96 | 4.79 | 4.82 | OSE-STRREG-0000476 | f | 1 | 1 | 0 | 0 | 1.36 |
| 4 | 39282 | https://www.airbnb.com/rooms/39282 | 20250301213336 | 2025-03-02 | city scrape | “Work-from-home” from OUR home. | *Monthly Discount will automatically apply <br... | THE NEIGHBORHOOD:<br />Our apartment is locate... | https://a0.muscache.com/pictures/ef8f43ad-d967... | 168525 | ... | 4.88 | 4.85 | 4.78 | OSE-STRREG-0001150 | f | 2 | 0 | 2 | 0 | 1.54 |
5 rows × 79 columns
In [7]:
# Importing SQL python package to work with the SQL database, establishing connection between python and SQL
from sqlalchemy import create_engine
engine = create_engine('postgresql://smarr@localhost:5432/airbnb_analysis', echo=False)
conn = engine.connect()
In [8]:
#Submitting our listings dataframe to SQL
listings.to_sql('listings', engine, if_exists='replace', index=False)
Out[8]:
264
In [11]:
## We are going to partition the ensuing calendar dataset since it is extremely large and jupyter cannot feasibly process it all
chunksize = 500000
sample_size = 50000
samples = []
for chunk in pd.read_csv('../AirBNBAnalysis/data/calendar_sampled.csv', chunksize=chunksize):
# Random sample 2% from each chunk (adjust if needed)
sample = chunk.sample(frac=0.02, random_state=42)
samples.append(sample)
# Stop once we hit the target
if sum(len(s) for s in samples) >= sample_size:
break
df_sampled = pd.concat(samples).head(sample_size)
In [12]:
## Saving it to a smaller csv
df_sampled.to_csv('../AirBNBAnalysis/data/calendar_sampled.csv', index=False)
In [13]:
df_sampled['price'] = df_sampled['price'].replace(r'[\$,]', '', regex=True).astype(float)
In [14]:
df_sampled.to_sql('calendar', engine, if_exists='replace', index=False)
Out[14]:
1000
In [15]:
# Testing data frame for adjusted price column
df_sampled.head()
Out[15]:
| listing_id | date | available | price | adjusted_price | minimum_nights | maximum_nights | |
|---|---|---|---|---|---|---|---|
| 33553 | 9116831 | 2025-03-22 | f | 165.0 | NaN | 30 | 1125 |
| 9427 | 763600 | 2025-06-19 | f | 75.0 | NaN | 30 | 1125 |
| 199 | 1989731 | 2025-03-14 | f | 137.0 | NaN | 30 | 1125 |
| 12447 | 2620837 | 2025-12-11 | f | 52.0 | NaN | 30 | 62 |
| 39489 | 10369038 | 2025-07-21 | f | 50.0 | NaN | 30 | 365 |
In [16]:
## call engine.dispose() or ensure .close() is used on connections to prevent table lock in postgresql
Pivot table for understanding price change over time across neighbourhoods¶
In [17]:
df = pd.read_sql_query("""
SELECT
TO_CHAR(c.date::DATE, 'YYYY-MM') AS month,
l.neighbourhood_cleansed,
ROUND(AVG(c.price)::numeric, 2) AS avg_price
FROM calendar c
JOIN listings l ON c.listing_id = l.id
WHERE c.available = 't'
GROUP BY TO_CHAR(c.date::DATE, 'YYYY-MM'), l.neighbourhood_cleansed
""", conn)
In [18]:
pivot_df = df.pivot(index='neighbourhood_cleansed', columns='month', values='avg_price')
pivot_df.head()
Out[18]:
| month | 2025-03 | 2025-04 | 2025-05 | 2025-06 | 2025-07 | 2025-08 | 2025-09 | 2025-10 | 2025-11 | 2025-12 | 2026-01 | 2026-02 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| neighbourhood_cleansed | ||||||||||||
| Arverne | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 299.0 | NaN | NaN | NaN |
| Astoria | NaN | NaN | NaN | NaN | 180.0 | NaN | NaN | 95.00 | NaN | 48.0 | NaN | NaN |
| Bedford-Stuyvesant | 69.5 | 171.0 | NaN | 200.0 | 221.0 | 192.5 | 225.0 | 216.33 | 55.0 | 134.5 | 139.67 | NaN |
| Boerum Hill | 199.0 | NaN | 175.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| Borough Park | NaN | NaN | 300.0 | NaN | NaN | NaN | NaN | 61.00 | NaN | NaN | NaN | NaN |
In [19]:
pivot_df_cleaned = pivot_df.dropna()
pivot_df_cleaned
## Some color would look great on this
Out[19]:
| month | 2025-03 | 2025-04 | 2025-05 | 2025-06 | 2025-07 | 2025-08 | 2025-09 | 2025-10 | 2025-11 | 2025-12 | 2026-01 | 2026-02 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| neighbourhood_cleansed |
In [20]:
styled_df=pivot_df_cleaned.style.background_gradient(
cmap='RdYlGn_r', # Red for high, green for low
axis=1 # Apply column-wise
)
styled_df
Out[20]:
| month | 2025-03 | 2025-04 | 2025-05 | 2025-06 | 2025-07 | 2025-08 | 2025-09 | 2025-10 | 2025-11 | 2025-12 | 2026-01 | 2026-02 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| neighbourhood_cleansed |
Adding reviews csv file for sentiment analysis (Python and PostgreSQL)¶
In [21]:
dfReviews = pd.read_csv("../AirBNBAnalysis/data/reviews.csv", low_memory=False)
In [22]:
dfReviews.to_sql('reviews', engine, if_exists='replace', index=False)
Out[22]:
919
In [23]:
dfReviews.head()
Out[23]:
| listing_id | id | date | reviewer_id | reviewer_name | comments | |
|---|---|---|---|---|---|---|
| 0 | 2595 | 17857 | 2009-11-21 | 50679 | Jean | Notre séjour de trois nuits.\r<br/>Nous avons ... |
| 1 | 2595 | 19176 | 2009-12-05 | 53267 | Cate | Great experience. |
| 2 | 2595 | 19760 | 2009-12-10 | 38960 | Anita | I've stayed with my friend at the Midtown Cast... |
| 3 | 2595 | 34320 | 2010-04-09 | 71130 | Kai-Uwe | We've been staying here for about 9 nights, en... |
| 4 | 2595 | 46312 | 2010-05-25 | 117113 | Alicia | We had a wonderful stay at Jennifer's charming... |
In [24]:
## Drop Null values as they will make it more difficult to do sentiment analysis
dfReviews=dfReviews.dropna(subset=['comments'])
In [25]:
## Checking length of Dataframe for verification purposes
len(dfReviews)
Out[25]:
970678
In [26]:
## Make all comments lower-case for standardizing purposes
dfReviews['comments']=dfReviews['comments'].str.lower()
Sentiment Scoring¶
In [27]:
from textblob import TextBlob
In [28]:
## Define Function
def get_sentiment(text):
return TextBlob(text).sentiment.polarity
In [29]:
## Apply it to Airbnb comments
dfReviews['Sentiment']=dfReviews['comments'].apply(get_sentiment)
In [30]:
dfReviews[['listing_id','Sentiment']].head()
## Doesn't quite get us our average sentiment by listing
Out[30]:
| listing_id | Sentiment | |
|---|---|---|
| 0 | 2595 | 0.000000 |
| 1 | 2595 | 0.800000 |
| 2 | 2595 | 0.410124 |
| 3 | 2595 | 0.266875 |
| 4 | 2595 | 0.518750 |
Average sentiment and count by listing ID¶
In [31]:
sentimentAvgandCount_by_listing=dfReviews.groupby('listing_id').agg(avg_sentiment=('Sentiment','mean'),sentiment_count=('Sentiment','count')).reset_index()
sentimentAvgandCount_by_listing
Out[31]:
| listing_id | avg_sentiment | sentiment_count | |
|---|---|---|---|
| 0 | 2595 | 0.333892 | 49 |
| 1 | 6848 | 0.386196 | 194 |
| 2 | 6872 | 0.559630 | 1 |
| 3 | 6990 | 0.392311 | 251 |
| 4 | 7064 | 0.476511 | 13 |
| ... | ... | ... | ... |
| 25642 | 1343696908645827246 | -0.244413 | 1 |
| 25643 | 1344203989320440460 | 0.246717 | 1 |
| 25644 | 1344298115530569589 | 0.392778 | 2 |
| 25645 | 1348032300167221882 | 0.258333 | 2 |
| 25646 | 1348081441771732943 | 0.191255 | 2 |
25647 rows × 3 columns
In [32]:
## Merge with Listings data
CombinedDF=listings.merge(sentimentAvgandCount_by_listing,left_on='id',right_on='listing_id')
In [33]:
CombinedDF
Out[33]:
| id | listing_url | scrape_id | last_scraped | source | name | description | neighborhood_overview | picture_url | host_id | ... | license | instant_bookable | calculated_host_listings_count | calculated_host_listings_count_entire_homes | calculated_host_listings_count_private_rooms | calculated_host_listings_count_shared_rooms | reviews_per_month | listing_id | avg_sentiment | sentiment_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 36121 | https://www.airbnb.com/rooms/36121 | 20250301213336 | 2025-03-03 | city scrape | Lg Rm in Historic Prospect Heights | Cozy space share in the heart of a great neigh... | Full of tree-lined streets and beautiful brown... | https://a0.muscache.com/pictures/8776354/35b32... | 62165 | ... | NaN | f | 1 | 0 | 1 | 0 | 0.05 | 36121 | 0.484124 | 9 |
| 1 | 36647 | https://www.airbnb.com/rooms/36647 | 20250301213336 | 2025-03-03 | city scrape | 1 Bedroom & your own Bathroom, Elevator Apartment | Private bedroom with your own bathroom in a 2 ... | Manhattan, SE corner of 2nd Ave/ E. 110th street | https://a0.muscache.com/pictures/miso/Hosting-... | 157798 | ... | NaN | f | 1 | 0 | 1 | 0 | 0.58 | 36647 | 0.330830 | 102 |
| 2 | 38663 | https://www.airbnb.com/rooms/38663 | 20250301213336 | 2025-03-02 | city scrape | Luxury Brownstone in Boerum Hill | Beautiful, large home in great hipster neighbo... | diverse, lively, hip, cool: loaded with restau... | https://a0.muscache.com/pictures/miso/Hosting-... | 165789 | ... | OSE-STRREG-0001784 | f | 1 | 0 | 1 | 0 | 0.28 | 38663 | 0.413804 | 43 |
| 3 | 38833 | https://www.airbnb.com/rooms/38833 | 20250301213336 | 2025-03-03 | city scrape | Spectacular West Harlem Garden Apt | This is a very large and unique space. An inc... | West Harlem is now packed with great restauran... | https://a0.muscache.com/pictures/7554f9e5-4ab0... | 166532 | ... | OSE-STRREG-0000476 | f | 1 | 1 | 0 | 0 | 1.36 | 38833 | 0.352103 | 241 |
| 4 | 39282 | https://www.airbnb.com/rooms/39282 | 20250301213336 | 2025-03-02 | city scrape | “Work-from-home” from OUR home. | *Monthly Discount will automatically apply <br... | THE NEIGHBORHOOD:<br />Our apartment is locate... | https://a0.muscache.com/pictures/ef8f43ad-d967... | 168525 | ... | OSE-STRREG-0001150 | f | 2 | 0 | 2 | 0 | 1.54 | 39282 | 0.420833 | 274 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 25642 | 1339828135232751364 | https://www.airbnb.com/rooms/1339828135232751364 | 20250301213336 | 2025-03-02 | city scrape | Cozy Retreat NYC Apartment Fit for Families! | Welcome to our modern and cozy 2-bedroom, 1-ba... | NaN | https://a0.muscache.com/pictures/prohost-api/H... | 516682555 | ... | Exempt | t | 2 | 2 | 0 | 0 | 4.00 | 1339828135232751364 | 0.329360 | 4 |
| 25643 | 1339838938668106394 | https://www.airbnb.com/rooms/1339838938668106394 | 20250301213336 | 2025-03-02 | city scrape | RARE Luxurious Apartment 1 Block From Time Squ... | Stay in the center of it all at this stylish 2... | NaN | https://a0.muscache.com/pictures/prohost-api/H... | 567273633 | ... | Exempt | t | 2 | 2 | 0 | 0 | 3.00 | 1339838938668106394 | 0.505807 | 3 |
| 25644 | 1339852338095933370 | https://www.airbnb.com/rooms/1339852338095933370 | 20250301213336 | 2025-03-02 | city scrape | Spacious 3BR Family Haven in Lower Manhattan | Welcome to our spacious and family-friendly 3-... | The Seaport district is home to a variety of r... | https://a0.muscache.com/pictures/prohost-api/H... | 567273633 | ... | Exempt | t | 2 | 2 | 0 | 0 | 2.00 | 1339852338095933370 | 0.453013 | 2 |
| 25645 | 1348032300167221882 | https://www.airbnb.com/rooms/1348032300167221882 | 20250301213336 | 2025-03-02 | city scrape | Upscale Comfort: Stylish 2BR in the Heart of NYC | Experience the best of Midtown Manhattan at th... | The neighborhood surrounding area offers the p... | https://a0.muscache.com/pictures/prohost-api/H... | 552857810 | ... | Exempt | t | 1 | 1 | 0 | 0 | 2.00 | 1348032300167221882 | 0.258333 | 2 |
| 25646 | 1348081441771732943 | https://www.airbnb.com/rooms/1348081441771732943 | 20250301213336 | 2025-03-02 | city scrape | NYC Skyline Penthouse 2Br/2Bath | Welcome to your dream stay in the heart of New... | NaN | https://a0.muscache.com/pictures/hosting/Hosti... | 676667930 | ... | Exempt | t | 1 | 1 | 0 | 0 | 2.00 | 1348081441771732943 | 0.191255 | 2 |
25647 rows × 82 columns
In [34]:
sentimentAvgandCount_by_listing.sort_values(by='avg_sentiment', ascending=False).head(10)
## Want to get listings that have a sentiment count greater than 10, to get a substantive analysis
Out[34]:
| listing_id | avg_sentiment | sentiment_count | |
|---|---|---|---|
| 12823 | 42757120 | 1.0 | 1 |
| 6714 | 18996738 | 1.0 | 1 |
| 15318 | 51485535 | 1.0 | 1 |
| 21005 | 848890927389228890 | 1.0 | 1 |
| 25274 | 1242992636408069849 | 1.0 | 1 |
| 25490 | 1279026320984012410 | 1.0 | 1 |
| 25278 | 1243527377526506606 | 1.0 | 1 |
| 3939 | 9545498 | 1.0 | 1 |
| 15295 | 51421349 | 1.0 | 2 |
| 15289 | 51402678 | 1.0 | 1 |
In [35]:
sentimentAvgandCount_by_listing = sentimentAvgandCount_by_listing[sentimentAvgandCount_by_listing['sentiment_count']>10]
In [36]:
sentimentAvgandCount_by_listing.sort_values(by='avg_sentiment', ascending=False).head(10)
## Listing ID is pretty abstract. I want to see the URl and the neighbourhood that each listing is in.
Out[36]:
| listing_id | avg_sentiment | sentiment_count | |
|---|---|---|---|
| 14220 | 48257754 | 0.658674 | 11 |
| 10103 | 31216852 | 0.655640 | 15 |
| 13262 | 44651142 | 0.653958 | 12 |
| 17759 | 638404496657149559 | 0.646865 | 14 |
| 4239 | 10089727 | 0.645863 | 21 |
| 14209 | 48251757 | 0.642101 | 19 |
| 11205 | 36224443 | 0.640738 | 11 |
| 15573 | 52169805 | 0.638703 | 11 |
| 10658 | 34071447 | 0.629037 | 14 |
| 9189 | 27955820 | 0.628515 | 14 |
In [37]:
sentimentAvgandCount_by_listingNew=sentimentAvgandCount_by_listing.merge(listings[['neighbourhood_cleansed', 'listing_url','id']], left_on = 'listing_id', right_on='id')
In [38]:
## Now lets run the group by and get a better idea of the area and general quality of the listing
sentimentAvgandCount_by_listingNew.sort_values(by='avg_sentiment', ascending=False).head(10).drop(columns=['id'], errors='ignore')
Out[38]:
| listing_id | avg_sentiment | sentiment_count | neighbourhood_cleansed | listing_url | |
|---|---|---|---|---|---|
| 7776 | 48257754 | 0.658674 | 11 | Financial District | https://www.airbnb.com/rooms/48257754 |
| 5636 | 31216852 | 0.655640 | 15 | Chelsea | https://www.airbnb.com/rooms/31216852 |
| 7277 | 44651142 | 0.653958 | 12 | Chelsea | https://www.airbnb.com/rooms/44651142 |
| 9774 | 638404496657149559 | 0.646865 | 14 | Cambria Heights | https://www.airbnb.com/rooms/638404496657149559 |
| 2433 | 10089727 | 0.645863 | 21 | Midtown | https://www.airbnb.com/rooms/10089727 |
| 7769 | 48251757 | 0.642101 | 19 | Financial District | https://www.airbnb.com/rooms/48251757 |
| 6283 | 36224443 | 0.640738 | 11 | Theater District | https://www.airbnb.com/rooms/36224443 |
| 8526 | 52169805 | 0.638703 | 11 | Chelsea | https://www.airbnb.com/rooms/52169805 |
| 5980 | 34071447 | 0.629037 | 14 | SoHo | https://www.airbnb.com/rooms/34071447 |
| 5157 | 27955820 | 0.628515 | 14 | Hell's Kitchen | https://www.airbnb.com/rooms/27955820 |
In [39]:
## Taking a look at the above listing url's will quickly demonstrate why they each score farily high on the average sentimeent column.
Average sentiment and count by neighbourhood (Additional Merge is needed)¶
In [40]:
CombinedDF_NeighbourhoodGroupby=dfReviews.merge(listings[['id','neighbourhood_cleansed']], left_on = 'listing_id', right_on='id')
In [41]:
CombinedDF_NeighbourhoodGroupby
Out[41]:
| listing_id | id_x | date | reviewer_id | reviewer_name | comments | Sentiment | id_y | neighbourhood_cleansed | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2595 | 17857 | 2009-11-21 | 50679 | Jean | notre séjour de trois nuits.\r<br/>nous avons ... | 0.000000 | 2595 | Midtown |
| 1 | 2595 | 19176 | 2009-12-05 | 53267 | Cate | great experience. | 0.800000 | 2595 | Midtown |
| 2 | 2595 | 19760 | 2009-12-10 | 38960 | Anita | i've stayed with my friend at the midtown cast... | 0.410124 | 2595 | Midtown |
| 3 | 2595 | 34320 | 2010-04-09 | 71130 | Kai-Uwe | we've been staying here for about 9 nights, en... | 0.266875 | 2595 | Midtown |
| 4 | 2595 | 46312 | 2010-05-25 | 117113 | Alicia | we had a wonderful stay at jennifer's charming... | 0.518750 | 2595 | Midtown |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 970673 | 1339852338095933370 | 1351610651278738398 | 2025-02-07 | 212287994 | Nancy-Rita | the apartment was exactly what i was hoping fo... | 0.324359 | 1339852338095933370 | Financial District |
| 970674 | 1339852338095933370 | 1353747692513293975 | 2025-02-10 | 7677238 | Victorio Nicolas | excellent place. very clean and tidy. host was... | 0.581667 | 1339852338095933370 | Financial District |
| 970675 | 1341630436070686432 | 1351590803012454069 | 2025-02-07 | 212287994 | Nancy-Rita | i had a fantastic stay! the apartment was exa... | 0.313636 | 1341630436070686432 | Midtown |
| 970676 | 1341630436070686432 | 1355257116539436530 | 2025-02-12 | 102021327 | Genevieve | great location. responsive host. place was ver... | 0.625556 | 1341630436070686432 | Midtown |
| 970677 | 1341630436070686432 | 1363160918748600500 | 2025-02-23 | 318854382 | Luz | awesome spot, walking distance to time square!... | 0.527778 | 1341630436070686432 | Midtown |
970678 rows × 9 columns
In [42]:
## Drop id_x and y values
CombinedDF_NeighbourhoodGroupby = CombinedDF_NeighbourhoodGroupby.drop(columns=['id_x', 'id_y'], errors='ignore')
In [43]:
CombinedDF_NeighbourhoodGroupby
Out[43]:
| listing_id | date | reviewer_id | reviewer_name | comments | Sentiment | neighbourhood_cleansed | |
|---|---|---|---|---|---|---|---|
| 0 | 2595 | 2009-11-21 | 50679 | Jean | notre séjour de trois nuits.\r<br/>nous avons ... | 0.000000 | Midtown |
| 1 | 2595 | 2009-12-05 | 53267 | Cate | great experience. | 0.800000 | Midtown |
| 2 | 2595 | 2009-12-10 | 38960 | Anita | i've stayed with my friend at the midtown cast... | 0.410124 | Midtown |
| 3 | 2595 | 2010-04-09 | 71130 | Kai-Uwe | we've been staying here for about 9 nights, en... | 0.266875 | Midtown |
| 4 | 2595 | 2010-05-25 | 117113 | Alicia | we had a wonderful stay at jennifer's charming... | 0.518750 | Midtown |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 970673 | 1339852338095933370 | 2025-02-07 | 212287994 | Nancy-Rita | the apartment was exactly what i was hoping fo... | 0.324359 | Financial District |
| 970674 | 1339852338095933370 | 2025-02-10 | 7677238 | Victorio Nicolas | excellent place. very clean and tidy. host was... | 0.581667 | Financial District |
| 970675 | 1341630436070686432 | 2025-02-07 | 212287994 | Nancy-Rita | i had a fantastic stay! the apartment was exa... | 0.313636 | Midtown |
| 970676 | 1341630436070686432 | 2025-02-12 | 102021327 | Genevieve | great location. responsive host. place was ver... | 0.625556 | Midtown |
| 970677 | 1341630436070686432 | 2025-02-23 | 318854382 | Luz | awesome spot, walking distance to time square!... | 0.527778 | Midtown |
970678 rows × 7 columns
In [44]:
sentimentAvgandCount_by_neighbourhood=CombinedDF_NeighbourhoodGroupby.groupby('neighbourhood_cleansed').agg(avg_sentiment=('Sentiment','mean'),sentiment_count=('Sentiment','count')).reset_index()
In [45]:
sentimentAvgandCount_by_neighbourhood.sort_values(by='avg_sentiment', ascending=False).head(10)
Out[45]:
| neighbourhood_cleansed | avg_sentiment | sentiment_count | |
|---|---|---|---|
| 218 | Woodrow | 0.671979 | 4 |
| 9 | Bay Terrace, Staten Island | 0.639286 | 1 |
| 140 | Navy Yard | 0.500429 | 9 |
| 52 | Country Club | 0.481688 | 29 |
| 194 | Tottenville | 0.474173 | 213 |
| 35 | Chelsea, Staten Island | 0.472478 | 2 |
| 21 | Breezy Point | 0.463310 | 116 |
| 203 | Vinegar Hill | 0.462069 | 768 |
| 149 | Oakwood | 0.458333 | 2 |
| 14 | Belle Harbor | 0.458162 | 245 |
In [46]:
# For a more reliable analysis, lets include only the neighbourhood scontaining more than 50 sentiment_count entries
sentimentAvgandCount_by_neighbourhood = sentimentAvgandCount_by_neighbourhood[sentimentAvgandCount_by_neighbourhood['sentiment_count']>50]
In [47]:
sentimentAvgandCount_by_neighbourhood.sort_values(by='avg_sentiment', ascending=False).head(10)
## Not surprisignly, we see neighbourhoods like Vinegar Hill, Financial District, and Rockaway beach score relatively highly.
## I'm sure Tottenville is a great place too - I've yet to see Staten Island
Out[47]:
| neighbourhood_cleansed | avg_sentiment | sentiment_count | |
|---|---|---|---|
| 194 | Tottenville | 0.474173 | 213 |
| 21 | Breezy Point | 0.463310 | 116 |
| 203 | Vinegar Hill | 0.462069 | 768 |
| 14 | Belle Harbor | 0.458162 | 245 |
| 143 | New Dorp Beach | 0.451780 | 483 |
| 75 | Financial District | 0.444972 | 14616 |
| 176 | Silver Lake | 0.442924 | 151 |
| 44 | Cobble Hill | 0.436239 | 2358 |
| 85 | Gerritsen Beach | 0.433977 | 125 |
| 168 | Rockaway Beach | 0.431654 | 2617 |