Nightlife Analytics: Has COVID changed the way the city moves at night
Author
Apu Datta
Published
December 18, 2025
New York City nightlife transportation: Time_Square
Executive Summary
COVID reshaped NYC night time mobility. Before COVID, night travel peaked around midnight but collapsed in 2020. Post COVID activity shifted earlier in the evening while late night travel remained weaker. Nightlife intensive zones saw the largest declines and recoveries, and rideshare services strengthened their dominance as Yellow Taxi activity lagged. Statistical evidence shows these changes are large, significant, and persistent, indicating a lasting shift in the timing and geography of night time travel in New York City.
Introduction
New York City’s nightlife supports jobs, transportation, and the sense of safety created by active late night streets. This project combines TLC night time trips (8 PM–4 AM) with Yelp nightlife venues to examine how COVID-19 reshaped night mobility. The overarching question is: How does nightlife activity - bars, entertainment, and rideshares drive urban economies and affect city safety? To address this, I focus on the following specific question: Has COVID changed the way the city moves at night (8 PM – 4 AM)? Using 2019–2023 Yellow Taxi and FHV (Uber/Lyft) trips linked to nightlife zones, the analysis tracks spatial and temporal shifts across pre-COVID, during-COVID, and post-COVID periods. Through cleaning, visualization, and statistical inference, the project reveals structural changes in NYC’s nightlife and night-time movement ecosystem.
Note
Key takeaway: COVID caused a sharp collapse in NYC night travel and the recovery is incomplete post-COVID trips shift earlier (8–11 PM) while late-night (12–4 AM) remains persistently weaker than 2019.
Dataset and Methods: TLC Trips + Yelp Nightlife
This project combines NYC TLC night time taxi and rideshare trips from 2019–2023 with Yelp nightlife venue data. Night trips were filtered to the 8 PM–4 AM window, while nightlife venues were geocoded and assigned to taxi zones. These sources were merged into a zone level panel dataset linking nightlife intensity to night time mobility across NYC and COVID periods.
Reproducibility Note
This project is reproducible with access to NYC TLC trip data. Yellow Taxi and FHV parquet files from 2019 to 2023 should be placed in the dataset_optional/ folder and are processed once into a local DuckDB database (data/night_trips.duckdb) for reuse. Yelp nightlife data are loaded from a cached CSV file by default, and the Yelp API is used only when refetching is enabled.
Warning
Interpret carefully: Yelp nightlife counts are an approximation (API limits + snapshot), so results reflect relative nightlife intensity rather than a full census.
Project Initialization and Environment Setup
This section sets up a portable, reproducible project environment by defining folders, loading required R packages, and configuring file paths to ensure consistent execution across systems.
TLC Yellow Taxi and FHVHV parquet files from 2019 to 2023 are processed in DuckDB and filtered to night hours (8 PM to 4 AM). Trips are stored in a standardized table (tlc_night) with timestamps, pickup zones, distances, and fares. For analysis, 2019, 2020, and 2023 represent pre COVID, during COVID, and post COVID periods, while 2021 and 2022 are excluded as transitional years.
Code
library(DBI)library(duckdb)library(dplyr)library(lubridate)library(glue)# Connect to a persistent DuckDB database on diskduckdb_path <-file.path(base_dir, "night_trips.duckdb")con <-dbConnect(duckdb::duckdb(), dbdir = duckdb_path)# PRAGMA settingsDBI::dbExecute(con, "PRAGMA disable_progress_bar;") # Turn off progress bar# DBI::dbExecute(con, "PRAGMA max_temp_directory_size='20GB';") # Increase temp storageDBI::dbExecute(con, "PRAGMA max_temp_directory_size='10GB';")# cat("DuckDB database:", duckdb_path, "\n")# Helper: list of all months and insert function# All months from 2019-01 to 2023-12months_all <-format(seq.Date(from =as.Date("2019-01-01"),to =as.Date("2023-12-01"),by ="month"),"%Y-%m")# Insert one month for one provider directly into tlc_nightinsert_tlc_night_month <-function(provider, ym, con) {# Local file path under dataset_optional/ local_path <-file.path( tlc_dir,glue("{provider}_tripdata_{ym}.parquet") ) local_path <-normalizePath(local_path, winslash ="/", mustWork =FALSE)if (!file.exists(local_path)) {stop("❌ TLC parquet file not found locally: ", local_path,"\nPlease make sure this file exists in dataset_optional.") }message("\n--- Processing ", provider, " ", ym, " ---")message("Local file: ", local_path)# Choose the correct column names based on provider schema pickup_col <- dplyr::case_when( provider =="yellow"~"tpep_pickup_datetime", provider =="fhvhv"~"pickup_datetime",TRUE~"pickup_datetime" ) dropoff_col <- dplyr::case_when( provider =="yellow"~"tpep_dropoff_datetime", provider =="fhvhv"~"dropoff_datetime",TRUE~"dropoff_datetime" ) distance_col <- dplyr::case_when( provider =="yellow"~"trip_distance", provider =="fhvhv"~"trip_miles",TRUE~"trip_distance" ) total_col <- dplyr::case_when( provider =="yellow"~"total_amount", provider =="fhvhv"~"base_passenger_fare",TRUE~"total_amount" ) sql <-glue(" INSERT INTO tlc_night SELECT '{provider}' AS provider, '{ym}' AS ym, {pickup_col} AS pickup_time, {dropoff_col} AS dropoff_time, PULocationID, DOLocationID, {distance_col} AS distance_miles, {total_col} AS total_fare FROM read_parquet('{local_path}') WHERE ( EXTRACT(hour FROM {pickup_col}) BETWEEN 20 AND 23 OR EXTRACT(hour FROM {pickup_col}) BETWEEN 0 AND 4 ); ") DBI::dbExecute(con, sql)message(" ✅ Inserted night trips for ", provider, " ", ym, " ")}# Build tlc_night ONCE from all local TLC parquet filesif (!DBI::dbExistsTable(con, "tlc_night")) {cat("[1] 0\n")cat("⏳ Building tlc_night from local TLC parquet files in:\n ", tlc_dir, "\n\n")# Empty table definition DBI::dbExecute( con," CREATE TABLE tlc_night ( provider TEXT, ym TEXT, pickup_time TIMESTAMP, dropoff_time TIMESTAMP, PULocationID INTEGER, DOLocationID INTEGER, distance_miles DOUBLE, total_fare DOUBLE ); " )# yellow: 2019-01 to 2023-12# fhvhv: 2019-02 to 2023-12 (no 2019-01 file) plan <- dplyr::bind_rows( tibble::tibble(provider ="yellow",ym = months_all ), tibble::tibble(provider ="fhvhv",ym = months_all[months_all >="2019-02"] ) )for (i inseq_len(nrow(plan))) {insert_tlc_night_month(provider = plan$provider[i],ym = plan$ym[i],con = con ) }} else {cat("✔ Using existing tlc_night table in DuckDB; not re-reading parquet files.\n")}# Use tlc_night as our main table in Rn_rows <-dbGetQuery(con, "SELECT COUNT(*) AS n FROM tlc_night")$n# cat("✔ tlc_night table ready in DuckDB (rows:", n_rows, ")\n")# lazy table (doesn't pull all rows into R)night_trips_raw <- dplyr::tbl(con, "tlc_night")# Small preview: a few rows per provider# 5 sample rows for YELLOW tripsyellow_preview <- DBI::dbGetQuery( con," SELECT * FROM tlc_night WHERE provider = 'yellow' ORDER BY ym, pickup_time LIMIT 5; ")# 5 sample rows for FHVHV tripsfhvhv_preview <- DBI::dbGetQuery( con," SELECT * FROM tlc_night WHERE provider = 'fhvhv' ORDER BY ym, pickup_time LIMIT 5; ")# Nice HTML table for Yellowknitr::kable( yellow_preview,format ="html",caption ="Sample Night Trips – Yellow (5 rows)")# Nice HTML table for fhvhvknitr::kable( fhvhv_preview,format ="html",caption ="Sample Night Trips – FHVHV (5 rows)")
Data Validation
To verify coverage after loading and filtering, I summarize the total number of night time trips by calendar year stored in the DuckDB table.
Code
library(knitr)library(kableExtra)library(scales)DBI::dbGetQuery( con," SELECT EXTRACT(year FROM pickup_time) AS year, COUNT(*) AS n_trips FROM tlc_night GROUP BY year ORDER BY year; ") %>% dplyr::mutate(`Night-time trips (8 PM–4 AM)`= scales::comma(n_trips) ) %>% dplyr::select(Year = year,`Night-time trips (8 PM–4 AM)` ) %>% knitr::kable(format ="html",caption ="Total Night-Time TLC Trips by Year",align =c("c", "r") ) %>% kableExtra::kable_styling(full_width =FALSE,bootstrap_options =c("striped", "hover", "condensed"),position ="center" ) %>% kableExtra::row_spec(0, bold =TRUE)
Total Night-Time TLC Trips by Year
Year
Night-time trips (8 PM–4 AM)
2019
103,761,090
2020
48,672,294
2023
86,662,278
TLC Data Load Validation
As a final sanity check, I confirm that the consolidated night time trips table was successfully created in DuckDB.
Code
DBI::dbExistsTable(con, "tlc_night")
COVID Period Filtering (One time Setup)
The analysis restricts the DuckDB table to three benchmark years: 2019 as pre COVID, 2020 as during COVID, and 2023 as post COVID. The years 2021 and 2022 are excluded as transitional recovery periods rather than stable behavioral regimes. This filtering is performed once for analysis, and the original TLC parquet files remain unchanged.
Code
library(DBI)cat("Keeping ONLY years 2019, 2020, 2023 in tlc_night (parquet files stay safe)...\n")DBI::dbExecute( con," DELETE FROM tlc_night WHERE EXTRACT(year FROM pickup_time) NOT IN (2019, 2020, 2023); ")cat("Checking remaining years in tlc_night...\n")remaining_years <- DBI::dbGetQuery( con," SELECT EXTRACT(year FROM pickup_time) AS year, COUNT(*) AS n_trips FROM tlc_night GROUP BY year ORDER BY year; ")print(remaining_years)cat("✅ Done. tlc_night now has only 2019, 2020, 2023.\n")
Fetching Yelp Nightlife Data via Yelp API
Nightlife venues were collected using the Yelp Fusion API across selected NYC neighborhoods and nightlife categories. Due to API pagination limits, the data represent a snapshot rather than a full census. The resulting dataset contains approximately 2,087 unique venues with usable geographic coordinates and is cached locally to ensure reproducibility without repeated API calls.
Code
library(httr2)library(dplyr)library(purrr)library(readr)library(tidyr)# Make sure Yelp folder existsdir.create(yelp_dir, recursive =TRUE, showWarnings =FALSE)# Helper: fetch nightlife venues from Yelp using neighborhood strategy# - 10 NYC nightlife neighborhoods# - 5 categories# - 2 pages per (hood, category) (offset = 0 and 50)fetch_yelp_venues <-function( api_key,save_path =file.path(yelp_dir, "nightlife_nyc_neighborhoods.csv"),refetch =FALSE) {# ✅ 1) If cache exists and refetch = FALSE, use it (NO API key needed)if (file.exists(save_path) &&!refetch) {message("✔ Using existing Yelp nightlife file: ", save_path) cached <- readr::read_csv(save_path, show_col_types =FALSE)return(cached)}# ✅ 2) If cache missing and refetch = FALSE, stop with clear instructionif (!file.exists(save_path) &&!refetch) {stop("❌ Yelp cached CSV not found: ", save_path, "\n","Fix: Put nightlife_nyc_neighborhoods.csv in data/yelp/ OR set refetch = TRUE (needs API key + internet)." )}# ✅ 3) Only require API key when refetch = TRUEif (identical(api_key, "")) {stop("❌ Missing Yelp API key. Set YELP_API_KEY in your .Renviron (refetch = TRUE).")} base_url <-"https://api.yelp.com/v3/businesses/search" all_venues <- dplyr::tibble()# 10 key NYC neighborhoods for nightlife coverage neighborhoods <-c("East Village, NY", "West Village, NY", "Lower East Side, NY","Williamsburg, Brooklyn", "Midtown Manhattan, NY", "Hell's Kitchen, NY","Astoria, Queens", "Park Slope, Brooklyn", "SoHo, NY", "Bushwick, Brooklyn" )# 5 main nightlife categories categories <-c("bars", "lounges", "danceclubs", "musicvenues", "cocktailbars")# Two pages per search: offset 0 and 50 → up to 100 venues per hood/category offsets <-c(0, 50)message("⏳ Fetching Yelp nightlife venues by neighborhood + category...")for (hood in neighborhoods) {for (cat in categories) {for (off in offsets) {message(" → ", hood, " | ", cat, " | offset = ", off)# Safe request using httr2 resp <-tryCatch( {request(base_url) |>req_headers(Authorization =paste("Bearer", api_key)) |>req_url_query(location = hood,categories = cat,limit =50,offset = off ) |>req_timeout(30) |>req_perform() },error =function(e) {message(" ⚠️ Error at ", hood, " - ", cat," (offset ", off, "): ", conditionMessage(e))return(NULL) } )# If request failed, move to the next combinationif (is.null(resp)) next# Stop paging if Yelp returns non-200 status (e.g. 400)if (resp_status(resp) !=200) {message(" ⚠️ Yelp status ", resp_status(resp)," → stopping further pages for this hood/category.")break } json <-resp_body_json(resp, simplifyVector =FALSE) businesses <- json$businesses# If no more businesses → stop paging for this hood/categoryif (length(businesses) ==0) {message(" ℹ️ No more businesses at this offset.")break }# Parse businesses into a tibble cleaned <- purrr::map_dfr(businesses, function(b) { tibble::tibble(id = b$id,name = b$name,neighborhood = hood,yelp_category= cat,rating = b$rating,review_count = b$review_count,price =ifelse(is.null(b$price), NA, b$price),categories =paste( purrr::map_chr(b$categories, "title"),collapse =", " ),latitude = b$coordinates$latitude,longitude = b$coordinates$longitude,city = b$location$city,state = b$location$state,zip_code = b$location$zip_code,url = b$url,is_closed = b$is_closed ) }) all_venues <- dplyr::bind_rows(all_venues, cleaned)# Be nice to the APISys.sleep(0.3) } } }# Deduplicate by Yelp business id final <- all_venues %>% dplyr::distinct(id, .keep_all =TRUE) readr::write_csv(final, save_path)message("✅ Saved Yelp nightlife snapshot to: ", save_path) final}# Use the function with your API key from .Renvironyelp_df <-suppressMessages(fetch_yelp_venues(api_key =Sys.getenv("YELP_API_KEY"),save_path =file.path(yelp_dir, "nightlife_nyc_neighborhoods.csv"),refetch =FALSE ))# Clean up for Quarto (avoid LaTeX issues with "$")yelp_preview <- yelp_df %>%mutate(price =gsub("\\$", "$ ", price) ) %>%select( name, neighborhood, yelp_category, rating, review_count, price, categories, latitude, longitude, city, state ) %>%head(5)knitr::kable( yelp_preview,caption ="Preview of Yelp Nightlife Venues by Neighborhood (Sample)")
COVID Period Classification
A helper function maps each year month identifier to its corresponding COVID period pre, during, or post COVID to ensure consistent labeling throughout the analysis.
Code
covid_period_from_ym <-function(ym) { year <-as.integer(substr(ym, 1, 4)) dplyr::case_when( year ==2019~"pre_covid", year ==2020~"during_covid", year ==2023~"post_covid",TRUE~NA_character_ )}
Night-Time TLC Trip Summary
Full Load: 2019–2023; Analysis Focus: 2019, 2020, 2023
This section provides a high level snapshot of the full TLC night trip table loaded into DuckDB. The main period analysis later focuses on 2019 (pre), 2020 (during), and 2023 (post).
Monthly Trends (By Provider)
Night time travel in NYC is dominated by FHVs, recording far more monthly trips than Yellow taxis. FHVs also have longer average trip distances, while Yellow taxis maintain higher average fares.
Hourly Night-Time Activity
Night travel peaks between 12 AM and 2 AM, with FHVs providing most trips. Yellow taxis remain active at lower volumes, and trips for both providers decline sharply after 3 AM.
Night trips by month and provider (first 5 rows)
ym
provider
n_trips
avg_distance_miles
avg_total_fare
2019-01
yellow
2,089,816
3.18
16.19
2019-02
fhvhv
6,747,686
5.01
15.57
2019-02
yellow
1,994,606
3.16
18.82
2019-03
fhvhv
8,023,674
5.06
15.44
2019-03
yellow
2,333,655
3.26
19.36
Data Quality Overview by COVID Period
The TLC night trip dataset shows excellent data completeness across all COVID periods examined. Key fields pickup location, trip distance, and fare amount have 0% missing values for all night time trips, This indicates high data reliability and minimizes concerns about missing data bias in the analysis.
Code
# Data quality by COVID period, computed fully in DuckDB (no big collect)qc_sql <-"WITH base AS ( SELECT ym, CASE WHEN CAST(substr(ym, 1, 4) AS INTEGER) = 2019 THEN 'pre_covid' WHEN CAST(substr(ym, 1, 4) AS INTEGER) = 2020 THEN 'during_covid' WHEN CAST(substr(ym, 1, 4) AS INTEGER) = 2023 THEN 'post_covid' ELSE NULL END AS period, PULocationID, distance_miles, total_fare FROM tlc_night WHERE CAST(substr(ym, 1, 4) AS INTEGER) IN (2019, 2020, 2023))SELECT period, COUNT(*) AS n_trips_sample, 100.0 * SUM(CASE WHEN PULocationID IS NULL THEN 1 ELSE 0 END) / COUNT(*) AS pct_na_pu_location, 100.0 * SUM(CASE WHEN distance_miles IS NULL THEN 1 ELSE 0 END) / COUNT(*) AS pct_na_distance, 100.0 * SUM(CASE WHEN total_fare IS NULL THEN 1 ELSE 0 END) / COUNT(*) AS pct_na_total_fareFROM baseGROUP BY periodORDER BY period;"data_quality_period <- DBI::dbGetQuery(con, qc_sql)knitr::kable( data_quality_period,format ="html",caption ="Night-Time TLC Data Quality by COVID Period (Full Data, DuckDB)",digits =2)
Night-Time TLC Data Quality by COVID Period (Full Data, DuckDB)
period
n_trips_sample
pct_na_pu_location
pct_na_distance
pct_na_total_fare
during_covid
48672206
0
0
0
post_covid
86662273
0
0
0
pre_covid
103761169
0
0
0
Linking TLC Night Trips with Yelp Nightlife Data
TLC night trips are linked to Yelp nightlife data at the taxi zone level, aggregating trips by zone, month, COVID period, and provider to form a zone level panel for analysis.
Code
library(sf)library(dplyr)library(tidyr)library(stringr)library(knitr)# TLC taxi zones geometry# I define where the TLC taxi-zone zip should live and make sure the folder existstaxi_zip <-file.path(base_dir, "tlc_zones.zip")taxi_dir <-file.path(base_dir, "tlc_zones")invisible({dir.create(taxi_dir, showWarnings =FALSE)# I download the official TLC taxi-zone shapefile once and reuse it laterif (!file.exists(taxi_zip)) {cat("\n[download] TLC taxi zone shapefile\n")download.file("https://d37ci6vzurychx.cloudfront.net/misc/taxi_zones.zip", taxi_zip,mode ="wb" )unzip(taxi_zip, exdir = taxi_dir)} else {# cat("\n[skip] Using existing TLC taxi zone zip\n")}})# I read the taxi zones as sf polygons and keep only the fields I needzones_sf <-st_read(dsn = taxi_dir,layer ="taxi_zones",quiet =TRUE) %>%st_transform(4326) %>%select(LocationID, borough = borough, zone = zone) %>%filter(borough !="EWR") # drop Newark Airport# cat("\nSample of TLC taxi zones:\n")# knitr::kable(# head(zones_sf, 6),# format = "html",# caption = "Sample of TLC Taxi Zones"# )# Yelp nightlife venues as sf points# I convert my Yelp nightlife dataframe to sf points using longitude/latitude# (columns from yelp_api_loader: latitude, longitude, categories, etc.)yelp_sf <- yelp_df %>%filter(!is.na(latitude), !is.na(longitude)) %>%st_as_sf(coords =c("longitude", "latitude"),crs =4326,remove =FALSE )if (interactive()) cat("\nSample of Yelp nightlife venues (sf points):\n")# Clean preview table: no geometry, no raw "$$"yelp_sf_preview <- yelp_sf %>% sf::st_drop_geometry() %>%mutate(# Replace "$" with "USD " so Quarto does not think "$$" is mathprice =gsub("\\$", "$ ", price) ) %>%select( name, rating, review_count, price, categories, # column created in yelp_api_loader city, state, zip_code ) %>%head(6)# knitr::kable(# yelp_sf_preview,# caption = "Sample of Yelp Nightlife Venues (sf points)"# )# Spatial join: assign each Yelp venue to a taxi zoneyelp_zoned <-st_join(yelp_sf, zones_sf, join = st_within)if (interactive()) cat("\nYelp venues with assigned taxi zones (if any):\n")# Clean preview: drop geometry, fix price, keep only key columnsyelp_zoned_preview <- yelp_zoned %>% sf::st_drop_geometry() %>%mutate(price =gsub("\\$", "$ ", price) ) %>%select( name, rating, review_count, price, borough, zone ) %>%head(6)# knitr::kable(# yelp_zoned_preview,# caption = "Yelp Venues with Assigned Taxi Zones"# )# For aggregation, I drop the geometry and keep only attributesyelp_zoned_df <- yelp_zoned %>% sf::st_drop_geometry()# Nightlife intensity per taxi zone + nightlife zone definitionnightlife_by_zone <- yelp_zoned_df %>%group_by(LocationID) %>%summarize(n_nightlife =n(),avg_rating =mean(rating, na.rm =TRUE),.groups ="drop" ) %>%mutate(# My definition: all taxi zones with more than 3 nightlife venuesis_nightlife_zone = n_nightlife >3,nightlife_zone =if_else( is_nightlife_zone,"Nightlife zone (≥4 venues)","Non-nightlife zone (≤3 venues)" ) )if (interactive()) cat("\nNightlife venues per taxi zone:\n")knitr::kable(head(nightlife_by_zone, 6),format ="html",caption ="Nightlife Venues per Taxi Zone (with Nightlife-Zone Flag)")
Nightlife Venues per Taxi Zone (with Nightlife-Zone Flag)
LocationID
n_nightlife
avg_rating
is_nightlife_zone
nightlife_zone
4
25
3.900000
TRUE
Nightlife zone (≥4 venues)
7
91
3.694505
TRUE
Nightlife zone (≥4 venues)
13
2
2.050000
FALSE
Non-nightlife zone (≤3 venues)
14
3
2.766667
FALSE
Non-nightlife zone (≤3 venues)
17
6
4.483333
TRUE
Nightlife zone (≥4 venues)
24
1
0.000000
FALSE
Non-nightlife zone (≤3 venues)
Code
# Aggregate TLC night trips (FULL data) to zone-month-period-provider# using DuckDB SQL (avoids difftime() translation issues)# Make sure the DuckDB connection from tlc_loader existsif (!exists("con")) {stop("DuckDB connection `con` not found. Run tlc_loader chunk first.")}trips_sql <-" SELECT PULocationID AS LocationID, ym, CASE WHEN CAST(substr(ym, 1, 4) AS INTEGER) = 2019 THEN 'pre_covid' WHEN CAST(substr(ym, 1, 4) AS INTEGER) = 2020 THEN 'during_covid' WHEN CAST(substr(ym, 1, 4) AS INTEGER) = 2023 THEN 'post_covid' ELSE NULL END AS period, provider, COUNT(*) AS n_trips, AVG(distance_miles) AS avg_distance, AVG(total_fare) AS avg_fare FROM tlc_night WHERE PULocationID IS NOT NULL AND distance_miles IS NOT NULL AND distance_miles > 0 AND total_fare IS NOT NULL AND total_fare > 0 AND date_diff('minute', pickup_time, dropoff_time) BETWEEN 1 AND 180 GROUP BY PULocationID, ym, period, provider"trips_by_zone_period <- DBI::dbGetQuery(con, trips_sql)# knitr::kable(# head(trips_by_zone_period, 6),# format = "html",# caption = "Night Trips per Zone, Month, Period, Provider (Full Data, DuckDB SQL)"# )# Build the zone-level panel: TLC trips + nightlife intensityzone_panel <- trips_by_zone_period %>%left_join(nightlife_by_zone, by ="LocationID") %>%mutate(n_nightlife =replace_na(n_nightlife, 0L),nightlife_bin =case_when( n_nightlife ==0~"No nightlife venues", n_nightlife <=5~"1–5 venues", n_nightlife <=15~"6–15 venues", n_nightlife >15~"16+ venues",TRUE~NA_character_ ),# Re-apply my nightlife-zone definition after replacing NAsis_nightlife_zone = n_nightlife >3,nightlife_zone =if_else( is_nightlife_zone,"Nightlife zone (≥4 venues)","Non-nightlife zone (≤3 venues)" ) )if (interactive()) cat("\nCombined zone-level panel (TLC trips + nightlife intensity):\n")# knitr::kable(# head(zone_panel, 6),# format = "html",# caption = "Zone-Level Panel: TLC Trips + Nightlife Intensity"# )zone_panel_path <-file.path(base_dir, "tlc_zone_panel.rds")saveRDS(zone_panel, zone_panel_path)if (interactive()) cat("\nSaved zone_panel to ", zone_panel_path, "\n")invisible(NULL)
Data Suitability: TLC Trips and Yelp Nightlife
Night time TLC trips appear in 264 taxi zones, indicating broad citywide coverage. Yelp nightlife venues appear in 112 zones, reflecting the concentration of nightlife activity. Together, these datasets are well aligned for analyzing the relationship between nightlife intensity and night time mobility.
Code
# Basic coverage checks on the zone-level panelzones_with_trips <- zone_panel %>%filter(n_trips >0) %>%summarize(n_zones_with_trips =n_distinct(LocationID))zones_with_nightlife <- zone_panel %>%filter(n_trips >0, n_nightlife >0) %>%summarize(n_zones_with_trips_and_nightlife =n_distinct(LocationID))coverage_summary <-bind_cols(zones_with_trips, zones_with_nightlife)# cat("\nCoverage of TLC night trips and Yelp nightlife across taxi zones:\n")# print(coverage_summary)knitr::kable( coverage_summary,format ="html",caption ="Coverage of TLC Night Trips and Yelp Nightlife Across Taxi Zones",digits =0)
Coverage of TLC Night Trips and Yelp Nightlife Across Taxi Zones
n_zones_with_trips
n_zones_with_trips_and_nightlife
264
112
Hourly Night Movement by COVID Period
Code
library(DBI)library(duckdb)library(dplyr)library(ggplot2)library(plotly)library(scales)# Ensure DuckDB connection exists (optional safety)if (!exists("con") ||is.null(con) ||!DBI::dbIsValid(con)) { duckdb_path <-file.path(base_dir, "night_trips.duckdb") con <- DBI::dbConnect(duckdb::duckdb(), dbdir = duckdb_path)}# 1) Build hourly counts by COVID period directly in DuckDB (same logic)hourly_counts_wrap <- DBI::dbGetQuery( con," WITH trips AS ( SELECT -- Wrap night hours: 20–23 stay 20–23, 0–4 become 24–28 CASE WHEN EXTRACT(hour FROM pickup_time) BETWEEN 20 AND 23 THEN EXTRACT(hour FROM pickup_time) WHEN EXTRACT(hour FROM pickup_time) BETWEEN 0 AND 4 THEN EXTRACT(hour FROM pickup_time) + 24 ELSE NULL END AS hour_wrap, CASE WHEN CAST(substr(ym, 1, 4) AS INTEGER) < 2020 THEN 'pre_covid' WHEN CAST(substr(ym, 1, 4) AS INTEGER) = 2020 THEN 'during_covid' ELSE 'post_covid' END AS period FROM tlc_night WHERE CAST(substr(ym, 1, 4) AS INTEGER) IN (2019, 2020, 2023) AND PULocationID IS NOT NULL AND distance_miles IS NOT NULL AND distance_miles > 0 AND total_fare IS NOT NULL AND total_fare > 0 AND date_diff('minute', pickup_time, dropoff_time) BETWEEN 1 AND 180 ) SELECT period, hour_wrap AS hour, COUNT(*) AS n_trips FROM trips WHERE hour_wrap IS NOT NULL GROUP BY period, hour ORDER BY period, hour ") %>%mutate(period =factor( period,levels =c("pre_covid", "during_covid", "post_covid"),labels =c("Pre-COVID", "During COVID", "Post-COVID") ) )# 2) Plot: polished title, y-axis in millions, nicer themep_hourly <-ggplot( hourly_counts_wrap,aes(x = hour, y = n_trips, color = period)) +geom_line(linewidth =1) +scale_x_continuous(limits =c(20, 28),breaks =c(20,21,22,23,24,25,26,27,28),labels =c("8 PM","9 PM","10 PM","11 PM","12 AM","1 AM","2 AM","3 AM","4 AM") ) +scale_y_continuous(limits =c(0, 20e6), labels =label_number(scale =1e-6, suffix ="M"),expand =expansion(mult =c(0, 0.05)) ) +labs(title ="Hourly Night Movement\nCOVID Period (8 PM–4 AM)",x ="Hour of Night",y ="Number of Trips (millions)",color ="Period" ) +theme_minimal(base_size =14) +theme(plot.title =element_text(face ="bold", size =18, hjust =0.5),axis.title.x =element_text(face ="bold"),axis.title.y =element_text(face ="bold"),axis.text.x =element_text(face ="bold"),panel.border =element_rect(color ="black", fill =NA, linewidth =0.8),panel.grid.major.x =element_blank(),panel.grid.minor =element_blank(),legend.position ="right" )ggplotly( p_hourly,tooltip =c("hour", "n_trips", "period"))
COVID sharply reduced night travel at all hours, with the largest drop after midnight. Post COVID trips rebound earlier in the evening, peaking from eight to eleven PM, while late night travel after two AM remains weaker than before, indicating a lasting shift toward earlier nights.
Weekend vs Weekday Night Movement
COVID reshaped night mobility but did not permanently weaken NYC’s weekend nightlife.
Weekend night trips fell sharply in 2020, rebounded fastest in the post COVID period, and restored the traditional weekend and weekday gap, reinforcing nightlife as a key driver of night time travel.
Trip Distance and Duration by COVID Period
Code
library(dplyr)distance_duration_summary <- DBI::dbGetQuery( con," WITH base AS ( SELECT CASE WHEN CAST(substr(ym, 1, 4) AS INTEGER) < 2020 THEN 'pre_covid' WHEN CAST(substr(ym, 1, 4) AS INTEGER) = 2020 THEN 'during_covid' ELSE 'post_covid' END AS period, distance_miles, date_diff('minute', pickup_time, dropoff_time) AS trip_minutes FROM tlc_night WHERE PULocationID IS NOT NULL AND distance_miles IS NOT NULL AND distance_miles > 0 AND total_fare IS NOT NULL AND total_fare > 0 AND date_diff('minute', pickup_time, dropoff_time) BETWEEN 1 AND 180 ) SELECT period, AVG(distance_miles) AS mean_distance, AVG(trip_minutes) AS mean_minutes FROM base GROUP BY period ORDER BY period; ")distance_duration_summary$period <-factor( distance_duration_summary$period,levels =c("pre_covid", "during_covid", "post_covid"))knitr::kable( distance_duration_summary,format ="html",caption ="Average Night Trip Distance and Duration by COVID Period (Full Data)",digits =2)
Average Night Trip Distance and Duration by COVID Period (Full Data)
period
mean_distance
mean_minutes
during_covid
4.72
15.07
post_covid
5.21
17.58
pre_covid
4.66
16.51
• During COVID (2020)
Night trips were shortest in both distance and duration, indicating reduced travel radius and localized movement.
• *Pre COVID (2019)
Trips were slightly shorter in distance but longer in duration, reflecting normal congestion and established nightlife patterns.
• Post COVID (2023)
Trips became longer than both pre- and during-COVID in distance and duration, suggesting a broader spatial spread of night activity.
• Overall Insight
COVID temporarily compressed night time travel. After reopening, night mobility rebounded with longer and wider-reaching trips than before the pandemic.
Provider Mix: Yellow vs Rideshare Across COVID Periods
Rideshare services became a dominant mode of night-time travel — strongly reflected in TLC trip data.
The provider mix reveals that rideshare (FHVHV) dominated NYC night trips in all periods and became even more dominant during COVID.
Pre COVID (2019): Yellow taxis handled about 23% of night-time trips, while rideshare providers accounted for roughly 77%.
During COVID (2020): Yellow’s share fell sharply to about 12%, as rideshare climbed to nearly 88% of all night trips.
Post COVID (2023): Yellow taxis remain at roughly 12%, while rideshare continues to dominate with about 88% of night trips.
Overall, the provider mix indicates a structural shift in NYC’s nightlife mobility, with Uber/Lyft style rideshare services remaining the primary mode of night time travel even after the city reopened.
Tip
Why it matters: The post COVID night economy is now overwhelmingly rideshare driven. Even after reopening, Yellow taxi share stays low, signaling a structural not temporary mode shift in late night mobility.
Nightlife Hotspots and Night Trip Intensity
Nightlife Density (Yelp venues): Nightlife is highly concentrated in a small set of Manhattan zones (e.g., East Village, Lower East Side, Midtown West). Most zones outside Manhattan have few or no nightlife venues, highlighting a tightly clustered late night economy.