Back to Projects
AI

Searchable.City

The first open-vocabulary semantic atlas of New York City, created by running a Vision Language Model on hundreds of thousands of images to build a searchable visual index.

Project Details

By processing street view imagery with a Vision Language Model (VLM), this project moves beyond traditional maps to create a searchable visual index of Manhattan. It translates the visual noise of the street into structured data, allowing queries for concepts like 'Gothic architecture' or 'scaffolding' to reveal patterns and systems that are invisible on conventional maps. This work explores how we can map the invisible systems culture, wealth, infrastructure that define the urban experience.
searchablecityheader

Inspiration

Every ten years, New York City conducts a massive, manual census of its street trees. Thousands of volunteers walk every block with clipboards, counting and identifying every oak and maple across the five boroughs. They do it because the digital map does not know the trees exist.

To Google or Apple, the city is a grid of addresses and listings. The rest of the world gets flattened. Not because it is invisible, but because it was never entered into a database. The map can tell you where a pharmacy is. It cannot tell you where the fire escapes are. Where the murals are. Where the awnings begin. Where the street trees actually cast shade. Where the scaffolding still hangs. That is not a New York problem. It is a mapping problem.

Overview

Standard maps rely on rigid databases. We used a supercomputer to "watch" the city instead. By generating hundreds of tags for every street view image in Manhattan, we created a searchable visual index of the city.

chinese_searchable_city
When we query "Chinese," it successfully delineates Chinatown without knowing a single zip code.

When we query "Chinese," the AI identifies architectural patterns, signage density, and color palettes. It successfully delineates Chinatown without knowing a single zip code. When we query "Gothic," it reveals the 19th-century spine of the city (churches, universities, and older civic buildings) separating the historic from the modern glass towers.

gothic_searchable_city
Querying "Gothic" reveals the historic spine of Manhattan, distinct from the glass of Midtown.

The Ghost in the Machine

This was unexpected. When we queried "East" vs "West," the model accurately lit up the respective sides of the island. Is it reading street signs? Shadows? The model somehow figured out which way it was facing just by analyzing the image data.

west_searchable_city
east_searchable_city

The Decoded City

When you stop looking for addresses and start looking for patterns, the invisible becomes obvious.

scaffolding_indepth
An in-depth look at the query "scaffolding."

conditioning_indepth
An in-depth look at the query "conditioning."

Perpetual Construction

Mapping scaffolding is effectively a way to map change. It highlights exactly where money is being spent on renovation, and where Local Law 11 is forcing facade repairs. It is the temporary city, frozen in 2025.

The Air Conditioner

Consider the air conditioner. As modern HVAC systems retro-fit the skyline, the window unit becomes a marker of building age and socioeconomic strata. A semantic query can instantly light up every wall sleeve or hanging unit across the boroughs, revealing the city's pace of renovation in real-time.

The Visual Language

We found over 3,000 unique descriptive tags. Here are some of the weirdest correlations:

VisualizationQuery & Observation
bagel

BAGEL

The breakfast of champions. Note the complete absence in industrial zones.

beer

BEER

Identifies bars, advertisements, and bodegas with neon signage.

trash

TRASH

Correlates with commercial density and foot traffic.

graffiti

GRAFFITI

The unauthorized art layer of the Lower East Side.

baseball

BASEBALL

Reveals the hidden green spaces of the city, from sandlots to stadiums.

The Blind Spots

However, this approach has inherent limitations. It is bound by the same physics as the human eye. A fire hydrant can vanish behind a double‑parked delivery truck. A basement entrance can dissolve into darkness. A sidewalk ramp can be present, and effectively invisible, if the frame catches it at the wrong angle.

And then there are the structural blind spots: what the camera never sees. Courtyards. Lobbies. Rooftops. The private city behind the street wall. Street View is not “the city.” It is a particular pass, from a particular height, on a particular day, along routes that a platform chose to drive and update.

Unlike ground-truth datasets provided by the city, a visual index carries the biases of its vantage point. It sees what the street view car sees - no more, no less. This map represents probabilities, not absolute facts. A missing tag doesn't prove a missing object. In fact, the empty spaces on the map often reveal more about the limitations of data collection than they do about the city itself.

image

The Searchable Future

Imagine a city you can Ctrl+F. Not a list of addresses: a living surface you can query. Search: “scaffolding.” Search: “shade.” Search: “flood risk.” Search: “closed storefront.” Search: “stoops where people actually sit.” We’re heading toward a continuous, searchable reality. As cameras multiply and refresh cycles compress, the map stops being a document and becomes a question you can ask at any moment. The interface is simple—a search bar—but what it returns is new: a city organized by meaning instead of coordinates. This is what open-vocabulary mapping unlocks. Not just navigation, but perception at scale: the ability to see how the city changes as if you stood on every corner at once.

image image image

Special Thanks

Imagery from Google Maps. © 2025 Google LLC, used under fair use.

Technologies Used:

AI
Computer Vision
VLM
Data Visualization
Urban Data
Big Data