Back to Projects
AI

OpenCityAI

The first large-scale chatbot using city data (March 2023), trained on over 200,000 open city datasets for optimized question answering.

Project Details

OpenCityAI is an AI chat model trained on a massive corpus of open city data from more than 200 U.S. cities, harmonized from disparate sources like Socrata, ArcGIS, and CKAN. The project addressed fragmentation in urban data by building a robust ingestion and schema‑harmonization pipeline (inspired by the OpenCityCorpus work), resulting in a unified dataset of over 200,000 city datasets (~200 GB) for training. The technical approach utilized PyTorch to train a large language model optimized for question answering via SQuAD 2.0. Retrieval‑augmented generation techniques allow the model to provide up‑to-date answers, and the system supports over 30 languages and updates daily with new city data. Validation results showed that OpenCityAI outperformed commercial systems like Google Bard and Microsoft Bing in delivering accurate, city‑specific information. The project’s goal is to democratize access to urban data for planners and residents.

Technologies Used:

Python
LLM
RAG
Urban Data