Vipra Software Launchpad PropTech Market Intelligence Analyst
2026 Engineering Project · Real Estate · PropTech · Gemini Multimodal
Gemini Multimodal Satellite Imagery MLS Data Zoning PDFs 60% Faster Valuations Climate Risk Assessment

The Market Intelligence Conversational Analyst —
Fusing Transactions, Satellite Imagery & Zoning Documents

How Vipra Software built the industry's first truly multimodal property intelligence engine — simultaneously reading satellite imagery for construction changes, parsing zoning regulation PDFs for permissible use, and querying 10 years of MLS transaction history — all through a single conversational interface that any broker, analyst, or investor can use in plain English.

MLS Listings Satellite Change Detection Zoning Regulations Climate Risk Scoring Permit Applications Environmental Reports
Industry
Real Estate & PropTech
Duration
32 Weeks
AI Modality
Vision + Text + Structured
LLM Model
Gemini 1.5 Pro (Multimodal)
Properties Covered
2M+ Listings
Revenue Uplift
$5M+ for Clients
60%
Faster Property Valuation
30%
More Accurate Price Predictions
$5M+
Additional Brokerage Revenue
2M+
Property Listings Indexed
3×
Data Sources Fused Simultaneously

The Challenge

A national residential and commercial brokerage with operations in 22 cities was sitting on the most comprehensive property data set in their markets — and couldn't use any of it. Their CRM held 2M+ MLS listings with 10 years of pricing history. Their GIS team had negotiated satellite imagery contracts covering every ZIP code. Their compliance department had digitized 8,000 pages of municipal zoning regulations. And their environmental team had climate risk reports for flood zones, wildfire corridors, and soil remediation sites across their coverage area.

None of these data sources could talk to each other. A senior analyst wanting to identify under-valued mixed-use development opportunities had to: query the MLS database manually, open the GIS viewer to check satellite imagery separately, download the relevant zoning PDFs and search them manually, cross-reference county permit databases in a browser, and then build a spreadsheet combining what they found. This process took 3–5 days per ZIP code, and the analyst's institutional knowledge — knowing which zoning board decisions were pending, which neighborhoods had upcoming infrastructure investment — lived entirely in their heads and was not captured anywhere.

As climate risk assessment became mandatory for property insurance and commercial lending under evolving ESG disclosure frameworks, the brokerage faced a new problem: lenders were now requiring climate-adjusted valuations at underwriting time. Their existing AVMs (Automated Valuation Models) had no climate risk layer whatsoever.

The Core Problem

Property intelligence is fundamentally multimodal: the value of a parcel is determined by what's on it (satellite imagery), what can be built on it (zoning documents), what comparable sales show (structured transaction history), and what risks surround it (environmental/climate reports). No existing system could process all four simultaneously. Human analysts spent 80% of their time gathering data and 20% making decisions — the ratio needed to be flipped.

The Gemini Multimodal Advantage

Gemini 1.5 Pro is the first production model capable of processing satellite imagery (vision), zoning regulation PDFs (long-context text), structured transaction tables, and natural-language market commentary simultaneously — in a single inference call. Previous approaches required three separate specialized models with complex orchestration logic between them. Gemini collapses this into a unified reasoning engine that understands "this parcel" in every dimension at once.

Challenge 1

Satellite → Structured Bridge

Satellite imagery is raw pixel data. Transaction databases are structured records. No existing system could link "new construction visible in satellite image dated March 2026" to "permit application filed February 2026" to "comparable sales within 800m" automatically.

Challenge 2

Zoning Document Complexity

Municipal zoning codes are among the most complex legal documents in existence — 200–800 pages each, with conditional use tables, overlay districts, setback calculations, and FAR formulas. Parsing them for specific parcel-level answers required legal expertise no analytics system could replicate.

Challenge 3

Temporal Drift in Property Data

Property data decays fast. Satellite imagery is typically 3–6 months stale. Zoning codes change quarterly. MLS data has 24-hour freshness requirements for listing accuracy. Building a system with consistent temporal reasoning across all three was an unsolved engineering problem.

Challenge 4

Climate Risk Integration

FEMA flood maps, wildfire risk scores, and sea-level projection models are produced in incompatible formats (SHP, GeoTIFF, PDF, XML) across dozens of government agencies. No commercial AVM had ever ingested all of these into a coherent risk-adjusted valuation layer.

System Architecture

The Market Intelligence Conversational Analyst is a five-layer architecture designed around the fundamental insight that property intelligence is multimodal. The structured layer handles transactional facts; the vision layer processes imagery; the document layer digests regulations; the unified store makes them queryable together; and Gemini 1.5 Pro acts as the reasoning surface that understands all three simultaneously.

Architecture — PropTech Market Intelligence Conversational Analyst
Structured Data MLS Listings 2M+ Properties Transaction DB 10yr Price History Demographics Census · Income Mortgage Rates Fed · Lender API Permit Applications · County Records Satellite & Vision Satellite Imagery Planet · Sentinel-2 Street View Condition Scoring GIS Shapefiles Parcel Boundaries Change Detection NDVI · NDBI Flood Maps · Wildfire Risk · Climate Models Documents & Text Zoning Codes 8K+ pages · 22 cities Environmental FEMA · EPA PDFs News & Reviews Local Media · Yelp HOA Documents Rules · Financials Inspector Reports · Title Docs · Appraisals Structured Ingestion BigQuery · Spark · dbt · CDC MLS API Sync · County SOAP Bridge Vision Pipeline Cloud Vision · Change Model GeoTIFF → Vertex AI Embeddings Document Processing Document AI · OCR · Layout Zoning Table Parser · NER Unified Geospatial Intelligence Store BigQuery 2M+ Property Records Vertex AI Vector Search Multimodal Embeddings · PostGIS Parcel Index Knowledge Store Zoning Rules · Climate Scores · AVM Gemini 1.5 Pro — Multimodal Property Reasoning Engine Vision (satellite) + Long-Context Text (zoning PDFs) + Structured SQL — single inference call RAG over Property Corpus Geospatial NL → Multi-Source Retrieval Climate-Adjusted AVM Output Consumption Layer Conversational UI NL Property Queries Map Overlay Answers Evidence Pack Export AVM + Risk Scores Climate-Adjusted Value Flood / Fire / Liquidity Lender-Ready Reports Development Pipeline Zoning Opportunity Map Permit Tracker Satellite Change Alerts Market Intelligence Trend Forecasting Neighbourhood Scoring Competitive Analysis

Multimodal Query Execution Flow

What makes the Market Intelligence Analyst technically unique is its parallel multimodal retrieval — a single conversational query fans out into three simultaneous retrieval streams: a structured SQL path against the property transaction database, a vision retrieval path against satellite image embeddings, and a semantic document path against zoning/environmental text. These converge in a RAG fusion step before Gemini synthesizes the final answer.

1
Input · Avg response target: 5.2s
Natural Language Query Ingestion
Broker or analyst submits a query via the conversational UI, API, or voice interface. Example: "Show me all commercial properties in ZIP 94103 where satellite imagery shows new construction in the last 6 months, zoning allows mixed-use, and comparable sales have appreciated >15%." The query is received by the orchestration layer as plain text.
2
Gemini Intent Decomposition · ~0.5s
Query Parsing & Modality Routing
Gemini decomposes the query into three typed retrieval sub-queries: (A) a geospatial SQL predicate for the structured layer — zip=94103, use_type=commercial, sale_appreciation>15%; (B) a vision retrieval query — satellite change detection: new construction signal, last 180 days; (C) a semantic document query — zoning code: mixed-use permitted, parcel overlay district. Each sub-query is routed to its specialist retrieval path simultaneously.
3
Parallel Retrieval · ~1.8s (parallel, not serial)
Triple-Stream Retrieval
Stream A (Structured): BigQuery SQL against 2M+ property records with PostGIS geospatial predicates. Returns matching parcels with coordinates, pricing history, and permit status. Stream B (Vision): Vertex AI Image Search retrieves satellite thumbnails for each candidate parcel, scored by the change-detection model for construction signals. Stream C (Document): Semantic vector search over zoning code embeddings, returning relevant zoning table rows and overlay district rules for each candidate parcel's jurisdiction.
4
Fusion · ~0.7s
Geospatial-Aware Context Assembly
Results from all three streams are intersected by parcel ID — only properties appearing in all three streams (matches on structured criteria AND showing satellite construction signals AND confirmed as mixed-use permitted by zoning) pass the filter. Satellite thumbnails, relevant zoning excerpts, and pricing data are assembled into a single multimodal context package per qualifying parcel.
5
Gemini Multimodal Reasoning · ~2.2s
Simultaneous Vision + Text + Structured Synthesis
Gemini 1.5 Pro receives the assembled context as a multimodal prompt: satellite imagery tiles (vision tokens), zoning document excerpts (text tokens), and structured pricing tables (text tokens in tabular format). It generates a natural-language response with ranked property recommendations, cited zoning article references, described construction observations from satellite, and climate risk scores — all in a single inference call.
6
Output · Total avg 5.2s
Contextual Response + Map Overlay + Evidence Pack
The broker receives: a ranked list of qualifying properties with narrative explanations, a map view with qualifying parcels highlighted, cited zoning code sections with page references, satellite change-detection confidence scores, comparable sales summaries, and a downloadable evidence pack suitable for client presentation or lender submission. The entire output that previously took a senior analyst 3–5 days is produced in 5.2 seconds.

Conversational Query Examples

The conversational interface is designed for non-technical real estate professionals. Queries can be as specific or as exploratory as the user needs — the system handles the complexity of multi-source retrieval invisibly.

Example Query 1 — Development Opportunity Identification

"Show me all commercial properties in ZIP 94103 where satellite imagery shows new construction in the last 6 months, zoning allows mixed-use, and comparable sales have appreciated >15%."

Example Query 2 — Climate Risk Due Diligence

"For this portfolio of 40 properties, generate a climate-adjusted valuation showing how a FEMA 100-year flood event, a 2°C sea-level rise by 2050, and the current California wildfire risk rating affect the appraised value of each asset."

Example Query 3 — Neighbourhood Trajectory Analysis

"Compare satellite imagery change signals, new business permit applications, and median price trends for these three adjacent ZIP codes over the last 24 months — and tell me which one shows the strongest early-stage gentrification pattern."

Example Query 4 — Zoning Arbitrage Discovery

"Find all single-family parcels within 500m of a light-rail station where the current zoning ordinance was updated in the last 18 months to permit higher-density residential, and the parcel has not yet had a development permit filed."

Implementation — Key Components

1. Satellite Change Detection Pipeline

# Satellite image ingestion → change detection → parcel scoring class SatelliteChangeDetectionPipeline: def __init__(self, config: SatConfig): self.vision_client = vision.ImageAnnotatorClient() self.change_model = aiplatform.Endpoint('construction-detect-v2') self.geo_index = PostGISClient(config.db_url) self.image_store = storage.Client().bucket('sat-imagery-archive') def process_parcel(self, parcel_id: str, bbox: BoundingBox) -> ChangeReport: # Fetch imagery at t-0 and t-180d for the parcel bounding box img_now = self._fetch_imagery(bbox, date='latest') img_past = self._fetch_imagery(bbox, date='-180d') # Compute NDBI (Normalized Difference Built-up Index) delta ndbi_delta = self._compute_ndbi_delta(img_now, img_past) # Run construction-detection model (fine-tuned EfficientNet) change_score = self.change_model.predict( instances=[{'before': img_past.b64, 'after': img_now.b64}] ).predictions[0]['construction_confidence'] # Cross-reference with permit database permits = self.geo_index.query( f"SELECT * FROM permits WHERE ST_Contains(geom, '{bbox.wkt}'::geometry)" f" AND filed_date > NOW() - INTERVAL '180 days'" ) return ChangeReport( parcel_id=parcel_id, change_score=change_score, ndbi_delta=ndbi_delta, has_permit=len(permits) > 0, imagery_before=img_past.gcs_uri, imagery_after=img_now.gcs_uri )

2. Zoning Document Intelligence Parser

# Parse municipal zoning codes → structured knowledge + embeddings class ZoningDocumentParser: # Zoning codes are among the most complex PDFs in existence: # conditional use tables, overlay districts, FAR calculations def parse_zoning_code(self, pdf_path: str, municipality: str) -> ZoningKnowledge: # Document AI layout parser for multi-column zoning tables raw = self.docai_client.process_document( request={'raw_document': {'content': open(pdf_path,'rb').read(), 'mime_type': 'application/pdf'}, 'name': self.docai_processor} ) # Extract use tables: district → permitted / conditional / prohibited use_table = self._extract_use_table(raw.document) # Parse setback rules, FAR limits, height restrictions per district dimensional_rules = self._extract_dimensional_standards(raw.document) # Chunk into 512-token passages with municipality + district metadata chunks = self._chunk_with_context(raw.document.text, metadata={'municipality': municipality, 'effective_date': self._extract_effective_date(raw)}) # Embed all chunks for semantic retrieval embeddings = self.embedding_model.get_embeddings( [TextEmbeddingInput(c.text, 'RETRIEVAL_DOCUMENT') for c in chunks] ) return ZoningKnowledge( use_table=use_table, dimensional_rules=dimensional_rules, chunks_embedded=list(zip(chunks, embeddings)), municipality=municipality )

3. Gemini Multimodal Property Query

# Single Gemini call processing imagery + zoning text + structured data class MultimodalPropertyReasoningEngine: SYSTEM_PROMPT = """You are an expert real estate analyst with deep knowledge of property valuation, zoning law, and satellite imagery interpretation. You have access to: (1) satellite imagery before/after pairs, (2) zoning code excerpts, (3) property transaction data. Always cite zoning ordinance section numbers. Rate construction confidence as HIGH/MEDIUM/LOW. Never fabricate comparable sales.""" def query(self, nl_query: str, candidates: list[PropertyCandidate]) -> PropertyResponse: parts = [{"text": self.SYSTEM_PROMPT}, {"text": f"USER QUERY: {nl_query}\n\nCANDIDATE PROPERTIES:"}] for p in candidates[:10]: # top-10 post-fusion # Satellite imagery — vision tokens parts.append({"inline_data": { "mime_type": "image/jpeg", "data": p.satellite_before_b64 }}) parts.append({"inline_data": { "mime_type": "image/jpeg", "data": p.satellite_after_b64 }}) # Zoning context + transaction data — text tokens parts.append({"text": f""" Property {p.parcel_id} ({p.address}): ZONING EXCERPTS: {p.zoning_context} TRANSACTION DATA: {p.comparable_sales_json} PERMIT STATUS: {p.permit_status} CLIMATE RISK: Flood={p.flood_risk}, Fire={p.fire_risk} ---"""}) response = self.model.generate_content( contents=[{"role": "user", "parts": parts}], generation_config=GenerationConfig(temperature=0.1, max_output_tokens=8192) ) return self._parse_property_response(response.text)

32-Week Implementation Timeline

Wk 1–4
Data Landscape Audit & Source ContractsInventory all MLS, county, satellite, and environmental data sources. Negotiate imagery API contracts (Planet/Maxar). Map 22-city zoning code inventory — identify PDFs, web portals, and update cadences.
Wk 5–10
Structured Property Data LakeBuild BigQuery ingestion for 2M+ MLS listings, 10-year pricing history, mortgage rate feeds, census demographics, and county property records. Establish PostGIS geospatial index on parcel coordinates. MLS API daily sync with delta detection.
Wk 11–16
Satellite Vision PipelineGeoTIFF ingestion from Planet and Sentinel-2. Change detection model training on construction labels (fine-tuned EfficientNet-B4). NDBI delta computation pipeline. Parcel-level imagery index in Vertex AI Image Search. Street View condition scoring model.
Wk 17–22
Zoning Document Intelligence22-city zoning code PDF corpus ingestion via Document AI. Use table extraction, dimensional standards parsing, overlay district mapping. Embedding pipeline for semantic retrieval. Quarterly update automation from municipal RSS/API feeds.
Wk 23–27
Gemini Multimodal Integration & RAG FusionMultimodal prompt engineering for property reasoning. Triple-stream parallel retrieval (SQL + Vision + Document). Geospatial-aware context assembly (parcel-ID intersection). Climate-adjusted AVM model integration. Broker-facing conversational UI build.
Wk 28–32
UAT, Climate Risk Layer & Production Launch200+ broker query scenarios for UAT. FEMA flood, wildfire, and sea-level model integration into climate-adjusted valuation. Lender-ready evidence pack export. Production cutover with 60-day parallel run against manual analyst workflow.

Engineering Challenges & Solutions

Challenge

Imagery Temporal Misalignment

Planet satellite imagery has 3–5 day revisit cycles; Sentinel-2 is 5–10 days. Construction activity visible in one provider's image might not appear in another's for a week. Comparing imagery across providers introduced false change signals.

Solution

Provider-Normalised Change Detection

Built a temporal alignment layer that normalises all imagery to the same 30-day window using radiometric calibration. Change scores are computed on normalised composites, not raw captures. Confidence intervals account for provider-specific revisit gaps.

Challenge

Zoning Code Versioning

Municipal zoning codes are amended 2–4 times per year per city. A query about a 2024 transaction must retrieve the zoning rules valid in 2024, not the 2026 amendment. Temporal accuracy in regulatory documents has direct legal implications for clients.

Solution

Effective-Date Versioned Embeddings

Every zoning document chunk is embedded with an effective_date metadata field. Retrieval filters on effective_date ≤ query_date to ensure temporally correct results. Quarterly amendment ingestion is automated via municipal portal monitoring.

Challenge

Parcel ID Inconsistency

MLS, county records, satellite imagery metadata, and zoning maps each use different parcel identifier formats — APN, FIPS, MapBlock, and proprietary MLS IDs. Joining across these was the primary integration blocker for the first 6 weeks.

Solution

Canonical Parcel Registry

Built a master parcel registry using county assessor APN as the canonical ID. All other identifier formats map to APN via lookup tables maintained in BigQuery. New source onboarding requires an APN mapping before data can enter the unified store.

Challenge

Gemini Context Window for Vision

Including high-resolution before/after satellite imagery tiles for 10+ candidate properties in a single Gemini context exceeded token limits and significantly increased latency. Initial query times were 18–22 seconds — unacceptable for a conversational interface.

Solution

Two-Stage Vision Retrieval

Pre-filter candidates with the lightweight change-detection model (Stage 1 — fast, runs on all parcels). Only the top-10 candidates by change score receive full satellite imagery tiles in the Gemini context (Stage 2). Average latency dropped from 22s to 5.2s.

Engineering Best Practices

Geospatial First, Not Data-Type First

Organise all data by parcel ID and geographic coordinate as the primary key — not by source system. Every satellite tile, zoning excerpt, and MLS record must be indexable by location. Geospatial joins are your unifying primitive across all three modalities.

Two-Stage Candidate Filtering

Never pass raw retrieval results directly to a multimodal LLM. Use lightweight specialist models (change detection, geospatial filter, keyword filter) to pre-select a small candidate set. LLM reasoning is expensive; save it for the final synthesis step only.

Temporal Metadata on Every Asset

Every data asset — satellite image, zoning document chunk, MLS listing, environmental report — must carry a capture_date or effective_date. Without this, temporal queries ("as of 2024") are impossible to answer accurately, creating legal liability for clients making property decisions.

Citation-Mandatory Outputs

Zoning recommendations must cite specific ordinance section numbers and page references. Satellite change observations must cite image dates and confidence scores. In a real estate context, a hallucinated zoning interpretation could lead a client to make a multi-million dollar acquisition decision on false information.

Climate Risk as a First-Class Data Layer

Integrate climate risk scores (flood, fire, heat, sea-level) into the property record at ingestion time — not as an afterthought overlay. Lenders and insurers are now requiring climate-adjusted valuations at underwriting. Systems that cannot produce these natively will be non-compliant by 2027.

Municipality-Aware Prompt Routing

Zoning codes vary dramatically by municipality — what is "mixed-use permitted by right" in San Francisco is "conditional use only" in adjacent Daly City. The system must identify the governing municipality from parcel coordinates and route to the correct zoning knowledge base before any regulatory answer is generated.

Why This Matters: PropTech 2026–2030

PropTech is the fastest-growing AI vertical in enterprise software, and three structural forces are accelerating the need for exactly the capability this system provides — multimodal property intelligence at conversational speed.

  • Climate Risk Becomes Mandatory: The SEC's climate disclosure rules (effective 2024) and FHFA guidance to Fannie/Freddie require lenders to document climate risk in property valuations. By 2027, any AVM that cannot produce a climate-adjusted valuation will not be acceptable for conforming loan underwriting. This system's climate risk layer is not a feature — it's a compliance requirement arriving at speed.
  • PropTech AI Investment Acceleration: Global PropTech AI investment hit $8.2B in 2025, with the plurality going to data intelligence platforms. HouseEazy, HouseCanary, and Compass have all announced multimodal AI valuation products. Brokerage firms without equivalent capability will lose institutional mandates to those that have it.
  • Satellite Imagery Cost Collapse: Planet Labs and Synthetic Aperture Radar (SAR) providers have driven per-image costs down 80% since 2022. Daily revisit at 3m resolution for entire metropolitan areas is now economically viable — unlocking near-real-time construction and vacancy monitoring as a standard data product, not a premium.
  • NL Interface Becoming the Standard: Real estate professionals are not SQL users. The brokerage firm that can give their brokers a natural-language interface to the full depth of their data — without training, without dashboards, without SQL — will see it adopted immediately and uniformly. The conversational interface is the distribution mechanism that makes the technology actually reach the people who need it.
Forward Look

By 2028, Vipra Software forecasts that 70% of institutional property transactions above $10M will require a machine-generated intelligence report combining satellite change analysis, zoning compliance confirmation, and climate risk scoring as a condition of financing. The firms building this infrastructure today are creating the underwriting standard for the next decade of real estate capital markets.

← Back to Launchpad Start a PropTech Data Project →