VantageDash

Architecture Overview

System architecture, tech stack, data flow, and deployment overview

VantageDash is a competitive intelligence SaaS that tracks competitor pricing, matches products using AI, and provides market insights. It follows a multi-tenant architecture with strict data isolation.

High-Level Architecture

┌─────────────────────────────┐     ┌──────────────────────────────┐
│     Frontend (Vercel)       │     │     Backend (Coolify)        │
│   Next.js 16 + App Router   │────▶│   FastAPI + Python Scripts   │
│   shadcn/ui v4 + Tailwind v4│     │   JWT Auth + Background Tasks│
│   Supabase Auth (client)    │     │   Service Role Key           │
└──────────┬──────────────────┘     └──────────┬───────────────────┘
           │                                    │
           │  Supabase JS SDK                   │  Supabase Python SDK
           │  (RLS-scoped reads)                │  (service role for background ops)
           ▼                                    ▼
┌────────────────────────────────────────────────────────────────────┐
│                    Supabase (PostgreSQL)                            │
│   RLS Policies ──▶ get_user_tenant_id() ──▶ tenant_id scoping     │
│   pgvector extension ──▶ HNSW indexes ──▶ semantic product search  │
│   15 tables ──▶ all FK'd to tenants(id)                            │
│   Auth trigger ──▶ handle_new_user() auto-provisions tenants       │
└────────────────────────────────────────────────────────────────────┘

Frontend Stack

TechnologyPurpose
Next.js 16App Router, server/client components, TypeScript strict mode
shadcn/ui v4Component library (uses @base-ui/react, NOT Radix — render prop, not asChild)
Tailwind v4CSS-first config, no tailwind.config.js
Supabase JSAuth + database queries from server/client components
next-themesLight/dark mode toggle
lucide-reactIcons
rechartsCharts (analytics, price history)
sonnerToast notifications
jspdf + html2canvasPDF/CSV export

Frontend Directory Layout

frontend/src/
├── app/(auth)/           # Login, signup, callback pages
├── app/(dashboard)/      # All 9 dashboard pages (sidebar + header layout)
│   ├── overview/         # Main dashboard with metrics
│   ├── competitors/      # Add/delete competitors
│   ├── products/         # Brand products from Shopify
│   ├── comparison/       # Side-by-side product matches
│   ├── price-history/    # Price trend charts per competitor
│   ├── analytics/        # Recharts analytics dashboard
│   ├── alerts/           # Alert config + history
│   ├── logs/             # Scrape/sync/matching session logs
│   ├── settings/         # General + Matching Profile tabs
│   ├── products/[id]/    # Product detail page
│   └── competitors/[id]/ # Competitor report page
├── components/ui/        # shadcn components (auto-generated)
├── components/dashboard/ # Custom: MetricCard, StatusBadge, etc.
├── components/layout/    # Sidebar, header, mobile nav
├── lib/supabase/         # Client + server helpers
├── lib/api/              # backend.ts — typed FastAPI client
├── lib/utils/            # config.ts, price.ts, export.ts
└── lib/types/            # database.ts (TypeScript interfaces)

Key Frontend Patterns

  • Server components by default"use client" only for forms, interactivity, hooks
  • Data fetching: Server components use createClient() from @/lib/supabase/server; client components use @/lib/supabase/client
  • Every page has: loading state, empty state, error handling (try-catch with error UI cards + retry buttons)
  • Multi-tenant: All Supabase queries automatically scoped by RLS
  • Backend calls: lib/api/backend.ts provides typed helpers (startScrape(), startMatch(), getProfile(), etc.)

Backend Stack

TechnologyPurpose
FastAPIAsync API framework with JWT auth
PydanticSettings validation, request/response models
supabase-pyDatabase access (per-request client with user JWT, service client for background)
RapidFuzzFuzzy string matching
OpenAIGPT-4o-mini for AI matching, text-embedding-3-small for vector embeddings
BeautifulSoupHTML parsing for product extraction
cryptographyFernet encryption for tenant credentials
StripeSubscription billing (Checkout, Portal, webhooks)

Backend Directory Layout

backend/
├── app/
│   ├── config.py         # Pydantic BaseSettings (env vars, get_secret())
│   ├── dependencies.py   # Auth: get_current_user, get_supabase_client, get_tenant_id
│   ├── models.py         # Request/response Pydantic schemas
│   ├── crypto.py         # Fernet encryption for tenant secrets
│   ├── middleware/
│   │   ├── audit.py      # Request ID + structured logging (AU-2, AU-3)
│   │   ├── rate_limit.py # Token bucket per IP (SC-5)
│   │   └── security_headers.py  # CSP, HSTS, etc. (SC-8)
│   ├── routers/
│   │   ├── health.py     # GET /api/health
│   │   ├── scrape.py     # POST /api/scrape + status polling
│   │   ├── sync.py       # POST /api/sync + status polling
│   │   ├── match.py      # POST /api/match (ai/fuzzy/hybrid/embed/validate)
│   │   ├── profile.py    # GET/PUT/DELETE /api/profile
│   │   ├── credentials.py # GET/PUT/DELETE /api/tenant/credentials
│   │   └── data_lifecycle.py # GET/DELETE /api/tenant/data
│   └── services/
│       ├── scrape_service.py  # Async wrapper for scraper.py
│       ├── sync_service.py    # Async wrapper for shopify_sync.py
│       ├── match_service.py   # Async wrapper for matching scripts
│       ├── alert_service.py   # Price change detection + alerts
│       ├── notification_service.py  # Slack + webhook delivery
│       ├── billing_service.py # Stripe subscription + plan limits
│       └── scheduler_service.py   # Background scrape scheduling
├── tests/                # ~1,357 pytest tests across 50 files
├── Dockerfile            # Python 3.12 slim, non-root user
└── requirements.txt      # Production dependencies

Root Python Scripts

These scripts predate the FastAPI backend and run standalone or are imported by backend services:

ScriptPurpose
scraper.pyMulti-strategy competitor scraping (Shopify, Uline, WooCommerce, Magento, Wix, Ecwid, Playwright+AI)
ai_matcher.pyGPT-4o-mini semantic product matching
product_matcher.pyRapidFuzz fuzzy matching
shopify_sync.pyShopify Admin API product sync
embedding_service.pyOpenAI text-embedding-3-small for pgvector
industry_profile.pyIndustryProfile dataclass + DB loader with caching
industry_templates.py8 pre-built industry profile templates
price_utils.pyPPU (price-per-unit) calculations

All entry points accept db=None, tenant_id=None for FastAPI injection while remaining backward-compatible for standalone use.

New Shared Components (Session 44)

ComponentPurpose
confidence-bar.tsxReusable confidence percentage bar (extracted from comparison page)
competitor-avatar.tsxShows competitor favicon via Google's favicon service
competitor-link.tsxCompetitor name with avatar linking to /competitors/[id]
searchable-product-select.tsxProduct combobox using @base-ui/react/combobox for Edit/Create match dialogs

Scrape Progress (Session 44)

The RunTaskButton component now shows live scrape progress during scraping operations:

  • Polls progress_percent + progress_message from the scrape session
  • Displays a progress bar with percentage and descriptive message
  • Updates in real-time until the scrape completes or fails

Authentication & Multi-Tenancy

Auth Flow

  1. User signs up via Supabase Auth (email/password)
  2. DB trigger handle_new_user() auto-creates a tenants row + user_tenants mapping
  3. Frontend stores Supabase JWT in cookies
  4. Backend validates JWT via get_current_user() dependency
  5. RLS policies use get_user_tenant_id() SQL function to scope all queries

Request Flow (Backend)

Request → CORS → SecurityHeaders → RateLimit → Audit → Router

                                               get_current_user()
                                               (validates Supabase JWT)

                                               get_supabase_client()
                                               (per-request, RLS-scoped)

                                               get_tenant_id()
                                               (reads user_tenants)

                                               get_user_role() [optional]
                                               require_role() guard

                                               Background Task
                                               (service role client +
                                                explicit tenant_id)

Two Client Patterns

  1. Per-request client (get_supabase_client): Created with user's JWT token, RLS automatically scopes all queries to the tenant. Used for synchronous reads.
  2. Service client (get_service_client): Created with service role key, bypasses RLS. Used for background tasks where the user's JWT may expire. Callers must filter by tenant_id explicitly.

Role-Based Access Control

The require_role() dependency factory enforces endpoint-level authorization:

  • owner: Full control — manage team, settings, credentials, data lifecycle
  • admin: Invite/remove members (except owner), configure settings, run pipelines
  • member: View all data, run scrapes and matches, add competitors
  • viewer: Read-only — view dashboards, export data

The get_user_role() dependency reads the current user's role from user_tenants (scoped by RLS). Frontend components use GET /api/team/me to conditionally render admin-only UI.

Invite Flow

Admin sends invite → team_invitations row created → Supabase invite email sent

Recipient clicks link → auth callback → exchangeCodeForSession

                                              handle_new_user() trigger fires
                                              (checks team_invitations for pending invite)

                                    ┌─── Found: joins existing tenant with invited role
                                    └─── Not found: creates new tenant as owner

Auth callback detects invited users (non-owner role) and skips the onboarding wizard.

Resilient Startup

The backend is designed to start even with missing config:

  1. config.py stores settings_error instead of crashing if env vars are missing
  2. main.py starts with only the health router if settings fail to load
  3. Health endpoint reports config_error and env_hint fields for debugging
  4. All operational routers only mount if settings loaded successfully

Billing & Monetization

Status: Stripe billing integration is live as of 2026-03-17.

VantageDash uses Stripe for subscription management with a 3-tier pricing model:

PlanPriceCompetitorsMonthly BudgetsKey Features
Free$0/mo25 scrapes, 0 AI, 0 embeddingsManual scraping, fuzzy matching
Pro$49/mo10100 scrapes, 50 AI, 0 embeddingsAI matching, 24h auto-scrape, webhooks
Enterprise$199/moUnlimitedUnlimited scrapes/AI, 100 embeddingsVector embeddings, 1h auto-scrape, priority support

Billing Flow

User clicks "Upgrade" → POST /api/billing/checkout

                   Stripe Checkout session created
                   (with tenant_id in metadata)

                   User completes payment on Stripe

            Stripe sends checkout.session.completed webhook

                   POST /api/billing/webhook
                   (signature-verified, no JWT)

                   billing_service upserts subscription row
                   + logs billing_event

                   Plan limits enforced on next API call

Key implementation details:

  • Webhook-driven: All subscription state changes flow through Stripe webhooks, not client-side confirmation
  • Feature gating: billing_service.py checks plan limits before allowing competitor additions, AI matching, embedding generation, and auto-scrape scheduling
  • Stripe Customer Portal: Users manage payment methods, view invoices, and cancel/switch plans via Stripe's hosted portal
  • Database tables: subscriptions (plan state per tenant) and billing_events (webhook audit log), both RLS-enforced
  • Free by default: New tenants start on the Free plan with no Stripe interaction required

Design Theme

Warm Neon Aero — a warm glass + golden neon aesthetic with light/dark mode support:

Color Palette

TokenLight ModeDark ModePurpose
--background#faf8f5 (warm cream)#0d0b08 (warm black)Page background
--primary#b07a00 (deep gold)#f0a500 (bright gold)Buttons, active states, links
--aero-gold#f0a500#f0a500Brand accent color
--aero-honey#ffd166#ffd166Highlight / glow
--aero-amber#e07c24#e07c24Secondary accent
--aero-bronze#b8723b#b8723bBorders, shadows
--aero-ember#c44d1a#c44d1aWarm emphasis
--aero-green#69f0ae#69f0aeSuccess indicators
--aero-pink#ff4081#ff4081Destructive / alert

Visual Effects

  • Glassmorphism cards: backdrop-filter: blur(20px) saturate(180%) with golden border glow on hover
  • Ambient gradient orbs: Warm golden radial gradients fixed to the body background
  • Light mode: Cream/ivory base with bronze-tinted glass cards and muted gold accents
  • Dark mode: Deep warm black base (#0d0b08) with golden neon edge glow on cards

Typography

FontCSS VariableUsage
Instrument Sans--font-headingHeadings, nav links, labels, card titles, buttons
Space Grotesk--font-spaceBody text (set on <body>)
Geist Mono--font-geist-monoCode, SKUs, monospace data

Instrument Sans replaced Sora in session 44 for a cleaner, more professional feel. Logo gradient darkened to #c07800→#e09520. Landing page buttons use rounded-md (not rounded-lg).

Light/Dark Mode Toggle

Implemented via next-themes (ThemeProvider wraps the app in layout.tsx):

  • attribute="class" — toggles .dark class on <html>
  • defaultTheme="dark" — dark mode by default
  • enableSystem — respects OS preference
  • Toggle component in the dashboard header

Accent Color Customization

Tenants can set a brand color in Settings that overrides the default gold accent:

  • AccentColorProvider reads brandColor from tenant settings
  • theme.ts utilities (applyAccentColor, clearAccentColor) set CSS custom properties on <html>
  • Overrides --primary, --ring, --sidebar-primary, --chart-1, and the --color-aero-* tokens
  • Automatic WCAG contrast calculation for foreground text via getContrastForeground()
  • Reverts to default palette when the provider unmounts

New files: components/providers/theme-provider.tsx, components/providers/accent-color-provider.tsx, lib/utils/theme.ts

Observability (Session 37)

Sentry — Error Tracking + Performance

  • Frontend: @sentry/nextjs with client, server, edge configs. instrumentation.ts hooks for Next.js 16.
  • Backend: sentry-sdk[fastapi] initialized in main.py before FastAPI creation.
  • Traces sample rate: 10% in production, 100% in development.
  • Session replay: 10% sessions, 100% on error.

PostHog — Product Analytics

  • posthog-js with PostHogProvider in root layout (app/layout.tsx).
  • Auto page view tracking via usePathname().
  • Custom events: scrape_started, match_started, competitor_added.
  • Helper: lib/analytics.tstrackEvent(), identifyUser().

Env Vars Required

VariableWherePurpose
NEXT_PUBLIC_SENTRY_DSNVercelFrontend Sentry
SENTRY_AUTH_TOKENVercel (build)Source map uploads
NEXT_PUBLIC_POSTHOG_KEYVercelPostHog analytics
NEXT_PUBLIC_POSTHOG_HOSTVercelPostHog host (default: us.i.posthog.com)
SENTRY_DSNRailwayBackend Sentry

Super-Admin Dashboard (Session 37)

Architecture

  • **super_admins** table: user_id UUID PK — RLS enabled with no policies (service_role only).
  • **is_super_admin: true** in user app_metadata — checked by frontend layout.
  • Backend: dependencies_admin.pyrequire_super_admin() validates against super_admins table via service_role.
  • Router: routers/admin.py — 6 endpoints under /api/admin, all require super_admin.

Admin Endpoints

EndpointReturns
GET /api/admin/statsTotal tenants, active tenants, users, scrape sessions, success rate, MRR
GET /api/admin/tenants?page=NPaginated tenant list with competitor/product counts
GET /api/admin/tenants/{id}Tenant drill-down: members, competitors, recent scrapes
GET /api/admin/activityCross-tenant activity feed (last 50 sessions)
GET /api/admin/errorsFailed sessions with error messages
GET /api/admin/billingSubscribers by plan, MRR calculation

Frontend Routes

RoutePage
/adminPlatform KPIs, billing overview, errors, activity feed
/admin/tenantsPaginated tenant table with search and drill-down
/admin/tenants/[id]Tenant detail: members, competitors, scrapes, subscription
/admin/activityCross-tenant scrape/match/sync activity feed

Access Control

  • Header user dropdown shows "Admin Dashboard" link only when isSuperAdmin prop is true.
  • Dashboard layout checks user.app_metadata.is_super_admin.
  • Backend enforces via super_admins table lookup on every admin endpoint.

Code Quality & Shared Utilities (Session 39)

Python Linting

All Python code is linted and formatted with ruff (configured in pyproject.toml at repo root).

  • Rules: E/F/W (pycodestyle/pyflakes), I (isort), UP (pyupgrade), B (bugbear), SIM (simplify), RUF
  • Pre-commit: .husky/pre-commit runs ruff check and ruff format --check on staged .py files
  • Dev dependency: ruff>=0.8.0 in backend/requirements-dev.txt

Shared Backend Utilities

ModulePurpose
backend/app/utils/sessions.pyupdate_session_status() — single helper replacing 23+ duplicated session-update try/except blocks across services and scripts
backend/app/constants.pyCentralized magic numbers: timeouts, pagination limits, matcher thresholds, AI params, webhook config

Shared Frontend Utilities

ModulePurpose
frontend/src/lib/utils/tenant.tsgetCurrentTenantId(supabase) — replaces 8 duplicated user_tenants fetch patterns
frontend/src/lib/utils/date.tstimeAgo(), getWeekKey() — shared date formatting
frontend/src/lib/constants.tsPOLLING, CONFIDENCE_TIERS, CHART_COLORS — app-wide constants

Analytics Chart Components

The analytics page is split into focused components in frontend/src/components/dashboard/analytics/:

ComponentRenders
CompetitorPriceDiffChartHorizontal bar chart of avg price difference by competitor
PriceCompetitivenessChartDonut chart of win/loss price distribution
CompetitorTrendChartLine chart of price diff trends over time
ConfidenceChartBar chart of match confidence distribution
CategoryBreakdownPie chart of matches by product category
BiggestPriceGapsList of products with largest price differences

Contributor Onboarding

  • CONTRIBUTING.md — Dev setup, testing, code style, branch naming, PR checklist
  • backend/README.md — FastAPI structure, auth flow, how to add endpoints, test patterns
  • README.md — Updated to reflect current Next.js + FastAPI stack (was outdated Streamlit-era)

Error Handling Policy

All production code follows: never silently swallow exceptions. Every except block either:

  1. Re-raises the exception, or
  2. Logs a logger.warning() with context before continuing

Recent Changes (Sessions 38–44)

Last updated: 2026-03-20

Session 38 — Quality Fixes

  • Fixed 7 vitest test errors across frontend test suite
  • Billing URL config: FRONTEND_URL env var added to backend Settings for Stripe checkout redirect URLs
  • Admin error boundary: added error handling to super-admin pages to prevent white-screen crashes
  • Logs page type safety: fixed TypeScript type errors in scrape/sync/matching session log displays

Session 39 — Code Maintainability Overhaul

  • Ruff linting fully integrated: pyproject.toml at repo root configures rules (E/F/W/I/UP/B/SIM/RUF), .husky/pre-commit runs ruff check + ruff format --check on staged .py files
  • Shared backend utilities: backend/app/utils/sessions.py (update_session_status() replacing 23+ duplicated blocks), backend/app/constants.py (centralized magic numbers)
  • Shared frontend utilities: lib/utils/tenant.ts (getCurrentTenantId()), lib/utils/date.ts (timeAgo(), getWeekKey()), lib/constants.ts (POLLING, CONFIDENCE_TIERS, CHART_COLORS)
  • Error handling policy: all except blocks now either re-raise or log logger.warning() with context — no more silent swallows
  • Contributor docs: CONTRIBUTING.md, backend/README.md, updated root README.md

Session 40 — Monitoring Activation

  • Sentry activated: Env vars set in both Vercel (NEXT_PUBLIC_SENTRY_DSN, SENTRY_AUTH_TOKEN, SENTRY_ORG, SENTRY_PROJECT) and Railway (SENTRY_DSN). Error tracking + performance monitoring live in production.
  • PostHog activated: NEXT_PUBLIC_POSTHOG_KEY set in Vercel. Product analytics, session replay, and custom event tracking active.
  • 8 new Shopify competitors scraped: Dragon Chewer, Green Tech Packaging, Marijuana Packaging, CannaSup Co, Grove Bags, Sana Packaging, Flush Packaging, 420 Science. Total: 17 competitors, 7,648 products.

Session 41 — CI & Stripe Finalization

  • GitHub Actions set to manual-only: All 4 workflows (ci.yml, backend CI, Playwright, security) changed to workflow_dispatch only. No more auto-triggers on push/PR (free minutes exhausted). Re-enable by restoring on: push/pull_request triggers.
  • Stripe setup wizard completed: Tax collection enabled (automatic mode, SaaS tax code txcd_10000000, Maryland registered, NAICS 541512). Checkout portal configured.

Session 42 — Scraper Resilience & Marketing Page

Scraper Retry Logic

  • Per-competitor retry with exponential backoff: Each competitor scrape now retries up to 3 times on failure with backoff delays, preventing a single timeout from killing an entire batch scrape
  • Session-level finally block: scrape_sessions status always gets updated (completed/failed) even on unexpected exceptions
  • Single-competitor session tracking: POST /api/scrape/{competitor_id} now creates its own scrape session (previously only batch scrapes had sessions)

Frontend Route Restructuring

  • Public landing/marketing page: New (marketing) route group with a public page at / — the root URL now shows a marketing/landing page instead of redirecting to login
  • Dashboard moved to **/overview**: The main dashboard page moved from / to /overview within the (dashboard) route group
  • Updated directory layout:
frontend/src/app/
├── (marketing)/       # Public pages (landing page at /)
├── (auth)/            # Login, signup, callback
├── (dashboard)/       # All dashboard pages (requires auth)
│   ├── overview/      # Main dashboard (was previously at /)
│   ├── competitors/
│   ├── products/
│   └── ...
└── (admin)/           # Super-admin pages

Session 43 — Warm Neon Aero Redesign

Visual Redesign

  • Color palette overhaul: Replaced cyan/purple Alienware palette with warm gold/amber/bronze ("Warm Neon Aero"). New CSS custom properties: --aero-gold (#f0a500), --aero-honey (#ffd166), --aero-amber (#e07c24), --aero-bronze (#b8723b), --aero-ember (#c44d1a).
  • Heading font: Sora replaced Exo 2 (--font-heading variable). Warmer geometric sans-serif that pairs better with the gold accent palette.
  • Light/dark mode: Added next-themes ThemeProvider in root layout. Light mode uses warm cream base (#faf8f5), dark mode uses warm black (#0d0b08). Both modes have matching glassmorphism card effects and ambient gradient orbs in gold/bronze tones.
  • Landing page bento grid: Marketing page at / redesigned with a 2+4 bento grid layout (2 large feature cards + 4 compact cards), "How it Works" numbered steps, and pricing cards. All using the warm neon aero palette.

Accent Color Customization

  • **AccentColorProvider** (components/providers/accent-color-provider.tsx): Reads tenant's brand color from settings and injects it as CSS custom properties via applyAccentColor().
  • **theme.ts** (lib/utils/theme.ts): Utilities for hex-to-RGB conversion, WCAG contrast calculation, color lightening/darkening, and applying/clearing accent color overrides on <html>.
  • Overridden tokens: --primary, --primary-foreground, --ring, --sidebar-primary, --chart-1, --color-aero-gold/honey/bronze.

Product Variant Expansion

  • Products page now shows expandable variant sub-rows. Products with multiple variants display a chevron toggle and "N variants" badge.
  • Clicking a product row expands inline sub-rows showing each variant's title, SKU, price, pack quantity, and calculated PPU.
  • Variant sub-rows use extractPackQuantity() from price.ts to parse pack sizes from variant titles and compute per-unit pricing.

Updated Test Counts

SuiteCount
vitest (frontend)651 tests (53 files)
pytest (backend)1,427+ tests (50+ files)
Playwright (e2e)~170 tests (8 files)
Total~2,240+

Recent Updates (Sessions 38–42)

Session 38 — Quality Fixes

  • Fixed 7 vitest test errors (type mismatches, missing mocks)
  • Billing URL configuration fix (backend URL from env vars)
  • Admin page error boundary added
  • Logs page type safety improvements

Session 39 — Code Maintainability Overhaul

  • Ruff linting: Configured in pyproject.toml, enforced via pre-commit hook
  • Shared utilities: backend/app/utils/sessions.py, backend/app/constants.py, frontend/src/lib/utils/tenant.ts, frontend/src/lib/utils/date.ts, frontend/src/lib/constants.ts
  • Error handling: All production except Exception: pass replaced with logger.warning()
  • Type hints: Return types on all 9 critical Python entry points
  • Component architecture: Analytics page split into 6 extracted chart components
  • Contributor docs: CONTRIBUTING.md, backend/README.md, updated root README.md

Session 40 — Monitoring & Scraping

  • Sentry activated: Env vars set in Vercel (NEXT_PUBLIC_SENTRY_DSN, SENTRY_AUTH_TOKEN) and Railway (SENTRY_DSN)
  • PostHog activated: NEXT_PUBLIC_POSTHOG_KEY set in Vercel — product analytics + session replay live
  • 8 new Shopify competitors scraped (17 total, 7,648 products)

Session 41 — CI & Stripe

  • GitHub Actions: All 4 workflows set to workflow_dispatch only (manual trigger, no failure emails)
  • Stripe setup wizard: Completed — Stripe Tax enabled, checkout portal configured, SaaS tax code applied

Session 42 — Scraper Resilience, Landing Page, Wiki Update

  • Scraper resilience: Per-competitor retry with backoff (1 retry, 3s delay), session-level finally block guarantees sessions are never stuck on "running", single-competitor scrapes now create session rows for progress tracking
  • Landing/marketing page: New (marketing) route group with public landing page at /. Dashboard overview moved from / to /overview. Hero, features grid, pricing cards, CTA sections. Middleware updated to allow / without auth.
  • Route changes: All sidebar/nav links updated from / to /overview. Auth callbacks redirect to /overview.
  • Test count: 651 vitest + 1,389+ pytest = ~2,210+ total (zero failures)

Session 43 — Warm Neon Aero Redesign

  • Color palette: Warm gold/amber/bronze replacing cyan/purple Alienware palette. New --aero-gold/honey/amber/bronze/ember CSS custom properties.
  • Typography: Sora font replacing Exo 2 for headings (--font-heading). Later replaced by Instrument Sans in session 44.
  • Light/dark mode: next-themes ThemeProvider added to root layout. Light mode cream base, dark mode warm black.
  • Accent color customization: AccentColorProvider + theme.ts allow per-tenant brand color overrides with WCAG contrast calculation.
  • Product variants: Expandable variant sub-rows on Products page with pack quantity and PPU per variant.
  • Landing page redesign: Bento grid layout (2 large + 4 compact feature cards), numbered "How it Works" steps, warm palette pricing cards.
  • New files: theme-provider.tsx, accent-color-provider.tsx, theme.ts

Session 44 — UI Polish, New Pages, Playwright Scraping

  • Typography: Heading font changed from Sora to Instrument Sans for a cleaner, more professional look.
  • Logo: Gradient darkened to #c07800→#e09520.
  • Landing page: Buttons changed from rounded-lg to rounded-md.
  • New pages: /products/[id] (product detail page) and /competitors/[id] (competitor report page).
  • New shared components: confidence-bar.tsx (extracted), competitor-avatar.tsx (Google favicon service), competitor-link.tsx, searchable-product-select.tsx (@base-ui/react/combobox).
  • Edit/Create match dialogs: Now use searchable product combobox instead of plain select.
  • CompetitorAvatar: Displays favicons from competitor URLs via Google's favicon service.
  • Scrape progress bar: RunTaskButton shows live progress_percent + progress_message during scraping.
  • Scraper fallback chain expanded to 7 steps: Shopify → Uline → WooCommerce API → WooCommerce Sitemap → HTML Listing → Playwright (JS) → Firecrawl+AI.
  • Playwright headless browser: New step 6 in the fallback chain for JS-rendered sites (Squarespace+Ecwid, React SPAs). Uses Playwright with Chromium to render pages and extract product data.
  • Design & Customize: Identified as Squarespace+Ecwid (not WooCommerce) — requires Playwright to scrape prices.

Session 46 — SEO, Blog, Ecwid Scraper

SEO

  • metadataBase: Set to https://vantagedash.io in root layout — all relative OG URLs resolve correctly.
  • Open Graph + Twitter cards: Title, description, and image on all pages.
  • JSON-LD structured data: Organization + SoftwareApplication schemas in root layout.
  • Dynamic OG image: opengraph-image.tsx generates a branded 1200x630 image at build time.
  • sitemap.ts: Auto-generates sitemap with all static pages + blog posts.
  • robots.ts: Allows all crawlers, points to sitemap URL.

Blog (/blog)

  • Route under (marketing) layout (standalone nav + footer, no sidebar).
  • Markdown posts stored in frontend/content/blog/ with YAML frontmatter (title, date, tags, excerpt).
  • Blog index page (/blog) shows post cards with excerpts, reading time, and tags.
  • Individual post pages (/blog/[slug]) render markdown with Tailwind typography (@tailwindcss/typography).
  • CTA footer on each post drives signup conversions.
  • Blog link added to marketing navigation bar.
  • 3 initial posts: competitor price tracking, AI vs fuzzy matching, PPU comparison.
  • New files: lib/blog/index.ts (markdown loading, frontmatter parsing), content/blog/*.md.

Ecwid Scraper

  • New scrape_ecwid_store() function in scraper.py.
  • Auto-detects Ecwid store ID from page HTML (regex patterns for ecwid.com or Ecwid. references).
  • Uses Ecwid public REST API (app.ecwid.com/api/v3/{store_id}/products) — no auth token needed.
  • Pagination support (100 products per page).
  • Falls back to Playwright DOM scraping if API returns errors.
  • Fallback chain updated to 8 steps: Shopify → Uline → WooCommerce API → WooCommerce Sitemap → HTML Listing → Playwright (JS) → Ecwid API → Firecrawl+AI.

Session 47 — Firecrawl Eliminated, Ecwid Fixed, Blog Expansion, Email Drip

Firecrawl Eliminated

The paid Firecrawl dependency has been completely removed:

  • Removed firecrawl-py from backend/requirements.txt, replaced with beautifulsoup4.
  • Deprecated scrape_non_shopify_store() — emits DeprecationWarning, no longer in fallback chain.
  • New scrape_with_playwright_and_ai() — uses free Playwright rendering + OpenAI text extraction. Cost: ~$0.003/site (vs $0.30 with Firecrawl).

Ecwid Scraper Fixed

  • _detect_ecwid_store_id() now returns (store_id, token) tuple.
  • fetch_ecwid_products() tries 3 token strategies: extracted token, public_{store_id}, no-token fallback.
  • XHR interception: Playwright captures Ecwid API responses during browser rendering.

Enhanced Playwright Scraper

  • XHR interception captures product data from JSON API responses.
  • Scroll-to-load: Auto-scrolls 3x viewport height to trigger lazy loading.
  • More CSS selectors: WooCommerce, Ecwid v2+, BigCommerce, generic.
  • _extract_product_from_card() DRY helper and _parse_price_text() handles price ranges.
  • Wait time increased from 3s to 5s.

BeautifulSoup HTML Scraper

New Strategy 4 in scrape_product_listing_html() using schema.org microdata and CSS class patterns.

Updated Fallback Chain

  1. Shopify /products.json (FREE)
  2. Uline dedicated scraper (FREE)
  3. WooCommerce Store API (FREE)
  4. Generic Sitemap + JSON-LD (FREE)
  5. HTML Listing + BeautifulSoup (FREE, enhanced)
  6. Ecwid API with token auth (FREE, fixed)
  7. Playwright JS renderer (FREE, enhanced)
  8. Playwright + OpenAI extraction (~$0.003/site, replaces Firecrawl)

Blog Expansion

7 new SEO-targeted posts (10 total): competitive pricing, packaging industry, Shopify analysis, price alerts, supplements, web scraping legal, product matching algorithms.

Email Drip System

  • email_drip_log table + pg_cron + pg_net extensions enabled.
  • Edge Function send-drip-email deployed (Resend free tier).
  • 4-email sequence: Day 0 welcome, Day 1 add competitor, Day 3 landscape, Day 7 upgrade.
  • pg_cron runs daily at 2:07pm UTC. Set RESEND_API_KEY in Supabase secrets to activate.

Tests: 1,417 pytest + 651 vitest = 2,068+ total, zero failures.

Session 48 — RLS Bug Fix, Blog Public, Scraper Platform Expansion

Critical Bug Fix: Single-Competitor Scrape RLS Bypass

POST /api/scrape/{competitor_id} was saving 0 products due to an RLS bypass bug. The single-competitor code path did not inject the auth-scoped DB client into scrape_and_save_store() — it fell through to the anon key via _ensure_supabase(), causing RLS to silently block all product_tracking inserts. The batch scraper (POST /api/scrape) worked because run_scraper receives the DB client directly.

Fix: Inject the service-role DB client in run_scrape_single + add _ensure_supabase safety net in scrape_and_save_store.

Blog Made Public

/blog and /blog/* routes were behind auth middleware, breaking SEO crawling. Added to the public paths list in frontend/src/middleware.ts.

Scraper Platform Improvements

  • Magento: Added _extract_product_from_microdata() fallback using BeautifulSoup for Magento pages that use schema.org microdata instead of JSON-LD. Sitemap URL quality check now validates microdata alongside JSON-LD.
  • Wix: Added Wix-specific DOM selectors (data-hook, ProductItem) for Playwright. Added Wix Stores API interception in XHR handler.
  • Ecwid: Expanded shop_paths with /shop-all, /all-products, /all for Ecwid stores.
  • RSS/WP Feed: Accept all URLs from WordPress product feeds (not just /product/ paths).
  • DRY refactoring: Extracted _xhr_items_to_products() and _extract_price_from_xhr_item() helpers.

Scrape Logging for Single-Competitor Scrapes

Single-competitor scrapes (POST /api/scrape/{id}) now write scrape_logs entries for debugging, matching the behavior of batch scrapes.

Email Template Assets

Added stacked windows SVG at /email/stacked-windows.svg for use in drip email templates (CSS positioning stripped by email clients).

RLS Policy Hardening

Replaced tautology RLS policies (USING true) in supabase_enable_rls.sql migration script with real get_user_tenant_id() enforcement matching the live Supabase state. All 20 public tables confirmed to have proper tenant isolation.

Marketing Claims Updated

  • "NIST 800-53 compliance" changed to "NIST 800-53 aligned" (no third-party audit conducted)
  • "Firecrawl+AI" feature badge replaced with "Ecwid" (Firecrawl eliminated in session 47)

Session 50 — Scraper Accuracy Testing & Fixes (2026-03-21)

Intensive scraper testing against all 17 live competitor sites with Chrome visual verification. Fixed 6 critical data quality bugs.

Bugs Fixed

BugAffected StoreFix
Sitemap sampling grabbed category pages firstClearBags (Magento)Smart sampling prefers .html URLs and SKU-like slugs over category pages
<dialog> "Product modal" extracted as product nameClearBagsRemove <dialog> elements before BS4 parsing; blocklist UI text
Playwright extracted "Quick View" overlay textMylar Legends (Wix)Added [data-hook='product-item-name'] and [data-hook='product-item-price-to-pay'] as priority selectors
All $0 prices on Wix storesMylar LegendsAdded handlers for convertedPriceData, formattedPrice, variant nested price formats
Uline keyword substring false positivesUlineWord-boundary regex matching ("tin" no longer matches "counting", "cart" no longer matches "carton")
Duplicate products across scrape sessionsAll storesDelete existing competitor products before inserting fresh snapshot (each scrape is a full replacement)

Additional Improvements

  • WooCommerce price_range fallback for variable products (range-priced items)
  • Googlebot UA fallback when Chrome UA gets 403
  • Sitemap product cap raised 300 to 500, with .html URLs prioritized
  • Magento-specific CSS selectors added to both BeautifulSoup and Playwright
  • [data-hook='product-item-root'] Wix DOM selector added
  • 10 new scraper tests (175 total scraper tests, 183 scrape-related total)

Verified Results

  • ClearBags: 0 products to 500 products with real names and prices (verified in Chrome: exact match)
  • Mylar Legends: 21 products at $0.00 to 10 products with real prices ($0.11-$3.00)
  • Biohazard Inc: 6,898 rows (3,440 unique) to ~3,440 on next scrape via dedup fix

Known Limitations

  • Cannaline: sgcaptcha bot protection blocks ALL HTTP requests (even Googlebot). Requires captcha-solving or headful browser approach.
  • Design & Customize: Ecwid API returns 403; prices only visible on individual product detail pages (not listings). 16/77 product names extracted, 0 prices.

Session 49 — Operational Completion (2026-03-20)

  • Resend API key activated: Email drip system now live — RESEND_API_KEY set in Supabase Edge Function secrets. 4-email onboarding sequence (welcome → add competitor → features → upgrade) runs daily at 2:07pm UTC via pg_cron, sent from onboarding@vantagedash.io.
  • 3 WooCommerce competitors re-scraped: Mylar Legends (21 products), ClearBags (13 products), Design & Customize (4 products) — all scraped successfully with enhanced Playwright + BeautifulSoup scrapers (previously failed due to exhausted Firecrawl credits).
  • All operational items complete: Product is feature-complete with all infrastructure active. Remaining items: go-to-market (customer acquisition) and Shopify App Store (deferred).
  • SVG favicon: Orange gradient circle with white lightning bolt (icon.svg), replaces old favicon.ico. Matches the .vd-logo used across nav/sidebar/auth pages.
  • Monthly usage budget enforcement: PLAN_LIMITS now includes monthly_scrapes, monthly_ai_matches, monthly_embeddings. check_monthly_limit() in billing_service.py checks against session tables within billing period. Scrape router returns 429 at limit, match router returns 403 for disallowed methods + 429 at limit. Frontend "Monthly Budget" card shows usage bars.

Session 51 — Stealth Playwright, Pagination, Enrichment, New Competitors (2026-03-21)

  • Session 52 (2026-03-21): Railway→Coolify doc cleanup complete (10 files), SKS Bottle scraping fixed (BS4 Strategy C + sitemap listing fallback), 13 new tests (200 scraper total, 2,278+ overall)

Features Added

FeatureImpact
playwright-stealthBypasses bot detection (Cannaline sgcaptcha solved: 0 → 365 products)
PaginationFollows next-page links up to 15 pages (Cannaline: 29 → 365)
Detail page enrichmentVisits product URLs for prices when listings hide them (D&C: 0 → 14 prices)
Shop page preferenceTries /shop when root has <10 products (D&C: 4 → 14 products)
WooCommerce detail selectorswait_for_selector for async-loaded variation prices

5 New Competitors (22 Total, 7 Platforms)

CompetitorPlatformProductsCoverage
PackFreshUSABigCommerce104100%
420 StockWooCommerce482100%
SKS Bottle & PackagingCustom PHP4100%
CannaZipWooCommerce8488%
Specialty BottleBigCommerce500~100%

Previously "Impossible" Stores Now Working

  • Cannaline: 365 products, 98.4% with prices (sgcaptcha bypassed by stealth)
  • Design & Customize: 14 products, 100% with prices (detail enrichment + shop preference)

Deployment Change

  • Backend migrated from Railway to Coolify (Docker-based, same Dockerfile)
  • api.vantagedash.io now points to Coolify instance
  • Dockerfile is platform-agnostic (curl added for Coolify healthcheck)

Test Counts

  • 200 scraper tests (175 + 12 new), 2,278+ total tests — zero failures

On this page

High-Level ArchitectureFrontend StackFrontend Directory LayoutKey Frontend PatternsBackend StackBackend Directory LayoutRoot Python ScriptsNew Shared Components (Session 44)Scrape Progress (Session 44)Authentication & Multi-TenancyAuth FlowRequest Flow (Backend)Two Client PatternsRole-Based Access ControlInvite FlowResilient StartupBilling & MonetizationBilling FlowDesign ThemeColor PaletteVisual EffectsTypographyLight/Dark Mode ToggleAccent Color CustomizationObservability (Session 37)Sentry — Error Tracking + PerformancePostHog — Product AnalyticsEnv Vars RequiredSuper-Admin Dashboard (Session 37)ArchitectureAdmin EndpointsFrontend RoutesAccess ControlCode Quality & Shared Utilities (Session 39)Python LintingShared Backend UtilitiesShared Frontend UtilitiesAnalytics Chart ComponentsContributor OnboardingError Handling PolicyRecent Changes (Sessions 38–44)Session 38 — Quality FixesSession 39 — Code Maintainability OverhaulSession 40 — Monitoring ActivationSession 41 — CI & Stripe FinalizationSession 42 — Scraper Resilience & Marketing PageScraper Retry LogicFrontend Route RestructuringSession 43 — Warm Neon Aero RedesignVisual RedesignAccent Color CustomizationProduct Variant ExpansionUpdated Test CountsRecent Updates (Sessions 38–42)Session 38 — Quality FixesSession 39 — Code Maintainability OverhaulSession 40 — Monitoring & ScrapingSession 41 — CI & StripeSession 42 — Scraper Resilience, Landing Page, Wiki UpdateSession 43 — Warm Neon Aero RedesignSession 44 — UI Polish, New Pages, Playwright ScrapingSession 46 — SEO, Blog, Ecwid ScraperSEOBlog (/blog)Ecwid ScraperSession 47 — Firecrawl Eliminated, Ecwid Fixed, Blog Expansion, Email DripFirecrawl EliminatedEcwid Scraper FixedEnhanced Playwright ScraperBeautifulSoup HTML ScraperUpdated Fallback ChainBlog ExpansionEmail Drip SystemTests: 1,417 pytest + 651 vitest = 2,068+ total, zero failures.Session 48 — RLS Bug Fix, Blog Public, Scraper Platform ExpansionCritical Bug Fix: Single-Competitor Scrape RLS BypassBlog Made PublicScraper Platform ImprovementsScrape Logging for Single-Competitor ScrapesEmail Template AssetsRLS Policy HardeningMarketing Claims UpdatedSession 50 — Scraper Accuracy Testing & Fixes (2026-03-21)Bugs FixedAdditional ImprovementsVerified ResultsKnown LimitationsSession 49 — Operational Completion (2026-03-20)Session 51 — Stealth Playwright, Pagination, Enrichment, New Competitors (2026-03-21)Features Added5 New Competitors (22 Total, 7 Platforms)Previously "Impossible" Stores Now WorkingDeployment ChangeTest Counts