Project

FreightSense

AI-powered shipment delay intervention engine Deterministic risk scoring · LLM reasoning · Human-override audit

GenAI Feb 27, 2026 6 min read
Groq LLM FastAPI Docker GCP

What is FreightSense?#

FreightSense is a two-layer AI decision system that helps operations teams decide exactly what to do when a shipment is at risk of arriving late. Instead of manually reviewing every delayed order, it:

  1. Scores the risk deterministically using historical benchmarks (delay magnitude, financial exposure, category/market late-delivery rates).
  2. Reasons with an LLM (LLaMA 3.3 70B via Groq) to classify the recommended intervention and estimate cost savings.
  3. Audits every decision — including human overrides — in a persistent log, so nothing is a black box.

The result: a one-click decision dashboard where analysts see the risk, the AI's recommendation, the reasoning, and any past overrides — all in one place.


Architecture#

┌─────────────────────────────────────────┐
│ Browser UI │
│ (Single-page, vanilla JS + CSS) │
└──────────────────┬──────────────────────┘
│ REST / JSON
┌──────────────────▼──────────────────────┐
│ FastAPI (main.py) │
│ POST /api/evaluate │
│ POST /api/evaluate/{id}/override │
│ GET /api/audit │
│ GET /api/audit/{id}/overrides │
│ GET /api/meta │
└───────────┬──────────────┬─────────────┘
│ │
┌───────────────────────▼──┐ ┌──────▼──────────────────┐
│ Layer 1 — Deterministic│ │ Layer 2 — LLM │
│ deterministic.py │ │ llm_evaluator.py │
│ │ │ │
│ • Delay days │───▶│ • Groq llama-3.3-70b │
│ • Financial exposure │ │ • Structured JSON prompt │
│ • Risk score (0–100) │ │ • Confidence score │
│ • Guardrail flags │ │ • Cost-saving estimate │
│ • Benchmark lookup │ │ • Free-text reasoning │
└──────────────────────────┘ └──────────────────────────┘
┌───────────▼──────────────┐
│ aiosqlite Database │
│ evaluations + overrides │
└──────────────────────────┘

Decision matrix#

Risk Score Recommendation Trigger condition
≥ 75 EXPEDITE High delay + high exposure
50–74 DISCOUNT Moderate delay, retention risk
25–49 MONITOR Low delay, watch required
< 25 NO_ACTION Within acceptable variance

When Layer 1 and the LLM disagree, the UI surfaces a disagreement badge and shows both recommendations side-by-side so the human can make the final call.


Quick Start#

Prerequisites#

1 — Clone & install#

Terminal window
git clone https://github.com/your-org/freightsense.git
cd freightsense
uv sync # creates .venv and installs all dependencies

2 — Configure#

Terminal window
cp .env.example .env # then edit .env
.env
GROQ_API_KEY=gsk_...
DATABASE_URL=./freightsense.db # SQLite path (or /tmp/freightsense.db in Cloud Run)
GROQ_MODEL=llama-3.3-70b-versatile

3 — Run#

Terminal window
uv run uvicorn main:app --reload
# → http://localhost:8000

Or open the interactive notebook UI:

Terminal window
uv run marimo edit main.py

The API docs are at http://localhost:8000/docs and the operational dashboard at http://localhost:8000.


API Reference#

POST /api/evaluate#

Submit a shipment for risk scoring and LLM intervention recommendation.

// Request
{
"order_id": "ORD-00123", // optional — auto-generated if blank
"customer_segment": "Corporate", // Consumer | Corporate | Home Office
"market": "USCA", // USCA | Europe | LATAM | Pacific Asia | Africa
"category_name": "Electronics",
"shipping_mode": "Standard Class",
"days_scheduled": 5,
"days_actual_estimate": 9,
"order_item_total": 1200.00,
"profit_ratio": 0.18
}
// Response — 201 Created
{
"evaluation_id": 42,
"delay_days": 4.0,
"risk_score": 71.3,
"financial_exposure": 249.48,
"confidence_tier": "HIGH",
"layer1_recommendation": "DISCOUNT",
"layer1_intervention_cost": 120.0,
"llm_recommendation": "DISCOUNT",
"confidence_score": 0.87,
"reasoning": "A 4-day delay on a Corporate Electronics order in USCA …",
"estimated_cost_saving": 180.0,
"guardrail_flags": [],
"llm_available": true,
"layers_disagree": false
}

POST /api/evaluate/{id}/override#

Record a human override decision against any evaluation (supports multiple revisions).

{
"override_decision": "ACCEPT", // ACCEPT | REJECT | CUSTOM
"override_reason": "Customer is a strategic account — expedite instead.",
"outcome_notes": ""
}

GET /api/audit#

Paginated audit log of all evaluations with their latest override status.

GET /api/audit?skip=0&limit=50

GET /api/audit/{id}/overrides#

Full override history for a single evaluation (shows every revision in order).

GET /api/meta#

Returns available categories and markets for populating UI dropdowns.


Benchmarks#

FreightSense ships with pre-computed benchmark statistics derived from ~180 k real supply-chain records (data/DataCoSupplyChainDataset.csv). These are compiled once into data/benchmarks.json and loaded at startup — no database query at inference time.

Terminal window
# Regenerate benchmarks after updating the source CSV
uv run python scripts/build_benchmarks.py

Each benchmark group (category × market) stores:

  • Average scheduled days
  • Average delay days
  • Late delivery rate
  • Average profit ratio
  • Sample size

Deployment — Google Cloud Run#

FreightSense is designed for Cloud Run with a single warm instance (minScale=1) so the SQLite audit log stays hot between requests.

One-time GCP setup#

Terminal window
# Set your GitHub details and run
GITHUB_ORG=your-org GITHUB_REPO=freightsense bash scripts/setup_gcp.sh

This script:

  1. Enables all required GCP APIs
  2. Creates an Artifact Registry Docker repo
  3. Creates a Service Account for GitHub Actions
  4. Configures Workload Identity Federation (no long-lived keys)
  5. Stores GROQ_API_KEY in Secret Manager
  6. Prints the three GitHub Variables to set

GitHub Variables to add#

After running the setup script, add these under Settings → Secrets and variables → Actions → Variables in your repo:

Variable Value
GCP_PROJECT_ID your GCP project ID
GCP_WORKLOAD_IDENTITY_PROVIDER printed by setup script
GCP_SERVICE_ACCOUNT printed by setup script

Continuous deployment#

Every push to main automatically:

push to main
Checkout → Auth (Workload Identity) → Build image → Push to Artifact Registry
gcloud run services replace service.yaml (tagged with $GITHUB_SHA)
Cloud Run deploys new revision, old revision drains

Manual deploys are also available via the Actions → Run workflow button in the GitHub UI.


Project Structure#

freightsense/
├── main.py # FastAPI app entry point
├── service.yaml # Cloud Run service spec
├── Dockerfile
├── pyproject.toml
├── app/
│ ├── api/
│ │ ├── routes.py # All 5 endpoints
│ │ └── schemas.py # Pydantic I/O models
│ ├── core/
│ │ ├── benchmarks.py # Benchmark store (in-memory dict)
│ │ ├── deterministic.py # Layer 1 — risk scoring engine
│ │ ├── llm_evaluator.py # Layer 2 — Groq LLM integration
│ │ └── config.py # Settings (pydantic-settings)
│ ├── db/
│ │ ├── database.py # aiosqlite init + connection
│ │ └── models.py # Async CRUD helpers
│ └── static/
│ ├── index.html # Single-page operations dashboard
│ └── styles.css
├── data/
│ ├── benchmarks.json # Pre-computed benchmark stats (~180 k records)
│ └── DataCoSupplyChainDataset.csv
├── scripts/
│ ├── build_benchmarks.py # Regenerate benchmarks.json
│ ├── setup_gcp.sh # One-time GCP resource bootstrap
│ └── test_groq.py # Smoke-test Groq connectivity
└── .github/
└── workflows/
└── deploy.yml # GitHub Actions CI/CD

Tech Stack#

Layer Technology
API framework FastAPI 0.115
LLM inference Groq API — LLaMA 3.3 70B Versatile
Async database aiosqlite 0.22 + raw SQL
Data processing pandas 2.2
Config pydantic-settings
Runtime Python 3.12, uvicorn
Container Docker (python:3.12-slim, non-root)
CI/CD GitHub Actions
Cloud Google Cloud Run + Artifact Registry + Secret Manager