FreightSense | Projects

What is FreightSense?#

FreightSense is a two-layer AI decision system that helps operations teams decide exactly what to do when a shipment is at risk of arriving late. Instead of manually reviewing every delayed order, it:

Scores the risk deterministically using historical benchmarks (delay magnitude, financial exposure, category/market late-delivery rates).
Reasons with an LLM (LLaMA 3.3 70B via Groq) to classify the recommended intervention and estimate cost savings.
Audits every decision — including human overrides — in a persistent log, so nothing is a black box.

The result: a one-click decision dashboard where analysts see the risk, the AI's recommendation, the reasoning, and any past overrides — all in one place.

Architecture#

1
                          ┌─────────────────────────────────────────┐
2
                          │               Browser UI                │
3
                          │    (Single-page, vanilla JS + CSS)      │
4
                          └──────────────────┬──────────────────────┘
5
                                             │ REST / JSON
6
                          ┌──────────────────▼──────────────────────┐
7
                          │            FastAPI  (main.py)           │
8
                          │   POST /api/evaluate                    │
9
                          │   POST /api/evaluate/{id}/override      │
10
                          │   GET  /api/audit                       │
11
                          │   GET  /api/audit/{id}/overrides        │
12
                          │   GET  /api/meta                        │
13
                          └───────────┬──────────────┬─────────────┘
14
                                      │              │
15
              ┌───────────────────────▼──┐    ┌──────▼──────────────────┐
16
              │   Layer 1 — Deterministic│    │   Layer 2 — LLM          │
17
              │   deterministic.py       │    │   llm_evaluator.py       │
18
              │                          │    │                          │
19
              │ • Delay days             │───▶│ • Groq llama-3.3-70b     │
20
              │ • Financial exposure     │    │ • Structured JSON prompt │
21
              │ • Risk score (0–100)     │    │ • Confidence score       │
22
              │ • Guardrail flags        │    │ • Cost-saving estimate   │
23
              │ • Benchmark lookup       │    │ • Free-text reasoning    │
24
              └──────────────────────────┘    └──────────────────────────┘
25
                                      │
26
                          ┌───────────▼──────────────┐
27
                          │    aiosqlite  Database    │
28
                          │  evaluations + overrides  │
29
                          └──────────────────────────┘

Decision matrix#

Risk Score	Recommendation	Trigger condition
≥ 75	EXPEDITE	High delay + high exposure
50–74	DISCOUNT	Moderate delay, retention risk
25–49	MONITOR	Low delay, watch required
< 25	NO_ACTION	Within acceptable variance

When Layer 1 and the LLM disagree, the UI surfaces a disagreement badge and shows both recommendations side-by-side so the human can make the final call.

Quick Start#

Prerequisites#

Python 3.12+
uv (recommended) or pip
A Groq API key

1 — Clone & install#

git clone https://github.com/your-org/freightsense.git
cd freightsense
uv sync           # creates .venv and installs all dependencies

2 — Configure#

cp .env.example .env   # then edit .env

1
GROQ_API_KEY=gsk_...
2
DATABASE_URL=./freightsense.db   # SQLite path (or /tmp/freightsense.db in Cloud Run)
3
GROQ_MODEL=llama-3.3-70b-versatile

3 — Run#

uv run uvicorn main:app --reload
# → http://localhost:8000

Or open the interactive notebook UI:

uv run marimo edit main.py

The API docs are at http://localhost:8000/docs and the operational dashboard at http://localhost:8000.

API Reference#

`POST /api/evaluate`#

Submit a shipment for risk scoring and LLM intervention recommendation.

1
// Request
2
{
3
  "order_id": "ORD-00123",          // optional — auto-generated if blank
4
  "customer_segment": "Corporate",  // Consumer | Corporate | Home Office
5
  "market": "USCA",                 // USCA | Europe | LATAM | Pacific Asia | Africa
6
  "category_name": "Electronics",
7
  "shipping_mode": "Standard Class",
8
  "days_scheduled": 5,
9
  "days_actual_estimate": 9,
10
  "order_item_total": 1200.00,
11
  "profit_ratio": 0.18
12
}

1
// Response — 201 Created
2
{
3
  "evaluation_id": 42,
4
  "delay_days": 4.0,
5
  "risk_score": 71.3,
6
  "financial_exposure": 249.48,
7
  "confidence_tier": "HIGH",
8
  "layer1_recommendation": "DISCOUNT",
9
  "layer1_intervention_cost": 120.0,
10
  "llm_recommendation": "DISCOUNT",
11
  "confidence_score": 0.87,
12
  "reasoning": "A 4-day delay on a Corporate Electronics order in USCA …",
13
  "estimated_cost_saving": 180.0,
14
  "guardrail_flags": [],
15
  "llm_available": true,
16
  "layers_disagree": false
17
}

`POST /api/evaluate/{id}/override`#

Record a human override decision against any evaluation (supports multiple revisions).

1
{
2
  "override_decision": "ACCEPT",   // ACCEPT | REJECT | CUSTOM
3
  "override_reason": "Customer is a strategic account — expedite instead.",
4
  "outcome_notes": ""
5
}

`GET /api/audit`#

Paginated audit log of all evaluations with their latest override status.

1
GET /api/audit?skip=0&limit=50

`GET /api/audit/{id}/overrides`#

Full override history for a single evaluation (shows every revision in order).

`GET /api/meta`#

Returns available categories and markets for populating UI dropdowns.

Benchmarks#

FreightSense ships with pre-computed benchmark statistics derived from ~180 k real supply-chain records (data/DataCoSupplyChainDataset.csv). These are compiled once into data/benchmarks.json and loaded at startup — no database query at inference time.

# Regenerate benchmarks after updating the source CSV
uv run python scripts/build_benchmarks.py

Each benchmark group (category × market) stores:

Average scheduled days
Average delay days
Late delivery rate
Average profit ratio
Sample size

Deployment — Google Cloud Run#

FreightSense is designed for Cloud Run with a single warm instance (minScale=1) so the SQLite audit log stays hot between requests.

One-time GCP setup#

# Set your GitHub details and run
GITHUB_ORG=your-org GITHUB_REPO=freightsense bash scripts/setup_gcp.sh

This script:

Enables all required GCP APIs
Creates an Artifact Registry Docker repo
Creates a Service Account for GitHub Actions
Configures Workload Identity Federation (no long-lived keys)
Stores GROQ_API_KEY in Secret Manager
Prints the three GitHub Variables to set

GitHub Variables to add#

After running the setup script, add these under Settings → Secrets and variables → Actions → Variables in your repo:

Variable	Value
`GCP_PROJECT_ID`	your GCP project ID
`GCP_WORKLOAD_IDENTITY_PROVIDER`	printed by setup script
`GCP_SERVICE_ACCOUNT`	printed by setup script

Continuous deployment#

Every push to main automatically:

1
push to main
2
    │
3
    ▼
4
 Checkout → Auth (Workload Identity) → Build image → Push to Artifact Registry
5
    │
6
    ▼
7
 gcloud run services replace service.yaml  (tagged with $GITHUB_SHA)
8
    │
9
    ▼
10
 Cloud Run deploys new revision, old revision drains

Manual deploys are also available via the Actions → Run workflow button in the GitHub UI.

Project Structure#

1
freightsense/
2
├── main.py                    # FastAPI app entry point
3
├── service.yaml               # Cloud Run service spec
4
├── Dockerfile
5
├── pyproject.toml
6
│
7
├── app/
8
│   ├── api/
9
│   │   ├── routes.py          # All 5 endpoints
10
│   │   └── schemas.py         # Pydantic I/O models
11
│   ├── core/
12
│   │   ├── benchmarks.py      # Benchmark store (in-memory dict)
13
│   │   ├── deterministic.py   # Layer 1 — risk scoring engine
14
│   │   ├── llm_evaluator.py   # Layer 2 — Groq LLM integration
15
│   │   └── config.py          # Settings (pydantic-settings)
16
│   ├── db/
17
│   │   ├── database.py        # aiosqlite init + connection
18
│   │   └── models.py          # Async CRUD helpers
19
│   └── static/
20
│       ├── index.html         # Single-page operations dashboard
21
│       └── styles.css
22
│
23
├── data/
24
│   ├── benchmarks.json        # Pre-computed benchmark stats (~180 k records)
25
│   └── DataCoSupplyChainDataset.csv
26
│
27
├── scripts/
28
│   ├── build_benchmarks.py    # Regenerate benchmarks.json
29
│   ├── setup_gcp.sh           # One-time GCP resource bootstrap
30
│   └── test_groq.py           # Smoke-test Groq connectivity
31
│
32
└── .github/
33
    └── workflows/
34
        └── deploy.yml         # GitHub Actions CI/CD

Tech Stack#

Layer	Technology
API framework	FastAPI 0.115
LLM inference	Groq API — LLaMA 3.3 70B Versatile
Async database	aiosqlite 0.22 + raw SQL
Data processing	pandas 2.2
Config	pydantic-settings
Runtime	Python 3.12, uvicorn
Container	Docker (python:3.12-slim, non-root)
CI/CD	GitHub Actions
Cloud	Google Cloud Run + Artifact Registry + Secret Manager