System Architecture

Technical design, data flow, and infrastructure overview

Design Principles

RayRay is architected around four core principles that ensure compliance, auditability, and operational efficiency:

Spatial Evidence Traceability — Every AI extraction maintains coordinate-based links to source text via bounding boxes
Mandatory Human-in-the-Loop — LangGraph checkpoints enforce human review before any database write
Immutable Audit Logs — Blockchain-style hash chains provide tamper-evident event history
Labor Optimization Metrics — Automated time-savings calculation for agency ROI reporting

System Overview

┌─────────────────────────────────────────────────────────────────────┐
│                         RAYRAY PLATFORM                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐        │
│  │   Frontend   │────▶│   Backend    │────▶│  PostgreSQL  │        │
│  │  Next.js 14  │     │   FastAPI    │     │   Database   │        │
│  │  React 18    │◀────│  Python 3.11 │◀────│   (Async)    │        │
│  └──────────────┘     └──────────────┘     └──────────────┘        │
│         │                    │                    │                 │
│         │                    ▼                    │                 │
│         │            ┌──────────────┐            │                 │
│         │            │  LangGraph   │            │                 │
│         │            │  Workflow    │            │                 │
│         │            └──────────────┘            │                 │
│         │                    │                    │                 │
│         │                    ▼                    │                 │
│         │            ┌──────────────┐            │                 │
│         │            │  LLM APIs    │            │                 │
│         │            │ Anthropic/   │            │                 │
│         │            │ OpenAI       │            │                 │
│         │            └──────────────┘            │                 │
│         │                                         │                 │
│         └────────────▶┌──────────────┐◀──────────┘                 │
│                        │  Audit Log   │                             │
│                        │  (Append-Only│                             │
│                        │   SHA-256)   │                             │
│                        └──────────────┘                             │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Extraction Workflow

The extraction pipeline enforces M-25-21 compliance through a multi-stage workflow with mandatory human checkpoints:

Stage 1: UPLOAD
    └── PDF received → Stored with metadata → Audit logged

Stage 2: PARSE
    └── PyMuPDF extracts text blocks with bounding boxes
    └── Each block: {id, text, x0, y0, x1, y1, page_number}

Stage 3: EXTRACT (AI-Assisted)
    └── LLM processes full document text
    └── Returns structured extractions with confidence scores
    └── Each extraction linked to source text blocks

Stage 4: VALIDATE (Redundancy Check)
    └── Confidence threshold check (< 60% = flagged)
    └── Duplicate detection (similarity > 80%)
    └── Required field validation
    └── Source coverage check

Stage 5: CHECKPOINT (Human Review) ◀── M-25-21 MANDATORY
    └── Workflow pauses
    └── Human reviews flagged and unflagged items
    └── Actions: APPROVE / MODIFY / REJECT

Stage 6: COMMIT
    └── Only after human approval
    └── Original values preserved if modified
    └── Audit log updated with reviewer info

Component Architecture

Frontend Stack

Component	Technology	Purpose
Framework	Next.js 14 (App Router)	SSR, routing, API routes
UI Library	React 18	Component model
Styling	Tailwind CSS	Utility-first CSS
PDF Viewer	react-pdf-viewer	Document display with overlays
State	React Context	Auth state, theme

Backend Stack

Component	Technology	Purpose
Framework	FastAPI	Async API server
ORM	SQLAlchemy 2.0	Async database access
Workflow	LangGraph	HITL checkpoint enforcement
PDF Processing	PyMuPDF	Text extraction with coordinates
Auth	python-jose + passlib	JWT + bcrypt
MFA	pyotp	TOTP (RFC 6238)

Data Model

Core Entities

Document
├── id: int (PK)
├── filename: str
├── file_path: str
├── status: enum [pending, processing, reviewed, committed]
├── uploaded_by: int (FK → User)
└── created_at: datetime

Page
├── id: int (PK)
├── document_id: int (FK)
├── page_number: int
└── dimensions: {width, height}

TextBlock
├── id: int (PK)
├── page_id: int (FK)
├── text: str
├── x0, y0, x1, y1: float  ◀── Bounding box coordinates
└── confidence: float

Extraction
├── id: int (PK)
├── document_id: int (FK)
├── label: str
├── value: str
├── original_value: str      ◀── AI output (preserved)
├── confidence: float
├── review_status: enum [pending, approved, rejected]
├── reviewed_by: int (FK)    ◀── M-25-21: Human verifier
├── reviewed_at: datetime
├── workflow_run_id: str     ◀── Traceability
└── workflow_checkpoint: str

ExtractionSource (link table)
├── extraction_id: int (FK)
├── text_block_id: int (FK)  ◀── Spatial evidence link
└── relevance_score: float

AuditLog
├── _id: uuid
├── _timestamp: datetime
├── _prev_hash: str          ◀── Chain integrity
├── _hash: str
├── event_type: str
├── user_id: int
├── resource_type: str
├── resource_id: int
├── outcome: enum [success, failure]
└── details: json

Relationship Diagram

Document 1──────* Page 1──────* TextBlock
    │                              │
    │                              │
    └──────1* Extraction *────────┘
                  │         (via ExtractionSource)
                  │
                  *1
               Reviewer (User)

User ──────* AuditLog

LangGraph Workflow Implementation

from langgraph.graph import StateGraph
from langgraph.checkpoint.memory import MemorySaver

# Define state
class ExtractionState(TypedDict):
    document_id: int
    extractions: List[Dict]
    review_status: str  # pending, approved, rejected
    reviewer_id: Optional[str]
    ...

# Build workflow
workflow = StateGraph(ExtractionState)
workflow.add_node("extract", extract_from_pdf)
workflow.add_node("redundancy_check", redundancy_check_node)
workflow.add_node("human_review", human_review_node)  # CHECKPOINT
workflow.add_node("commit", commit_to_database)

# Edges
workflow.add_edge("extract", "redundancy_check")
workflow.add_edge("redundancy_check", "human_review")
workflow.add_conditional_edges("human_review", should_commit, {
    "commit": "commit",
    "reject": END,
    "pending": END,  # Pause for human input
})
workflow.add_edge("commit", END)

# Compile with checkpointing
checkpointer = MemorySaver()  # Use PostgresSaver in production
app = workflow.compile(checkpointer=checkpointer)

Key Implementation Details

Checkpoint Enforcement: The workflow pauses at human_review and cannot proceed until external input is provided via the review API
State Persistence: Production deployments should use PostgresSaver for durable checkpoint storage
Redundancy Checks: Pre-review validation flags low-confidence items (thresholds configurable)

Frontend Route Structure

apps/web/src/app/
├── layout.tsx              # Root layout (fonts, providers)
├── (marketing)/            # Public pages — no auth required
│   ├── page.tsx            # Landing (/)
│   └── login/page.tsx      # Sign in (/login)
├── (app)/                  # Protected pages — auth required
│   ├── layout.tsx          # App layout with nav + footer
│   ├── dashboard/          # Main dashboard
│   ├── documents/          # Document list + detail
│   ├── compare/            # Document comparison
│   ├── settings/           # User settings + MFA
│   └── profile/            # User profile
└── docs/                   # Documentation (this site)

Security Architecture

Authentication Flow

1. User submits credentials (email + password)
2. Backend verifies against bcrypt hash
3. If MFA enabled:
   a. Return partial JWT with mfa=false
   b. User submits TOTP code
   c. Verify against stored secret
   d. Issue full JWT with mfa=true
4. JWT stored in httpOnly cookie + localStorage
5. Subsequent requests include Authorization header

Authorization Model

Role	Can Commit	Can Manage Users	NIST Mapping
Observer	No	No	AC-3 (Read-only)
Analyst	No	No	AC-3 (Drafts only)
Reviewer	Yes	No	AC-3 (Full CRUD)
Admin	Yes	Yes	AC-3 (Privileged)

Infrastructure Requirements

Development

# Minimum requirements
- Node.js 18+
- Python 3.11+
- PostgreSQL 15+
- 4GB RAM

# Start services
docker-compose up -d db
cd apps/api && uvicorn app.main:app --reload
cd apps/web && npm run dev

Production

Component	Recommendation	Notes
Database	Managed PostgreSQL (RDS, Cloud SQL)	Enable point-in-time recovery
API Hosting	Containerized (ECS, Cloud Run)	Auto-scaling recommended
Frontend	Vercel, Cloudflare Pages	Edge caching for static assets
File Storage	S3, GCS with encryption	Server-side encryption required
Secrets	Vault, AWS Secrets Manager	Never commit to code