Document Comparison & Scoring

Multi-Criteria Decision Analysis with Full Audit Trail

Overview

RayRay's comparison engine enables intelligence analysts and decision makers to evaluate multiple systems against configurable scoring profiles. The engine produces ranked results with full transparency into score calculations, supporting M-25-21 compliance through human-in-the-loop verification and immutable audit trails.

Key Capabilities

  • Configurable Scoring Profiles — Define what matters: weights, normalization methods, and target values
  • Multi-Document Comparison — Compare 2-4 systems simultaneously
  • Score Breakdown — Full transparency into how each score was calculated
  • Export Formats — JSON, CSV, and PowerPoint briefing exports
  • Profile Snapshots — Audit trail preserves scoring configuration at comparison time

Supported Comparison Types

Comparison TypeUse CaseExample Attributes
System PerformanceComparing technical specificationsRange, accuracy, reliability, power output
Cost-Benefit AnalysisBudget-constrained procurementUnit cost, maintenance cost, lifecycle cost
Operational SuitabilityDeployment feasibilityWeight, deployment time, crew requirements
Mission-SpecificCustom weighted profilesUser-defined attributes and weights

Scoring Engine Architecture

The scoring engine (ScoringService) implements a three-phase calculation pipeline that transforms raw extracted values into normalized, weighted scores.

Phase 1: Value Parsing

Raw extraction values (strings like "50km" or "$2.5M") are parsed into numeric values with unit normalization. The parser handles multiple unit formats:

# Distance units
"50km" → 50.0 (kilometers)
"30 miles" → 48.28 (converted to km)
"100nm" → 185.2 (nautical miles to km)

# Currency units (normalized to millions)
"$2,500,000" → 2.5
"$2.5M" → 2.5
"$500K" → 0.5

# Time units (normalized to months)
"6 months" → 6.0
"2 years" → 24.0
"18 mo" → 18.0

Phase 2: Normalization

Raw numeric values are normalized to a 0-1 scale using the configured method. RayRay supports three normalization modes:

MAX Normalization (Higher is Better)

Used when higher values indicate better performance (e.g., range, accuracy).

normalized = (value - min_value) / (max_value - min_value)

Example: Detection range comparison
  System A: 150km → normalized = 0.71
  System B: 120km → normalized = 0.43
  System C: 180km → normalized = 1.00

Where min=120, max=180

MIN Normalization (Lower is Better)

Used when lower values indicate better performance (e.g., cost, weight, deployment time).

normalized = (max_value - value) / (max_value - min_value)

Example: Unit cost comparison
  System A: $8.5M  → normalized = 0.64
  System B: $6.2M  → normalized = 1.00 (best)
  System C: $12.0M → normalized = 0.00 (worst)

Where min=6.2, max=12.0

TARGET Normalization (Closer to Target is Better)

Used when an optimal value exists and deviation in either direction is undesirable.

normalized = 1.0 - |value - target| / range

Example: Operating frequency with target 9.5 GHz
  System A: 9.2 GHz → normalized = 0.90
  System B: 9.5 GHz → normalized = 1.00 (optimal)
  System C: 10.1 GHz → normalized = 0.70

Where target=9.5, min=9.0, max=10.5, range=1.5

Phase 3: Weighted Score Calculation

Normalized values are multiplied by attribute weights and summed to produce the final score on a 0-100 scale:

final_score = Σ(normalized_value × weight) × 100

Example: "Mission Priority" profile
  detection_range: weight=0.30, normalized=0.71 → weighted=0.213
  accuracy:        weight=0.25, normalized=0.95 → weighted=0.238
  reliability:     weight=0.10, normalized=0.80 → weighted=0.080
  cost:            weight=0.15, normalized=0.64 → weighted=0.096
  weight_kg:       weight=0.05, normalized=0.60 → weighted=0.030
  power_output:    weight=0.10, normalized=0.80 → weighted=0.080
  frequency_band:  weight=0.05, normalized=0.50 → weighted=0.025
                                    Total = 0.762
                              Final Score = 76.2

Confidence Adjustment

When extraction confidence falls below 70%, the scoring engine applies a graduated penalty to account for data quality uncertainty:

if confidence < 0.70:
    penalty = 1.0 - (0.1 × (1.0 - confidence))
    normalized = normalized × penalty

Example:
  confidence = 0.50 → penalty = 0.95 → 5% reduction
  confidence = 0.30 → penalty = 0.97 → 3% reduction
  confidence = 0.60 → penalty = 0.96 → 4% reduction

Note: The maximum penalty is capped at 10% to prevent over-penalizing uncertain but potentially valuable data.

Scoring Profiles

Scoring profiles define the "recipe" for comparison. Each profile specifies:

PropertyTypeDescription
namestringProfile identifier (e.g., "Mission Priority")
methodenumScoring algorithm: weighted_sum, topsis, ahp, custom
attributesarrayList of ScoringAttribute configurations
is_defaultbooleanWhether this is the default profile for new comparisons

Attribute Configuration

{
  "attribute_name": "detection_range",  // Matches Extraction.label
  "display_name": "Detection Range",    // Human-readable label
  "weight": 0.30,                       // 30% of final score
  "normalize_method": "max",            // Higher is better
  "unit": "km",                         // Display unit
  "priority": 10                        // Sort order (lower = first)
}

Target Value Configuration

{
  "attribute_name": "operating_frequency",
  "display_name": "Operating Frequency",
  "weight": 0.15,
  "normalize_method": "target",
  "target_value": 9.5,                  // Optimal frequency in GHz
  "scale_min": 8.0,                     // Minimum expected
  "scale_max": 12.0,                    // Maximum expected
  "unit": "GHz"
}

Comparison Workflow

Step 1: Select Documents

Choose 2-4 documents with approved extractions. Only extractions withreview_status: "approved" are included in scoring calculations.

Step 2: Select Scoring Profile

Choose a pre-configured profile or create a custom one. The profile's attribute list determines which extraction labels are compared.

Step 3: Run Comparison

The scoring engine calculates normalized scores for each attribute, applies weights, and produces ranked results.

Step 4: Review Results

Results include final scores, rankings, and full score breakdowns showing how each attribute contributed to the final score.

Delta Analysis (Version Comparison)

Track changes across document versions by comparing extractions from different revisions of the same source material:

Delta TypeTriggerSignificance
Value ChangeNumeric value differs by more than 5%May indicate specification update
New AttributeAttribute present in new version onlyAdded capability or data
Removed AttributeAttribute present in old version onlyDeprecated or removed capability
Confidence ChangeConfidence score changed significantlyData quality improvement or degradation

Delta comparisons are useful for:

  • Tracking specification changes across contract amendments
  • Validating data extraction consistency
  • Identifying document version drift

Radar Visualization

Comparison results can be visualized as radar (spider) charts, providing an intuitive view of system capabilities across multiple dimensions:

  • Axes — One axis per scoring attribute
  • Scale — 0-100 normalized scale for all axes
  • Overlay — Multiple systems displayed simultaneously
  • Area — Larger area indicates better overall performance

Radar visualizations are particularly effective for:

  • Quick visual comparison of 2-3 systems
  • Identifying strengths and weaknesses at a glance
  • Briefing presentations for decision makers

API Endpoints

Create Comparison

POST /api/comparisons/
Content-Type: application/json
Authorization: Bearer {token}

{
  "name": "Q1 Radar Comparison",
  "documentIds": [1, 2, 3],
  "profileId": 1,
  "createdBy": "analyst@example.com"
}

Response: 200 OK
{
  "id": 42,
  "name": "Q1 Radar Comparison",
  "profileId": 1,
  "status": "calculated",
  "results": [
    {
      "documentId": 3,
      "documentName": "EL_M_2084_Datasheet.pdf",
      "finalScore": 82.5,
      "rank": 1,
      "scoreBreakdown": {...},
      "rawValues": {...}
    },
    ...
  ]
}

List Comparisons

GET /api/comparisons/?status=calculated&limit=50
Authorization: Bearer {token}

Response: 200 OK
[
  {
    "id": 42,
    "name": "Q1 Radar Comparison",
    "profileId": 1,
    "documentIds": [1, 2, 3],
    "status": "calculated",
    "createdAt": "2026-03-12T10:00:00Z"
  },
  ...
]

Get Comparison Details

GET /api/comparisons/{comparison_id}
Authorization: Bearer {token}

Response: 200 OK
{
  "id": 42,
  "name": "Q1 Radar Comparison",
  "profileSnapshot": {
    "name": "Mission Priority",
    "method": "weighted_sum",
    "attributes": [...]
  },
  "results": [...]
}

Recalculate Comparison

POST /api/comparisons/{comparison_id}/recalculate
Authorization: Bearer {token}

Response: 200 OK
{
  "id": 42,
  "status": "calculated",
  "results": [...]  // Updated results
}

Delete Comparison

DELETE /api/comparisons/{comparison_id}
Authorization: Bearer {token}

Response: 200 OK
{
  "message": "Comparison deleted",
  "id": 42
}

Exporting Comparison Results

JSON Export

GET /api/comparisons/{comparison_id}/export?format=json
Authorization: Bearer {token}

Response: application/json
{
  "comparison": {
    "id": 42,
    "name": "Q1 Radar Comparison",
    "profileSnapshot": {...}
  },
  "results": [
    {
      "rank": 1,
      "documentName": "EL_M_2084_Datasheet.pdf",
      "finalScore": 82.5,
      "scoreBreakdown": {...},
      "rawValues": {...}
    },
    ...
  ]
}

CSV Export

GET /api/comparisons/{comparison_id}/export?format=csv
Authorization: Bearer {token}

Response: text/csv
Rank,Document,Final Score,Detection Range,Accuracy,Reliability,...
1,EL_M_2084_Datasheet.pdf,82.5,180 km,92%,10000 hrs,...
2,AN_TPQ-53_SpecSheet.pdf,76.2,150 km,95%,8000 hrs,...
3,GM200_MMIA.pdf,71.8,135 km,90%,7500 hrs,...

PowerPoint Briefing Export

GET /api/comparisons/{comparison_id}/export/pptx
Authorization: Bearer {token}

Response: application/vnd.openxmlformats-officedocument.presentationml.presentation

Slides included:
1. Title slide (comparison name, date, author)
2. Rankings summary with bar chart
3. Per-system score breakdown (one slide each)
4. Side-by-side specifications table
5. Methodology / audit trail slide

Time Savings Estimation

GET /api/comparisons/{comparison_id}/time-savings?sessions_per_year=50
Authorization: Bearer {token}

Response: 200 OK
{
  "comparisonId": 42,
  "documentCount": 3,
  "attributeCount": 7,
  "manualMinutes": 240,
  "aiMinutes": 45,
  "savedMinutes": 195,
  "savedPercent": 81.25,
  "annualSavedHours": 162.5,
  "annualSavedFteDays": 20.3,
  "m2521Compliant": true
}

Role-Based Permissions

RoleCreateViewRecalculateDeleteExport
Observer
Analyst
Reviewer
Admin

Audit Trail

Every comparison operation is logged with:

  • Profile Snapshot — Full scoring configuration preserved at comparison time
  • Raw Values — Original extracted values stored for verification
  • Score Breakdown — Step-by-step calculation transparency
  • User Attribution — Who created/recalculated the comparison
  • Timestamp — When each operation occurred

This audit trail ensures comparisons can be defended and reproduced, supporting M-25-21 requirements for AI-assisted decision documentation.

Best Practices

Profile Design

  • Ensure weights sum to 1.0 (100%) for meaningful final scores
  • Use TARGET normalization when optimal values are known (e.g., frequency bands)
  • Set scale_min/scale_max to bound normalization when data ranges are predictable
  • Document the rationale for weight distributions in profile descriptions

Data Quality

  • Only compare documents with approved extractions
  • Review low-confidence extractions before including in comparisons
  • Flag missing attributes for manual investigation
  • Use delta analysis to validate extraction consistency

Decision Documentation

  • Export comparison results before briefings for offline reference
  • Include profile snapshot in decision documentation
  • Archive comparisons rather than deleting for audit purposes
  • Use time savings metrics to demonstrate ROI to leadership

Related