Document Comparison & Scoring
Multi-Criteria Decision Analysis with Full Audit Trail
Overview
RayRay's comparison engine enables intelligence analysts and decision makers to evaluate multiple systems against configurable scoring profiles. The engine produces ranked results with full transparency into score calculations, supporting M-25-21 compliance through human-in-the-loop verification and immutable audit trails.
Key Capabilities
- Configurable Scoring Profiles — Define what matters: weights, normalization methods, and target values
- Multi-Document Comparison — Compare 2-4 systems simultaneously
- Score Breakdown — Full transparency into how each score was calculated
- Export Formats — JSON, CSV, and PowerPoint briefing exports
- Profile Snapshots — Audit trail preserves scoring configuration at comparison time
Supported Comparison Types
| Comparison Type | Use Case | Example Attributes |
|---|---|---|
| System Performance | Comparing technical specifications | Range, accuracy, reliability, power output |
| Cost-Benefit Analysis | Budget-constrained procurement | Unit cost, maintenance cost, lifecycle cost |
| Operational Suitability | Deployment feasibility | Weight, deployment time, crew requirements |
| Mission-Specific | Custom weighted profiles | User-defined attributes and weights |
Scoring Engine Architecture
The scoring engine (ScoringService) implements a three-phase calculation pipeline that transforms raw extracted values into normalized, weighted scores.
Phase 1: Value Parsing
Raw extraction values (strings like "50km" or "$2.5M") are parsed into numeric values with unit normalization. The parser handles multiple unit formats:
# Distance units
"50km" → 50.0 (kilometers)
"30 miles" → 48.28 (converted to km)
"100nm" → 185.2 (nautical miles to km)
# Currency units (normalized to millions)
"$2,500,000" → 2.5
"$2.5M" → 2.5
"$500K" → 0.5
# Time units (normalized to months)
"6 months" → 6.0
"2 years" → 24.0
"18 mo" → 18.0Phase 2: Normalization
Raw numeric values are normalized to a 0-1 scale using the configured method. RayRay supports three normalization modes:
MAX Normalization (Higher is Better)
Used when higher values indicate better performance (e.g., range, accuracy).
normalized = (value - min_value) / (max_value - min_value)
Example: Detection range comparison
System A: 150km → normalized = 0.71
System B: 120km → normalized = 0.43
System C: 180km → normalized = 1.00
Where min=120, max=180MIN Normalization (Lower is Better)
Used when lower values indicate better performance (e.g., cost, weight, deployment time).
normalized = (max_value - value) / (max_value - min_value)
Example: Unit cost comparison
System A: $8.5M → normalized = 0.64
System B: $6.2M → normalized = 1.00 (best)
System C: $12.0M → normalized = 0.00 (worst)
Where min=6.2, max=12.0TARGET Normalization (Closer to Target is Better)
Used when an optimal value exists and deviation in either direction is undesirable.
normalized = 1.0 - |value - target| / range
Example: Operating frequency with target 9.5 GHz
System A: 9.2 GHz → normalized = 0.90
System B: 9.5 GHz → normalized = 1.00 (optimal)
System C: 10.1 GHz → normalized = 0.70
Where target=9.5, min=9.0, max=10.5, range=1.5Phase 3: Weighted Score Calculation
Normalized values are multiplied by attribute weights and summed to produce the final score on a 0-100 scale:
final_score = Σ(normalized_value × weight) × 100
Example: "Mission Priority" profile
detection_range: weight=0.30, normalized=0.71 → weighted=0.213
accuracy: weight=0.25, normalized=0.95 → weighted=0.238
reliability: weight=0.10, normalized=0.80 → weighted=0.080
cost: weight=0.15, normalized=0.64 → weighted=0.096
weight_kg: weight=0.05, normalized=0.60 → weighted=0.030
power_output: weight=0.10, normalized=0.80 → weighted=0.080
frequency_band: weight=0.05, normalized=0.50 → weighted=0.025
Total = 0.762
Final Score = 76.2Confidence Adjustment
When extraction confidence falls below 70%, the scoring engine applies a graduated penalty to account for data quality uncertainty:
if confidence < 0.70:
penalty = 1.0 - (0.1 × (1.0 - confidence))
normalized = normalized × penalty
Example:
confidence = 0.50 → penalty = 0.95 → 5% reduction
confidence = 0.30 → penalty = 0.97 → 3% reduction
confidence = 0.60 → penalty = 0.96 → 4% reductionNote: The maximum penalty is capped at 10% to prevent over-penalizing uncertain but potentially valuable data.
Scoring Profiles
Scoring profiles define the "recipe" for comparison. Each profile specifies:
| Property | Type | Description |
|---|---|---|
name | string | Profile identifier (e.g., "Mission Priority") |
method | enum | Scoring algorithm: weighted_sum, topsis, ahp, custom |
attributes | array | List of ScoringAttribute configurations |
is_default | boolean | Whether this is the default profile for new comparisons |
Attribute Configuration
{
"attribute_name": "detection_range", // Matches Extraction.label
"display_name": "Detection Range", // Human-readable label
"weight": 0.30, // 30% of final score
"normalize_method": "max", // Higher is better
"unit": "km", // Display unit
"priority": 10 // Sort order (lower = first)
}Target Value Configuration
{
"attribute_name": "operating_frequency",
"display_name": "Operating Frequency",
"weight": 0.15,
"normalize_method": "target",
"target_value": 9.5, // Optimal frequency in GHz
"scale_min": 8.0, // Minimum expected
"scale_max": 12.0, // Maximum expected
"unit": "GHz"
}Comparison Workflow
Step 1: Select Documents
Choose 2-4 documents with approved extractions. Only extractions withreview_status: "approved" are included in scoring calculations.
Step 2: Select Scoring Profile
Choose a pre-configured profile or create a custom one. The profile's attribute list determines which extraction labels are compared.
Step 3: Run Comparison
The scoring engine calculates normalized scores for each attribute, applies weights, and produces ranked results.
Step 4: Review Results
Results include final scores, rankings, and full score breakdowns showing how each attribute contributed to the final score.
Delta Analysis (Version Comparison)
Track changes across document versions by comparing extractions from different revisions of the same source material:
| Delta Type | Trigger | Significance |
|---|---|---|
| Value Change | Numeric value differs by more than 5% | May indicate specification update |
| New Attribute | Attribute present in new version only | Added capability or data |
| Removed Attribute | Attribute present in old version only | Deprecated or removed capability |
| Confidence Change | Confidence score changed significantly | Data quality improvement or degradation |
Delta comparisons are useful for:
- Tracking specification changes across contract amendments
- Validating data extraction consistency
- Identifying document version drift
Radar Visualization
Comparison results can be visualized as radar (spider) charts, providing an intuitive view of system capabilities across multiple dimensions:
- Axes — One axis per scoring attribute
- Scale — 0-100 normalized scale for all axes
- Overlay — Multiple systems displayed simultaneously
- Area — Larger area indicates better overall performance
Radar visualizations are particularly effective for:
- Quick visual comparison of 2-3 systems
- Identifying strengths and weaknesses at a glance
- Briefing presentations for decision makers
API Endpoints
Create Comparison
POST /api/comparisons/
Content-Type: application/json
Authorization: Bearer {token}
{
"name": "Q1 Radar Comparison",
"documentIds": [1, 2, 3],
"profileId": 1,
"createdBy": "analyst@example.com"
}
Response: 200 OK
{
"id": 42,
"name": "Q1 Radar Comparison",
"profileId": 1,
"status": "calculated",
"results": [
{
"documentId": 3,
"documentName": "EL_M_2084_Datasheet.pdf",
"finalScore": 82.5,
"rank": 1,
"scoreBreakdown": {...},
"rawValues": {...}
},
...
]
}List Comparisons
GET /api/comparisons/?status=calculated&limit=50
Authorization: Bearer {token}
Response: 200 OK
[
{
"id": 42,
"name": "Q1 Radar Comparison",
"profileId": 1,
"documentIds": [1, 2, 3],
"status": "calculated",
"createdAt": "2026-03-12T10:00:00Z"
},
...
]Get Comparison Details
GET /api/comparisons/{comparison_id}
Authorization: Bearer {token}
Response: 200 OK
{
"id": 42,
"name": "Q1 Radar Comparison",
"profileSnapshot": {
"name": "Mission Priority",
"method": "weighted_sum",
"attributes": [...]
},
"results": [...]
}Recalculate Comparison
POST /api/comparisons/{comparison_id}/recalculate
Authorization: Bearer {token}
Response: 200 OK
{
"id": 42,
"status": "calculated",
"results": [...] // Updated results
}Delete Comparison
DELETE /api/comparisons/{comparison_id}
Authorization: Bearer {token}
Response: 200 OK
{
"message": "Comparison deleted",
"id": 42
}Exporting Comparison Results
JSON Export
GET /api/comparisons/{comparison_id}/export?format=json
Authorization: Bearer {token}
Response: application/json
{
"comparison": {
"id": 42,
"name": "Q1 Radar Comparison",
"profileSnapshot": {...}
},
"results": [
{
"rank": 1,
"documentName": "EL_M_2084_Datasheet.pdf",
"finalScore": 82.5,
"scoreBreakdown": {...},
"rawValues": {...}
},
...
]
}CSV Export
GET /api/comparisons/{comparison_id}/export?format=csv
Authorization: Bearer {token}
Response: text/csv
Rank,Document,Final Score,Detection Range,Accuracy,Reliability,...
1,EL_M_2084_Datasheet.pdf,82.5,180 km,92%,10000 hrs,...
2,AN_TPQ-53_SpecSheet.pdf,76.2,150 km,95%,8000 hrs,...
3,GM200_MMIA.pdf,71.8,135 km,90%,7500 hrs,...PowerPoint Briefing Export
GET /api/comparisons/{comparison_id}/export/pptx
Authorization: Bearer {token}
Response: application/vnd.openxmlformats-officedocument.presentationml.presentation
Slides included:
1. Title slide (comparison name, date, author)
2. Rankings summary with bar chart
3. Per-system score breakdown (one slide each)
4. Side-by-side specifications table
5. Methodology / audit trail slideTime Savings Estimation
GET /api/comparisons/{comparison_id}/time-savings?sessions_per_year=50
Authorization: Bearer {token}
Response: 200 OK
{
"comparisonId": 42,
"documentCount": 3,
"attributeCount": 7,
"manualMinutes": 240,
"aiMinutes": 45,
"savedMinutes": 195,
"savedPercent": 81.25,
"annualSavedHours": 162.5,
"annualSavedFteDays": 20.3,
"m2521Compliant": true
}Role-Based Permissions
| Role | Create | View | Recalculate | Delete | Export |
|---|---|---|---|---|---|
| Observer | — | — | — | — | — |
| Analyst | — | ✓ | — | — | ✓ |
| Reviewer | ✓ | ✓ | ✓ | — | ✓ |
| Admin | ✓ | ✓ | ✓ | ✓ | ✓ |
Audit Trail
Every comparison operation is logged with:
- Profile Snapshot — Full scoring configuration preserved at comparison time
- Raw Values — Original extracted values stored for verification
- Score Breakdown — Step-by-step calculation transparency
- User Attribution — Who created/recalculated the comparison
- Timestamp — When each operation occurred
This audit trail ensures comparisons can be defended and reproduced, supporting M-25-21 requirements for AI-assisted decision documentation.
Best Practices
Profile Design
- Ensure weights sum to 1.0 (100%) for meaningful final scores
- Use TARGET normalization when optimal values are known (e.g., frequency bands)
- Set scale_min/scale_max to bound normalization when data ranges are predictable
- Document the rationale for weight distributions in profile descriptions
Data Quality
- Only compare documents with approved extractions
- Review low-confidence extractions before including in comparisons
- Flag missing attributes for manual investigation
- Use delta analysis to validate extraction consistency
Decision Documentation
- Export comparison results before briefings for offline reference
- Include profile snapshot in decision documentation
- Archive comparisons rather than deleting for audit purposes
- Use time savings metrics to demonstrate ROI to leadership
Related
- Extraction Workflow — How extraction data is generated
- API Reference — Full endpoint documentation
- Compliance — M-25-21 audit requirements
- Architecture — System design overview