How 805 patent documents become a navigable knowledge graph inside the Sov_OS knowledge ring.
Using existing nodes: a_library → a_dna → a_canon → a_index
The patent collection contains 805 markdown files with hash-chain provenance but zero semantic structure. Files are organized by iCloud archive origin, not by concept. There are no tags, no domain classifications, no cross-references between related patents. Implicit patent families exist (Logos, Kronos, Guardian Stones, SAGE) but they're invisible to any automated system.
The goal: ingest every document into the Sov_OS knowledge ring so that meaning becomes geometry — each patent gets a DNA helix encoding, a 12D shape tensor, vector embeddings, and rich metadata that enables cross-domain discovery by conceptual similarity rather than filename matching.
Four existing knowledge nodes form a pipeline. Each document flows through five stages — the same pipeline that processes any markdown entering the system:
| Stage | Node | Input | Output | Purpose |
|---|---|---|---|---|
ingest library.ingest |
a_library |
Raw markdown + path | FrontMatter + body + links | Parse YAML frontmatter, extract wiki-links and markdown links |
hash library.hash |
a_library |
Content bytes + path | ContentHash (BLAKE3) | Location-bound identity — same content at different paths gets different hashes |
encode dna.v1_encode |
a_dna |
Document text | Vec<[f64; 3]> helix | Encode full text as cylindrical helix — losslessly reversible |
shape dna.shape_tensor |
a_dna |
Helix point cloud | [f32; 12] tensor | Compress helix to centroid, variance, curvature, torsion, length, radius |
index canon.rank |
a_canon |
Shape tensor + metadata | Indexed entry | Insert into spatial index for nearest-neighbor discovery |
These nodes are already wired in knowledge.toml:
The current patent files have only conversion metadata (source hash, chain index). The a_library node already parses FrontMatter with title, tags, created_at, updated_at. We extend this with domain-specific fields:
--- title: "Real-Time Semantic Drift Firewall" domain: governance system: sage kind: provisional_patent status: filed tags: - semantic-firewall - hysteresis - authorization - drift-detection - schmitt-trigger related: - guardian_stones/stone_1 - sage/trust_coefficient - logos/semantic_identity sov_concepts: - AuthState - Warrant - NarrativeMass - TrustDecay layers: - L3-mesh # where this lives in the shape stack - L4-topology # what it governs source_file: "Real_Time_Semantic_Drift_Firewall_Patent_Draft.md" source_hash: "a1b2c3..." chain_index: 42 created_at: 2026-03-16 updated_at: 2026-03-16 ---
| Field | Type | Purpose |
|---|---|---|
domain | enum | One of 10 conceptual domains (see §4). Primary classification. |
system | string | Which Sov system this belongs to: sage, logos, kronos, hermes, rita, etc. |
kind | enum | Document type: provisional_patent, white_paper, specification, appendix, figure, charter, research |
status | enum | Lifecycle: draft, filed, published, superseded |
tags | Vec<String> | Free-form concept tags for full-text and faceted search |
related | Vec<String> | Explicit cross-references to other documents by system/slug path |
sov_concepts | Vec<String> | Named types from the Sov lexicon — links document to codebase |
layers | Vec<String> | Shape layer classification (L0–L5) for stack positioning |
Ten domains emerged from the patent corpus. Each maps to one or more Sov systems and rings:
Patent documents live inside the knowledge ring as a managed collection within a_library. The enriched frontmatter makes them queryable by every field. The physical layout:
storage/a_knowledge/ ├── nodes/ │ └── a_library/ │ ├── library.toml ← pipeline config (existing) │ ├── crates/ ← Rust pipeline code (existing) │ └── vault/ ← NEW: document storage │ ├── _index.toml ← master index (auto-generated) │ ├── _domains.toml ← domain → document mapping │ ├── _graph.toml ← cross-reference adjacency list │ │ │ ├── identity/ ← domain directory │ │ ├── logos/ │ │ │ ├── provisional_patent.md │ │ │ ├── multi_modal_auth.md │ │ │ └── voice_derived_object.md │ │ └── wisp/ │ │ ├── continuation_provisional.md │ │ └── semantic_identity_object.md │ │ │ ├── governance/ │ │ ├── sage/ │ │ │ ├── semantic_firewall.md │ │ │ ├── governed_access.md │ │ │ └── trust_coefficient.md │ │ └── providence/ │ │ └── constraint_enforcement.md │ │ │ ├── temporal/ │ │ └── kronos/ │ │ ├── temporal_intelligence.md │ │ ├── trust_decay_functions.md │ │ └── kronos_spiral.md │ │ │ ├── security/ │ │ └── guardian_stones/ │ │ ├── stone_1_semantic_firewall.md │ │ ├── stone_2_behavioral_drift.md │ │ └── ... │ │ │ └── ... (10 domains total)
[documents.sage_semantic_firewall] path = "governance/sage/semantic_firewall.md" hash = "a1b2c3d4..." domain = "governance" system = "sage" kind = "provisional_patent" tags = ["semantic-firewall", "hysteresis", "authorization"] shape_tensor = [0.12, -0.34, ...] # 12D geometric fingerprint helix_length = 14823 # V1 helix point count
# Edges are bidirectional, weighted by relationship type [[edge]] from = "governance/sage/semantic_firewall" to = "security/guardian_stones/stone_1" weight = "implements" # implements | extends | references | supersedes [[edge]] from = "governance/sage/semantic_firewall" to = "identity/logos/multi_modal_auth" weight = "references" [[edge]] from = "governance/sage/trust_coefficient" to = "temporal/kronos/trust_decay_functions" weight = "extends"
Three layers of cross-referencing, from explicit to emergent:
The related field in frontmatter and the _graph.toml adjacency list capture known relationships between documents. These are human-curated and precise: "this patent implements that specification." The extract_links function in library-functions already parses wiki-links and markdown links from document bodies — these become automatic graph edges.
The sov_concepts field maps each document to named types in the Sov lexicon (AuthState, Warrant, NarrativeMass). Any two documents sharing a concept are automatically linked. This bridges the gap between the patent library and the running codebase — you can ask "which patents describe the AuthState type?" and get immediate answers.
Every document gets a V1 DNA helix encoding and a 12D shape tensor. Documents with similar textual geometry — similar vocabulary patterns, sentence rhythms, structural organization — cluster together in shape space regardless of explicit metadata. The motif_classify function finds the nearest geometric neighbors. This discovers hidden relationships: two patents that never reference each other but describe the same concept from different angles.
Once the documents are ingested, the knowledge ring supports these discovery modes:
| Query | Mechanism | Example |
|---|---|---|
| Domain browse | Frontmatter filter on domain |
"Show all governance patents" |
| System trace | Filter on system + sort by kind |
"Every document about SAGE, grouped by type" |
| Concept map | Join on sov_concepts |
"Which patents mention AuthState?" |
| Similarity search | 12D shape tensor KNN via a_canon |
"Find patents structurally similar to the Semantic Firewall" |
| Cross-domain | Graph traversal on _graph.toml |
"What connects governance to identity?" |
| Layer view | Filter on layers |
"All L3-mesh layer documents" |
| Lineage trace | Hash chain in original provenance metadata | "What's the conversion history of this file?" |
| Diff detection | library.diff stage |
"What changed between v1 and v2 of this patent?" |
The end state is a navigable manifold where each patent is a point in shape space. The existing Manifold Explorer already renders shape signatures as an interactive graph — it needs only to accept document shape tensors as input alongside code shapes.
| Phase | Scope | Outcome |
|---|---|---|
| Phase 1 Classify & Enrich |
Add enriched frontmatter to all 805 documents. Classify each into one of 10 domains. Map system and kind fields.Extract sov_concepts where applicable.
|
Every document has domain, system, kind, tags, and concept links. |
| Phase 2 Reorganize |
Move files from flat iCloud archives into vault/{domain}/{system}/ hierarchy.Generate _index.toml and _domains.toml.Build _graph.toml from related fields and extracted links.
|
Browsable, structured patent library with explicit cross-references. |
| Phase 3 Encode |
Run every document through dna.v1_encode → dna.shape_tensor.Store 12D shape tensors in _index.toml.Build motif index for geometric similarity search. |
Every document has a geometric fingerprint. Similarity search works. |
| Phase 4 Visualize |
Feed document shape tensors to the Manifold Explorer. Build a domain-colored interactive knowledge graph in the Mirror UI. Wire to the Sov menu for navigation. |
Full visual knowledge manifold — browse, search, discover by geometry. |
This design uses zero new nodes. Everything maps to existing code:
| Capability | Node / Crate | Status |
|---|---|---|
| Frontmatter parsing | library-functions/parse.rs | ✓ Exists (136 LOC, 6 tests) |
| BLAKE3 hashing | library-functions/hash.rs | ✓ Exists (51 LOC, 4 tests) |
| Link extraction | library-functions/parse.rs | ✓ Exists (wiki + markdown links) |
| Index building | library-functions/index.rs | ✓ Exists (82 LOC, 4 tests) |
| Diff detection | library-functions/diff.rs | ✓ Exists (131 LOC, 6 tests) |
| V1 helix encoding | dna-functions/strand.rs | ✓ Exists (363 LOC, 28 tests) |
| 12D shape tensor | dna-functions/geometry.rs | ✓ Exists (385 LOC, 67 tests) |
| Motif classification | dna-functions/motif.rs | ✓ Exists (260 LOC) |
| Vector KNN search | a_canon + a_index | ✓ Exists (HNSW + brute) |
| Audit trail | a_shadow | ✓ Exists (Merkle chains) |
| Hard-wire dna→canon | knowledge.toml | ✓ Wired (capacity: 128) |
| Manifold visualization | ui/a_mirror/static/manifold/ | ✓ Exists (interactive graph) |
The only new code needed is: (1) an enrichment script that adds domain/system/kind metadata to each markdown file, (2) the vault/ directory structure with index files, and (3) a feed adapter that pushes document shape tensors into the Manifold Explorer. The pipeline itself — ingest, hash, encode, shape, index — is already built and tested.