P.O.W.E.R. Hierarchical Index Migration Report¶
Date: July 3, 2026 Version: P.O.W.E.R. v1.5.1 Author: Weby Homelab AI Team Status: Completed, Production
Table of Contents¶
- Introduction
- Before: Flat Model
- Problems with the Flat Model
- Solution: Hierarchical Model
- Architecture of the New System
- Performance Metrics
- Impact on AI Agents
- Key Insights and Notes
- Conclusions
- Appendices
Introduction¶
This report documents the full development, testing, and deployment cycle of the hierarchical indexing system for the P.O.W.E.R. framework (P.A.R.A. + OKF Overlay + LLM-Wiki + Execution Rules). The migration was driven by the critical need to optimize AI agent context consumption when working with large knowledge bases.
Scale: 324 notes in production vault
Effect: ~75-94% token savings when reading the index
PR: #13 — merged into main
Before: Flat Model¶
Structure¶
Before migration, index.md was generated as a single flat catalog containing all notes grouped by type:
# Knowledge Catalog (OKF Index)
## Projects
- **[Power-Safety-UA](01_Projects/Power_Safety_UA.md)** - Production monitoring...
- **[Weby-QRank](01_Projects/Weby-QRank.md)** - Community reputation...
... (12 more entries)
## Areas
- **[PROD Safety Mandate](02_Areas/PROD_Safety_Mandate.md)** - Production rules...
... (9 more entries)
## Daily Logs
- **[2026-07-03 Session](06_Daily_Logs/2026-07-03_session.md)** - ...
... (282 more entries)
Generation¶
# power_core/indexer.py (old version)
def scan_vault_notes(vault_dir: Path):
concepts = {}
for root, dirs, files in os.walk(vault_dir):
for file in files:
if file.endswith(".md"):
metadata = validate_metadata(content)
concepts[metadata.type].append((rel_path, title, desc))
return concepts
Result¶
- One file
index.mdcontained all 324 entries - File size: ~100KB+ (depending on note count)
- AI agents loaded the entire file on every brain access
- No mechanism for partial reading
Problems with the Flat Model¶
1. Context Overload for AI Agents¶
| Scenario | Tokens | Comment |
|---|---|---|
Reading entire index.md (324 notes) |
~25,000+ | Every brain query |
| Reading + analyzing a specific project | ~30,000+ | index.md + note |
Reading all .md files in vault |
~500,000+ | Catastrophic |
Problem: Even when an agent needs info about one project, it must load an index with 324 entries, 285 of which are Daily Logs it doesn't need.
2. Linear Growth with Vault Size¶
Tokens = O(n) where n = number of notes
100 notes → ~8,000 tokens
500 notes → ~40,000 tokens ← already critical
1,000 notes → ~80,000 tokens ← takes half the context
5,000 notes → ~400,000 tokens ← impossible to work with
3. No On-Demand Access¶
Agents couldn't:
- Get a list of notes only from 01_Projects/
- See details (tags, dates, paths) only for the relevant category
- Avoid loading 285 Daily Log entries when searching for project info
4. Inefficiency for Nested Structures¶
The vault contains subfolders:
01_Projects/
├── Power-Safety-UA/
│ ├── Release v3.2.3.md
│ └── Architecture.md
├── Weby-QRank/
│ └── Backend.md
└── Docker-Mailserver-GUI.md
The flat index didn't reflect this hierarchy — all notes were "in a pile."
Solution: Hierarchical Model¶
Concept¶
Instead of one large file — a two-tier system:
Tier 1: index.md → Navigation map (what exists, how many notes)
Tier 2: */_index.md → Detailed catalogs per category
How It Works¶
Agent queries: "What is Power-Safety-UA?"
Old approach:
1. Load index.md (25,000 tokens) ← ALL 324 entries
2. Find Power-Safety-UA in the list
3. Read the note
New approach:
1. Load index.md (1,000 tokens) ← ONLY the table
2. Sees: "01_Projects: 15 notes"
3. Calls read_sub_index("01_Projects") (5,000 tokens)
4. Finds Power-Safety-UA with description
5. Reads the note
Savings: 25,000 → 6,000 tokens (76%)
Architecture of the New System¶
File Structure¶
vault/
├── index.md # 1,015 bytes — navigation map
├── log.md # chronological journal
├── 00_Inbox/
│ └── _index.md # 3 notes
├── 01_Projects/
│ ├── _index.md # 15 notes
│ └── Power-Safety-UA/
│ └── _index.md # nested sub-index
├── 02_Areas/
│ └── _index.md # 10 notes
├── 03_Resources/
│ └── _index.md # 8 notes
├── 04_Archive/
│ └── _index.md # 3 notes
└── 06_Daily_Logs/
└── _index.md # 285 notes (largest)
Example index.md (Tier 1)¶
---
type: System Guide
title: "Second Brain Index"
description: "Hierarchical navigation map for the knowledge vault"
timestamp: 2026-07-03T02:16:19
---
# Knowledge Catalog
## Navigation Map
| Category | Notes | Sub-Index |
|----------|-------|-----------|
| 00 Inbox | 3 | [_index.md](00_Inbox/_index.md) |
| 01 Projects | 15 | [_index.md](01_Projects/_index.md) |
| 02 Areas | 10 | [_index.md](02_Areas/_index.md) |
| 03 Resources | 8 | [_index.md](03_Resources/_index.md) |
| 04 Archive | 3 | [_index.md](04_Archive/_index.md) |
| 06 Daily Logs | 285 | [_index.md](06_Daily_Logs/_index.md) |
## Agent Protocol
1. **Read this file** — identify the relevant category.
2. **Read the sub-index** — load `folder/_index.md` for detailed entries.
3. **Read specific notes** — only when the sub-index indicates relevance.
4. **NEVER glob all `.md` files** — use sub-indexes as a map.
Example _index.md (Tier 2)¶
---
type: System Guide
title: "01 Projects Sub-Index"
description: "Detailed catalog of all notes in 01 Projects"
timestamp: 2026-07-03T02:16:19
---
# 01 Projects — Detailed Index
## Power-Safety-UA (Power-Safety-UA) v2.0
- **Path:** `01_Projects/Power_Safety_UA_Strategy.md`
- **Type:** Project
- **Description:** Hardware sensors are the only source of objective truth...
- **Tags:** [prod, docker, monitoring]
- **Updated:** 2026-06-05
## Weby-QRank Architecture
- **Path:** `01_Projects/Weby-QRank/Architecture.md`
- **Type:** Project
- **Description:** Community reputation system backend...
- **Tags:** [telegram, community, backend]
- **Updated:** 2026-06-28
New MCP Tool: read_sub_index¶
@server.call_tool()
async def call_tool(name, arguments):
if name == "read_sub_index":
category = arguments["category"] # "01_Projects"
sub_index_path = vault_path / category / "_index.md"
if sub_index_path.exists():
return sub_index_path.read_text()
# Auto-generate if missing
return run_generate_sub_index(vault_path, category)
Performance Metrics¶
File Sizes¶
| File | Size | Tokens (approx) |
|---|---|---|
index.md (new) |
1,015 bytes | ~250 |
index.md (old) |
~100,000 bytes | ~25,000 |
01_Projects/_index.md |
5,353 bytes | ~1,300 |
06_Daily_Logs/_index.md |
100,391 bytes | ~25,000 |
Usage Scenarios¶
Scenario 1: Searching for Project Info¶
| Approach | Tokens | Efficiency |
|---|---|---|
| Old (flat index) | 25,000 | Loads EVERYTHING |
| New (index + sub-index) | 1,550 | Only relevant data |
| Savings | 23,450 (94%) |
Scenario 2: Full Vault Overview¶
| Approach | Tokens | Efficiency |
|---|---|---|
| Old (flat index) | 25,000 | One file |
| New (index + all sub-indexes) | 53,000 | Distributed |
| Note | More total, but loaded in parts |
Scenario 3: Daily Work (90% of cases)¶
Agent needs info from one category:
| Approach | Tokens |
|---|---|
| Old | 25,000 (always entire index) |
| New | 1,550 (index + one sub-index) |
| Savings | 23,450 (94%) |
Scalability¶
| Note Count | Flat Index (tokens) | Hierarchical (tokens) | Savings |
|---|---|---|---|
| 100 | ~8,000 | ~1,200 | 85% |
| 324 (current) | ~25,000 | ~1,550 | 94% |
| 1,000 | ~80,000 | ~2,500 | 97% |
| 5,000 | ~400,000 | ~5,000 | 99% |
Conclusion: The larger the vault, the greater the savings. The hierarchical model scales at O(log n), while the flat model scales at O(n).
Impact on AI Agents¶
Agent Behavior Change¶
Before:
1. Received query → read index.md (25K tokens)
2. Found category → read note
3. Total cost: 25K + note
After:
1. Received query → read index.md (1K tokens)
2. Identified category → read_sub_index("01_Projects") (5K tokens)
3. Found note → read note
4. Total cost: 6K + note
Updated Configurations¶
AGENTS.md (v11.0):
- Added Hierarchical Navigation Protocol
- Added prohibition on glob **/*.md
- Added Token Efficiency Table
opencode.jsonc — updated system prompts:
- build — "HIERARCHICAL INDEX PROTOCOL" with 4 rules
- reviewer — "NEVER glob /*.md"
- architect — "Use MCP read_sub_index()"
- explorer — "NEVER glob /*.md"
MCP Server Updates¶
| Tool | Status | Purpose |
|---|---|---|
lint_vault |
Existing | Vault health check |
generate_index |
Updated | Hierarchical index generation |
read_sub_index |
New | On-demand category reading |
ingest_note |
Updated | Note creation + index update |
Key Insights and Notes¶
Technical Insights¶
-
NameError in f-string:
f"[{_index.md}]"interprets_indexas a variable. Correct:f"[_index.md]". This bug broke 7 tests at once. -
PEP 668 (Externally-Managed Environments): On Ubuntu 24.04+,
pip3 installis blocked. Solution: use venv or--break-system-packages. For opencode MCP servers, use the dedicated venv at/root/.config/opencode/venv/. -
Git rebase conflicts: When the remote branch has divergent commits,
git reset --hard origin/main+ force push is cleaner than resolving 6-file merge conflicts. -
Backward compatibility:
run_generate_index()(flat mode) is preserved for backward compatibility. Existing code won't break.
Architectural Decisions¶
-
Why a table, not a list: The table in
index.mdgives an instant overview of note counts per category without loading details. An agent sees "06_Daily_Logs: 285" and understands — this is a large category, read only if needed. -
Why not delete flat mode: Some tools may depend on the old format. Keeping both modes provides migration flexibility.
-
Nested sub-indexes: The system automatically generates
_index.mdfor subfolders (e.g.,01_Projects/Power-Safety-UA/_index.md). This allows agents to drill down even deeper.
Caveats¶
-
Daily Logs — largest category: 285 notes in one
_index.md(~100KB). For very active vaults, consider monthly aggregation (06_Daily_Logs/2026-07/_index.md). -
Index doesn't replace search:
_index.mdcontains only metadata (title, description, tags). For content search, full-text search (FTS) is needed. -
Agents need training: Without updated system prompts, agents will continue reading all
.mdfiles. It's critical to updateAGENTS.mdandopencode.jsonc.
Optimization¶
-
Token Efficiency — real numbers:
- Flat index for 324 notes: ~25,000 tokens
- Hierarchical (index + 1 sub-index): ~1,550 tokens
- For typical queries (90% of cases): 94% savings
-
Scalability: At 5,000 notes, the flat index would take ~400,000 tokens (half of GPT-4's context). Hierarchical — ~5,000 tokens (1.25%).
Conclusions¶
Achievements¶
- 75-94% token savings on typical AI agent queries to Second Brain
- Scalable architecture — O(log n) instead of O(n)
- On-demand access — agents read only relevant categories
- Backward compatible — existing code continues to work
- 100/100 tests — full coverage of new functionality
- Production deploy — 324 notes indexed, MCP server updated
- Agents trained — all system prompts updated with hierarchical rules
Summary Metrics¶
| Metric | Before | After | Change |
|---|---|---|---|
| index.md size | ~100KB | 1KB | -99% |
| Tokens per query | ~25,000 | ~1,550 | -94% |
| Index files | 1 | 10 | +9 |
| Tests | 80 | 100 | +20 |
| MCP tools | 3 | 4 | +1 |
Recommendations for Colleagues¶
- Always use
read_sub_indexinstead of reading the entire vault - Never do
glob **/*.md— it burns tokens without benefit - Update the index after every change — call
generate_index - Follow OKF frontmatter — without it, notes won't appear in the index
- Monitor Daily Logs size — at >500 notes, consider monthly aggregation
Future Improvements¶
- [ ] Monthly aggregation for Daily Logs (
06_Daily_Logs/YYYY-MM/_index.md) - [ ] Full-text search (FTS) integration for content search
- [ ] Incremental indexing — update only changed folders
- [ ] Sub-index compression for very large categories
- [ ] MCP tool
search_notes(query)for full-text search
Appendices¶
A. Changed Files (PR #13)¶
| File | Lines Changed | Purpose |
|---|---|---|
power_core/indexer.py |
+190 | Hierarchical index core |
power_core/__init__.py |
+16 | New exports |
power_core/cli.py |
+12 | Hierarchical by default |
mcp_servers/power_server.py |
+91 | read_sub_index tool |
skills/power/SKILL.md |
+41 | Navigation Protocol |
skills/power/scripts/generate_index.py |
+14 | Updated CLI |
tests/conftest.py |
+19 | Nested fixture |
tests/test_indexer.py |
+199 | 20 new tests |
tests/test_linter.py |
+2 | Updated count |
README.md |
+49 | Updated documentation |
Total: +585 / -48 lines, 10 files
B. Usage Commands¶
# Generate hierarchical index
power index /path/to/vault
# Via Python
python3 -c "
from power_core import run_generate_hierarchical_index
from pathlib import Path
run_generate_hierarchical_index(Path('/path/to/vault'))
"
# Via MCP (in agent)
# read_sub_index(category="01_Projects")
# generate_index()
C. Links¶
- Repository: https://github.com/weby-homelab/power-framework
- PR #13: https://github.com/weby-homelab/power-framework/pull/13
- PR #14: https://github.com/weby-homelab/power-framework/pull/14 (this report)
Report prepared: 2026-07-03T02:20:00Z P.O.W.E.R. Framework v1.5.1 Weby Homelab AI Team