LexTreeRAG¶

What is this?

LexTreeRAG is a vectorless legal RAG system that answers questions about EU law by searching the live EUR-Lex CELLAR database — without needing a vector database or pre-embedded corpus.

How it works¶

User question
  │
  ├─ 1. Classify question ──────── Claude Haiku
  ├─ 2. Extract keywords + year ── Claude Haiku
  ├─ 3. Search EUR-Lex SPARQL ──── publications.europa.eu
  ├─ 4. Fetch documents ────────── HTML → Markdown  /  PDF → text
  ├─ 5. Build reasoning trees ──── parse Articles → summarise → cache JSON
  ├─ 6. Navigate trees ─────────── Claude Haiku picks relevant articles
  ├─ 7. Python reranking ───────── obligation score + list score
  ├─ 8. Generate answer ────────── Claude Opus 4.6 (streaming)
  └─ 9. Confidence scoring ──────── Claude Haiku → retry if < 50%

Architecture¶

Pattern B — Unified Streamlit. There is no separate frontend and backend.

Layer	File	Role
UI + Orchestration	`app.py`	Streamlit interface, calls pipeline modules directly
CLI	`main.py`	Terminal entry point, same pipeline
Pipeline	`pipeline/*.py`	Keyword extraction, EUR-Lex search, tree building, navigation, ranking, answer generation
Cache	`data/cache/`	JSON reasoning trees per document
PDF Archive	`data/pdfs/`	Raw PDFs downloaded from CELLAR

External dependencies¶

Service	Purpose
Anthropic API — Claude Haiku 4.5	Keyword extraction, question classification, article summarisation, tree navigation, confidence scoring
Anthropic API — Claude Opus 4.6	Final answer generation with adaptive thinking
EUR-Lex CELLAR SPARQL	Live document discovery (`publications.europa.eu/webapi/rdf/sparql`)
EUR-Lex CELLAR content	HTML and PDF document fetch
Google Gemini (optional)	Hybrid second-opinion answer with Google Search grounding

Pages¶

Chat — Ask EUR-Lex — main chat interface
References Tab — article reference cards
Cache Manager — browse and manage cached trees
Pipeline Architecture — full technical reference