Recovery is where most RAG systems quietly break down. Traditional pipelines rely on vector similarity, embedding queries and document fragments in the same space and fetching the “closest” matches. But similarity is a poor proxy for what we actually need: Importance based on logic. In long professional documents, such as financial reports, research papers, or legal texts, the correct answer is often not in the paragraph that is most semantically similar. It requires navigating the structure, understanding the context, and implementing multi-step thinking across departments. This is exactly where vector-based RAG starts to break down.
Page index It is designed to solve this gap by rethinking retrieval from first principles. Instead of dividing documents and searching across embeddings, it builds a hierarchy Table of contents style tree index It uses LLMs to think about this structure – much like a human expert scanning sections, mining and connecting ideas. This allows a An unguided, logic-based retrieval process It is more explainable, traceable, and consistent with how knowledge is actually extracted from complex documents. By replacing similarity search with structured exploration and tree-based reasoning, PageIndex provides much higher retrieval accuracy – as evidenced by its strong performance in benchmarks like FinanceBench – making it particularly effective for domains that require accuracy and deep understanding.


In this article, we will use PageIndex to index the underlying Transformer sheet – “Attention is all you need” – and run two exhaustive queries against it without a single vector or embedding. Instead of partitioning a PDF file and retrieving it by similarity, PageIndex creates a hierarchical tree of document sections, and then uses GPT-5.4 To think through the contract summaries and identify exactly which sections contain the answer – before reading a single word of the full text.

Setting dependencies
For this tutorial, you’ll need the PageIndex and OpenAI API keys. You can get the same from https://dash.pageindex.ai/api-keys and https://platform.openai.com/api-keys respectively.
pip install pageindex openai requests
from pageindex import PageIndexClient
import pageindex.utils as utils
import os
from getpass import getpass
PAGEINDEX_API_KEY = getpass('Enter PageIndex API Key: ')
pi_client = PageIndexClient(api_key=PAGEINDEX_API_KEY)
We import the OpenAI client and configure it with an API key to enable access to the LLMs. Next, we define an asynchronous helper function that sends prompts to the form and returns the generated response.
import openai
OPENAI_API_KEY = getpass('Enter OpenAI API Key: ')
async def call_llm(prompt, model="gpt-5.4", temperature=0):
client = openai.AsyncOpenAI(api_key=OPENAI_API_KEY)
response = await client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
temperature=temperature
)
return response.choices[0].message.content.strip()
Building a PageIndex tree
In this part, we download the Transformer sheet directly from arXiv and submit it to PageIndex, which processes the PDF and builds a hierarchical tree of its sections – each node stores the title, abstract, and full text of the section. Once the tree is ready, we print it to examine the structure inferred by PageIndex: each nested chapter, subsection, and heading becomes a node in the tree, maintaining the natural organization of the document exactly as the authors intended it.
─────────────────────────────────────────────
Step 1: Build the PageIndex Tree
─────────────────────────────────────────────
1.1 Download the Transformer paper and submit it
import os, requests
pdf_url = "https://arxiv.org/pdf/1706.03762.pdf"
pdf_path = os.path.join("data", pdf_url.split("/")[-1])
os.makedirs("data", exist_ok=True)
print("Downloading 'Attention Is All You Need'...")
response = requests.get(pdf_url)
with open(pdf_path, "wb") as f:
f.write(response.content)
print(f"✅ Saved to {pdf_path}")
doc_id = pi_client.submit_document(pdf_path)["doc_id"]
print(f"📄 Document submitted. doc_id: {doc_id}")
1.2 Retrieve the tree (poll until ready)
import time
print("\nWaiting for PageIndex tree to be ready", end="")
while not pi_client.is_retrieval_ready(doc_id):
print(".", end="", flush=True)
time.sleep(5)
tree = pi_client.get_tree(doc_id, node_summary=True)["result"]
print("\n\n📂 Document Tree Structure:")
utils.print_tree(tree)


Logic-based retrieval
After creating the tree, we now intentionally run an exhaustive query – one that cannot be answered with a single section of the paper. We remove the full text from each node, leaving only the titles and abstracts, and pass the entire tree structure to GPT-5.4. The model then considers these summaries to identify each node that is likely to contain a relevant answer, and returns both its step-by-step reasoning and a list of matching node IDs. This is the essence of what makes PageIndex different: LLM decides where to search before any full text is loaded.
─────────────────────────────────────────────
Step 2: Reasoning-Based Retrieval
─────────────────────────────────────────────
2.1 Define a query that requires navigating across sections
import json
This query is intentionally cross-cutting -- it can't be answered
by a single section, which is where tree search shines over top-k.
query = "Why did the authors choose self-attention over recurrence, and what are the complexity trade-offs they compared?"
tree_without_text = utils.remove_fields(tree.copy(), fields=["text"])
search_prompt = f"""
You are given a question and a hierarchical tree structure of a research paper.
Each node has a node_id, title, and a summary of its content.
Your task: identify ALL nodes that are likely to contain information relevant to answering the question.
Think carefully -- the answer may be spread across multiple sections.
Question: {query}
Document tree:
{json.dumps(tree_without_text, indent=2)}
Reply ONLY in this JSON format, no preamble:
{{
"thinking": "",
"node_list": ["node_id_1", "node_id_2", ...]
}}
"""
print(f'🔍 Query: "{query}"\n')
print("Running tree search with GPT-5.4...")
tree_search_result = await call_llm(search_prompt)
2.2 Inspect the retrieval reasoning and matched nodes
node_map = utils.create_node_mapping(tree)
result_json = json.loads(tree_search_result)
print("\n🧠 LLM Reasoning:")
utils.print_wrapped(result_json["thinking"])
print("\n📌 Retrieved Nodes:")
for node_id in result_json["node_list"]:
node = node_map[node_id]
print(f" • [{node['node_id']}] Page {node['page_index']:andgt;2} -- {node['title']}")


Answer to generation
Once we identify the relevant nodes, we pull their full text and merge it together into a single context block – each section is clearly labeled so the model knows where each piece of information is coming from. This combined context is then delivered to GPT-5.4 with a structured prompt asking about the underlying motivation, the specific complexity numbers, and any caveats acknowledged by the authors. The model answers using only what has been retrieved, basing each prompt directly in the body of the paper.
─────────────────────────────────────────────
Step 3: Answer Generation
─────────────────────────────────────────────
3.1 Stitch together context from all retrieved nodes
node_list = result_json["node_list"]
relevant_content = "\n\n---\n\n".join(
f"[Section: {node_map[nid]['title']}]\n{node_map[nid]['text']}"
for nid in node_list
)
print(f"\n📖 Retrieved Context Preview (first 1200 chars):\n")
utils.print_wrapped(relevant_content[:1200] + "...\n")
3.2 Generate a structured answer grounded in the retrieved sections
answer_prompt = f"""
You are a technical assistant. Answer the question below using ONLY the provided context.
Be specific -- reference actual design choices, numbers, and trade-offs mentioned in the text.
Question: {query}
Context:
{relevant_content}
Structure your answer as:
1. The core motivation for choosing self-attention
2. The specific complexity comparisons made (include any tables or numbers)
3. Any caveats or limitations the authors acknowledged
"""
print("💬 Generating answer...\n")
answer = await call_llm(answer_prompt)
print("─" * 60)
print("✅ Final Answer:\n")
utils.print_wrapped(answer)
print("─" * 60)




Test with the second query
To show that the tree has been built once and reused at no additional cost, we run a second query – this time targeting a local mechanism rather than a global design decision. The same tree structure is passed to GPT-5.4, which narrows its search to only subsections of interest, retrieves their full text, and generates a clear explanation of how multi-headed attention works and why the scaling factor is important. No re-indexing, no re-inclusion – just a new question against the same tree.
query2 = "How does the multi-head attention mechanism work, and what is the role of scaling in dot-product attention?"
search_prompt2 = f"""
You are given a question and a hierarchical tree structure of a research paper.
Identify all nodes likely to contain the answer.
Question: {query2}
Document tree:
{json.dumps(tree_without_text, indent=2)}
Reply ONLY in this JSON format:
{{
"thinking": "",
"node_list": ["node_id_1", ...]
}}
"""
print(f'\n\n🔍 Second Query: "{query2}"\n')
result2_raw = await call_llm(search_prompt2)
result2 = json.loads(result2_raw)
print("🧠 Reasoning:")
utils.print_wrapped(result2["thinking"])
relevant_content2 = "\n\n---\n\n".join(
f"[Section: {node_map[nid]['title']}]\n{node_map[nid]['text']}"
for nid in result2["node_list"]
)
answer_prompt2 = f"""
Answer the following question using ONLY the provided context.
Explain the mechanism clearly, as if for a technical blog post.
Question: {query2}
Context: {relevant_content2}
"""
answer2 = await call_llm(answer_prompt2)
print("\n✅ Answer:\n")
utils.print_wrapped(answer2)


verify Full codes here. Search hundreds of machine learning/data science Colab notebooks here. Also, feel free to follow us on twitter And don’t forget to join us 130k+ ml SubReddit And subscribe to Our newsletter. I am waiting! Are you on telegram? Now you can join us on Telegram too.
Do you need to partner with us to promote your GitHub Repo page, face hug page, product release, webinar, etc.? Contact us

I am a graduate of Civil Engineering (2022) from Jamia Millia Islamia University, New Delhi, and I have a keen interest in data science, especially neural networks and their applications in various fields.