Here's a common misconception about RAG! Most people think RAG works like this: index a document → retrieve that same document. But indexing ≠ retrieval. What you index doesn't have to be what you feed the LLM. Once you understand this, you can build RAG systems that actually work. Here are 4 indexing strategies that separate good RAG from great RAG: 1) Chunk Indexing ↳ This is the standard approach. Split documents into chunks, embed them, store in a vector database, and retrieve the closest matches. ↳ Simple and effective, but large or noisy chunks will hurt your precision. 2) Sub-chunk Indexing ↳ Break your chunks into smaller sub-chunks for indexing, but retrieve the full chunk for context. ↳ This is powerful when a single section covers multiple concepts. You get better query matching without losing the surrounding context your LLM needs. 3) Query Indexing ↳ Instead of indexing raw text, generate hypothetical questions the chunk could answer. Index those questions instead. ↳ User queries naturally align better with questions than raw document text. This closes the semantic gap between what users ask and what you've stored. ...