Created at 7am, Jan 19
benjaminArtificial Intelligence
0
Foundations of Vector Retrieval by Sebastian Bruch
5UU6Q0ymbm8Sw0z7qmm3UyubmUbJJIQLOnQDCwXJyRU
File Type
PDF
Entry Count
519
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

\'Vectors are universal mathematical objects that can represent text, images, speech, or a mix of these data modalities. That happens regardless of whether data is represented by hand-crafted features or learnt embeddings. Collect a large enough quantity of such vectors and the question of retrieval becomes urgently relevant: Finding vectors that are more similar to a query vector. This monograph is concerned with the question above and covers fundamental concepts along with advanced data structures and algorithms for vector retrieval. In doing so, it recaps this fascinating topic and lowers barriers of entry into this rich area of research.\' - Sebastian Bruchhttps://doi.org/10.48550/arXiv.2401.09350

We must add that, the detail above is generally satisfied in practice. Importantly, if the vectors in our collection are independent and identically distributed, then the collection is almost surely in general position. That is why we often take that technicality for granted. So from now on, we assume that the Delaunay graph of a collection of points is unique.
id: 796fe3ec9a9e2c72b75931aef08b187e - page: 96
6.2.3 Top-1 Retrieval We can immediately recognize the importance of Voronoi regions: They geometrically capture the set of queries for which a point from the collection is the solution to the top-1 retrieval problem. But what is the significance of the dual representation of this geometrical concept? How does the Delaunay graph help us solve the top-1 retrieval problem? For one, the Delaunay graph is a compact representation of the Voronoi diagram. Instead of describing polytopes, we need only to record edges between neighboring nodes. But, more crucially, as the following claim shows, we can traverse the Delaunay graph greedily, and reach the optimal top-1 solution from any node. In other words, the Delaunay graph has the desired property we described in Section 6.1.
id: aa53c5d19c3438d0a058d1ba6c46b6ba - page: 96
Theorem 6.1 Let G = (V, E) be a graph that contains the Delaunay graph of m vectors X Rd. The best-first-search algorithm over G gives the optimal solution to the top-1 retrieval problem for any arbitrary query q if (, ) is proper. The proof of the result above relies on an important property of the Delaunay graph, which we state first.
id: 98ab2d5625a5551892d2fb6f359f5c04 - page: 96
6.2 The Delaunay Graph Fig. 6.3: Illustration of the second case in the proof of Lemma 6.1. Lemma 6.1 Let G = (V, E) be the Delaunay graph of a collection of points X Rd, and let B be a ball centered at that contains two points u, v X , with radius r = min((, u), (, v)), for a continuous and proper distance function (, ). Then either (u, v) E or there exists a third point in X that is contained in B. Proof. Suppose there is no other point in X that is contained in B. We must show that, in that case, (u, v) E. There are two cases. The first and easy case is when u and v are on the surface of B. Clearly, u and v are equidistant from . Because there are no other points in B, we can conclude that lies in the intersection of Ru and Rv, the Voronoi regions associated with u and v. That implies (u, v) E.
id: 83185a0e4310badbbbfeab071cccfa66 - page: 97
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "5UU6Q0ymbm8Sw0z7qmm3UyubmUbJJIQLOnQDCwXJyRU", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "5UU6Q0ymbm8Sw0z7qmm3UyubmUbJJIQLOnQDCwXJyRU", "level": 2}'