Created at 11am, Feb 27
firstbatchGeneral
164
wikipedia.20220301.en
uaBIB4kh7gYh6vSNL7V2eygfbyRu9vGZ_nJ6jKVn_x8
File Type
CUSTOM
Entry Count
5600000
Embed. Model
BAAI/bge-large-en-v1.5
Index Type
hnsw

Wikipedia[en] is a RAG index built using the '20220301.en' subset of HuggingFace dataset wikipedia. It contains ~6M cleaned articles in English. The RAG is built using the BAAI/bge-large-en-v1.5 model. We embedded full articles rather than just titles, coupled with summarization of long chunks to enhance retrieval performance significantly. The index also enables filtering retrieval context by filling a “query” input, simply some keywords. Then, along with the vectors you sent, the keywords will be used for extracting the exact passage from the context for cutting noise and providing only the most accurate part you need via the bm25 method.

How to Retrieve?

# Installation
Dria CLI requires NodeJS (>= 18.0.0) & Docker to be installed on your machine, and is available on NPM. It can be installed to your system with:

npm i -g dria-cli

# Fetch Index with Transaction ID
dria fetch uaBIB4kh7gYh6vSNL7V2eygfbyRu9vGZ_nJ6jKVn_x8

# Serve Index with Name
dria serve wikipedia.20220301.en