Run Local Wikipedia Index with Dria

# Run Local Wikipedia Index with Dria

Overview

With the Docker setup of Dria, users can host Dria knowledge databases directly on their local hard drives. This setup allows for the utilization of extensive retrieval indexes offline without needing an internet connection.

Wikipedia is the largest knowledge base humans can access and with Dria, we upload all index to Arweave and make it usable for everyone on their devices

Wikipedia Index (opens in a new tab)

Setup

Install Docker

To install Docker, follow the instructions on the Docker official website:

URL: Docker Installation Guide (opens in a new tab)

This guide provides detailed installation steps for various operating systems, including Windows, macOS, and Linux. It will walk you through the downloading process of the Docker Desktop or Docker Engine, depending on your operating system, and setting it up for your development needs.

Install Node.js

To install Node.js, you can use a package manager that works with your operating system. This method simplifies the installation process and often takes care of setting up the environment variables for you.

URL: Node.js Package Manager Installation Guide (opens in a new tab)

The Node.js website provides instructions for installing Node.js via package manager for a variety of platforms, including Linux, macOS, and Windows. This guide will help you install Node.js and npm (Node package manager), which is used to install packages from the npm registry.

By following these guides, you will set up the necessary environment for running and developing applications that can utilize Docker containers and Node.js runtime.

Install Require Libraries

To set up the environment necessary for running the provided code snippet, you must install specific Python and Node.js libraries.

Use the command below to install the Dria library for local data queries to Index.

Torch and transformers needs for embed the queries.

pip install dria transformers torch

Additionally, execute the command below to globally install the dria-cli package with npm, enabling command-line interactions with Dria for managing data and queries.

npm i -g dria-cli

Fetch knowledge base from Blockchain

Dria-cli (opens in a new tab) is useful for accessing and managing Dria Indexes on blockchain.

So, we will use the 'fetch' command for our case to get our index.

dria fetch uaBIB4kh7gYh6vSNL7V2eygfbyRu9vGZ_nJ6jKVn_x8 # Transaction/Contract ID of Wikipedia

Initializing an Embedding Model for Querying the Dria Index

To perform queries on the Dria index, such as one built from Wikipedia data, you must generate embeddings for the queried text using the model specified for index creation. In this case, use the BAAI/bge-large-en (opens in a new tab) model to create these embeddings.

from transformers import AutoTokenizer, AutoModel
import torch
 
# Call BAAI/bge-large-en model on Huggingface
tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-large-en")
model = AutoModel.from_pretrained("BAAI/bge-large-en")

Setting Up Dria for Local Use

To utilize the local version of the Dria Index, you should instantiate the DriaLocal class from the Dria (opens in a new tab) library.

# Import Dria for API Access, DriaLocal for local serve
from dria import DriaLocal
 
dria = DriaLocal()

Executing the Query

Following the previous process, we will transform our query into an embedding and then query Dria with it.

def get_embeddings(texts):
    # Tokenize the input texts, ensuring they are padded and truncated as necessary.
    encoded_input = tokenizer(texts, return_tensors='pt', padding=True,
                              truncation=True, max_length=512)
    # Generate embeddings without updating model weights.
    with torch.no_grad():
        model_output = model(**encoded_input)
    # Calculate the mean of the last hidden state to get the embeddings.
    embeddings = model_output.last_hidden_state.mean(dim=1)
    return embeddings
 
# The query we wish to run.
query = "What is the AGI?"
# Generate the embedding for our query.
embedded_query = get_embeddings(query)
# Convert the tensor to a list of floats for the query.
embedded_query = [float(i) for i in embedded_query[0]]
 
# Execute the query against Dria, retrieving the top 3 results.
query_results = dria.query(embedded_query, top_n=3)

Integrating Ollama with Dria for Retriever-Augmented Generation (RAG)

To incorporate the output of queries in large language models (LLMs), we will utilize Ollama for running LLMs locally. Ollama (opens in a new tab) facilitates the local operation of these models, enhancing efficiency and reducing reliance on external services.

We need to install Ollama initially. Then, we'll explore how to implement it within our code alongside Dria to enable a seamless RAG application.

Setting Up Ollama

First, obtain the version of Ollama compatible with your operating system from Ollama (opens in a new tab)

Next, select the specific model you wish to utilize. For our purposes, we will employ the Llama-2-7b model.

For a comprehensive list of available models and their capabilities, refer to the Ollama model catalog at Library (opens in a new tab).

ollama run llama2:7b

Then we should install python library for using code interface.

pip install ollama

Implementing the RAG Pipeline Using Ollama

Dria provides texts along with their metadata, that serves as valuable inputs for constructing Retriever-Augmented Generation (RAG) pipelines.

We'll process the results to include only those with a relevance score above a set threshold, in this case, 0.5.

Following this filtering, we'll leverage Ollama, augmenting our prompt with the selected context, to generate a comprehensive response.

import ollama
 
# Set the minimum score for query results to be considered
query_threshold = 0.5
 
# Filter the query results based on the score threshold
filtered_query_results = [item["metadata"]["text"] for item in query_results if item["score"] > query_threshold]
 
# Construct the prompt for Ollama, incorporating the filtered query results and the user's prompt
ollama_input = f"""
  Incorporate the following context when responding to the user's prompt:
  Context: {filtered_query_results}
 
  User's prompt: {prompt}
"""
 
# Generate a response using Ollama with the specified model and the constructed prompt
response = ollama.chat(model='llama2:7b-chat', messages=[{'role': 'user', 'content': ollama_input}])
 
# Output the response from Ollama
print(response)