RAG 10 min read

Building an Offline RAG Chatbot

Step-by-step tutorial on creating a local physics chatbot that works completely offline using RAG architecture.

Md. Mahamudul Hasan
Md. Mahamudul Hasan
December 15, 2025

What is RAG?

Retrieval-Augmented Generation (RAG) is a powerful technique that combines the strengths of large language models with external knowledge retrieval. Instead of relying solely on what the model learned during training, RAG allows the AI to access and reference specific documents—like textbooks, manuals, or any custom knowledge base.

"RAG enables AI to be grounded in real, verifiable information rather than just its training data. It's like giving the AI a library card!"

In this tutorial, I'll show you how to build a physics chatbot that can answer questions based on your physics textbooks—and it works completely offline!

Why Build an Offline Chatbot?

🔒

Privacy

All your data stays on your local machine. No information is sent to external servers.

💰

Cost-Free

No API costs! After initial setup, you can use it unlimited times without paying.

🌐

No Internet Required

Perfect for areas with poor connectivity or secure environments.

Fast Response

No network latency—responses come directly from your machine.

Architecture Overview

The RAG pipeline consists of several key components working together:

RAG Pipeline Architecture

📚 PDF Documents
📄 Text Chunking
🔢 Vector Embeddings
🗄️ Vector Database (FAISS)
🔍 Semantic Search
🤖 LLM Response Generation

Step-by-Step Implementation

1. Install Dependencies

Terminal
pip install langchain langchain-community
pip install sentence-transformers
pip install faiss-cpu
pip install pypdf
pip install ollama

2. Load and Process Documents

Python - Document Loading
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load PDF documents
loader = PyPDFLoader("physics_textbook.pdf")
documents = loader.load()

# Split into chunks for better retrieval
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len
)
chunks = text_splitter.split_documents(documents)

print(f"Created {len(chunks)} chunks from the document")

3. Create Vector Embeddings

Python - Embeddings
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# Use a local embedding model (works offline!)
embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'}
)

# Create vector store
vectorstore = FAISS.from_documents(chunks, embeddings)

# Save for later use
vectorstore.save_local("physics_vectorstore")
print("Vector store created and saved!")

4. Set Up Local LLM with Ollama

Terminal - Install Ollama Model
# Install Ollama first from ollama.ai
# Then pull a model (e.g., Llama 2 or Mistral)
ollama pull llama2
# or for a smaller model
ollama pull phi

5. Build the RAG Chain

Python - Complete RAG Implementation
from langchain.llms import Ollama
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# Load the vector store
vectorstore = FAISS.load_local(
    "physics_vectorstore", 
    embeddings,
    allow_dangerous_deserialization=True
)

# Initialize local LLM
llm = Ollama(model="llama2", temperature=0.7)

# Create custom prompt
template = """You are a helpful physics tutor. Use the following context 
to answer the question. If you don't know the answer based on the context, 
say so honestly.

Context: {context}

Question: {question}

Answer: """

prompt = PromptTemplate(
    template=template,
    input_variables=["context", "question"]
)

# Create RAG chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    chain_type_kwargs={"prompt": prompt}
)

# Ask a question!
def ask_physics(question):
    response = qa_chain.invoke({"query": question})
    return response["result"]

# Example usage
print(ask_physics("What is Newton's first law of motion?"))

Example Results

Q

What is Newton's first law of motion?

A

Newton's first law of motion, also known as the law of inertia, states that an object at rest stays at rest and an object in motion stays in motion with the same speed and in the same direction unless acted upon by an unbalanced force. This means that objects naturally resist changes to their state of motion.

Pro Tips

  • Chunk size matters: Smaller chunks (500-1000 chars) work better for precise answers
  • Overlap is important: 10-20% overlap helps maintain context across chunks
  • Try different models: Mistral and Phi are great alternatives to Llama 2
  • GPU acceleration: If you have a GPU, use it for faster inference

🔗 Get the Complete Project

The full source code with additional features like conversation history and a Streamlit UI is available on GitHub.

View on GitHub
#RAG #LangChain #NLP #Ollama #Python #LocalLLM
Md. Mahamudul Hasan

Md. Mahamudul Hasan

Final-year Computer Science student passionate about Machine Learning, Computer Vision, and building accessible AI applications.