Project Synopsis

n8n workflow

What is the problem being addressed? 

The project addresses the deficiencies common in traditional academic document retrieval systems, particularly targeting the university's digital repository.

Lack of Adequacy: Traditional keyword-based search engines have limitations in understanding the semantical intentions of users' queries, hence the provision of incorrect or incomplete search results.

Information Overload: With the vast number of scholarly literature available, people often struggle to quickly find concise and relevant resources.

Lack of Intelligent Support: There is no unified system that can summarize documents, answer research questions, and track sources/citations—making research time-consuming and fragmented.

What is your project idea and how will it work (what are its components etc)?

The project is to implement an intelligent, AI driven Retrieval-Augmented Generation (RAG) system atop the existing DSpace repository of the Department of Computer Science. The system shall provide semantic document search, intelligent answering of questions, and citation-based answering in a unified system.

Key Elements

1. Document Ingestion & Preprocessing

  • Retrieve ~1,000 papers from the archive (PDF/Word).
  • Perform OCR (on scanned documents), preprocess the text, and chunk the text into smaller pieces for embedding efficiently.

2. Semantic Embeddings & Vector Search

  • Employ models like DeepSeek, LLaMA, or OpenAI to create vector embeddings.
  • Store them in a vector database (such as Qdrant or FAISS) to allow for rapid, semantic search.

3. AI chatbot interface

  • Users submit questions in natural language.
  • A backend pipeline fetches the relevant chunks of documents and supplies them to a large language model (LLM) to produce summaries or direct answers.
  • All responses contain source tracing (citations and references to the source materials).

4. Comparative Model Assessment

  • Compare the open-source and commercial LLMs based on their costs, speed levels, and levels of user satisfaction (like Gemini 2.0, GPT-4o, DeepSeek R1, LLaMA 3

5. User Interface (UI)

  • An interactive web interface with:
    • Chat window & search bar
    • Summarized results with links
    • Filters for departments, document types, etc.
RAG for researchers interfaceperplexity chat

n8n workflow part 2

Data flow diagram