What is the problem being addressed?
The project addresses the deficiencies common in traditional academic document retrieval systems, particularly targeting the university's digital repository.
Lack of Adequacy: Traditional keyword-based search engines have limitations in understanding the semantical intentions of users' queries, hence the provision of incorrect or incomplete search results.
Information Overload: With the vast number of scholarly literature available, people often struggle to quickly find concise and relevant resources.
Lack of Intelligent Support: There is no unified system that can summarize documents, answer research questions, and track sources/citations—making research time-consuming and fragmented.
What is your project idea and how will it work (what are its components etc)?
The project is to implement an intelligent, AI driven Retrieval-Augmented Generation (RAG) system atop the existing DSpace repository of the Department of Computer Science. The system shall provide semantic document search, intelligent answering of questions, and citation-based answering in a unified system.
Key Elements
1. Document Ingestion & Preprocessing
- Retrieve ~1,000 papers from the archive (PDF/Word).
- Perform OCR (on scanned documents), preprocess the text, and chunk the text into smaller pieces for embedding efficiently.
2. Semantic Embeddings & Vector Search
- Employ models like DeepSeek, LLaMA, or OpenAI to create vector embeddings.
- Store them in a vector database (such as Qdrant or FAISS) to allow for rapid, semantic search.
3. AI chatbot interface
- Users submit questions in natural language.
- A backend pipeline fetches the relevant chunks of documents and supplies them to a large language model (LLM) to produce summaries or direct answers.
- All responses contain source tracing (citations and references to the source materials).
4. Comparative Model Assessment
- Compare the open-source and commercial LLMs based on their costs, speed levels, and levels of user satisfaction (like Gemini 2.0, GPT-4o, DeepSeek R1, LLaMA 3
5. User Interface (UI)
- An interactive web interface with:
- Chat window & search bar
- Summarized results with links
- Filters for departments, document types, etc.



