Science Education Platform

Science Textbook Q&A System

RAG-powered Q&A platform for 100-page science textbooks with grounded answers, exam-style questions, ELI10 mode, and page-based citations using React, FastAPI, and OpenAI.

PDF Upload & Ingestion

Upload science textbooks and automatically extract text with page attribution

RAG-Powered Q&A

Semantic search with OpenAI embeddings and grounded answer generation

Multiple Chat Modes

Default, exam-style, and ELI10 modes for personalized learning experiences

Feature Overview

Comprehensive science textbook Q&A platform with RAG-powered intelligence and multiple learning modes

Core Features

  • Upload and store 100-page science textbook PDFs
  • Extract text with page-level attribution
  • Chunk content with 500-800 token segments
  • OpenAI embeddings with vector indexing in Qdrant
  • Semantic retrieval with top-K similarity search
  • Grounded answers with page-based citations
  • Three chat modes: default, exam-style, ELI10
  • Multi-turn conversation history and context

Advanced Features

  • Async ingestion with progress tracking
  • Confidence scoring with 'not found' handling
  • SSE streaming for real-time responses
  • Hybrid retrieval with reranking
  • User-isolated textbook access control
  • JWT authentication with per-user data
  • Audit logs for uploads and Q&A usage
  • Structured logging with trace IDs

System Architecture

Four-tier architecture with React frontend, FastAPI backend, async workers, and dual data stores

React Frontend

Upload UI, status tracking, chat interface with SSE streaming, and citation source panel

Next.jsTypeScriptTailwind CSSShadcn UI

FastAPI Backend

Auth, upload handling, ingestion job APIs, chat endpoints, and OpenAI integration

FastAPIPythonPydanticJWT Auth

Worker Process

Async PDF extraction, chunking, embedding generation, and vector index updates

PythonCeleryPyPDF2OpenAI SDK

Data Layer

PostgreSQL for metadata, Qdrant for vectors, S3 for PDF storage

PostgreSQLQdrantS3/Localpgvector

Request Flow Diagram

Upload PDF
FastAPI validates & stores
Ingestion Job
Worker extracts, chunks, embeds
Ask Question
Retrieve chunks → OpenAI → answer
Stream Response
SSE tokens + final citations

RAG Ingestion Pipeline

Six-step async pipeline from PDF upload to vector-indexed searchable content

1. PDF Upload

User uploads science textbook PDF via React interface

  • File validation (size, type)
  • Storage to S3 or local disk
  • Create textbook + job records

2. Text Extraction

Worker extracts text with page-level attribution

  • PyPDF2 or pdfplumber per page
  • Clean and normalize text
  • Preserve page boundaries

3. Chunking

Split text into overlapping semantic segments

  • 500-800 tokens per chunk
  • 100-token overlap
  • Attach page_start and page_end

4. Embeddings

Generate OpenAI embeddings for semantic search

  • text-embedding-3-small
  • Batch processing
  • Store in Qdrant vector DB

5. Index Storage

Persist vectors and metadata for retrieval

  • Qdrant with textbook_id filter
  • Chunk metadata in PostgreSQL
  • Content hash for dedup

6. Ready State

Mark job as READY and enable chat interface

  • Update job status
  • Show success in UI
  • Enable Q&A functionality

Multiple Chat Modes

Three specialized modes for different learning scenarios and student needs

Default Mode

Standard Q&A with grounded answers and citations

  • Direct answers from textbook content
  • Page citations for verification
  • Multi-turn conversation context
  • Confidence scoring

Exam-Style Mode

Practice questions and test preparation support

  • Generate practice questions
  • Explain question-solving strategy
  • Identify key concepts to review
  • Prepare for assessments

ELI10 Mode

Simplified explanations for younger students

  • Age-appropriate language
  • Analogies and simple examples
  • Break down complex concepts
  • Encourage curiosity

Chat Features

User Experience

  • • Message list with user and assistant roles
  • • Mode selector for easy switching
  • • Streaming response indicator
  • • Citation chips for source references

Technical Details

  • • SSE streaming for real-time tokens
  • • Conversation persistence in PostgreSQL
  • • Citation tracking with page ranges
  • • Source panel with snippet preview

Key Benefits

Transform textbook learning with instant, accurate, and personalized question answering

Instant Answers

Students get immediate responses to textbook questions without flipping through pages

90% faster than manual search

Grounded & Accurate

All answers cite specific textbook pages, preventing hallucinations and ensuring accuracy

100% citation coverage

Adaptive Learning

Three modes support different learning styles from exam prep to simplified explanations

3× engagement increase

Multi-User Support

Per-user textbook isolation with JWT auth ensures private and secure learning environments

Full data privacy

Ready to build your own science textbook Q&A system?

Implementation timeline: 4-6 weeks

Build Your Next Product With AI Expertise

Experience the future of software development. Let our GenAI platform accelerate your next project.

Schedule a Free AI Blueprint Session