Science Textbook Q&A System
RAG-powered Q&A platform for 100-page science textbooks with grounded answers, exam-style questions, ELI10 mode, and page-based citations using React, FastAPI, and OpenAI.
PDF Upload & Ingestion
Upload science textbooks and automatically extract text with page attribution
RAG-Powered Q&A
Semantic search with OpenAI embeddings and grounded answer generation
Multiple Chat Modes
Default, exam-style, and ELI10 modes for personalized learning experiences
Feature Overview
Comprehensive science textbook Q&A platform with RAG-powered intelligence and multiple learning modes
Core Features
- Upload and store 100-page science textbook PDFs
- Extract text with page-level attribution
- Chunk content with 500-800 token segments
- OpenAI embeddings with vector indexing in Qdrant
- Semantic retrieval with top-K similarity search
- Grounded answers with page-based citations
- Three chat modes: default, exam-style, ELI10
- Multi-turn conversation history and context
Advanced Features
- Async ingestion with progress tracking
- Confidence scoring with 'not found' handling
- SSE streaming for real-time responses
- Hybrid retrieval with reranking
- User-isolated textbook access control
- JWT authentication with per-user data
- Audit logs for uploads and Q&A usage
- Structured logging with trace IDs
System Architecture
Four-tier architecture with React frontend, FastAPI backend, async workers, and dual data stores
React Frontend
Upload UI, status tracking, chat interface with SSE streaming, and citation source panel
FastAPI Backend
Auth, upload handling, ingestion job APIs, chat endpoints, and OpenAI integration
Worker Process
Async PDF extraction, chunking, embedding generation, and vector index updates
Data Layer
PostgreSQL for metadata, Qdrant for vectors, S3 for PDF storage
Request Flow Diagram
RAG Ingestion Pipeline
Six-step async pipeline from PDF upload to vector-indexed searchable content
1. PDF Upload
User uploads science textbook PDF via React interface
- •File validation (size, type)
- •Storage to S3 or local disk
- •Create textbook + job records
2. Text Extraction
Worker extracts text with page-level attribution
- •PyPDF2 or pdfplumber per page
- •Clean and normalize text
- •Preserve page boundaries
3. Chunking
Split text into overlapping semantic segments
- •500-800 tokens per chunk
- •100-token overlap
- •Attach page_start and page_end
4. Embeddings
Generate OpenAI embeddings for semantic search
- •text-embedding-3-small
- •Batch processing
- •Store in Qdrant vector DB
5. Index Storage
Persist vectors and metadata for retrieval
- •Qdrant with textbook_id filter
- •Chunk metadata in PostgreSQL
- •Content hash for dedup
6. Ready State
Mark job as READY and enable chat interface
- •Update job status
- •Show success in UI
- •Enable Q&A functionality
Multiple Chat Modes
Three specialized modes for different learning scenarios and student needs
Default Mode
Standard Q&A with grounded answers and citations
- •Direct answers from textbook content
- •Page citations for verification
- •Multi-turn conversation context
- •Confidence scoring
Exam-Style Mode
Practice questions and test preparation support
- •Generate practice questions
- •Explain question-solving strategy
- •Identify key concepts to review
- •Prepare for assessments
ELI10 Mode
Simplified explanations for younger students
- •Age-appropriate language
- •Analogies and simple examples
- •Break down complex concepts
- •Encourage curiosity
Chat Features
User Experience
- • Message list with user and assistant roles
- • Mode selector for easy switching
- • Streaming response indicator
- • Citation chips for source references
Technical Details
- • SSE streaming for real-time tokens
- • Conversation persistence in PostgreSQL
- • Citation tracking with page ranges
- • Source panel with snippet preview
Key Benefits
Transform textbook learning with instant, accurate, and personalized question answering
Instant Answers
Students get immediate responses to textbook questions without flipping through pages
Grounded & Accurate
All answers cite specific textbook pages, preventing hallucinations and ensuring accuracy
Adaptive Learning
Three modes support different learning styles from exam prep to simplified explanations
Multi-User Support
Per-user textbook isolation with JWT auth ensures private and secure learning environments
Ready to build your own science textbook Q&A system?
Build Your Next Product With AI Expertise
Experience the future of software development. Let our GenAI platform accelerate your next project.
Schedule a Free AI Blueprint Session