Semantic Search
for Lookalike Discovery
Build a powerful similarity search engine that finds lookalike companies and leads using embeddings and vector databases. Enable "similar accounts", semantic search, and smart recommendations.
Scope Overview
A small, focused service that stores embeddings for company and lead text in a vector database, enabling semantic similarity search for lookalike discovery and smart recommendations.
Must-Have Features
- Create embeddings for text (company summary, description, title)
- Upsert vectors with metadata (company_id, domain, industry, country)
- Query vectors to find top K similar results
- Filter by metadata (country, industry, employee range)
- Basic management endpoints (delete, reindex)
Nice-to-Have (Later)
- Hybrid search combining keyword and vector search
- Multi-collection support (companies vs people)
- Reranking and deduplication of results
- Diversity scoring for varied recommendations
Architecture
A simple three-tier architecture with FastAPI as the API layer, Qdrant for vector storage, and flexible embedding model integration.
FastAPI Service
Vector API wrapper
Qdrant VectorDB
Simple and fast vector database
Embedding Model
Text to vector conversion
MySQL (Optional)
Store entities and canonical text
Data You Embed
Consistent text formatting for company and lead embeddings ensures optimal similarity search results.
Company Embedding Text
Fields to concatenate:
company_namedomainsummaryindustrykeywordstech_stack
Lead Embedding Text
Fields to concatenate:
titlesenioritydepartmentcompany_summaryskills/keywords
Qdrant Collections
Define vector collections with appropriate payload metadata for filtering and retrieval.
companies_vectors
Store company embeddings with business metadata
Payload Metadata:
company_idstringUnique company identifier
domainstringCompany website domain
industrystringIndustry classification
countrystringCompany location
employee_bucketstringSize range (e.g., '51-200')
sourcestringData source identifier
leads_vectors
Store lead/person embeddings with contact metadata (optional)
Payload Metadata:
lead_idstringUnique lead identifier
email_domainstringWork email domain
titlestringJob title
countrystringLead location
score_tierstringICP tier (A/B/C/D)
FastAPI Endpoints
RESTful API for indexing, searching, and managing vector embeddings with metadata filtering.
/vectors/companies/upsertIndex a company text with metadata
{
"company_id": "c123",
"text": "Company: Acme... Summary: ...",
"metadata": {
"domain": "acme.com",
"industry": "SaaS",
"country": "US",
"employee_bucket": "51-200"
}
}{
"ok": true
}/vectors/companies/searchSemantic search with filters
{
"query_text": "B2B SaaS analytics platform for mid-market",
"top_k": 10,
"filters": {
"country": [
"US"
],
"industry": [
"SaaS"
]
}
}{
"matches": [
{
"company_id": "c123",
"score": 0.82,
"metadata": {
"domain": "acme.com"
}
}
]
}/vectors/companies/similar/:company_idFind companies similar to a given ID
{
"matches": [
{
"company_id": "c456",
"score": 0.78,
"metadata": {
"domain": "analytics-co.com"
}
}
]
}/vectors/companies/:company_idRemove company vector from index
{
"ok": true,
"deleted": "c123"
}Embedding Flow
Simple six-step process from text input to similarity search results with metadata filtering.
Receive Text
FastAPI receives company/lead text input from client
Create Embedding
Call embed(text) to convert text into float vector
Upsert to Qdrant
Store vector in Qdrant with payload metadata
Search Query
Embed query text and perform vector similarity search
Apply Filters
Filter results by metadata (country, industry, etc.)
Return Results
Return top K similar entities with scores
Optional React UI
A minimal testing interface for querying vectors and viewing similarity results.
UI Components
Query Input
Example: "B2B SaaS analytics platform for mid-market"Filter Controls
Results Display
Similar Companies Button
Production Benefits
Deploy a production-ready vector similarity search engine that powers lookalike discovery and smart recommendations.
Key Benefits
- Index 1,000+ company texts with consistent embeddings
- Semantic search returns contextually relevant similar companies
- Metadata filters work correctly (country, industry, size)
- Find similar companies by ID for lookalike discovery
- Fast query performance with Qdrant vector database
- Flexible embedding model integration (OpenAI, local, etc.)
- Optional lead/person similarity search support
- RESTful API ready for integration with existing systems
Done Criteria
- Successfully index 1,000 company texts
- Search returns relevant similar companies
- Filters work correctly (e.g., only US SaaS)
- Similar companies endpoint works by company_id
Ready to Get Started?
Docker-compose setup available with FastAPI + Qdrant and sample data to test similarity immediately.
Build Your Next Product With AI Expertise
Experience the future of software development. Let our GenAI platform accelerate your next project.
Schedule a Free AI Blueprint Session