Vector Similarity Search

Semantic Search
for Lookalike Discovery

Build a powerful similarity search engine that finds lookalike companies and leads using embeddings and vector databases. Enable "similar accounts", semantic search, and smart recommendations.

Embedding-Powered
Qdrant VectorDB
Metadata Filtering
Query: "B2B SaaS analytics"Top 10 similar
Acme Analytics
SaaS
0.92
similarity
DataFlow Inc
SaaS
0.88
similarity
MetricCo
SaaS
0.85
similarity

Scope Overview

A small, focused service that stores embeddings for company and lead text in a vector database, enabling semantic similarity search for lookalike discovery and smart recommendations.

Must-Have Features

  • Create embeddings for text (company summary, description, title)
  • Upsert vectors with metadata (company_id, domain, industry, country)
  • Query vectors to find top K similar results
  • Filter by metadata (country, industry, employee range)
  • Basic management endpoints (delete, reindex)

Nice-to-Have (Later)

  • Hybrid search combining keyword and vector search
  • Multi-collection support (companies vs people)
  • Reranking and deduplication of results
  • Diversity scoring for varied recommendations

Architecture

A simple three-tier architecture with FastAPI as the API layer, Qdrant for vector storage, and flexible embedding model integration.

FastAPI Service

Vector API wrapper

FastAPI
Python
REST API

Qdrant VectorDB

Simple and fast vector database

Qdrant
Vector Storage
Similarity Search

Embedding Model

Text to vector conversion

OpenAI
Local Models
Sentence Transformers

MySQL (Optional)

Store entities and canonical text

MySQL
Entity Storage
Text Corpus

Data You Embed

Consistent text formatting for company and lead embeddings ensures optimal similarity search results.

Company Embedding Text

Example concatenation:
Company: Acme Analytics
Domain: acme.com
Summary: Leading B2B analytics platform...
Industry: SaaS / Analytics
Keywords: data pipeline, BI, dashboards
Tech: aws, snowflake, dbt

Fields to concatenate:

  • company_name
  • domain
  • summary
  • industry
  • keywords
  • tech_stack

Lead Embedding Text

Optional second collection:
Title: VP of Engineering
Seniority: Director+
Department: Engineering
Company: Acme Analytics (SaaS)
Skills: cloud architecture, AWS

Fields to concatenate:

  • title
  • seniority
  • department
  • company_summary
  • skills/keywords

Qdrant Collections

Define vector collections with appropriate payload metadata for filtering and retrieval.

companies_vectors

Store company embeddings with business metadata

Vector size: Embedding dimension (e.g., 1536 for OpenAI)

Payload Metadata:

company_idstring

Unique company identifier

domainstring

Company website domain

industrystring

Industry classification

countrystring

Company location

employee_bucketstring

Size range (e.g., '51-200')

sourcestring

Data source identifier

leads_vectors

Store lead/person embeddings with contact metadata (optional)

Vector size: Embedding dimension (matches company vectors)

Payload Metadata:

lead_idstring

Unique lead identifier

email_domainstring

Work email domain

titlestring

Job title

countrystring

Lead location

score_tierstring

ICP tier (A/B/C/D)

FastAPI Endpoints

RESTful API for indexing, searching, and managing vector embeddings with metadata filtering.

POST/vectors/companies/upsert

Index a company text with metadata

Request:
{
  "company_id": "c123",
  "text": "Company: Acme... Summary: ...",
  "metadata": {
    "domain": "acme.com",
    "industry": "SaaS",
    "country": "US",
    "employee_bucket": "51-200"
  }
}
Response:
{
  "ok": true
}
POST/vectors/companies/search

Semantic search with filters

Request:
{
  "query_text": "B2B SaaS analytics platform for mid-market",
  "top_k": 10,
  "filters": {
    "country": [
      "US"
    ],
    "industry": [
      "SaaS"
    ]
  }
}
Response:
{
  "matches": [
    {
      "company_id": "c123",
      "score": 0.82,
      "metadata": {
        "domain": "acme.com"
      }
    }
  ]
}
POST/vectors/companies/similar/:company_id

Find companies similar to a given ID

Response:
{
  "matches": [
    {
      "company_id": "c456",
      "score": 0.78,
      "metadata": {
        "domain": "analytics-co.com"
      }
    }
  ]
}
DELETE/vectors/companies/:company_id

Remove company vector from index

Response:
{
  "ok": true,
  "deleted": "c123"
}

Embedding Flow

Simple six-step process from text input to similarity search results with metadata filtering.

1

Receive Text

FastAPI receives company/lead text input from client

2

Create Embedding

Call embed(text) to convert text into float vector

3

Upsert to Qdrant

Store vector in Qdrant with payload metadata

4

Search Query

Embed query text and perform vector similarity search

5

Apply Filters

Filter results by metadata (country, industry, etc.)

6

Return Results

Return top K similar entities with scores

Optional React UI

A minimal testing interface for querying vectors and viewing similarity results.

UI Components

Query Input

Large textarea for entering semantic search query
Example: "B2B SaaS analytics platform for mid-market"

Filter Controls

Dropdown filters for country, industry, employee size
Country: US
Industry: SaaS
Size: 51-200

Results Display

List view showing company_id, domain, and similarity score
c123 - acme.com
0.92
c456 - dataflow.io
0.88

Similar Companies Button

"Find similar to company X" button for lookalike discovery

Production Benefits

Deploy a production-ready vector similarity search engine that powers lookalike discovery and smart recommendations.

Key Benefits

  • Index 1,000+ company texts with consistent embeddings
  • Semantic search returns contextually relevant similar companies
  • Metadata filters work correctly (country, industry, size)
  • Find similar companies by ID for lookalike discovery
  • Fast query performance with Qdrant vector database
  • Flexible embedding model integration (OpenAI, local, etc.)
  • Optional lead/person similarity search support
  • RESTful API ready for integration with existing systems

Done Criteria

  • Successfully index 1,000 company texts
  • Search returns relevant similar companies
  • Filters work correctly (e.g., only US SaaS)
  • Similar companies endpoint works by company_id

Ready to Get Started?

Docker-compose setup available with FastAPI + Qdrant and sample data to test similarity immediately.

Build Your Next Product With AI Expertise

Experience the future of software development. Let our GenAI platform accelerate your next project.

Schedule a Free AI Blueprint Session