Skip to content

ZeroTrace OSINT

Photo Clustering

Perceptual-hash clustering across many photos to find "same image, different source" matches — without face recognition.

Photo clustering takes a set of photos (URLs, file paths, or both) and groups them by perceptual-hash distance. Photos that hash close to each other are visually similar — usually the same image, possibly the same image after re-encoding, cropping, or minor modification.

It is the toolkit's answer to "is this the same person on these different platforms?" — without face recognition, without biometrics, just image-content hashing.

What you get

For a set of photos:

SectionWhat it surfaces
Cluster tablePhotos grouped by perceptual-hash distance
Per-photo hashespHash, dHash, aHash for each photo
Pairwise distance matrixHamming distance between every pair
Cross-source clustersHighlighted: clusters that span multiple URLs / platforms — the high-value matches
Side-by-side compareOpen any cluster in a side-by-side viewer

Why hash-based, not face-based

Face recognition is the obvious tool for "same person across photos." The toolkit does not ship it for two reasons:

  1. Legal and ethical. Face recognition is regulated in many jurisdictions and ethically fraught everywhere. The toolkit is for finding information people have published, not for identifying people from photos.
  2. Hash-based works for the most useful case. People reuse the same selfie across platforms because uploading a new one is friction. Hash-based matching catches the reuse cases that matter most for cross-platform identity confirmation.

Hash-based does not work for "different photos of the same face." That is the face-recognition case the toolkit deliberately does not address.

Threshold

The default Hamming-distance threshold (8 bits, on a 64-bit hash) catches:

  • Identical images.
  • Images re-encoded at the same or similar quality.
  • Images cropped slightly.
  • Images with minor colour adjustments.

The threshold is configurable. A tighter threshold (4 bits) returns only near-identical matches. A looser threshold (16 bits) catches more dramatic variations but increases false-positive rate.

Cross-source clusters — the high-value finds

A cluster of two photos from the same Instagram account is uninteresting (the user uploaded the same photo twice).

A cluster of two photos — one from Instagram, one from a personal blog — is high-signal evidence of cross-platform identity overlap.

The tool sorts clusters so cross-source ones surface first.

Photo clustering pairs naturally with the username search. The username-search tool captures profile photos for found accounts; pasting that list of profile-photo URLs into the clustering tool tells you which platforms share images.

Inputs

Input modeUse when
Local filesYou have downloaded the photos already
URLsProfile photos from username-search results, photo URLs from any web source
MixedDrop in a folder + paste a URL list

The tool fetches URL inputs over HTTPS (with rate-limiting per host). Local files are read from disk.

Pairwise distance matrix

For small input sets (under 50 photos), the tool renders a pairwise-distance heatmap. Cells colour-graded by distance — dark = close (likely same), light = far (different).

For larger input sets, the matrix is too large to render usefully; the cluster table replaces it.

Side-by-side compare

Click any cluster to open a side-by-side viewer. The full-resolution images appear next to each other so you can verify the match visually. Particularly useful when two images cluster but the hash distance is at the threshold edge.

Pivots

Click on...Pivot to
Photo URLReverse image composer (run reverse-search on the matched image)
Image metadata (if URL hits a host that exposes EXIF)Image metadata
Source URLSite analysis on the source host

Sources

  • All hashing runs locally.
  • URL inputs are fetched via direct HTTPS (rate-limited per host).
  • The clustering algorithm runs locally.

No external API is queried for the matching itself. The pivots from a match (reverse-image, site analysis) use their respective sources.

Command Palette

Search for a command to run...