ZeroTrace OSINT
Photo Clustering
Perceptual-hash clustering across many photos to find "same image, different source" matches — without face recognition.
Photo clustering takes a set of photos (URLs, file paths, or both) and groups them by perceptual-hash distance. Photos that hash close to each other are visually similar — usually the same image, possibly the same image after re-encoding, cropping, or minor modification.
It is the toolkit's answer to "is this the same person on these different platforms?" — without face recognition, without biometrics, just image-content hashing.
What you get
For a set of photos:
| Section | What it surfaces |
|---|---|
| Cluster table | Photos grouped by perceptual-hash distance |
| Per-photo hashes | pHash, dHash, aHash for each photo |
| Pairwise distance matrix | Hamming distance between every pair |
| Cross-source clusters | Highlighted: clusters that span multiple URLs / platforms — the high-value matches |
| Side-by-side compare | Open any cluster in a side-by-side viewer |
Why hash-based, not face-based
Face recognition is the obvious tool for "same person across photos." The toolkit does not ship it for two reasons:
- Legal and ethical. Face recognition is regulated in many jurisdictions and ethically fraught everywhere. The toolkit is for finding information people have published, not for identifying people from photos.
- Hash-based works for the most useful case. People reuse the same selfie across platforms because uploading a new one is friction. Hash-based matching catches the reuse cases that matter most for cross-platform identity confirmation.
Hash-based does not work for "different photos of the same face." That is the face-recognition case the toolkit deliberately does not address.
Threshold
The default Hamming-distance threshold (8 bits, on a 64-bit hash) catches:
- Identical images.
- Images re-encoded at the same or similar quality.
- Images cropped slightly.
- Images with minor colour adjustments.
The threshold is configurable. A tighter threshold (4 bits) returns only near-identical matches. A looser threshold (16 bits) catches more dramatic variations but increases false-positive rate.
Cross-source clusters — the high-value finds
A cluster of two photos from the same Instagram account is uninteresting (the user uploaded the same photo twice).
A cluster of two photos — one from Instagram, one from a personal blog — is high-signal evidence of cross-platform identity overlap.
The tool sorts clusters so cross-source ones surface first.
Photo clustering pairs naturally with the username search. The username-search tool captures profile photos for found accounts; pasting that list of profile-photo URLs into the clustering tool tells you which platforms share images.
Inputs
| Input mode | Use when |
|---|---|
| Local files | You have downloaded the photos already |
| URLs | Profile photos from username-search results, photo URLs from any web source |
| Mixed | Drop in a folder + paste a URL list |
The tool fetches URL inputs over HTTPS (with rate-limiting per host). Local files are read from disk.
Pairwise distance matrix
For small input sets (under 50 photos), the tool renders a pairwise-distance heatmap. Cells colour-graded by distance — dark = close (likely same), light = far (different).
For larger input sets, the matrix is too large to render usefully; the cluster table replaces it.
Side-by-side compare
Click any cluster to open a side-by-side viewer. The full-resolution images appear next to each other so you can verify the match visually. Particularly useful when two images cluster but the hash distance is at the threshold edge.
Pivots
| Click on... | Pivot to |
|---|---|
| Photo URL | Reverse image composer (run reverse-search on the matched image) |
| Image metadata (if URL hits a host that exposes EXIF) | Image metadata |
| Source URL | Site analysis on the source host |
Sources
- All hashing runs locally.
- URL inputs are fetched via direct HTTPS (rate-limited per host).
- The clustering algorithm runs locally.
No external API is queried for the matching itself. The pivots from a match (reverse-image, site analysis) use their respective sources.