ZeroTrace OSINT

Robots.txt & Sitemap

Name: ZeroTrace
Address: DE

Crawl rules, disallowed paths, sitemap-index recursion, and the "interesting by omission" entropy view.

The robots.txt and sitemap tool fetches both files for a site and analyses them for paths the operator wanted crawlers to know about — both include (sitemap) and exclude (robots disallow).

For OSINT purposes, the disallow paths are often more interesting than the sitemap. They tell you which directories the operator considers sensitive enough to ask crawlers not to index.

What you get

Section	What it surfaces
robots.txt rules	Per-user-agent rules: Allow, Disallow, Crawl-delay
Sitemap URLs	Sitemaps linked from robots.txt, plus standard locations
Sitemap entries	Every URL listed in the sitemap (or sitemaps, when recursing)
Sitemap-index recursion	Sitemap-index files unwrap into their child sitemaps automatically
lastmod histogram	When the sitemap entries were last updated, grouped by month
Diff against archived	Snapshot diff of robots.txt across Wayback captures

Disallow paths — interesting by omission

A path that appears in Disallow: is a path the operator does not want indexed. Common reasons:

Admin panels (/admin, /wp-admin).
Private API endpoints (/api/internal, /private).
Account areas (/account, /dashboard, /profile).
Search results pages (/search?).
Staging / preview (/staging, /preview).

For reconnaissance, these paths are known to exist (otherwise the operator would not have written a rule for them) and known to be sensitive (otherwise the operator would not have asked for them to stay out of search engines).

The tool sorts disallow paths by uniqueness across the site — paths that appear in the disallow list but not in the sitemap are highlighted as "interesting by omission."

"Interesting by omission" is the OSINT investigator's friend. The page that the site does not advertise is often the page that matters most.

Sitemap recursion

A sitemap-index file lists other sitemap files. The tool detects index files and recurses into the children, returning the merged URL list with a source-sitemap column so you can see which sitemap each URL came from.

For very large sites (e-commerce catalogs, news archives, large CMSes), recursion can return tens of thousands of URLs. The tool paginates the result; CSV export gives you the full list.

lastmod histogram

The <lastmod> tag on each sitemap entry tells search engines when the page was last updated. Aggregating lastmod into a histogram tells you:

Bursts of activity — periods when the site published heavily.
Quiet periods — periods when the site was inactive.
Recent changes — what the operator updated yesterday.

For investigative reporting, the recent changes column is the immediate value: "what has this site changed in the last week?"

Diff against archived robots.txt

A toggle pulls the most recent archived robots.txt from the Wayback Machine and diffs it against the live one. Useful for spotting:

Newly added disallow paths (new sensitive areas).
Removed disallow paths (sensitive areas opened up to indexing).
Crawl-delay changes (signal of crawler-pressure changes).

Pivots

Click on...	Pivot to
Disallow path	Web crawler (target the disallow path), site analysis
Sitemap URL	URL parser
Sitemap entry URL	Site analysis, redirect analyzer, Wayback
Crawl-delay value	(no pivot — informational)

Pre-fetch quick view

For very large sitemaps, a "first 50 entries" quick view loads instantly while the full recursion runs in the background. You see something useful immediately without waiting for the full crawl.

Sources

The site's own robots.txt and sitemap.xml (and any other sitemap URLs they list).
The Wayback Machine for the archived-diff feature.

Every source is named on the result.

Command Palette