The Internet Archive discovers and captures web pages through many different web crawls.
At any given time several distinct crawls are running, some for months, and some every day
View the web archive through the Wayback Machine.
Collection: Survey Crawl Number 6: Sep 11th, 2017 - running now
The seeds for this crawl came from:
251 million Domains that had at least one
link from a different domain in the Wayback Machine, across all time
~ 300 million Domains that we had in the
Wayback, across all time
55,945,067 Domains from
This crawl was run with a Heritrix setting
of "maxHops=0" (URLs including their embeds)
The WARC files associated with this crawl
are not currently available to the general public.