We want to write our own search engine to find specific regular expressions of interest on the web.
However, we don't want to really have to trawl the whole web, as that would take many years on the computer!
I want to be able to access preexisting web indexing, in order to run our customized search engine front-end.
Has anyone pulled off anything like this before, and can anyone recommend where to find accessible web indexes?
It would be great if it were possible to tap into Google's indexes for research purposes, but if they don't allow that are there any others out there that can be used?