To me, the simplest difference is that there are methods to let Google (Bing, DuckDuckGo, etc...) know that a page should not be crawled or indexed, and search engines (at least all the major ones) will respect that and not do so.
I would imagine there are some more ethical LLMs that follow...