What would a good fediverse crawler have to look like?
Ignore robots.txt
It would have to be separate from the database and easy to set up elsewhere incase someone bans it's IP or even IP range
Fake the user agent and possibly even pretend it's a human
That's a shitton of work though so why even bother
@matrix
Use the latest Firefox or Chrome user agent.
Use Tor, or any number of other methods to get large amounts of residential IP addresses
Distribute packages of archived data over BitTorrent
Those packages can be read by open server software, so a theoratically limitless number of mirrors exist