-
s-cryptFound this: github.com/skyzh/canvas_grab
-
hexa-inb4: alexa.com sold to amazon for undisclosed amount of gold-pressed latinum
-
hexa-oh
-
hexa-already "an amazon company"
-
OrIdow6Yes, the sale to Amazon was an important part of the Internet Archive's founding from what I understand
-
OrIdow6But sort of ot
-
tech234aAlso sort of ot but I thought Alexa contributed crawl data to WBM
-
monikalooks like they stopped donating data about a year ago?
-
monika
-
tech234aThere's supposed to be an "embargo period", not sure how long that period is
-
akrillicSorry to bother everyone, but I'm new to archiving. How would you guys suggest finding the subdomains of a site (Tripod in this case) that doesn't have them listed on a sitemap?
-
jodizzleSublist3r is one that I've used before: github.com/aboul3la/Sublist3r
-
jodizzleI believe there are several webapps that people commonly use.
-
jodizzleNo guarantees of fetching all the subdomains, of course.
-
JAASearch engines, Twitter searches, and the WBM are also sources I use typically.
-
JAAWell, CDX API, not sure it's possible through the WBM directly.
-
akrillici was afraid that would be the answer
-
akrillic^^ sublist3r only turned up about 500 domains, unfortunately
-
JAAYeah, unless there's a sitemap, easy way to enumerate, or open AXFR (lol as if anyone does that), everything's going to be messy and typically very incomplete.
-
JAAOh yeah, Rapid7's Sonar data is also very useful.