-
fireonlive
nitter.net having issues?
-
Nulo|m
how does grabsite get lists of urls to scrap apart from crawling the site itself?
-
JAA
Sitemaps
-
JAA
fireonlive: Not that I'm aware of. I think it was more of a 'this domain is neat' comment.
-
Nulo|m
JAA: omg i can't believe i missed that, thanks
-
pokechu22
Nulo|m: and more generally (at least assuming the same behavior as archivebot), it gets it both by trying example.com/sitemap.xml and by looking for Sitemap: directives in example.com/robots.txt
-
fireonlive
ahh ok :3
-
h2ibot
Inti83 edited Argentina (+81, /* Educational Websites */):
wiki.archiveteam.org/?diff=51435&oldid=51354
-
h2ibot
Dylan edited SubRocks (+379, updated URLs, added more information and some…):
wiki.archiveteam.org/?diff=51436&oldid=50515
-
h2ibot
-
h2ibot
-
h2ibot
-
h2ibot
-
ctag
Does grab-site not stop crawling?
-
ctag
I started it against that astronomy website, and for a while the web UI shows "0 crawl" but now it's back
-
ctag
Has been running for over a day
-
that_lurker
You can check the todo database to see if it is still going. Most sites tend to take a few days
-
that_lurker
also its a good idea to monitor the grab every now and then in case its stuck in a loop or some other spam url so you can add them to the ignore
-
ctag
Oh, it looks like its stuck crawling the calendar widget
-
ctag
This site is tiny, shouldn't take too long to grab all the pages.
-
ctag
I'm running it from docker, will try to figure out the todo database, thanks lurker
-
joepie91|m
calendars, wrecking site grabs since web 2.0
-
ctag
Should I just let it crunch? The calendar widget probably has a finite range, right?
-
ctag
.. Right?
-
joepie91|m
nope
-
ctag
:|
-
joepie91|m
most calendar widgets are infinite and generated on-demand
-
joepie91|m
they're a famous tarpit for archival efforts
-
ctag
It appears to be crawling backwards, I'll let it get to the early era of the site and then cancell it
-
ctag
Thanks joepie
-
ctag
Oh, I was going to try wget again too
-
ctag
And the redirect might work, if anyone with AB can get it for me
-
that_lurker
just add the calendar url to the ignores with regex
-
that_lurker
so grab-site will ignore it and still grab everyting else
-
ctag
Awesome, that got it back out of it, thanks!
-
joepie91|m
-
fireonlive
DLoader pls
-
fireonlive
ah it was matrix all along
-
nulldata
Yay splitting of nets!
-
fireonlive
matrix says hello and goodbye and hello and goodbye and hello
-
fireonlive
:p
-
fireonlive
(i don’t have the scroll back to see the actual number)
-
nulldata
I believe someone mentioned a ddos in #hackint, but with the number of joins and leaves I can't see it anymore lol
-
fireonlive
ahh boo 😒
-
fireonlive
ddosing irc is so 90s
-
hexa-
nulldata: clients in 2023 should really support smart filtering of join/parts
-
nulldata
hexa- The Lounge collapses them, but they still count toward the history limit loaded in memory
-
nulldata
And for some reason my history limit is set low, fixing that now
-
hexa-
rough
-
hexa-
but yeah, udp ddos against a closed port
-
nulldata
Fixed - now I can see that it was you that posted it was a ddos in hackint :P
-
fireonlive
ahh lame :(
-
fireonlive
and yeah TL is special that way lol
-
nicolas17
ugh
-
nicolas17
can someone politely tell DLoader (*cough* mode +b) to fix their connection
-
fireonlive
b with a redirect to #fixyourshit ?
-
fireonlive
:3
-
fireonlive
.op
-
fireonlive
i tried :D
-
nulldata
I guess DLoader isn't *🕶️* DoneLoading
-
fireonlive
yeeeeaaaaaahhhhhhhh 🎶
-
that_lurker
One of the reasons why znc's flood protection is nice
-
katia
i have a by-no-means-complete dump of some37c3 ftps, 468,162 files, 4,753.5 GB
-
katia
-
JAA
arkiver: ^ Think IA would take that?
-
fireonlive
**151.217.62.31** :(
-
fireonlive
:P
-
hexa-
fireonlive: is this the gay porn ftp ip? :D
-
fireonlive
yeah :D
-
hexa-
hehe
-
fireonlive
:3