00:04:37 nitter.net having issues? 00:29:17 how does grabsite get lists of urls to scrap apart from crawling the site itself? 00:30:10 Sitemaps 00:31:03 fireonlive: Not that I'm aware of. I think it was more of a 'this domain is neat' comment. 00:31:38 JAA: omg i can't believe i missed that, thanks 00:32:41 Nulo|m: and more generally (at least assuming the same behavior as archivebot), it gets it both by trying example.com/sitemap.xml and by looking for Sitemap: directives in example.com/robots.txt 00:36:36 ahh ok :3 09:36:38 Inti83 edited Argentina (+81, /* Educational Websites */): https://wiki.archiveteam.org/?diff=51435&oldid=51354 09:36:39 Dylan edited SubRocks (+379, updated URLs, added more information and some…): https://wiki.archiveteam.org/?diff=51436&oldid=50515 09:41:39 Exorcism edited SubRocks (-4): https://wiki.archiveteam.org/?diff=51437&oldid=51436 13:50:30 Exorcism edited Arto (+36): https://wiki.archiveteam.org/?diff=51438&oldid=25934 14:15:35 Exorcism uploaded File:Facebook-screenshot.png: https://wiki.archiveteam.org/?title=File%3AFacebook-screenshot.png 14:15:36 Exorcism edited Facebook (+33): https://wiki.archiveteam.org/?diff=51440&oldid=49532 14:25:46 Does grab-site not stop crawling? 14:26:10 I started it against that astronomy website, and for a while the web UI shows "0 crawl" but now it's back 14:26:16 Has been running for over a day 14:38:18 You can check the todo database to see if it is still going. Most sites tend to take a few days 14:58:36 also its a good idea to monitor the grab every now and then in case its stuck in a loop or some other spam url so you can add them to the ignore 15:37:06 Oh, it looks like its stuck crawling the calendar widget 15:37:19 This site is tiny, shouldn't take too long to grab all the pages. 15:37:45 I'm running it from docker, will try to figure out the todo database, thanks lurker 15:40:53 calendars, wrecking site grabs since web 2.0 15:44:06 Should I just let it crunch? The calendar widget probably has a finite range, right? 15:44:10 .. Right? 15:44:37 nope 15:44:43 :| 15:44:52 most calendar widgets are infinite and generated on-demand 15:45:01 they're a famous tarpit for archival efforts 15:45:18 It appears to be crawling backwards, I'll let it get to the early era of the site and then cancell it 15:45:24 Thanks joepie 15:45:49 Oh, I was going to try wget again too 15:46:04 And the redirect might work, if anyone with AB can get it for me 15:46:21 just add the calendar url to the ignores with regex 15:46:57 so grab-site will ignore it and still grab everyting else 15:53:25 Awesome, that got it back out of it, thanks! 17:09:17 may have already come up here, but: https://bitbang.social/@Geekman/111666278406724944 17:48:03 DLoader pls 17:48:28 ah it was matrix all along 17:57:32 Yay splitting of nets! 17:58:09 matrix says hello and goodbye and hello and goodbye and hello 17:58:11 :p 17:58:40 (i don’t have the scroll back to see the actual number) 18:01:15 I believe someone mentioned a ddos in #hackint, but with the number of joins and leaves I can't see it anymore lol 18:04:25 ahh boo 😒 18:04:45 ddosing irc is so 90s 18:07:23 nulldata: clients in 2023 should really support smart filtering of join/parts 18:08:50 hexa- The Lounge collapses them, but they still count toward the history limit loaded in memory 18:09:27 And for some reason my history limit is set low, fixing that now 18:11:32 rough 18:11:52 but yeah, udp ddos against a closed port 18:14:01 Fixed - now I can see that it was you that posted it was a ddos in hackint :P 18:26:43 ahh lame :( 18:26:58 and yeah TL is special that way lol 18:42:00 ugh 18:43:29 can someone politely tell DLoader (*cough* mode +b) to fix their connection 18:45:53 b with a redirect to #fixyourshit ? 18:45:59 :3 18:46:28 .op 18:46:35 i tried :D 19:30:00 I guess DLoader isn't *🕶️* DoneLoading 19:35:33 yeeeeaaaaaahhhhhhhh 🎶 22:33:07 One of the reasons why znc's flood protection is nice 23:46:04 i have a by-no-means-complete dump of some37c3 ftps, 468,162 files, 4,753.5 GB 23:46:41 some of the exclusions here might have been followed: https://github.com/katlol/37c3-ftp-mirror/blob/main/exclude.txt 23:48:43 arkiver: ^ Think IA would take that? 23:51:12 **** :( 23:51:14 :P 23:59:26 fireonlive: is this the gay porn ftp ip? :D 23:59:42 yeah :D 23:59:46 hehe 23:59:53 :3