00:00:45 <h2ibot> JAABot edited CurrentWarriorProject (+2): https://wiki.archiveteam.org/?diff=51228&oldid=51227
00:45:43 <fireonlive> smol change: twitter2nitter/transferinliner/and karma system now ignore lines starting with !; so it won't go off if you're using a bot command (thanks project10); also 'known-bots' (h2ibot, botifico, and Aramaki) are skipped from them
06:33:15 <JAA> Sanqui: Just a brief update, all of those linked Webzdarma jobs are 4.5 TiB, so it'll take a while to download them all, even at the 60 MB/s I'm getting from right next to IA.
06:36:10 <h2ibot> Arctic Circle System edited Alive... OR ARE THEY (+383, /* Endangered */ Added Kirby's Rainbow Resort): https://wiki.archiveteam.org/?diff=51229&oldid=51031
09:51:44 <Sanqui> Thanks JAA.  Problem is the ones that had offsite, sadly not enough foresight there.  In the long term we will be making and keeping our own copies
14:04:10 <imer> Sanqui: would you like me to run that through the common crawl cdx? I have that lying around and from a quick spot check there is some matching links in there
14:51:30 <Sanqui> imer: yes please, ^https?://(www.)?(uloz.to|ulozto.cz|ulozto.sk|ulozto.net|zachowajto.pl)
14:58:31 <kiska> I could try the fdns data set I have
16:00:54 <imer> Sanqui: ack, will be a few days to run through it all
16:26:01 <Sanqui> imer: deadline is tomorrow, so probably no need then
16:26:08 <Sanqui> thanks though
16:26:19 <Sanqui> maybe if it's possible to run on a subset of .cz sites
16:26:22 <Sanqui> (and .sk)
16:26:24 <Sanqui> it would make sense
16:27:02 <imer> oh. oops
16:27:18 <imer> i'll toss you over the partial results then as I get them
17:28:18 <JAA> Sanqui: Sometime in the future, all AB jobs' databases should be kept, and then this wouldn't be an issue. wpull still extracts all links when running with --no-offsite-links, it just then ignores them silently, so they only appear in the DB.
17:35:25 <Vokun> Can these sorts of links be put into AB? This person passed away and if possible i'd like to have these pages saved. Also, can AB grab a youtube channel? Just the pages, not videos. I already put it into downthetube
17:35:25 <Vokun> https://www.instagram.com/chesyarts
17:35:25 <Vokun> https://ko-fi.com/chesyarts
17:35:25 <Vokun> https://www.tiktok.com/@chesyarts0w0
17:35:25 <Vokun> https://www.youtube.com/@chesyarts1691
17:37:11 <pokechu22> Vokun: I don't think any o those work properly in AB, n; all of those sites have strict rate-limiting and are JS-based, and AB will only get 429s
17:37:56 <Vokun> rip
17:39:33 <fireonlive> youtube can go to #down-the-tube as long as it's in scope https://wiki.archiveteam.org/index.php/YouTube#Scope (someone dying is)
17:48:46 <Vokun> I put it in. Thanks
17:50:00 <fireonlive> :)
18:30:45 <h2ibot> Pokechu22 edited DokuWiki (+472, mention taskrunner): https://wiki.archiveteam.org/?diff=51230&oldid=51010
19:14:06 <Pedrosso> the archiveteam wikipage on bluesky is very short, has anything been done about that?
19:22:39 <polduran> hello everyone. I might have something for the archivebot if anyone has time to put it in the queue: https://www.summoners-inn.de is the biggest and probably one of the oldest german league of legends news website with articles back to 2013. today, they announced the end of Summoner's Inn after their parent company Freaks4U lost their  partnership
19:22:40 <polduran> to host the official german Leauge of Legends broadcast.
19:25:05 <pokechu22> polduran: I've queued it, not sure how well it'll run though as they don't seem to have a sitemap
19:26:10 <pokechu22> I also queued https://www.freaks4u.de
19:30:31 <polduran> let's hope for the best^^ thank you. and yeah, good idea ^-^" maybe also the german LoL-league? https://www.primeleague.gg/ not sure if there is anything interessting on there and how and if the situation also affects this, but the website is hosted and copyrighted by freaks4u
19:32:24 <pokechu22> Alright
19:33:05 <polduran> thanks again and have a nice day :D
19:45:01 <sdomi> continuing on the discussion from #//; imer: what would be the best way to handle this JS mess?
19:45:27 <sdomi> I can probably write a scraper that'll generate a list of URLs from these downloaders; there isn't much metadata to be saved anyways, so IMO saving just the ZIPs is a good starting point
19:48:09 <sdomi> imer: hey, also, can you verify if the downloader3.html still works? I.. think I crashed it
19:48:28 <masterX244> did you check with devtools how the EULA acceptance is handled?
19:48:30 <sdomi> checked from two IPs and several browsers, no dice
19:48:39 <sdomi> masterX244: on some of them there's no EULA at all
19:48:48 <masterX244> with some luck that can be faked with some headers/constant request stuff
19:48:49 <sdomi> so I'm focusing on that right now
19:49:57 <masterX244> had a site once that had a ad-intercept on first download under a session, fooled that by "wasting" that with a url-parametered URL before the real crawl started
19:50:27 <sdomi> https://f.sakamoto.pl/UwUMicKuA.png ,_,
19:51:01 <masterX244> 2 "wasted" requests ion the WARC but better than a lost one. POST sucks for archivebot though
19:52:04 <sdomi> masterX244: no, no; i'm not getting any responses anymore
19:52:07 <sdomi> oh, it's back now
19:52:26 <sdomi> so what I did was.. I tried a wildcard instead of the version number, just to check what would happen
19:52:41 <masterX244> ahh, poking around for shortcuts
19:52:43 <sdomi> and it seems that it crashed their entire API for a solid minute
19:52:56 <sdomi> so. uh. we need to be careful around this one XD
19:54:28 <masterX244> cockroach-infested area :(, that sucks
19:55:39 <sdomi> btw, how does WARC work? I know that I can run a mitm proxy for myself, but how would I go about handing it over to IA? what are the steps/precautions/who do I need to talk to...? :p
19:55:54 <nicolas17> "you don't"
19:56:44 <nicolas17> you can upload WARC files to archive.org, but they won't be used by web.archive.org, because there's no way to know if they actually match the website you mirrored or if you messed with the content (accidentally or intentionally)
19:56:54 <sdomi> yes, that I know
19:57:25 <sdomi> i was more asking about... what steps do I take to actually get the content preserved with y'alls help?
20:04:00 <fireonlive>  a project/mini-project proposal let’s say :3
20:05:56 <sdomi> figured out how the EULA stuff works! it's a static JS function that takes params from the current URL
20:06:05 <sdomi> so this is very much possible to automate
20:06:27 <sdomi> function in question: https://pastebin.com/9bsxLDLu
20:38:02 <imer> sdomi: sorry, stepped away for a bit, I have not the slightest idea how to do this - although I am probably no the person to ask haha
20:38:15 <sdomi> imer: writing a scraper as we speak :p
20:38:18 <imer> nice
20:57:52 <Webuser533> could you help me find an archive of this video https://www.youtube.com/watch?v=V3gbrP2U10A ?
21:05:05 <that_lurker> #youtubearchive would be a fitting channel for that question
21:06:26 <Webuser533> alright thank you !
23:22:56 <sdomi> https://pastebin.com/gAwF2bwc URLs
23:34:55 <sdomi> https://f.sakamoto.pl/nvidia_rescue.tar.gz here's the code I wrote
23:36:49 <sdomi> turns out that most docs URLs are completely dead already, or point to generic sites that have likely been archived for ages. i'm downloading real "data" locally right now, gonna upload as an item onto IA later ^-^