00:39:49 qwertyasdfuiopghjkl arkiver - btw channel for that is #justsolve, JAA pinged the admin too 00:40:28 I agree with adding it to #nodeping, and that doesn't even need to wait for the site to be back 00:49:13 nicolas17: I think I've seen notifications for it in #nodeping before, it's just not detecting the outage this time because it's still giving a 200 OK status code. 00:49:39 oh 00:49:41 oof 00:49:43 fucking PHP 00:50:35 how is "require(): Unable to allocate memory for pool" not a fatal error returning 500? 01:06:26 https://i.imgflip.com/888wmc.jpg 01:20:09 JAA: ^ 01:27:08 thanks mediawiki for that 200 :p 01:28:26 good thing argenteam supports content-encoding because that was 3.14GiB of HTML 01:50:51 Nicolas17v2 created ARGENTeaM (+2540, Create page): https://wiki.archiveteam.org/?title=ARGENTeaM 01:50:52 JustAnotherArchivist changed the user rights of User:Nicolas17v2 01:54:56 Could someone could put this in AB? https://everynoise.com The owner said he's not sure how much longer he's going to run the site 01:55:02 My source https://www.reddit.com/r/Archiveteam/comments/18az1cu/everynoisecom_might_go_down_soon 02:06:00 Done but it looks like there was a job in 2022, will check what happened with that 02:25:44 for project channels with h2ibot or similar (blogger/pastebin/youtube/telegram/etc), would it be a good idea to have ‘the bot’ in a separate side channel (per project) while discussion for the project continues in the main channel? especially for when the bot can get somewhat "noisy" for some with a bunch of additions and some discussion may be 02:25:45 missed cc JAA arkiver Vokun 02:26:56 i've contemplated that myself 02:26:57 my +1 on that fwiw ... but I might just be 'holding it wrong' 02:26:57 e.g. something like #telegrab-queue or #telegrab-bot 02:26:58 JustAnotherArchivist edited ARGENTeaM (+7, Datetimeify): https://wiki.archiveteam.org/?diff=51256&oldid=51255 02:27:40 maybe the bot should reply in NOTICEs too? 02:28:27 perhaps… but would still clutter i suppose (and require more code changes than just moving a channel) 02:28:58 Nicolas17v2 created Argenteam (+23, Add redirect): https://wiki.archiveteam.org/?title=Argenteam 02:32:08 Moving the channel is a one-line diff in a config file. 02:32:20 NOTICE isn't supported at all by the code currently. 02:32:27 oof 02:32:48 i filter several of the bots 02:32:53 but (a) that doesn't work for everyone 02:32:59 and (b) the ones that have h2ibot talk to itself mostly have to be stateful, so either i let them potentially interfere with one another or i make a new bit for every bot and copy/paste all the triggers 02:33:24 will probably have to write a full-on script at some point 02:33:38 indeed re: filters not working for everyone (so notice may not make a difference there) 02:34:03 even just people popping in to be a bit buried 02:34:12 thuban: is this for weechat? 02:34:20 project10: yeah 02:35:18 ever since thuban brought it up i try to prefix everything eggdrop says haha 02:35:49 It'd be nice if we could associate responses directly with the bot command message, but that's not something hackint (or most clients) currently supports. 02:36:12 yeah… would be nice to leap forward ircv3 a bit 02:42:27 (if we don't move channels, it might be nice to have the bot accept a 'cc' argument to repeat back along with the name of the submitted item, then cc the original submitter's nick when it submits to itself) 02:50:23 That would be something for arkiver to implement. h2ibot just forwards messages, it doesn't do any message handling itself. 02:51:22 yes, that's what i meant by 'the bot'. sorry for unclearness 02:51:45 oh hey argenteam has an API... 02:54:27 we love to see it 03:05:49 That question just comes from my pastebin stuff XD 04:33:23 Nicolas17v2 edited ARGENTeaM (+1071, Explain API and tvshows): https://wiki.archiveteam.org/?diff=51258&oldid=51256 04:54:24 * nicolas17 reads AB manual 04:54:41 so !ao gets *only* the given URL, it doesn't even recurse into images or stylesheets? 04:55:05 url + dependencies i believe 04:55:14 URL + page requisites, yes 04:55:43 if I "!ao < somelist" and the list has multiple URLs with the same prerequisites, I would hope it gets those prerequisites only once 04:56:03 but independent "!ao someurl" would get those multiple times right? 04:56:08 Correct 04:58:25 so I could make a huge list with URLs like https://argenteam.net/movie/148927 and then !ao< the list, it won't recurse into links but it will get images and stylesheets, and it will get them only once 04:58:54 what about redirects? if I add A and B, and A redirects to B, will it request B only once? 05:01:37 No, redirects aren't deduped. 05:02:00 Cf. https://github.com/ArchiveTeam/wpull/issues/431 05:25:19 However, the no-parent rule can lead to some complications with that too: if the initial URL was https://example.com/dira/url and https://example.com/dira/url redirects to https://example.com/dirb/url then that redirect's target *won't* be saved. This applies to !ao https://example.com/dira/url or !a https://example.com/dira/url or an !ao < list or an !a < list (but if it's 05:25:22 discovered from https://example.com/ or a similar URL without a subdirectory, or if it's offsite, then the no-parent rule doesn't matter) 05:38:57 okay! 05:39:13 JAA: I think here's all webpages https://transfer.archivete.am/L9wLu/argenteam.net_webpages.txt.zst 05:40:28 /tv/$id URLs redirect to the first episode of the TV show, so that will be fetched twice, but seems acceptable 06:05:47 One of Mozilla's (partially) public telemetry websites is being replaced with a private version on December 15 https://groups.google.com/a/mozilla.org/g/firefox-dev/c/kNuk69n7nhc 06:06:21 The attached doc says the data will be available elsewhere but I can't seem to find it in the other location yet 06:08:09 ok time to pass out 06:08:47 Tech234a edited Deathwatch (+221, /* 2023 */ Add Mozilla telemetry): https://wiki.archiveteam.org/?diff=51259&oldid=51253 06:09:05 oh I forgot to add argenteam to deathwatch 06:12:20 nicolas17: AB job for that is running now. 06:14:49 FireonLive edited Mailman2 (+1224, add asterisk (most/(all?) to be discontinued…): https://wiki.archiveteam.org/?diff=51260&oldid=51145 06:17:14 * pabs goes to AB the digium lists fireonlive 06:17:23 fireonlive: can you also add them to deathwatch? 06:17:35 ah sure :) 06:21:50 FireonLive edited Deathwatch (+400, add Sangoma/Digium/Asterisk): https://wiki.archiveteam.org/?diff=51261&oldid=51259 06:23:17 I was wondering why the queue kept *growing* on that AB job... prerequisites :P 06:25:30 page requisites* 06:25:37 that 06:30:07 note that to discover these URLs I ran multiple threads as fast as they could go and reached 50 req/s 06:31:23 50 req/s would probably be way too much here, just saying if you want to raise it up a notch... the server can cope :P 06:33:56 con 99999999 07:02:05 PaulWise edited Mailman2 (-1233, digium lists in progress): https://wiki.archiveteam.org/?diff=51262&oldid=51260 07:05:45 =] tks 07:16:40 nicolas17: So the /tv/ID URLs redirect to /episode/something, which means the redirect isn't followed as pokechu22 explained above. But since the latter URL is also in the list, that should be fine. 07:25:02 JAA: did we make WARCs of uloz.to ? 07:25:35 arkiver: It didn't sound like WARCs were being made since pro accounts were involved. 07:25:38 Sanqui: ^ 07:25:46 ah i see 07:25:51 where is the data now? 07:50:47 Transition of power in Argentina is on Sunday, so we should probably start doing something with that. 07:51:07 nicolas17: Any progress on setting something up for the geofenced sites? 08:10:37 JAA: i guess that "something" would be archiving various government sites in #archivebot ? 08:10:58 especially the departments he wants to cut out or greatly reduce funds for 08:11:39 arkiver: Yeah, plus special handling for sites that are being annoying, like Buttflare or geofencing to Argentina. 08:34:56 arkiver: we downloaded with pro accounts (otherwise the download speeds were abysmal and captchas were all over), using an off the shelf tool because there wasn't a time for a warc friendly solution. we currently have 40+ TB of data across multiple people and computers and are coordinating deduplication and concatenation using a few scripts, it's a work in progress. I'll try to get a wiki article up though 08:36:52 Sanqui: sounds pretty good, do you have examples of the data? 08:37:10 it sounds like you're still doing some cleanup, so examples might come later i guess 08:38:36 arkiver: https://www.ejha.cz/ulozto/list.php 08:39:22 spoiler: there's a lot of warez 08:40:02 but also a lot of videos, music, video game saves and mods, other legal data. 08:40:38 page's not loading for me now, can write again when it is 09:02:27 Sanqui: hah, i see 09:02:31 but still good it was saved 09:02:42 seems to be loading now 09:02:49 what were you further plans for this data? 09:03:09 good question 09:04:28 we (czechoslovak game archive) are considering subtly hinting that we downloaded a crapload of data from ulož.to and that if somebody is looking for something specific for research purposes to contact us 09:05:03 and of course, if it's suitable/desirable we would also put it on IA 09:05:27 would have to determine the best process+formats for that 12:25:43 hello, Im trying to download this video, and I noticed that in "about this capture", it is claimed that this video is archived by archiveteam. However, the wayback machine claims that this video is not archived. I would like to ask if there's any way to fetch the full video for downloading, as im interested in its content. 12:25:53 https://web.archive.org/web/20220103091055/https://www.youtube.com/watch?v=OQktVBtbygI 12:26:12 forgot to post the video link 13:09:04 cas: Looks like that video's watch page was saved, but not the video itself. 13:27:25 Hi, re requests for EndOfTerm, thy are already annuoncing they will stop financing a set of human rights sites, ex clandestine concentration camps which are now cultural centers for memory and justice. The sites are https://www.exccdolimpo.org.ar and https://memoriaexatletico.blogspot.com. I added them and a couple more to the Argentine Wiki page 13:27:26 as well as some comments 13:28:41 The new president takes office on 10th december. They also announced to stop financing the ministry for gender and lgtb+ which has a lot of resources - specifically this editorial https://editorial.mingeneros.gob.ar/ 17:22:58 oh no 17:23:20 https://argenteam.net/movie/148671/Pain.Hustlers.%282023%29 --> https://foro.argenteam.net/viewtopic.php?f=11&t=185066 --> https://argenteam.net/movie/148671 17:23:30 the official blog posts have links without the slug 17:24:04 er forum posts 17:24:07 I'm not awake yet 17:28:20 🐌 19:43:32 Sanqui: FYI, I just tried to archive the page with AB, and it went down again immediately. I think it's the people here using The ~~Lounge~~ Clownge with link prefetching enabled. 19:44:15 *shrug* it's not mine anyway, but I suppose I should've PM'd it to arkiver instead :) 19:45:50 It does seem to be hosted on a Z3 or something. 19:47:02 can you add it in a https://transfer.archivete.am/ txt file so that the url is not posted in the channel? 20:23:59 I guess that would be a workaround, but ugh. 20:26:36 I can disable link prefetching 20:27:03 every single user in the channel using thelounge would need to disable it 20:27:29 Right 20:27:50 I wonder if ircv3 could add link embeds so this wasn't necessary 20:28:02 I'll just grab-site it. 20:33:24 tfw JAA disses your IRC client again 20:33:30 :'( 20:34:04 Well I tried making a feature request, but they did not get my point apparently https://github.com/thelounge/thelounge/issues/4805 20:48:04 How's the argentina end-of-term project going? 20:51:12 At least its interesting to know that thelounge makes link preview requests as twitterbot :P 20:55:09 https://dl.fireon.live/irc/tltest 20:55:29 Mozilla/5.0 (compatible; The Lounge IRC Client; +https://github.com/thelounge/thelounge) facebookexternalhit/1.1 Twitterbot/1.0 20:56:55 Pedrosso: It isn't yet, but I'll get started on it tonight. 20:57:08 Awesome 20:58:00 I completly forgot how hard it is to argue the point of your feature requests in the land of github :P 20:58:29 (44 requests total to that url btw) 20:58:43 (39 from The Lounge) 20:58:52 we need 25 more request 20:58:53 s 20:58:53 now post it on mastodon and see how much worse it gets 20:59:08 that_lurker: indeed :3 22:38:42 https://argenteam.net/api/v1/movie/596 aaaugh 22:55:58 https://argenteam.net/api/v1/movie/148898 many affected 22:56:40 oh no 23:01:10 Oof