00:33:55 Arkiver edited Deathwatch (+245, SAPO Videos deleting data on September 17): https://wiki.archiveteam.org/?diff=50616&oldid=50603 02:53:26 FireonLive edited Current Projects (+2, fix comments (i've removed these auto…): https://wiki.archiveteam.org/?diff=50617&oldid=50613 02:54:21 that link has a virus 02:54:26 no one click it 02:54:28 thanks 03:00:46 Still wrestling with https://pkg.fig.io/ and it's such a mess. There's a script at https://pkg.fig.io/install.sh that you're supposed to pipe to a shell (because of course), which then invokes other scripts, adds their repo to your package manager, and installs stuff. I haven't been able to get my hands on the actual .deb or .rpm files though, just get 403s there. Maybe I'm doing something wrong. 03:05:11 found it 03:05:22 well I didn't check if the deb actually downloads 03:06:00 oh, I see 03:06:06 :-) 03:06:08 it 403s in the end 03:06:51 It might also be the old way of installing it. https://repo.fig.io/ has far newer versions. 03:07:18 what's the latest version? 03:07:41 Not a clue, but repo has 2.16.0 builds. 03:08:25 The files are all helpfully named 'fig.$packageManagerExtension' without a version, and the download page link goes to 'latest'. 03:08:36 open bucket listing nice 03:08:45 (The download page also says there are no Linux builds.) 03:09:11 I found hints somewhere that there are also Windows builds. Haven't seen those in the bucket. 03:10:23 hmm even setting a user-agent to apt doesn't seem to work, though i'm just guessing at the correct value lol 03:10:32 yeah I think this script is just outdated 03:10:34 curl -A 'debian APT-HTTP/1.3 (2.6.1)' https://pkg-cdn.fig.io/2.5.3/linux/x86_64/fig.deb 03:10:39 :( 03:11:08 and no open bucket either 03:13:03 I wanted to try running the script in a Debian container to see what'd happen, but there are some libseccomp2 issues on my test machine. 03:13:18 I did that 03:13:22 Ah 03:13:25 Thanks 03:13:29 it's not letting me install the package because it says the repository signature expired 03:13:39 Heh 03:13:59 libfaketime time? 03:14:49 Or just --allow-unauthenticated 03:15:04 E: Failed to fetch https://pkg-cdn.fig.io/2.5.3/linux/x86_64/fig.deb 403 Forbidden [IP: 13.227.83.128 443] 03:15:07 this repo is just broken 03:15:18 nothing to see here move along 03:15:22 Welp 03:15:25 :( 03:15:36 I'll collect the scripts and stuff at least. 03:15:47 The redirects will be broken in AB anyway thanks to the 307 bug. 03:16:18 ah it doesn't like the new 307 hotness? 03:17:18 It's a bit overzealous at preserving the request: https://github.com/ArchiveTeam/wpull/issues/425 03:18:48 ahh i see 03:21:44 >2019 03:23:24 >https://github.com/search?q=org%3AArchiveTeam+author%3Anicolas17&type=pullrequests 03:23:38 :P 03:24:16 just ignore the spider webs (lol webs) in https://github.com/ArchiveTeam/wpull/pulls 03:25:48 pkg.fig.io was still in use under a year ago per the GitHub issues: https://github.com/withfig/fig/issues?q=is%3Aissue+pkg.fig.io 03:26:16 hmm 03:46:09 lol: https://pkg.fig.io/install-headless.sh 03:49:06 lol 05:04:56 JAA: so, what projects are currently running? 05:05:55 nicolas17: Xuite, Gfycat, Telegram, not sure what else. 05:08:20 urlteam2, mediafire maybe?, github maybe? 05:10:23 mediafire has no work 05:10:36 JAA: what's keeping reddit and urls paused? 05:10:40 i mean it's technically running though 05:10:42 :p 05:10:54 just needs some sweet !a lovin' 05:11:24 reddit -> arkiver verifying i.reddit.com; urls... i think just sheer size 05:11:43 i.redd.it* but yeah 05:11:45 or 'load sheeding' for the latter 05:11:49 ah yeah 05:12:03 i.reddit.com is the now kill web interface 05:12:16 i did correct myself from i.imgur.com before i hit enter though :D 05:12:16 oh yeah they made the image links even worse 05:12:21 indeed! 05:12:24 it's awful :D 05:12:46 it used to be that clicking an image link showed me the webpage, and I had to right click the image and open in a new tab to see the actual image with usable zooming 05:13:01 now if I open the image in a new tab it loads the goddamn webpage too 05:13:02 now right click does noooothing 05:13:04 :D 05:13:25 For a while, I could use view-source to get the image itself. No idea why, never bothered to look into it. 05:14:04 i decided to follow the thot leaders at reddit and host my very own image: https://mkx9delh5a.execute-api.ca-central-1.amazonaws.com/uploads/a-very-nice-image.png 05:14:18 Every time I click on an image now, I get redirected to https://old.reddit.com/r/funny/comments/media/nice_hat/ due to my URL rewrites from www.reddit.com to old.reddit.com. 05:14:36 lol 05:14:37 if workers are bored we could resume imgur at a low rate >.> 05:14:54 at least you can see gaga's hat 05:15:29 It redirects to https://www.reddit.com/media?url=..., but on old.reddit.com, that redirects to the post with ID 'media' instead. :-) 05:18:08 I think I'm done configuring allll the Apple update assets in my script... now I have 600MB of json responses 05:45:56 tracker taking a nap? "Tracker returned status code 500. The tracker has probably malfunctioned. Retrying after 80 seconds.." 05:51:38 looks like tracker isn't happy 05:52:06 cc JAA 06:01:34 https://mkx9delh5a.execute-api.ca-central-1.amazonaws.com/uploads/b20a08951272ce78/fix-it.gif 06:01:56 Fusl: hi! tracker has been erroring out on item requests/backfeed for the last ~15min 06:02:49 seems to be back up 06:02:50 looks to be recovering, just have to start pinging people :D 06:07:13 the Fusl-bat-phone 06:18:10 Someone on a game alpha stuff discors noticed that epicgames cleared the UT assets used by the cancelled UT game that is/was on github for ue license holders from their servers, i got a full mirror of that data luckily 06:48:29 fireonlive: https://lounge.kuhaon.fun/folder/b912317d19b4b8b6/JaaSignal.png 06:48:40 😂 06:48:44 yes. 06:54:24 https://media.tenor.com/KJYhAJa46UYAAAAC/old-school-batman.gif 06:54:49 bonus points if you hear the sound effect while viewing 11:03:06 Tracker /backfeed unhappy again 11:03:19 the tracker returns 500 and failed to accept backfeeds again 12:10:03 re 12:11:31 :) pls repeat for those not on #archiveteam plcp_ 12:11:38 okok 12:11:42 buckle up 12:12:46 All personal websites from personal webpages of the main telco operator in France are going offline by September 5th, they have a registry here https://annuaire-pp.orange.fr/accueil 12:13:19 https://pages.perso.orange.fr/ 12:13:27 The announce (in French, sry) 12:14:09 pokechu22 flashfire42 JAA have been doing some ArchiveBot jobs for orange 12:14:10 I can help translate if need be 12:14:22 all *.pagesperso-orange.fr and all *.monsite-orange.fr 12:15:23 pokechu22: did your orange !a < jobs cover https://telecommunications.monsite-orange.fr/ ? plcp_ mentioned that as an example 12:15:39 I'm worried, especially because it's composed mostly of non-tech savvy people, non profits and older folks, that for most build tens of thousands of pages on topics they're passionate about, and won't be migrate anywhere 12:16:23 *be migrated 12:17:37 yeah, ISP hosting is quite endangered in general https://wiki.archiveteam.org/index.php?title=ISP_Hosting 12:18:33 Pixnet (https://pixnet.net/), the last largest blog service provider in Taiwan, accepted the migration from Yahoo! Blog, Wretch, yam天空部落 and Xuite, announced to delete inactive accounts (before 2020-01-01) on 2023-12-01: https://admin.pixnet.net/blog/post/49016232 12:19:49 I consider that Pixnet is partially endangered and it's going to be another large DPoS project 12:21:46 seems like something to mention on the announce channel #archiveteam too 12:22:07 and add it to deathwatch https://wiki.archiveteam.org/index.php/Deathwatch 12:22:52 pabs: I shortly mentioned it on #archiveteam and is editing the wiki :p 12:26:20 Yts98 edited Deathwatch (+164, Add Pixnet): https://wiki.archiveteam.org/?diff=50618&oldid=50616 12:32:29 re 12:33:48 (the "web interface" link here https://wiki.archiveteam.org/index.php/Archiveteam:IRC#How_do_I_chat_on_IRC? may be updated from #archiveteam to #archiveteam-bs to avoid ppl in a hurry pollution the announce chan) 12:35:00 and thanks for the rapid answer pabs 12:38:05 good idea, fixed 12:38:23 PaulWise edited Archiveteam:IRC (+3, set #archiveteam-bs as the default channel): https://wiki.archiveteam.org/?diff=50619&oldid=50560 12:40:24 PaulWise edited Archiveteam:IRC (+3, fix web link too): https://wiki.archiveteam.org/?diff=50620&oldid=50619 14:21:44 Yts98 created PIXNET (+4100, inactive accounts of PIXNET is endangered): https://wiki.archiveteam.org/?title=PIXNET 14:25:45 Yts98 edited Deathwatch (+0, Capitalize PIXNET): https://wiki.archiveteam.org/?diff=50622&oldid=50618 16:44:45 plcp: no, I don't think I've got any of https://telecommunications.monsite-orange.fr 16:45:05 er, wait, one sec 16:45:25 still waking up, thought that was something like telecommunications-orange.fr and not a subdomain of monsite-orange.fr 16:58:36 plcp: Yeah, that's on the priority list running in AB. flashfire42 also did several jobs for it starting on various pages (but would have recursed over the whole site on each one), see https://archive.fart.website/archivebot/viewer/domain/telecommunications.monsite-orange.fr 17:04:05 nice 17:04:35 I'm going through some of these websites, looks like there's some amount of badly rewritten ones 17:06:28 some have their homepages hosted as ".pagesperso-orange.fr" but when crawling they use legacy "http://perso.wanadoo.fr//" urls that no longer works 17:06:50 but rewriting these urls to pagesperso fixes the website 17:06:56 what a nightmare 17:07:39 plcp: french operator woohoo ☆*: .｡. o(≧▽≦)o .｡.:*☆ 17:12:17 Unfortunately the site bans for 24 hours if you request at faster than 1 page/second so it's unlikely we'll get everything - if there was more time it'd probably be possible to handle those legacy URLs but I don't think we will be able to :| 17:24:19 Also, an history, from as far as I know:... (full message at ) 17:28:33 I don't think http versus https is which site builder is used - instead it's if the username has multiple dots in it it gets http and if it doesn't it gets https, because a SSL certificate for *.monsite-orange.fr only covers subdomains without dots and there isn't a way to do *.*.monsite-orange.fr 17:29:24 that can also be seen by looking at what http://perso.orange.fr/DEMO and http://perso.orange.fr/FOO.BAR redirect to 17:29:43 pokechu22: they rate limit that aggressively? 17:29:59 Bruh, I forgot 17:29:59 ↓ http://monsite.orange.fr/DEMO 17:29:59 ↓ http://DEMO.monsite.orange.fr/ 17:29:59 ↓ ... 17:30:22 They apply a ban after an hour or two of sustained requests at a high speed, but it does seem like it's that strict overall 17:30:39 ah that's why I was able to wget one site 17:31:11 but if I go for the 44k something pages, it won't work 17:31:25 Yeah 17:31:26 (still scrapping they registry) 17:32:34 The other annoying factor is that sites and pages that don't exist redirect to https://r.orange.fr/r/Oerreur_404 and then https://e.orange.fr/error404.html, and both of those pages also count into the rate limit. (And ArchiveBot doesn't have a way of applying ignores to redirect targets, so it requests those every time) 17:32:38 even just downloading one page per site, the front index.html, will require days with one ip 17:33:27 with that rate limit, should have started a year ago :D 17:33:36 pokechu22: when did you started? 17:34:10 A few days ago 17:34:19 well shit 17:34:42 The list of high-priority sites that are likely to exist (https://transfer.archivete.am/6gcam/pagesperso-orange.fr_pagespro-orange.fr_monsite-orange.fr_seed_urls_thuban_priority.txt) has already downloaded all of the front pages at least 17:34:55 but it seems unlikely it'll get everything else 17:35:20 as he said: well shit 17:36:30 I have a bunch of other jobs running on different IPs based on other lists I generated (e.g. sites that have no existing coverage at all, most of which don't exist but it's found 646 of them so far that do, and some other generated lists) 17:36:59 but we should have started a while back :| 17:37:13 the aforementioned list looks like their registry scrapped 17:37:28 pokechu22: they announced it like three month ago iirc 17:37:28 Yes 17:37:41 but the information reached me like, today 17:38:07 flashfire42 has been running individual sites for a while: https://archive.fart.website/archivebot/viewer/?q=orange.fr - it just took a while to build up lists of sites 17:40:36 We only got a full registry list 2 days ago. See https://hackint.logs.kiska.pw/archiveteam-bs/20230828 (and https://hackint.logs.kiska.pw/archiveteam-bs/20230827#c374594) 17:47:50 159k pages! 17:47:51 wow 17:48:00 that's triple the amount from the registry 17:55:07 okok, brb spamming all friends that may have worked once in their life at orange 17:58:31 https://drop.chapril.org/download/37075644d302ef4f/#p_ubcDqTNiANhPEyXHb3Qw 17:58:37 here's my list 18:01:37 Rehosted because JS nonsense: https://transfer.archivete.am/7gCmW/orange-list.txt.zst 18:06:04 http://1a1.emploi.pour.cadre.technique.top.performant.pagesperso-orange.fr/ - this is an excellent URL and site... 18:06:57 (.tar.gz unpacked and then recompressed with zstd, to be precise.) 18:10:06 It looks like a few of those are new 18:10:18 --ultra --22? :p 18:13:15 does higher # have any affect on the decompressor? or just when compressing 18:13:29 fireonlive: Nah, -10 is my go-to. And yes, the --ultra levels require more memory to decompress IIRC. 18:13:42 fireonlive: you can test that 18:14:11 nicolas17: technically correct 18:14:15 JAA: ah :) 18:14:17 I mean like, easily 18:14:20 https://belleilescapade.monsite-orange.fr/ and https://patrocle.monsite-orange.fr/ are completely new; they weren't on any previous list. https://erawylersitedefleur.pagesperso-orange.fr/ and https://iniri.pagesperso-orange.fr/ were one one of the lists but not the priority one. 18:14:33 "zstd -b1 -e19 file.txt" will benchmark all levels 1 to 19 and give you the compression ratio, and compression and decompression speed 18:14:40 ah! nice 18:14:42 otherwise your list matched the orangefr_online_raw.txt one pretty closely 18:15:28 and if either compression or decompression takes less than 1 second, it runs multiple times to get a better measurement 18:19:06 neat, zstd continues to impress me 18:20:36 there's one disappointing thing though 18:21:29 "--format=FORMAT: compress and decompress in other formats. If compiled with support, zstd can compress to or decompress from other compression algorithm formats. Possibly available options are zstd, gzip, xz, lzma, and lz4." 18:21:49 it doesn't support benchmarking them :( -b only does zstd format 18:42:42 :( 18:55:53 JAA: thanks 20:16:59 JustAnotherArchivist edited Deathwatch (+10, Link to Game Atsumaru section on [[Niconico]]): https://wiki.archiveteam.org/?diff=50623&oldid=50622 20:27:11 transfer will be getting a bit of an upgrade soonish. Planned changes include adding on-the-fly zstd compression support on upload, removing the forced download (i.e. no longer requiring /inline/ for browser access), and pasting content directly on the web interface (thanks to upstream's implementation of that). Now's your opportunity for further ideas. :-) 20:30:57 wooh :) 20:31:21 so also no need for zstd'ing stuff ourselves and taking .zst off from the URL? 20:31:24 JAA: ^ 20:31:55 JAA: 🥳🥳🥳🥳🥳🥳🥳🥳🥳🥳🥳 20:32:01 arkiver: Correct, no need for that anymore, although it might still be preferable if you want to minimise the amount of data transferred (e.g. slow connections). 20:32:29 paste text -> uploads a .txt file? :3 20:32:45 is that what you mean or did they finally add paste binary -> uploads binary 20:33:21 JAA: nice 20:34:15 fireonlive: I don't know exactly how it works, just saw it in the changelog. 20:34:20 ahh 20:34:27 But I assume pasting text, yeah. 20:35:13 transfer.sh-web is a clusterfuck, so the diff is very useful: https://github.com/dutchcoders/transfer.sh-web/pull/58/files 20:35:21 oh thanks 20:35:44 allow a certain UA to access image/video files? :3 20:36:07 though idk if the 🐰 is advanced enough for that 20:36:34 Oh, I guess this is really it: https://github.com/dutchcoders/transfer.sh-web/pull/58/files#diff-738ca807f137aa95054f4d49bc42f48f8f85b1acf13e381d268415f6d4f09417 20:36:45 So that seems a bit underwhelming. We'll see though. 20:36:52 300,000 changes to 'modTime: time.Unix(1668857825, 0),' 20:37:06 What do you mean regarding UA access? 20:37:09 ah yeah, listening for files in the clipboard 20:37:22 TheLounge's link preview thingy 20:37:46 dunno if you can allowlist say just stuff ending in .jpg/.png/etc 20:38:27 The Lounge is blocked specifically because dozens of people would spam the server within milliseconds of a link getting shared, and it caused problems on the server side including a fun crash due to a mutex bug. 20:38:44 bindata_gen.go scares me: var _bindataDistScriptsMainJs = 20:38:53 https://github.com/dutchcoders/transfer.sh-web/pull/58/files#diff-eef14c30d770fdc35b929095526891a4d3b2dc4ae748face27cafe361367926aR2037 haha 20:39:54 ah ye, after that was patched I thought it was more of a bandwidth thing 20:40:07 https://github.com/dutchcoders/transfer.sh/issues/380 20:41:31 hm, make delete urls available if possible? 20:41:39 if one were to accidentally shove a file? 20:41:51 they seem to be hidden on the AT instance 20:41:55 (or i'm dumb) 20:42:12 Yeah, that's the other part of it. When a large file gets linked, a dozen downloads of it would be started simultaneously, which is *great*. 20:42:58 ye, i figured limiting it to images at least would be somewhat better instead of everyone trying to download 100MB files to immediately throw out haha 20:43:02 but either way is fine 20:43:28 Don't remember as I hardly ever use the web interface, but will check. 20:44:53 oh, i guess i just misremembered: they show in curl at least: x-url-delete: https://transfer.archivete.am/PMcII/test.txt/kyjYcQjrG1 20:45:36 ah yeah but not on web 20:46:34 Yeah, the header exists, but it isn't always present. Depends on how the upload is done. 20:47:18 i was like I tried to delete this but ye it's probably just cached 21:36:45 Question. After I installed my AT Warrior I was able to access the UI on localhost:8001 once and now loads forever. How can I solve this? 21:55:03 Is it still running? If it's not running (or it's just starting up) it'll either load forever or immediately fail to road 21:55:18 pokechu22: question 21:56:17 I have a half day of free time before leaving for holidays, away from my computers, until next monday (more or less ~5 days of continuous querying with up to 3 unique IPs & machines) 21:56:26 what do I do during this half day 21:56:53 is it worth it to learn to setup an "archive warrior" to contribute to the effort? 21:56:57 pokechu22 it's running like it would normally, just cant access localhost 21:57:00 I don't think we have any kind of distributed project set up for orange 21:57:24 Setting up the warrior isn't too hard but it wouldn't be targeting orange specifically 21:58:09 so I can just get wget to spit out as much warcs as possible w/o being banned, and it would be somewhat useful 21:58:18 You're connecting to http://localhost:8001/ and not https://localhost:8001/ right? 21:59:13 Yeah, that'd be useful, though it'd be hard to avoid duplicating other work 21:59:30 the orange.fr priority job is onto its third pass, so that's pretty cool--we have assets (and one layer of links) for front pages of all those sites 21:59:34 that said, queue has been slowly growing, so while we might finish the majority of sites (which are small), we definitely will not completely get the large ones by the deadline 22:00:00 nice 22:00:21 pokechu22 yup 22:00:58 You could try http://127.0.0.1:8001/ or something like that maybe? 22:01:45 there we go, loaded after around 30 seconds 22:01:54 thanks for the help :] 22:08:51 mmmh I guess I'll find a way to prioritize some orange sites over others, and get as much shit as possible before the deadline 22:10:23 ok well it loaded.. but i cant do anything on the interface :p 22:10:29 screenshot for reference: https://ibb.co/VwngwJG 22:39:14 "that said, queue has been slowly..." <- Where to find it in the warrior list? 23:15:55 AntoninDelFabbro|m: those are archivebot jobs, so no warrior support: http://archivebot.com/ 23:16:16 can type 'orange' in the "Show" box to see 23:19:17 http://archivebot.com/?initialFilter=orange 23:28:51 ah yeah that’s better