-
h2ibot
Arkiver edited Deathwatch (+245, SAPO Videos deleting data on September 17):
wiki.archiveteam.org/?diff=50616&oldid=50603
-
h2ibot
FireonLive edited Current Projects (+2, fix comments (i've removed these auto…):
wiki.archiveteam.org/?diff=50617&oldid=50613
-
fireonlive
that link has a virus
-
fireonlive
no one click it
-
fireonlive
thanks
-
JAA
Still wrestling with
pkg.fig.io and it's such a mess. There's a script at
pkg.fig.io/install.sh that you're supposed to pipe to a shell (because of course), which then invokes other scripts, adds their repo to your package manager, and installs stuff. I haven't been able to get my hands on the actual .deb or .rpm files though, just get 403s there. Maybe I'm doing something wrong.
-
nicolas17
found it
-
nicolas17
well I didn't check if the deb actually downloads
-
nicolas17
oh, I see
-
JAA
:-)
-
nicolas17
it 403s in the end
-
JAA
It might also be the old way of installing it.
repo.fig.io has far newer versions.
-
nicolas17
what's the latest version?
-
JAA
Not a clue, but repo has 2.16.0 builds.
-
JAA
The files are all helpfully named 'fig.$packageManagerExtension' without a version, and the download page link goes to 'latest'.
-
nicolas17
open bucket listing nice
-
JAA
(The download page also says there are no Linux builds.)
-
JAA
I found hints somewhere that there are also Windows builds. Haven't seen those in the bucket.
-
fireonlive
hmm even setting a user-agent to apt doesn't seem to work, though i'm just guessing at the correct value lol
-
nicolas17
yeah I think this script is just outdated
-
fireonlive
-
fireonlive
:(
-
fireonlive
and no open bucket either
-
JAA
I wanted to try running the script in a Debian container to see what'd happen, but there are some libseccomp2 issues on my test machine.
-
nicolas17
I did that
-
JAA
Ah
-
JAA
Thanks
-
nicolas17
it's not letting me install the package because it says the repository signature expired
-
JAA
Heh
-
JAA
libfaketime time?
-
JAA
Or just --allow-unauthenticated
-
nicolas17
E: Failed to fetch
pkg-cdn.fig.io/2.5.3/linux/x86_64/fig.deb 403 Forbidden [IP: 13.227.83.128 443]
-
nicolas17
this repo is just broken
-
nicolas17
nothing to see here move along
-
JAA
Welp
-
fireonlive
:(
-
JAA
I'll collect the scripts and stuff at least.
-
JAA
The redirects will be broken in AB anyway thanks to the 307 bug.
-
fireonlive
ah it doesn't like the new 307 hotness?
-
JAA
It's a bit overzealous at preserving the request:
ArchiveTeam/wpull #425
-
fireonlive
ahh i see
-
nicolas17
>2019
-
fireonlive
-
fireonlive
:P
-
fireonlive
just ignore the spider webs (lol webs) in
github.com/ArchiveTeam/wpull/pulls
-
JAA
pkg.fig.io was still in use under a year ago per the GitHub issues:
github.com/withfig/fig/issues?q=is%3Aissue+pkg.fig.io
-
fireonlive
hmm
-
JAA
-
fireonlive
lol
-
nicolas17
JAA: so, what projects are currently running?
-
JAA
nicolas17: Xuite, Gfycat, Telegram, not sure what else.
-
fireonlive
urlteam2, mediafire maybe?, github maybe?
-
nicolas17
mediafire has no work
-
nicolas17
JAA: what's keeping reddit and urls paused?
-
fireonlive
i mean it's technically running though
-
fireonlive
:p
-
fireonlive
just needs some sweet !a lovin'
-
fireonlive
reddit -> arkiver verifying i.reddit.com; urls... i think just sheer size
-
JAA
i.redd.it* but yeah
-
fireonlive
or 'load sheeding' for the latter
-
fireonlive
ah yeah
-
fireonlive
i.reddit.com is the now kill web interface
-
fireonlive
i did correct myself from i.imgur.com before i hit enter though :D
-
nicolas17
oh yeah they made the image links even worse
-
fireonlive
indeed!
-
fireonlive
it's awful :D
-
nicolas17
it used to be that clicking an image link showed me the webpage, and I had to right click the image and open in a new tab to see the actual image with usable zooming
-
nicolas17
now if I open the image in a new tab it loads the goddamn webpage too
-
fireonlive
now right click does noooothing
-
fireonlive
:D
-
JAA
For a while, I could use view-source to get the image itself. No idea why, never bothered to look into it.
-
fireonlive
i decided to follow the thot leaders at reddit and host my very own image:
mkx9delh5a.execute-api.ca-central-1…s.com/uploads/a-very-nice-image.png
-
JAA
Every time I click on an image now, I get redirected to
old.reddit.com/r/funny/comments/media/nice_hat due to my URL rewrites from www.reddit.com to old.reddit.com.
-
fireonlive
lol
-
nicolas17
if workers are bored we could resume imgur at a low rate >.>
-
fireonlive
at least you can see gaga's hat
-
JAA
It redirects to
reddit.com/media?url=..., but on old.reddit.com, that redirects to the post with ID 'media' instead. :-)
-
nicolas17
I think I'm done configuring allll the Apple update assets in my script... now I have 600MB of json responses
-
imer
tracker taking a nap? "Tracker returned status code 500. The tracker has probably malfunctioned. Retrying after 80 seconds.."
-
fireonlive
looks like tracker isn't happy
-
fireonlive
cc JAA
-
fireonlive
-
imer
Fusl: hi! tracker has been erroring out on item requests/backfeed for the last ~15min
-
drunkmoon
seems to be back up
-
imer
looks to be recovering, just have to start pinging people :D
-
fireonlive
the Fusl-bat-phone
-
masterX244
Someone on a game alpha stuff discors noticed that epicgames cleared the UT assets used by the cancelled UT game that is/was on github for ue license holders from their servers, i got a full mirror of that data luckily
-
that_lurker
-
fireonlive
😂
-
fireonlive
yes.
-
project10
-
project10
bonus points if you hear the sound effect while viewing
-
nstrom|m
Tracker /backfeed unhappy again
-
yts98
the tracker returns 500 and failed to accept backfeeds again
-
plcp_
re
-
pabs
:) pls repeat for those not on #archiveteam plcp_
-
plcp_
okok
-
plcp_
buckle up
-
plcp_
All personal websites from personal webpages of the main telco operator in France are going offline by September 5th, they have a registry here
annuaire-pp.orange.fr/accueil
-
plcp_
-
plcp_
The announce (in French, sry)
-
pabs
pokechu22 flashfire42 JAA have been doing some ArchiveBot jobs for orange
-
qyxojzh|m
I can help translate if need be
-
plcp_
all *.pagesperso-orange.fr and all *.monsite-orange.fr
-
pabs
pokechu22: did your orange !a < jobs cover
telecommunications.monsite-orange.fr ? plcp_ mentioned that as an example
-
plcp_
I'm worried, especially because it's composed mostly of non-tech savvy people, non profits and older folks, that for most build tens of thousands of pages on topics they're passionate about, and won't be migrate anywhere
-
plcp_
*be migrated
-
pabs
yeah, ISP hosting is quite endangered in general
wiki.archiveteam.org/index.php?title=ISP_Hosting
-
yts98
Pixnet (
pixnet.net), the last largest blog service provider in Taiwan, accepted the migration from Yahoo! Blog, Wretch, yam天空部落 and Xuite, announced to delete inactive accounts (before 2020-01-01) on 2023-12-01:
admin.pixnet.net/blog/post/49016232
-
yts98
I consider that Pixnet is partially endangered and it's going to be another large DPoS project
-
pabs
seems like something to mention on the announce channel #archiveteam too
-
pabs
-
yts98
pabs: I shortly mentioned it on #archiveteam and is editing the wiki :p
-
h2ibot
Yts98 edited Deathwatch (+164, Add Pixnet):
wiki.archiveteam.org/?diff=50618&oldid=50616
-
plcp
re
-
plcp
(the "web interface" link here
wiki.archiveteam.org/index.php/Archiveteam:IRC#How_do_I_chat_on_IRC? may be updated from #archiveteam to #archiveteam-bs to avoid ppl in a hurry pollution the announce chan)
-
plcp
and thanks for the rapid answer pabs
-
pabs
good idea, fixed
-
h2ibot
PaulWise edited Archiveteam:IRC (+3, set #archiveteam-bs as the default channel):
wiki.archiveteam.org/?diff=50619&oldid=50560
-
h2ibot
PaulWise edited Archiveteam:IRC (+3, fix web link too):
wiki.archiveteam.org/?diff=50620&oldid=50619
-
h2ibot
Yts98 created PIXNET (+4100, inactive accounts of PIXNET is endangered):
wiki.archiveteam.org/?title=PIXNET
-
h2ibot
Yts98 edited Deathwatch (+0, Capitalize PIXNET):
wiki.archiveteam.org/?diff=50622&oldid=50618
-
pokechu22
plcp: no, I don't think I've got any of
telecommunications.monsite-orange.fr
-
pokechu22
er, wait, one sec
-
pokechu22
still waking up, thought that was something like telecommunications-orange.fr and not a subdomain of monsite-orange.fr
-
pokechu22
plcp: Yeah, that's on the priority list running in AB. flashfire42 also did several jobs for it starting on various pages (but would have recursed over the whole site on each one), see
archive.fart.website/archivebot/vie…elecommunications.monsite-orange.fr
-
plcp
nice
-
plcp
I'm going through some of these websites, looks like there's some amount of badly rewritten ones
-
plcp
some have their homepages hosted as "<handle>.pagesperso-orange.fr" but when crawling they use legacy "
perso.wanadoo.fr<handle>/" urls that no longer works
-
plcp
but rewriting these urls to pagesperso fixes the website
-
plcp
what a nightmare
-
Exorcism|TheLounge
plcp: french operator woohoo ☆*: .。. o(≧▽≦)o .。.:*☆
-
pokechu22
Unfortunately the site bans for 24 hours if you request at faster than 1 page/second so it's unlikely we'll get everything - if there was more time it'd probably be possible to handle those legacy URLs but I don't think we will be able to :|
-
AntoninDelFabbro|m
-
pokechu22
I don't think http versus https is which site builder is used - instead it's if the username has multiple dots in it it gets http and if it doesn't it gets https, because a SSL certificate for *.monsite-orange.fr only covers subdomains without dots and there isn't a way to do *.*.monsite-orange.fr
-
pokechu22
that can also be seen by looking at what
perso.orange.fr/DEMO and
perso.orange.fr/FOO.BAR redirect to
-
plcp
pokechu22: they rate limit that aggressively?
-
AntoninDelFabbro|m
Bruh, I forgot
-
AntoninDelFabbro|m
-
AntoninDelFabbro|m
-
AntoninDelFabbro|m
↓ ...
-
pokechu22
They apply a ban after an hour or two of sustained requests at a high speed, but it does seem like it's that strict overall
-
plcp
ah that's why I was able to wget one site
-
plcp
but if I go for the 44k something pages, it won't work
-
pokechu22
Yeah
-
plcp
(still scrapping they registry)
-
pokechu22
The other annoying factor is that sites and pages that don't exist redirect to
r.orange.fr/r/Oerreur_404 and then
e.orange.fr/error404.html, and both of those pages also count into the rate limit. (And ArchiveBot doesn't have a way of applying ignores to redirect targets, so it requests those every time)
-
plcp
even just downloading one page per site, the front index.html, will require days with one ip
-
plcp
with that rate limit, should have started a year ago :D
-
plcp
pokechu22: when did you started?
-
pokechu22
A few days ago
-
plcp
well shit
-
pokechu22
The list of high-priority sites that are likely to exist (
transfer.archivete.am/6gcam/pagespe…ge.fr_seed_urls_thuban_priority.txt) has already downloaded all of the front pages at least
-
pokechu22
but it seems unlikely it'll get everything else
-
AntoninDelFabbro|m
as he said: well shit
-
pokechu22
I have a bunch of other jobs running on different IPs based on other lists I generated (e.g. sites that have no existing coverage at all, most of which don't exist but it's found 646 of them so far that do, and some other generated lists)
-
pokechu22
but we should have started a while back :|
-
plcp
the aforementioned list looks like their registry scrapped
-
plcp
pokechu22: they announced it like three month ago iirc
-
pokechu22
Yes
-
plcp
but the information reached me like, today
-
pokechu22
flashfire42 has been running individual sites for a while:
archive.fart.website/archivebot/viewer/?q=orange.fr - it just took a while to build up lists of sites
-
pokechu22
-
plcp
159k pages!
-
plcp
wow
-
plcp
that's triple the amount from the registry
-
plcp
okok, brb spamming all friends that may have worked once in their life at orange
-
plcp
-
plcp
here's my list
-
JAA
-
pokechu22
-
JAA
(.tar.gz unpacked and then recompressed with zstd, to be precise.)
-
pokechu22
It looks like a few of those are new
-
fireonlive
--ultra --22? :p
-
fireonlive
does higher # have any affect on the decompressor? or just when compressing
-
JAA
fireonlive: Nah, -10 is my go-to. And yes, the --ultra levels require more memory to decompress IIRC.
-
nicolas17
fireonlive: you can test that
-
fireonlive
nicolas17: technically correct
-
fireonlive
JAA: ah :)
-
nicolas17
I mean like, easily
-
pokechu22
-
nicolas17
"zstd -b1 -e19 file.txt" will benchmark all levels 1 to 19 and give you the compression ratio, and compression and decompression speed
-
fireonlive
ah! nice
-
pokechu22
otherwise your list matched the orangefr_online_raw.txt one pretty closely
-
nicolas17
and if either compression or decompression takes less than 1 second, it runs multiple times to get a better measurement
-
imer
neat, zstd continues to impress me
-
nicolas17
there's one disappointing thing though
-
nicolas17
"--format=FORMAT: compress and decompress in other formats. If compiled with support, zstd can compress to or decompress from other compression algorithm formats. Possibly available options are zstd, gzip, xz, lzma, and lz4."
-
nicolas17
it doesn't support benchmarking them :( -b only does zstd format
-
fireonlive
:(
-
plcp
JAA: thanks
-
h2ibot
JustAnotherArchivist edited Deathwatch (+10, Link to Game Atsumaru section on [[Niconico]]):
wiki.archiveteam.org/?diff=50623&oldid=50622
-
JAA
transfer will be getting a bit of an upgrade soonish. Planned changes include adding on-the-fly zstd compression support on upload, removing the forced download (i.e. no longer requiring /inline/ for browser access), and pasting content directly on the web interface (thanks to upstream's implementation of that). Now's your opportunity for further ideas. :-)
-
arkiver
wooh :)
-
arkiver
so also no need for zstd'ing stuff ourselves and taking .zst off from the URL?
-
arkiver
JAA: ^
-
fireonlive
JAA: 🥳🥳🥳🥳🥳🥳🥳🥳🥳🥳🥳
-
JAA
arkiver: Correct, no need for that anymore, although it might still be preferable if you want to minimise the amount of data transferred (e.g. slow connections).
-
fireonlive
paste text -> uploads a .txt file? :3
-
fireonlive
is that what you mean or did they finally add paste binary -> uploads binary
-
arkiver
JAA: nice
-
JAA
fireonlive: I don't know exactly how it works, just saw it in the changelog.
-
fireonlive
ahh
-
JAA
But I assume pasting text, yeah.
-
JAA
transfer.sh-web is a clusterfuck, so the diff is very useful:
dutchcoders/transfer.sh-web #58/files
-
fireonlive
oh thanks
-
fireonlive
allow a certain UA to access image/video files? :3
-
fireonlive
though idk if the 🐰 is advanced enough for that
-
JAA
-
JAA
So that seems a bit underwhelming. We'll see though.
-
fireonlive
300,000 changes to 'modTime: time.Unix(1668857825, 0),'
-
JAA
What do you mean regarding UA access?
-
fireonlive
ah yeah, listening for files in the clipboard
-
fireonlive
TheLounge's link preview thingy
-
fireonlive
dunno if you can allowlist say just stuff ending in .jpg/.png/etc
-
JAA
The Lounge is blocked specifically because dozens of people would spam the server within milliseconds of a link getting shared, and it caused problems on the server side including a fun crash due to a mutex bug.
-
fireonlive
bindata_gen.go scares me: var _bindataDistScriptsMainJs =
-
fireonlive
-
fireonlive
ah ye, after that was patched I thought it was more of a bandwidth thing
-
JAA
-
fireonlive
hm, make delete urls available if possible?
-
fireonlive
if one were to accidentally shove a file?
-
fireonlive
they seem to be hidden on the AT instance
-
fireonlive
(or i'm dumb)
-
JAA
Yeah, that's the other part of it. When a large file gets linked, a dozen downloads of it would be started simultaneously, which is *great*.
-
fireonlive
ye, i figured limiting it to images at least would be somewhat better instead of everyone trying to download 100MB files to immediately throw out haha
-
fireonlive
but either way is fine
-
JAA
Don't remember as I hardly ever use the web interface, but will check.
-
fireonlive
oh, i guess i just misremembered: they show in curl at least: x-url-delete:
transfer.archivete.am/PMcII/test.txt/kyjYcQjrG1
-
fireonlive
ah yeah but not on web
-
JAA
Yeah, the header exists, but it isn't always present. Depends on how the upload is done.
-
fireonlive
i was like I tried to delete this but ye it's probably just cached
-
Kline
Question. After I installed my AT Warrior I was able to access the UI on localhost:8001 once and now loads forever. How can I solve this?
-
pokechu22
Is it still running? If it's not running (or it's just starting up) it'll either load forever or immediately fail to road
-
plcp
pokechu22: question
-
plcp
I have a half day of free time before leaving for holidays, away from my computers, until next monday (more or less ~5 days of continuous querying with up to 3 unique IPs & machines)
-
plcp
what do I do during this half day
-
plcp
is it worth it to learn to setup an "archive warrior" to contribute to the effort?
-
Kline
pokechu22 it's running like it would normally, just cant access localhost
-
pokechu22
I don't think we have any kind of distributed project set up for orange
-
pokechu22
Setting up the warrior isn't too hard but it wouldn't be targeting orange specifically
-
plcp
so I can just get wget to spit out as much warcs as possible w/o being banned, and it would be somewhat useful
-
pokechu22
You're connecting to
localhost:8001 and not
localhost:8001 right?
-
pokechu22
Yeah, that'd be useful, though it'd be hard to avoid duplicating other work
-
thuban
the orange.fr priority job is onto its third pass, so that's pretty cool--we have assets (and one layer of links) for front pages of all those sites
-
thuban
that said, queue has been slowly growing, so while we might finish the majority of sites (which are small), we definitely will not completely get the large ones by the deadline
-
plcp
nice
-
Kline
pokechu22 yup
-
pokechu22
You could try
127.0.0.1:8001 or something like that maybe?
-
Kline
there we go, loaded after around 30 seconds
-
Kline
thanks for the help :]
-
plcp
mmmh I guess I'll find a way to prioritize some orange sites over others, and get as much shit as possible before the deadline
-
Kline
ok well it loaded.. but i cant do anything on the interface :p
-
Kline
screenshot for reference:
ibb.co/VwngwJG
-
AntoninDelFabbro|m
<thuban> "that said, queue has been slowly..." <- Where to find it in the warrior list?
-
fireonlive
AntoninDelFabbro|m: those are archivebot jobs, so no warrior support:
archivebot.com
-
fireonlive
can type 'orange' in the "Show" box to see
-
JAA
-
fireonlive
ah yeah that’s better