-
fireonlive
cc pokechu22?
-
pokechu22
There's been an archivebot job on all of macgui.com but it's been blocked for a long time because the site has banned us
-
pokechu22
A new one probably is a good diea
-
thuban
aha, i thought i remembered some discussion
-
thuban
would a new job not also be b&?
-
pokechu22
If we run it slower it'd probably be fine
-
thuban
sounds good
-
mgrandi
There aren't that many files, even a really slow download rate would probably be fine
-
HP_Archivist
qwertyasdfuiopghjk: Trump's jail page is live fyi
-
HP_Archivist
qwertyasdfuiopghjkl: ^
-
flashfire42
Not sure if I should have my warriors on xuite because we have less of it storage wise or gfycat because we have less of it item wise
-
imer
flashfire42: xuite, gfycat should be fine with current workers on it
-
imer
gfycat is tracker limited as well
-
imer
^ so cant go any faster than it currently is
-
fireonlive
-
fireonlive
-
DigitalDragons
-
DigitalDragons
"This is an unofficial product, but if there is enough interest, a portion of proceeds will be dona" (I'm not cutting it off, that's literally where the redbubble description ends.)
-
fireonlive
-
DigitalDragons
-
fireonlive
was just about to link that :3
-
fireonlive
but ye
-
fireonlive
idk
-
thuban
imho, as long as redbubble still delivers, it's fine to leave it up
-
mgrandi
They do, I got a shirt from them not too long ago
-
HP_Archivist
qwertyasdfuiopghjkl: When you're around, please crawl this - Trump's jail info page:
justice.fultoncountyga.gov/PAJailMa…ailingDetail.aspx?JailingID=1472740
-
qwertyasdfuiopghjkl
-
fireonlive
oh nice
-
JAA
Do I understand this right? Someone uploaded the warrior logo to Redbubble, but other than that, they're not involved as Redbubble directly prints the stuff whenever it's ordered?
-
HP_Archivist
qwertyasdfuiopghjkl: Awesome
-
JAA
I also wonder about the copyright situation there. I don't see any licence on
wiki.archiveteam.org/index.php/File:Archive_team.png at least.
-
fireonlive
JAA: AIUI, yes. they also get a kickback
-
fireonlive
so just upload file and collect cheques
-
JAA
Lovely!
-
flashfire42
*creates redbubble account*
-
fireonlive
i guess someone came by *checks image link* ~3 years later and decided they'd upload it there and edit the wiki
-
fireonlive
and 'if money from here is enough perhaps can donate it back to AT'
-
fireonlive
lol
-
HP_Archivist
Lmao. Why would anyone use something so obscure as the AT logo... on generic consumer products?
-
JAA
Do you hate free money?
-
fireonlive
you for example might slap it on your laptop
-
fireonlive
or someone could hand them out after a passionate talk about archiveteam kinda thing
-
HP_Archivist
Well, sure. I guess I might, heh
-
JAA
They also uploaded the ReactJS logo and formerly the CoffeeScript logo.
-
JAA
The latter was removed at some point.
-
fireonlive
interesting
-
HP_Archivist
Unless that person is familiar with archiveteam, seems completely random. And if they are familiar, chances are they'd be in here already, no?
-
JAA
Or at least they would've been around that time. I don't see anything in the logs.
-
fireonlive
one edit to a angelfire/project page (i don't think there was approval back then? maybe there was?) then after that right to redbubble
-
JAA
No, the mod thing only exists since about three years.
-
fireonlive
ah ok
-
fireonlive
i assume after the m*****f***** incident
-
JAA
Previously, there was a safeword on signup and then you could just edit.
-
JAA
Nah, much later, around the time we also got rid of EFnet.
-
fireonlive
ahh
-
JAA
The Angelfire edit was also two years earlier than this sticker thing.
-
fireonlive
ahh yes it was
-
JAA
Based on how I understand it, this can go to hell and we should get Redbubble to take down the listing, too. I'd have a different opinion if this was run by a regular, but like that? No.
-
JAA
Might not be a terrible idea to ask Jason about it though.
-
HP_Archivist
Not that I have much of a say, given I've only been volunteering here a few years. But, would it be time for an upgraded logo anyway?
-
fireonlive
seems sensible to me
-
fireonlive
it's kinda a 'sub' logo
-
JAA
Also to sort out the copyright/licence matter. I assume Jason knew Penelope or similar.
-
fireonlive
but i like it :3
-
JAA
knew/knows*
-
HP_Archivist
It's cool, agree, but it seems very 2010s or earlier.
-
JAA
HP_Archivist: I don't think anyone will stop you if you want to design one. :-)
-
fireonlive
:3 true
-
HP_Archivist
JAA: I am a terrible draw-er. Most you'll get outta me is a stick figure or two, ha
-
thuban
i like the logo, chasing trends is for suckers
-
fireonlive
-
flashfire42
hmm I have been thinking and we are gonna lose webs sooner than we will lose the orange isp sites. should I focus more on some webs stuff even with the knowledge we did a pretty good grab of webs a few years ago
-
DigitalDragons
hmm
-
DigitalDragons
i think it depends how much we expect webs to have changed over the years
-
flashfire42
The orange web pages are fairly obscure but webs has had people updating their pages and making new pages up to and including in the last few days afaik
-
fireonlive
(no license on that one it seems either)
-
fireonlive
(well that one or its variations)
-
JAA
HP_Archivist: Yep, same. In general, if I had to guess, I'd say there aren't many designers here. Just doesn't seem like a demographic we'd attract.
-
appledash
I don't like logo changes for the sake of logo changes and newer ones are almost always worse
-
PredatorIWD
Should #archiveteam channel just be renamed to #at-announcements / #archiveteam-news or similar? Every third message is always telling the users to go here to discuss further whatever is it that they said
-
PredatorIWD
On the wiki it says that the channel is for announcements but the default name obviously makes new people forget
-
fireonlive
i was going to suggest an entrymsg but that already exists lol
-
thuban
anyway... the sticker bootlegging is slimy-but-harmless*, and i don't support going after it unless we intend to offer an 'official' replacement (even an equally crappy redbubble replacement)
-
thuban
* if penelope shenck wants royalties on a likely-single-digit number of sticker sales, i will personally cut her a check
-
fireonlive
-
fireonlive
"Danish hosting giants CloudNordic and AzeroCloud have been hit by a massive ransomware attack, resulting in a catastrophic loss of customer data."
-
JAA
Yup, I archived what was left of their websites earlier.
-
fireonlive
"Recovery instructions include sending customers to the Wayback Machine since their own backups are hosed."
-
fireonlive
JAA: nice :)
-
JAA
It was basically only the homepages with the notices quoted there.
-
JAA
Everything else was gone.
-
fireonlive
:(
-
fireonlive
nordicbots.com (long-running IRC bot service on QuakeNet) had all bots go down, and suffered a total data loss with all backups encrypted as well. I wonder if this is the same thing
-
fireonlive
"Things aren't looking good. Harddisks _and_ backups have been encrypted by a crypto locker. Sorry. Looking into it."
-
fireonlive
might be worth an AB
-
JAA
> Latest news
-
JAA
> 013-08-30
-
JAA
> 2013-08-30 *
-
JAA
Yeah, probably worth that anyway.
-
fireonlive
ye :x
-
qwertyasdfuiopghjkl
HP_Archivist: Current status on saving those pages:
transfer.archivete.am/inline/wpkNM/list
-
qwertyasdfuiopghjkl
I'll check the not found ones again later today to see if they've been added.
-
erkinalp
hey, wowturkey archiver says "paused". any issues?
-
erkinalp
expected item count for wowturkey is about 9.4M, not 9.15M
-
h2ibot
PaulWise edited Deathwatch (+249, Feb 29 Amazon Honeycode shutdown, AB jobs launched):
wiki.archiveteam.org/?diff=50584&oldid=50575
-
fireonlive
:)
-
pokechu22
erkinalp: the websocket for the archivebot dashboard is currently misbehaving. Jobs are still being worked on, but the page only updates when refreshed, not automatically.
-
erkinalp
pokechu22: is there any page that we could watch the progress for that fix?
-
IDK
"The nordic cloud experts"
-
IDK
tbh this got me wondering
-
IDK
what if IA got hit with this
-
imer
The easy defense is to have data be append only, so no overwriting or deleting, not sure how data distribution at IA works internally though
-
imer
well, "easy" as things at scale always are ;)
-
IDK
shouldn't the secondary backup be minimally accessible, if they somehow managed to wipe secondary backup, then they must have done some massive mistakes
-
HP_Archivist
qwertyasdfuiopghjkl: Sounds good
-
HP_Archivist
I think that's everyone, in your list
-
erkinalp
wowturkey archival still going strong
-
erkinalp
btw there's a quirk of wowturkey that's **very important** for us: when the website closes for db vacuum, it returns a static error page with the text "bakım, temizlik, badana vesaire nedeniyle site kapalı, birkaç dakika içinde yeniden açılacak"
-
HP_Archivist
qwertyasdfuiopghjkl: Along similar lines, I heard last evening that Trump's mugshot was going to be released in higher resolution today by Fulton County Jail. I'm not sure if they will post on their site or elsewhere. But a high resolution crawl of it from the 'official source' should be done.
-
HP_Archivist
I'm AFK for a few hours.
-
erkinalp
which means "closed for maintenance, cleanup, paint etc and reopen in a few minutes"
-
erkinalp
if there are any such responses in the archive, those individual pages will need to be manually recrawled
-
erkinalp
that "badana vesaire..." page is meant as a 503 page
-
qwertyasdfuiopghjkl
HP_Archivist: All the previously missing ones have appeared and been saved:
transfer.archivete.am/inline/8iDFM/list
-
erkinalp
sorry, if there were any replies to the "badana vesaire" thing, i've missed that as my client crashed
-
qwertyasdfuiopghjkl
There were no replies to that yet. If that page actually has the http status code 503, ArchiveBot will automatically retry it later.
-
qwertyasdfuiopghjkl
Also, this channel has a public log at
hackint.logs.kiska.pw/archiveteam-bs , so you can check that to see any messages you missed while disconnected.
-
pokechu22
erkinalp: do you know if the site also gives a different HTTP status code when that happens? If it's a 5xx error code it'll be retried
-
erkinalp
i need to check that out
-
JAA
No 50x responses from wowturkey.com in the log of 3di34a3v4nwzjuejzb82e336n.
-
erkinalp
need to look at that "badana vesaire" thing at 2UTC to 2:05UTC, that's normally when it vacuums
-
nicolas17
either everything has been stalled for an hour or the websocket stats are stuck
-
imer
looking stalled on my end :(
-
fireonlive
@ERROR: max connections (-1) reached -- try again later
-
fireonlive
think we hit the wall for now
-
DigitalDragons
I think IA is struggling again
-
phaeton
can concurrency be adjusted on an already running docker-grab or can it be set/reset only on container create/recreate?
-
fireonlive
all abord the struggle bus
-
imer
s3 stats seems to be recovering slowly, so maybe it'll catch up soon?
-
imer
phaeton: need to recreate afaik
-
phaeton
that's what I thought. thanks for confirming
-
HP_Archivist
qwertyasdfuiopghjkl: You're awesome, thanks for doing all of them
-
fireonlive
yeee
-
fireonlive
qwertyasdfuiopghjkl++
-
fireonlive
rip karma bots :(
-
HP_Archivist
The image here on this page used by the Associated Press seems of higher quality than the one released last evening. The Fulton County Jail text is legible and sharper
-
HP_Archivist
-
HP_Archivist
-
HP_Archivist
Trump himself is still blurry somehow
-
fireonlive
they don't seem to have the best camera there
-
fireonlive
that or they don't use it well
-
fireonlive
the other photos are not well taken
-
HP_Archivist
Yeah, or it's just terrible lighting, too
-
fireonlive
ahyeah
-
nicolas17
several newspapers including AP posted AI-upscaled bullshit
-
HP_Archivist
nicolas17: Do you have a source for that?
-
nicolas17
-
nicolas17
look at the logo, that's clearly upscaling
-
fireonlive
oh, gross
-
nicolas17
but there's artifacts all over the place on his face
-
Barto
feels like some failed ai upscaling
-
Barto
it upscaled the artifact too
-
HP_Archivist
I just commented on the legibility of that text (above). I mean, it could be an upscaled version. But if you look at the dimensions of that photo compared to what was posted online last evening, it IS larger.
-
HP_Archivist
Honestly, unless someone has evidence of upscaling, I'm inclined to take this at face value. It's the AP, too. Something tells me they would mention that below the photo used.
-
HP_Archivist
Also - that 'artifacting' is there in the one released light evening. It's just an overall bad photo with bad lighting. But this one from the AP is definitely of higher resolution.
-
nicolas17
"it could be upscaled, but you gotta admit it's larger" what.
-
HP_Archivist
nicolas17: I'm just trying to archive the highest resolution that was officially provided by said original source.
-
nicolas17
yeah a friend is trying to obtain that
-
nicolas17
AP is not an original source
-
HP_Archivist
If you compare Trumps face from the first released photo yesterday to the one in that AP link, his face and those artifacts look the same
-
nicolas17
apparently you can get it from
justice.fultoncountyga.gov/PAJailMa…ailingDetail.aspx?JailingID=1472740 but I think that needs a US IP address, it doesn't load for me
-
HP_Archivist
That errors out for me, too, and I'm in the US
-
JAA
That AP image is so obviously upscaled and smoothened it isn't even funny.
-
nicolas17
JPEG is lossy so you should convert it to PNG if you want the highest resolution :D
-
JAA
PNG? But BMP files are larger!
-
nicolas17
-
HP_Archivist
Well, okay, I'm wrong then, heh. I still regard AP as the last bastion of hope for new reporting. Or, so I'd like to think
-
HP_Archivist
news*
-
HP_Archivist
nicolas17: If you can get the link for us that leads to Fulton County's site providing the image, we can get it crawled. But that link errors out for me, too. I searched the site and didn't find a section for mugshots
-
HP_Archivist
Btw, nicolas17 - you were right about source being with Fulton County Jail. Usually though, they release 'source' to the news outlets and in turn they use that in their reporting. My thinking was if we couldn't get source from, well, source, then next best would be a trustworthy news outlet
-
pokechu22
I have a US ip address and
justice.fultoncountyga.gov/PAJailMa…ailingDetail.aspx?JailingID=1472740 doesn't have the image. (Also, that link only works if you find it via the search page - directly using the link doesn't work. The site is super jank.)
-
nicolas17
"I'd like to track down the OG, but you have to request it by email after filling out a legal form"
-
pokechu22
see
web.archive.org/web/20230825010801/…ailingDetail.aspx?JailingID=1472740 (saving that requires doing some jank cookie stuff, see discussion yesterday)
-
HP_Archivist
I don't know it any regular citizen can request the image or if it's only reserved for media and news outlets. Not sure the legality of it.
-
HP_Archivist
Mugshots are obviously public record, but no idea about requesting copies
-
nstrom|m
nfoic.org/georgia-foia-laws if you can get someone in GA to ask
-
fireonlive
i wonder if it's on a press release hidden away somewhere
-
fireonlive
or hm
-
fireonlive
oh could just be everyone FOIA'd it?
-
fireonlive
interesting
-
HP_Archivist
Good point, fireonlive
-
JAA
> Release of booking information to include mugshots will occur daily at approximately 4:00 p.m. (EST) via media advisory.
-
JAA
-
HP_Archivist
-
JAA
Something like that, yeah.
-
HP_Archivist
I mean, there's nothing wrong or no harm in doing so. But I'm still leery about having my actual name associated with Trump in any way or context, lmao.
-
HP_Archivist
Someone else can if they feel so inclined for archival purposes...
-
nicolas17
HP_Archivist:
-
nicolas17
"so here's the raw file that was sent out to the press"
-
nicolas17
-
nicolas17
>the logo was, unsurprisingly, added separately, apparently using Canva (based on metadata)
-
nicolas17
>the JPEG macroblocks present indicate that the original resolution was somewhere around 454x454px
-
nicolas17
>(there are no macroblock artefacts on the logo)
-
nicolas17
>and the actual exported resolution is 1080x1080px
-
nulldata
gmsracing.net - "a former championship-winning organization, plan to cease operations at the end of the 2023 NASCAR Truck Series season."
beyondtheflag.com/2023/08/24/nascar-gms-racing-shutting-end-2023
-
HP_Archivist
nicolas17: Unless you have solid evidence for this, we can agree to disagree. Borderline conspiratorial, and, to borrow a quote from a famous scientist, "Extraordinary claims, require extraordinary evidence".
-
nicolas17
what is *your* theory? that the image this guy got from the .gov site is downscaled and AP has the full original resolution?
-
pokechu22
nulldata: that's squarespace and the sitemap has 610 URLs - should be easy for archivebot
-
JAA
What nicolas17 wrote sounds about right to me. The photo is clearly much lower resolution than the logo.
-
JAA
My only question would be: ... but why?
-
JAA
Then again, the other photos were horrible as well, so Hanlon's razor probably applies.
-
nicolas17
JAA: thing is, the jail seems to have upscaled it from whatever was the original (454?) to 1080x1080, but AP blew it to 2700x2700
-
nicolas17
and the latter is what shows signs of generative AI gone wild
-
HP_Archivist
Yeah, that's what I'm saying, JAA. If nicolas17 is right then pretty much wtf/why is what I'm wondering
-
JAA
Yeah, that too.
-
JAA
Taking a mugshot at that resolution in the first place makes no sense to me though. Maybe they didn't want to release the original photo for some reason, so they downscaled it, then upscaled it again.
-
HP_Archivist
The jail text does not look superimposed. It looks like it's part of the wall. I mean, I'm no expert on spotting fake / AI pics, but nobody else agrees?
-
HP_Archivist
-
HP_Archivist
"“I think it’s a certainty there will be context collapse around any prominent image that can be interpreted or misinterpreted by different communities, both wilfully and accidentally,” says Sam Gregory, executive director of Witness, a nonprofit organization focused on using images and videos for protecting human rights. And with numerous versions of a Trump mug shot circulating online, people may
-
HP_Archivist
have different memories and associations of the historical event. “We’ll remember the one that we saw in a context that made it memorable to us,” Gregory says."
-
JAA
Both types of artefacts, the regular upscaling in the 'original' image and the AI stuff in the AP image, are pretty obvious.
-
pokechu22
It looks superimposed to me, particularly with regards to the JPEG artefacts
-
JAA
But it's hard to explain why.
-
JAA
-
HP_Archivist
JAA: Lol ^
-
nicolas17
it's the only part of the image that doesn't show signs of chroma subsampling when I split it into HSV
-
pokechu22
also... what's the copyright status of that mugshot? I know government images in Florida are public domain but most other state government images aren't, but mugshots might be special?
-
pokechu22
Wikipedia is treating it as fair use and has some context on the description, it seems:
en.wikipedia.org/wiki/File:Donald_Trump_mug_shot.jpg
-
h2ibot
Myusernameisanything edited URLTeam (+141, /* "Official" shorteners */ mod.lk):
wiki.archiveteam.org/?diff=50585&oldid=50543
-
h2ibot
Vokunal edited Frequently Asked Questions (+321, Added a section on why uploads are less than…):
wiki.archiveteam.org/?diff=50586&oldid=50275
-
HP_Archivist
According to this post on Reddit, the image unofficially leaked right before media outlets were showing it:
reddit.com/r/pics/comments/160j0z0/donald_trumps_mugshot/jxmnluj
-
HP_Archivist
So, the one that leaked didn't have the logo on it. Idk how anything leaks in that short amount of time
-
nicolas17
looks like a photo of a screen
-
HP_Archivist
But to nicolas17' credit, I now see how the logo is superimposed
-
HP_Archivist
Yeah it does
-
nicolas17
I tried to make the image full-lightness full-saturation keeping only the hue
-
nicolas17
and the result looked fucking terrifying, like his face was a ball of fire
-
nicolas17
>.>
-
HP_Archivist
Because we don't know the setup and number of personnel at this jail, we might never know how it leaked, exactly or what the sequence of events for pre-release, leak, and official release came to be.
-
HP_Archivist
e.g. who had access to the pic before the public/media did, etc
-
HP_Archivist
Kinda odd though that no media outlets mention it leaked before the media was able to show it officially
-
HP_Archivist
nicolas17: A sub-comment on that post, "I think the fact that it doesn't have the seal adds some credibility to it being a leak. It would make sense the seal is added when they release the mugshot to the public, but they keep a clean one internally."
-
HP_Archivist
Which makes sense
-
Krume
hello!
-
Krume
i am new to archiveteam, i once ran the early version of the bot on a VPS but it wasn't a good idea since it was just a very cheap one
-
Krume
are there any laws that would make it difficult for me to do this from my home in EU/DE?
-
pokechu22
I believe there are some people around here in the EU that do run it from home; I run mine from home in the US. This doesn't account for relevant laws though which I haven't researched
-
pokechu22
Some sites will block/rate limit you if you request pages too fast, and some archiveteam projects tend to get you blocked on a lot of sites (the URLs project in particular warns about this)
-
Krume
well, there would be websites like project gutenberg or the internet archive which are kind of restricted from germany
-
HP_Archivist
Germany restricts access to Archive.org ?
-
Krume
oh, and can i find more information about that in the wiki?
-
Krume
hmm, some items would be illegal to download
-
JAA
Most projects target specific sites. URLs aka #// is really the only one that 'might grab anything' and requires care in that regard.
-
Krume
ah, okay
-
Krume
i cannot get myself blocked too much
-
Krume
my isp would be angry at me
-
Krume
would it wear down a SSD quickly? it would probably process many files
-
Krume
i just got a new one for my raspberry pi 4
-
JAA
It's not impossible that the targeted sites serve content that's illegal in some jurisdictions. Beyond some scale, it's basically unavoidable. Whether this is a concern for you depends on your level of paranoia. It shouldn't be a problem since the chance of hitting such content is generally very small and being part of an automated retrieval is usually somewhat protected legally (although Germany has
-
JAA
a special kink for shitty court decisions in that area, so...). HTTPS being very common nowadays also helps, although we sometimes have to archive old sites which lack it, of course.
-
nicolas17
the warrior/grabber currently can't run on ARM processors like raspberry pi
-
Krume
i see!
-
Krume
i have a PC too =)
-
appledash
...why can't it?
-
nicolas17
because nobody took the work to extensively test if wget-at produces valid results
-
Krume
maybe because of some instructions that ARM cannot do?
-
JAA
SSDs get a decent amount of wear, but you can use a tmpfs to do it in RAM instead on most projects. I'm not sure this has been documented on the wiki yet, but quite a few people are using it.
-
Krume
JAA: i can try to use the tmpfs idea with a lot of google
-
Krume
i still have regular hdds
-
JAA
Someone can give you the right options for Docker for sure. I don't have them handy right now.
-
Krume
ah that could be really nice, thanks JAA
-
Krume
do i need to be an "expert" to test wget -at on my raspberry pi?
-
Krume
i have a 4B model
-
Krume
and i use DietPi
-
JAA
Please don't attempt to do it.
-
imer
Krume: "--mount type=tmpfs,tmpfs-size=1G,destination=/grab/data" (1G being a safety limit, most projects use very little)
-
Krume
okay!
-
JAA
Yes, it requires a lot of knowledge about how WARCs and zstd work.
-
appledash
Why would a program like that produce inconsistent results based on CPU architecture?
-
Krume
i don't know zstd and WARC was something familiar which i don't remember right now
-
Krume
thank you imer
-
JAA
appledash: Because it's very easy to write C code that depends on undefined behaviour, and it was written and tested on x86.
-
Krume
i will copy it for now, since i only have windows installed and i need to reinstall linux again
-
» appledash scared