#archiveteam-bs

01:00

h2ibot

JAABot edited CurrentWarriorProject (-2): wiki.archiveteam.org/?diff=51218&oldid=51178
02:05

nulldata

nvsgames.com - Nuverse, the game publishing arm of ByteDance, is being shutdown by on Monday. bloomberg.com/news/articles/2023-11…g-arm-in-business-retreat#xj4y7vzkg
04:02

pabs

what is a good way to archive to the WBM a URL that responds eventually but too slowly for AB/SPN timeouts? debtags.debian.org/reports/recent
04:05

JAA

Wow, that's slow...
04:11

JAA

pabs: I've grabbed it with grab-site. It'll be in the WBM whenever I upload my backlog.
04:11

pabs

thanks!
04:35

nicolas17

JAA: did you upload a functional grab-site yet? :p
04:35

nicolas17

before our president-elect changes his website
04:41

JAA

nicolas17: Right, no, I got distracted while waiting for information on how to push.
04:53

JAA

Ok, I think it's working.
04:54

JAA

(Fucking Docker requiring storing credentials on disk for a single push...)
04:57

JAA

nicolas17: atdr.meo.ws/justanotherarchivist/grab-site-docker:20220509-g398726f7
04:57

JAA

(But fixing the build is still on my todo list.)
05:19

nicolas17

JAA: transfer.archivete.am/LX1m4/milei.tar.zst
05:20

nicolas17

I assume the warcs and cdx are enough, but I tar'd up the whole output directory
05:20

nicolas17

there... isn't actually any content in there, it's just images and empty slogans :P
05:21

JAA

<surprised_pikachu.png>
05:21

JAA

Thanks, and yeah, I usually also keep everything and put all but the WARCs into a tar.
06:25

h2ibot

Tech234a edited Google Plus Comments on Blogspot (+333, Add note about submission of discoveries to…): wiki.archiveteam.org/?diff=51219&oldid=47913
06:30

h2ibot

Tech234a edited Google Plus Comments on Blogspot (+84, Search tool broke, add a few more details): wiki.archiveteam.org/?diff=51220&oldid=51219
06:32

tech234a

^ IA doesn't seem to serve HTML in items with the correct Content-Type headers any more, probably anti-spam? It broke the search tool for the Google Plus Comments on Blogspot project though
06:34

h2ibot

Tech234a edited Google Plus Comments on Blogspot (+0, Correction for rescued items): wiki.archiveteam.org/?diff=51221&oldid=51220
06:37

JAA

Cc arkiver ^
09:09

that_lurker

-+rss- Marc Thorpe, Robot Wars founder, has died: marcthorpe.com/about news.ycombinator.com/item?id=38424390
09:29

Mannie

I have this idea for the archivebot project. One of the goals of the project is to archive bankrupt company's. I have submitted a lot of those cases by looking them up in the insolvencies registor of the court. I go to the registor and just copy trade names and google them and post them to be archived. Now I had this idea to be automate it. This would be done by looking ones a day to the company of that day and 'google' the
09:29

Mannie

company's and archive the website.
10:24

pabs

Mannie: Google is hard to automate, you get lot of captchas. also subdomain archiving and related resource archiving is quite manual right now
10:36

Mannie

pabs that was the problem that is also struiglled with. Here do we get the website from (automated). They are not in the court registor or in the trade register (msot times).
10:40

Mannie

Is there not a way like whois to get the domains that are owned by a particialer comapany?
10:40

Mannie

We have the name from the court files so we only need to have the domain that they own.
10:42

pabs

in Australia the business registry doesn't record website domains
10:43

murb

I've noticed from mapping small businsses may of them let their domains expire and just use a facebook page.
10:43

murb

(without hiring new sign writters.)
10:43

Mannie

In the Netherlands and Luxembourg also not. That is the problem that we have. We have all the details but not the domain
10:56

pabs

there is also social media to think about, at least YouTube can be archived. twitter used to be able to
10:57

pabs

a related problem is archiving sites around a natural disaster (fire/flood etc). thats even harder, 1) to get the affected area 2) find websites related to things in that area
10:58

pabs

for 2 I have been manually perusing Google Maps sometimes, but that is very manual and tedious
11:00

murb

pabs: or datacentres going up in smoke...
11:01

pabs

yeah, that is likely an impossible problem, since you can't even get a list of IPs in a datacentre
11:12

Mannie

pabs for problem 2 can be partially automated to look up all bussiness in that area in the trade register and from the openstreetmap/google maps api. with that list you can start searching. If the company is like a theatre or café you can use a script to look on review sites like tripadvicer and yelp to see if there is a website. Whats over needs to be search manualy
11:13

Mannie

There is still the part of domains like blogs that are part of hobby.
11:18

Mannie

pabs I have find this free company to domain site: autocomplete.clearbit.com
11:19

Mannie

this is the link for tello.com: autocomplete.clearbit.com
15:15

h2ibot

Switchnode edited Deathwatch (+189, /* 2024 */ add hardware.info): wiki.archiveteam.org/?diff=51222&oldid=51175
17:25

arkiver

tech234a: correct, anti-spam
17:26

arkiver

but if i remember correctly, we may be able to move it to some collection
17:26

arkiver

the item that is
18:00

h2ibot

JAABot edited CurrentWarriorProject (+2): wiki.archiveteam.org/?diff=51223&oldid=51218
19:29

fireonlive

“ I am reaching out to inform you that over the next few months we will be migrating our email addresses from @zoom.us to @zoom.com. You may receive emails from both domains, and I want to assure you that emails from these two domains are legitimate.”
19:29

fireonlive

“In the coming months, you’ll also notice our website and other collateral change to zoom.com.”
19:30

fireonlive

¯\_(ツ)_/¯ probably just a 1:1 shift of literally everything but who knows. thought i’d shart it out here for y’all

10 months ago

« a day earlier

a day later »

today »