01:00:37 JAABot edited CurrentWarriorProject (-2): https://wiki.archiveteam.org/?diff=51218&oldid=51178 02:05:08 nvsgames.com - Nuverse, the game publishing arm of ByteDance, is being shutdown by on Monday. https://www.bloomberg.com/news/articles/2023-11-27/bytedance-is-said-to-shut-main-gaming-arm-in-business-retreat#xj4y7vzkg 04:02:10 what is a good way to archive to the WBM a URL that responds eventually but too slowly for AB/SPN timeouts? https://debtags.debian.org/reports/recent/ 04:05:56 Wow, that's slow... 04:11:17 pabs: I've grabbed it with grab-site. It'll be in the WBM whenever I upload my backlog. 04:11:49 thanks! 04:35:22 JAA: did you upload a functional grab-site yet? :p 04:35:34 before our president-elect changes his website 04:41:57 nicolas17: Right, no, I got distracted while waiting for information on how to push. 04:53:07 Ok, I think it's working. 04:54:27 (Fucking Docker requiring storing credentials on disk for a single push...) 04:57:20 nicolas17: atdr.meo.ws/justanotherarchivist/grab-site-docker:20220509-g398726f7 04:57:59 (But fixing the build is still on my todo list.) 05:19:43 JAA: https://transfer.archivete.am/LX1m4/milei.tar.zst 05:20:06 I assume the warcs and cdx are enough, but I tar'd up the whole output directory 05:20:31 there... isn't actually any content in there, it's just images and empty slogans :P 05:21:07 05:21:59 Thanks, and yeah, I usually also keep everything and put all but the WARCs into a tar. 06:25:38 Tech234a edited Google Plus Comments on Blogspot (+333, Add note about submission of discoveries to…): https://wiki.archiveteam.org/?diff=51219&oldid=47913 06:30:38 Tech234a edited Google Plus Comments on Blogspot (+84, Search tool broke, add a few more details): https://wiki.archiveteam.org/?diff=51220&oldid=51219 06:32:12 ^ IA doesn't seem to serve HTML in items with the correct Content-Type headers any more, probably anti-spam? It broke the search tool for the Google Plus Comments on Blogspot project though 06:34:38 Tech234a edited Google Plus Comments on Blogspot (+0, Correction for rescued items): https://wiki.archiveteam.org/?diff=51221&oldid=51220 06:37:07 Cc arkiver ^ 09:09:05 -+rss- Marc Thorpe, Robot Wars founder, has died: https://marcthorpe.com/about/ https://news.ycombinator.com/item?id=38424390 09:29:12 I have this idea for the archivebot project. One of the goals of the project is to archive bankrupt company's. I have submitted a lot of those cases by looking them up in the insolvencies registor of the court. I go to the registor and just copy trade names and google them and post them to be archived. Now I had this idea to be automate it. This would be done by looking ones a day to the company of that day and 'google' the 09:29:12 company's and archive the website. 10:24:44 Mannie: Google is hard to automate, you get lot of captchas. also subdomain archiving and related resource archiving is quite manual right now 10:36:23 pabs that was the problem that is also struiglled with. Here do we get the website from (automated). They are not in the court registor or in the trade register (msot times). 10:40:08 Is there not a way like whois to get the domains that are owned by a particialer comapany? 10:40:55 We have the name from the court files so we only need to have the domain that they own. 10:42:29 in Australia the business registry doesn't record website domains 10:43:15 I've noticed from mapping small businsses may of them let their domains expire and just use a facebook page. 10:43:33 (without hiring new sign writters.) 10:43:35 In the Netherlands and Luxembourg also not. That is the problem that we have. We have all the details but not the domain 10:56:00 there is also social media to think about, at least YouTube can be archived. twitter used to be able to 10:57:37 a related problem is archiving sites around a natural disaster (fire/flood etc). thats even harder, 1) to get the affected area 2) find websites related to things in that area 10:58:06 for 2 I have been manually perusing Google Maps sometimes, but that is very manual and tedious 11:00:28 pabs: or datacentres going up in smoke... 11:01:18 yeah, that is likely an impossible problem, since you can't even get a list of IPs in a datacentre 11:12:44 pabs for problem 2 can be partially automated to look up all bussiness in that area in the trade register and from the openstreetmap/google maps api. with that list you can start searching. If the company is like a theatre or café you can use a script to look on review sites like tripadvicer and yelp to see if there is a website. Whats over needs to be search manualy 11:13:54 There is still the part of domains like blogs that are part of hobby. 11:18:38 pabs I have find this free company to domain site: autocomplete.clearbit.com 11:19:00 this is the link for tello.com: autocomplete.clearbit.com 15:15:12 Switchnode edited Deathwatch (+189, /* 2024 */ add hardware.info): https://wiki.archiveteam.org/?diff=51222&oldid=51175 17:25:54 tech234a: correct, anti-spam 17:26:12 but if i remember correctly, we may be able to move it to some collection 17:26:16 the item that is 18:00:44 JAABot edited CurrentWarriorProject (+2): https://wiki.archiveteam.org/?diff=51223&oldid=51218 19:29:28 “ I am reaching out to inform you that over the next few months we will be migrating our email addresses from @zoom.us to @zoom.com. You may receive emails from both domains, and I want to assure you that emails from these two domains are legitimate.” 19:29:42 “In the coming months, you’ll also notice our website and other collateral change to zoom.com.” 19:30:33 ¯\_(ツ)_/¯ probably just a 1:1 shift of literally everything but who knows. thought i’d shart it out here for y’all