00:01:27 I see a bunch of free and paid APIs for M&A feeds 00:10:15 https://site.financialmodelingprep.com/developer/docs/merger-and-acquisition-api 00:10:51 hmm 00:10:59 if there’s a good rss feed i could hook it up to rss 00:12:06 there's this: https://seekingalpha.com/market-news/m-a 00:13:01 There's an RSS feed 00:15:22 this seems to be the url for the feed: https://seekingalpha.com/tag/m-a.xml 00:16:10 i’m sitting in a vehicle on my phone so hard to tell for sure haha 00:16:57 fireonlive: yep that's the one, it has the stock ticker symbols like the other FMP feed 00:17:05 ah awesome :) 00:17:09 which makes finding companies a lot easier 00:17:30 it also showed failed or cancelled M&As as well 00:18:10 i’ll toss it up in #m&a if that suits everyone when i’m back at a more proper computer later; just out and about with a friend who’s visiting for the first time in a while 01:18:21 https://www.thewrap.com/gannett-drops-ap-associated-press-usa-today/ "Gannett, publisher of USA Today and hundreds of local newspapers, will stop using the Associated Press’ content starting next week, [...] will eliminate AP dispatches, photos and video as of March 25, according to an internal memo" 01:19:07 Not sure if this means removal of existing content or just discontinuing new content 01:27:17 https://apnews.com/article/gannett-associated-press-contract-97405e4715c9a25d21477b992028db2a "Shortly after, AP said it had been informed by McClatchy that it would also drop the service." https://www.nytimes.com/2024/03/19/business/media/gannett-mcclatchy-ap-associated-press.html "McClatchy [...] told its editors this week that it would stop 01:27:18 using some A.P. services next month." "[McClatchy] said that The A.P.’s feed would end on March 29 and that no A.P. content could be published after March 31." apparently there's also another one 02:29:50 #m&a is now setup, we should see if it works within the hour :3 02:32:28 Terbium++ 02:32:29 -eggdrop- [karma] 'Terbium' now has 2 karma! 09:31:36 is it possible to upload locally archived websites to internet archive such that they are searchable using wayback machine? 09:32:30 that isn't possible 09:53:43 RIP original redis 14:36:52 Hi, 14:36:53 Would you be interested in archiving the biggest (and only) Arabic archive of literary magazines? Its owner died last week and it's at risk of dying at anytime. 14:36:53 https://archive.alsharekh.org 14:39:38 the site also has a sitemap (https://archive.alsharekh.org/sitemap.xml) which would help ramp things up! 14:39:42 Hmm, the stats are 2 million pages, 326,446 articles, 52,234 writers, 273 magazines, 15,857 issues. It looks like images are directly embedded (view-source:https://archive.alsharekh.org/Articles/293/20679/470610 has data-normal="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg" data-full="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg"> + ) and archivebot extracts those correctly, and the server doesn't mind the backslashes not being replaced by the browser with forward slashes 14:41:10 Yes, it uses "flipbuilder.com" (PDF Page Flipper) to make the reading pages. 14:41:11 Don't know if you encountered that before. sorry for my weak language. 14:43:49 I think archivebot will work here - 2 million URLs is a bit large, but we've done bigger. Do you know if it's at risk of shutting down in a few weeks, or if it'll probably be up for months? 14:46:16 hmm, https://archive.alsharekh.org/contents/293/20679 requires a bunch of API requests to e.g. https://archiveapi.alsharekh.org/Search/IssueIndex?IID=20679 actually; archivebot probably won't follow those 14:47:13 Hmm, not sure. 14:47:14 The owner was the pioneer or Arabic language in the early days of computers and he (and his company at the time) added Arabic support for almost every OS/software at the time. 14:47:14 The company isn't very active these days and he stepped down from it. I guess it'd be up for a few months considering his finances and tech background? 14:47:36 ... though https://archive.alsharekh.org/sitemap10.xml links to articles, so it *would* find all of the articles, but the table of contents would not work unless we did that separately (which would not be *too* hard) 14:48:23 Not sure if its possible, but can you ignore the API requests? 14:48:24 It's for info about individual articles which is not as important as the whole issue/chapter/magazine (https://archive.alsharekh.org/MagazinePages/MagazineBook/~xxx) 14:49:26 The important stuff is at the above url structure, the API acts like an index for the issue (article 1 is at page 3, article 2 is at page 6 etc) 14:52:43 Hmm, http://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Maarefa/Al_Maarefa_2020/Issue_681/index.html doesn't have any URLs archivebot would find in it... that flipbook won't work well with it 14:53:45 it looks like https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Maarefa/Al_Maarefa_2020/Issue_681/mobile/javascript/config.js has bookConfig.totalPageCount=337 and bookConfig.CreatedTime ="201204132846" 14:54:48 If you check dev inspection (ctrl shift i) then you can see that the flipbook is just a bunch of images and js. 14:54:48 I guess it's not possible after all eh? 14:55:36 It would be possible, but it would require additional work to make the flipbooks function 14:56:23 https://archive.alsharekh.org/Articles/293/20679/470610 links the images directly though so that would work. Do all magazines have both flipbooks and those /Articles/ pages? 14:59:53 https://archive.alsharekh.org/Articles/293/20679/470610 has a blue "تصفح العدد" button that opens https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Shariqa/Al_Shariqa_2017/Issue_3/index.html so it seems like flipbooks do exist for everything... but I can't see where that link comes from 15:01:50 ... and the flipbook uses https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Shariqa/Al_Shariqa_2017/Issue_3/files/mobile/1.jpg?201204132846 while the /Articles/ page uses https://archive.alsharekh.org/MagazinePages/Magazine_JPG/Al_Shariqa/Al_Shariqa_2017/Issue_3/001.jpg (better quality). 15:01:54 The whole thing is basically a giant flip book :( 15:01:54 And not very sure about articles page, but it exists for most of it (unindexed issues have no articles, only flipbook) 15:03:35 I'll start it in archivebot just to get *something*, and hopefully a solution for the flipbooks can be found afterwards 15:03:45 Thanks for letting us know about the site, we probably wouldn't have found it otherwise :) 15:05:18 I assume the rest of alsharekh.org should also be saved? 15:06:25 thank you ikkoup! 15:07:33 yeah it might be interesting to save everything on that site 15:07:44 at least into WARCs, perhaps separate items on IA as well 15:10:01 Not really, alsharekh.org is landing page for other services run by the same guy. 15:10:01 a Lexicon, Dictionary (acquired by Saudi government), Tashkeel (vowel movement corrector) and a spell checker. I guess they can't be saved. 15:11:57 I also tried to setup grab-site (https://github.com/ArchiveTeam/grab-site) on a vps to help crawling the archive, but had some troubles with python 3.8 not being supported. 15:22:53 ikkoup: I would recommend using a container or Python version manager for grab-site in that case to drop back down to Python 3.7 15:28:39 That said, archivebot isn't a distributed project - running grab-site locally would mean you grab the entire site yourself, and additional archivebot grabs the entire site by itself. It won't make things run faster. 15:32:12 Ah, I thought it was something like the archivewarrior. 15:32:12 I wanted to run grab-site since it has some advanced crawling/scraping capabilities for forums like vBulletin and SMF which are not found in other crawling/scarping tools I looked up. 16:09:40 i realise i don't know much about storj 16:10:01 is it just private storage only for files to be made available from elsewhere, page requisites and such? 16:16:41 I think you can use storj as S3 16:26:57 Which I guess means you could have some site assets on storj being served 16:27:00 Or something like that 16:28:08 right 16:30:07 is there a channel for archiving #web3? 16:33:52 archiving web3? 16:34:12 so like... archiving blockchains? 16:35:01 I thought part of the point was that it's kind of implicitly so already due to its distributed nature 16:37:41 that's not archiving 16:40:26 ..fair 17:30:22 the question was tongue in cheek, I probably should've made that more obvious :) 17:50:52 Censuro edited Talk:URLTeam (+983, /* Shouldn't archive.today be considered a URL…): https://wiki.archiveteam.org/?diff=51913&oldid=26103 17:50:53 Popthebop edited Talk:Deathwatch (+423, /* the Tom Lehrer website containing original…): https://wiki.archiveteam.org/?diff=51914&oldid=51350 17:50:54 Popthebop edited Talk:Tumblr (+1278, /* Current state of tumblr | IMPORTANT */ new…): https://wiki.archiveteam.org/?diff=51915&oldid=45705 17:50:55 Sepro edited List of websites excluded from the Wayback Machine (+24, Add loom.com): https://wiki.archiveteam.org/?diff=51916&oldid=51896 17:50:56 Flama12333 edited Deathwatch (+167, added realtek ftp sadly): https://wiki.archiveteam.org/?diff=51917&oldid=51901 18:00:54 JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=51918&oldid=51916 19:13:07 JacksonChen666 edited Deathwatch (+3, fix citation errors): https://wiki.archiveteam.org/?diff=51919&oldid=51917 19:55:43 how are people doing log agg? looking into grafana loki but getting piss poor performance generating graphs 19:56:03 also eyeing influxdb but now sure how/where that fits in 19:56:40 work use an ELK stack 20:22:59 Just using dozzle on individual servers, no agg 21:22:44 arkiver, kpcyrd: I wonder if Web3 is as distributed as advertised? relatedly NFTs certainly aren't, lots of them apparently just load stuff off HTTP 21:28:26 lmk when there's anything of value worth archiving, too 21:33:18 I did ELK, but then it was approaching hundreds of GB of logs per day, now I just use dozzle everywhere 🤷‍♂️ At work we use Azure stuff and grafana if we need graphs 21:34:04 dozzle does everything I need for almost all my personal stuff: https://logs.hel1.aktheknight.co.uk/ 23:54:11 JAA if you haven't gotten The PokéCommunity completely archived by now, you might want to put it high up on the priority list. A Pokémon fan game website was just shut down by DMCA: https://twitter.com/RelicCastleCom/status/1770901435867361351 23:54:57 The PokéCommunity has probably the largest Pokémon fan game communities out there and they had four games C&D'd a while ago, so the ninja lawyers are well aware that they exist 23:57:10 why they gotta do my PokeCommunity like that.... 23:57:36 I think we last did it 10 months ago: https://archive.fart.website/archivebot/viewer/job/202305131413054huog 23:58:30 Terbium - because Nintendo loathes its fans. 23:59:27 Also, they really should have hosted the site in a DMCA ignored location. After so many DMCA's over the decades, it seems like this lesson is never learned