00:01:27 <Terbium> I see a bunch of free and paid APIs for M&A feeds
00:10:15 <Terbium> https://site.financialmodelingprep.com/developer/docs/merger-and-acquisition-api
00:10:51 <fireonlive> hmm
00:10:59 <fireonlive> if there’s a good rss feed i could hook it up to rss
00:12:06 <Terbium> there's this: https://seekingalpha.com/market-news/m-a
00:13:01 <Terbium> There's an RSS feed
00:15:22 <fireonlive> this seems to be the url for the feed: https://seekingalpha.com/tag/m-a.xml
00:16:10 <fireonlive> i’m sitting in a vehicle on my phone so hard to tell for sure haha
00:16:57 <Terbium> fireonlive: yep that's the one, it has the stock ticker symbols like the other FMP feed
00:17:05 <fireonlive> ah awesome :)
00:17:09 <Terbium> which makes finding companies a lot easier
00:17:30 <Terbium> it also showed failed or cancelled M&As as well
00:18:10 <fireonlive> i’ll toss it up in #m&a if that suits everyone when i’m back at a more proper computer later; just out and about with a friend who’s visiting for the first time in a while
01:18:21 <qwertyasdfuiopghjkl> https://www.thewrap.com/gannett-drops-ap-associated-press-usa-today/ "Gannett, publisher of USA Today and hundreds of local newspapers, will stop using the Associated Press’ content starting next week, [...] will eliminate AP dispatches, photos and video as of March 25, according to an internal memo"
01:19:07 <qwertyasdfuiopghjkl> Not sure if this means removal of existing content or just discontinuing new content
01:27:17 <qwertyasdfuiopghjkl> https://apnews.com/article/gannett-associated-press-contract-97405e4715c9a25d21477b992028db2a "Shortly after, AP said it had been informed by McClatchy that it would also drop the service." https://www.nytimes.com/2024/03/19/business/media/gannett-mcclatchy-ap-associated-press.html "McClatchy [...] told its editors this week that it would stop
01:27:18 <qwertyasdfuiopghjkl> using some A.P. services next month." "[McClatchy] said that The A.P.’s feed would end on March 29 and that no A.P. content could be published after March 31." apparently there's also another one
02:29:50 <fireonlive> #m&a is now setup, we should see if it works within the hour :3
02:32:28 <fireonlive> Terbium++
02:32:29 -eggdrop- [karma] 'Terbium' now has 2 karma!
09:31:36 <newbie007> is it possible to upload locally archived websites to internet archive such that they are searchable using wayback machine?
09:32:30 <pabs> that isn't possible
09:53:43 <arkiver> RIP original redis
14:36:52 <ikkoup> Hi,
14:36:53 <ikkoup> Would you be interested in archiving the biggest (and only) Arabic archive of literary magazines? Its owner died last week and it's at risk of dying at anytime.
14:36:53 <ikkoup> https://archive.alsharekh.org
14:39:38 <ikkoup> the site also has a sitemap (https://archive.alsharekh.org/sitemap.xml) which would help ramp things up!
14:39:42 <pokechu22> Hmm, the stats are 2 million pages, 326,446 articles, 52,234 writers, 273 magazines, 15,857 issues. It looks like images are directly embedded (view-source:https://archive.alsharekh.org/Articles/293/20679/470610 has <img _ngcontent-sc1 class="slide_image" src="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg"
14:39:44 <pokechu22> data-normal="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg" data-full="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg"> + <base href="/">) and archivebot extracts those correctly, and the server doesn't mind the backslashes not being replaced by the browser with forward slashes
14:41:10 <ikkoup> Yes, it uses "flipbuilder.com" (PDF Page Flipper) to make the reading pages.
14:41:11 <ikkoup> Don't know if you encountered that before. sorry for my weak language.
14:43:49 <pokechu22> I think archivebot will work here - 2 million URLs is a bit large, but we've done bigger. Do you know if it's at risk of shutting down in a few weeks, or if it'll probably be up for months?
14:46:16 <pokechu22> hmm, https://archive.alsharekh.org/contents/293/20679 requires a bunch of API requests to e.g. https://archiveapi.alsharekh.org/Search/IssueIndex?IID=20679 actually; archivebot probably won't follow those
14:47:13 <ikkoup> Hmm, not sure.
14:47:14 <ikkoup> The owner was the pioneer or Arabic language in the early days of computers and he (and his company at the time) added Arabic support for almost every OS/software at the time.
14:47:14 <ikkoup> The company isn't very active these days and he stepped down from it. I guess it'd be up for a few months considering his finances and tech background?
14:47:36 <pokechu22> ... though https://archive.alsharekh.org/sitemap10.xml links to articles, so it *would* find all of the articles, but the table of contents would not work unless we did that separately (which would not be *too* hard)
14:48:23 <ikkoup> Not sure if its possible, but can you ignore the API requests?
14:48:24 <ikkoup> It's for info about individual articles which is not as important as the whole issue/chapter/magazine (https://archive.alsharekh.org/MagazinePages/MagazineBook/~xxx)
14:49:26 <ikkoup> The important stuff is at the above url structure, the API acts like an index for the issue (article 1 is at page 3, article 2 is at page 6 etc)
14:52:43 <pokechu22> Hmm, http://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Maarefa/Al_Maarefa_2020/Issue_681/index.html doesn't have any URLs archivebot would find in it... that flipbook won't work well with it
14:53:45 <pokechu22> it looks like https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Maarefa/Al_Maarefa_2020/Issue_681/mobile/javascript/config.js has bookConfig.totalPageCount=337 and bookConfig.CreatedTime ="201204132846"
14:54:48 <ikkoup> If you check dev inspection (ctrl shift i) then you can see that the flipbook is just a bunch of images and js.
14:54:48 <ikkoup> I guess it's not possible after all eh?
14:55:36 <pokechu22> It would be possible, but it would require additional work to make the flipbooks function
14:56:23 <pokechu22> https://archive.alsharekh.org/Articles/293/20679/470610 links the images directly though so that would work. Do all magazines have both flipbooks and those /Articles/ pages?
14:59:53 <pokechu22> https://archive.alsharekh.org/Articles/293/20679/470610 has a blue "تصفح العدد" button that opens https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Shariqa/Al_Shariqa_2017/Issue_3/index.html so it seems like flipbooks do exist for everything... but I can't see where that link comes from
15:01:50 <pokechu22> ... and the flipbook uses https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Shariqa/Al_Shariqa_2017/Issue_3/files/mobile/1.jpg?201204132846 while the /Articles/ page uses https://archive.alsharekh.org/MagazinePages/Magazine_JPG/Al_Shariqa/Al_Shariqa_2017/Issue_3/001.jpg (better quality).
15:01:54 <ikkoup> The whole thing is basically a giant flip book :(
15:01:54 <ikkoup> And not very sure about articles page, but it exists for most of it (unindexed issues have no articles, only flipbook)
15:03:35 <pokechu22> I'll start it in archivebot just to get *something*, and hopefully a solution for the flipbooks can be found afterwards
15:03:45 <pokechu22> Thanks for letting us know about the site, we probably wouldn't have found it otherwise :)
15:05:18 <pokechu22> I assume the rest of alsharekh.org should also be saved?
15:06:25 <arkiver> thank you ikkoup!
15:07:33 <arkiver> yeah it might be interesting to save everything on that site
15:07:44 <arkiver> at least into WARCs, perhaps separate items on IA as well
15:10:01 <ikkoup> Not really, alsharekh.org is landing page for other services run by the same guy.
15:10:01 <ikkoup> a Lexicon, Dictionary (acquired by Saudi government), Tashkeel (vowel movement corrector) and a spell checker. I guess they can't be saved.
15:11:57 <ikkoup> I also tried to setup grab-site (https://github.com/ArchiveTeam/grab-site) on a vps to help crawling the archive, but had some troubles with python 3.8 not being supported.
15:22:53 <Terbium> ikkoup: I would recommend using a container or Python version manager for grab-site in that case to drop back down to Python 3.7
15:28:39 <pokechu22> That said, archivebot isn't a distributed project - running grab-site locally would mean you grab the entire site yourself, and additional archivebot grabs the entire site by itself. It won't make things run faster.
15:32:12 <ikkoup> Ah, I thought it was something like the archivewarrior.
15:32:12 <ikkoup> I wanted to run grab-site since it has some advanced crawling/scraping capabilities for forums like vBulletin and SMF which are not found in other crawling/scarping tools I looked up.
16:09:40 <arkiver> i realise i don't know much about storj
16:10:01 <arkiver> is it just private storage only for files to be made available from elsewhere, page requisites and such?
16:16:41 <kiska> I think you can use storj as S3
16:26:57 <kiska> Which I guess means you could have some site assets on storj being served
16:27:00 <kiska> Or something like that
16:28:08 <arkiver> right
16:30:07 <kpcyrd> is there a channel for archiving #web3?
16:33:52 <arkiver> archiving web3?
16:34:12 <arkiver> so like... archiving blockchains?
16:35:01 <FireFly> I thought part of the point was that it's kind of implicitly so already due to its distributed nature
16:37:41 <arkiver> that's not archiving
16:40:26 <FireFly> ..fair
17:30:22 <kpcyrd> the question was tongue in cheek, I probably should've made that more obvious :)
17:50:52 <h2ibot> Censuro edited Talk:URLTeam (+983, /* Shouldn't archive.today be considered a URL…): https://wiki.archiveteam.org/?diff=51913&oldid=26103
17:50:53 <h2ibot> Popthebop edited Talk:Deathwatch (+423, /* the Tom Lehrer website containing original…): https://wiki.archiveteam.org/?diff=51914&oldid=51350
17:50:54 <h2ibot> Popthebop edited Talk:Tumblr (+1278, /* Current state of tumblr | IMPORTANT  */ new…): https://wiki.archiveteam.org/?diff=51915&oldid=45705
17:50:55 <h2ibot> Sepro edited List of websites excluded from the Wayback Machine (+24, Add loom.com): https://wiki.archiveteam.org/?diff=51916&oldid=51896
17:50:56 <h2ibot> Flama12333 edited Deathwatch (+167, added realtek ftp sadly): https://wiki.archiveteam.org/?diff=51917&oldid=51901
18:00:54 <h2ibot> JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=51918&oldid=51916
19:13:07 <h2ibot> JacksonChen666 edited Deathwatch (+3, fix citation errors): https://wiki.archiveteam.org/?diff=51919&oldid=51917
19:55:43 <michaelblob> how are people doing log agg? looking into grafana loki but getting piss poor performance generating graphs
19:56:03 <michaelblob> also eyeing influxdb but now sure how/where that fits in
19:56:40 <Barto> work use an ELK stack
20:22:59 <nstrom|m> Just using dozzle on individual servers, no agg
21:22:44 <pabs> arkiver, kpcyrd: I wonder if Web3 is as distributed as advertised? relatedly NFTs certainly aren't, lots of them apparently just load stuff off HTTP
21:28:26 <nicolas17> lmk when there's anything of value worth archiving, too
21:33:18 <AK> I did ELK, but then it was approaching hundreds of GB of logs per day, now I just use dozzle everywhere 🤷‍♂️ At work we use Azure stuff and grafana if we need graphs
21:34:04 <AK> dozzle does everything I need for almost all my personal stuff: https://logs.hel1.aktheknight.co.uk/
23:54:11 <icedice> JAA if you haven't gotten The PokéCommunity completely archived by now, you might want to put it high up on the priority list. A Pokémon fan game website was just shut down by DMCA: https://twitter.com/RelicCastleCom/status/1770901435867361351
23:54:57 <icedice> The PokéCommunity has probably the largest Pokémon fan game communities out there and they had four games C&D'd a while ago, so the ninja lawyers are well aware that they exist
23:57:10 <Terbium> why they gotta do my PokeCommunity like that....
23:57:36 <pokechu22> I think we last did it 10 months ago: https://archive.fart.website/archivebot/viewer/job/202305131413054huog
23:58:30 <nulldata> Terbium - because Nintendo loathes its fans.
23:59:27 <Terbium> Also, they really should have hosted the site in a DMCA ignored location. After so many DMCA's over the decades, it seems like this lesson is never learned