-
Terbium
I see a bunch of free and paid APIs for M&A feeds
-
Terbium
-
fireonlive
hmm
-
fireonlive
if there’s a good rss feed i could hook it up to rss
-
Terbium
-
Terbium
There's an RSS feed
-
fireonlive
this seems to be the url for the feed:
seekingalpha.com/tag/m-a.xml
-
fireonlive
i’m sitting in a vehicle on my phone so hard to tell for sure haha
-
Terbium
fireonlive: yep that's the one, it has the stock ticker symbols like the other FMP feed
-
fireonlive
ah awesome :)
-
Terbium
which makes finding companies a lot easier
-
Terbium
it also showed failed or cancelled M&As as well
-
fireonlive
i’ll toss it up in #m&a if that suits everyone when i’m back at a more proper computer later; just out and about with a friend who’s visiting for the first time in a while
-
qwertyasdfuiopghjkl
thewrap.com/gannett-drops-ap-associated-press-usa-today "Gannett, publisher of USA Today and hundreds of local newspapers, will stop using the Associated Press’ content starting next week, [...] will eliminate AP dispatches, photos and video as of March 25, according to an internal memo"
-
qwertyasdfuiopghjkl
Not sure if this means removal of existing content or just discontinuing new content
-
qwertyasdfuiopghjkl
apnews.com/article/gannett-associat…ct-97405e4715c9a25d21477b992028db2a "Shortly after, AP said it had been informed by McClatchy that it would also drop the service."
nytimes.com/2024/03/19/business/med…-mcclatchy-ap-associated-press.html "McClatchy [...] told its editors this week that it would stop
-
qwertyasdfuiopghjkl
using some A.P. services next month." "[McClatchy] said that The A.P.’s feed would end on March 29 and that no A.P. content could be published after March 31." apparently there's also another one
-
fireonlive
#m&a is now setup, we should see if it works within the hour :3
-
fireonlive
Terbium++
-
eggdrop
[karma] 'Terbium' now has 2 karma!
-
newbie007
is it possible to upload locally archived websites to internet archive such that they are searchable using wayback machine?
-
pabs
that isn't possible
-
arkiver
RIP original redis
-
ikkoup
Hi,
-
ikkoup
Would you be interested in archiving the biggest (and only) Arabic archive of literary magazines? Its owner died last week and it's at risk of dying at anytime.
-
ikkoup
-
ikkoup
the site also has a sitemap (
archive.alsharekh.org/sitemap.xml) which would help ramp things up!
-
pokechu22
Hmm, the stats are 2 million pages, 326,446 articles, 52,234 writers, 273 magazines, 15,857 issues. It looks like images are directly embedded (view-source:https://archive.alsharekh.org/Articles/293/20679/470610 has <img _ngcontent-sc1 class="slide_image" src="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg"
-
pokechu22
data-normal="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg" data-full="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg"> + <base href="/">) and archivebot extracts those correctly, and the server doesn't mind the backslashes not being replaced by the browser with forward slashes
-
ikkoup
Yes, it uses "flipbuilder.com" (PDF Page Flipper) to make the reading pages.
-
ikkoup
Don't know if you encountered that before. sorry for my weak language.
-
pokechu22
I think archivebot will work here - 2 million URLs is a bit large, but we've done bigger. Do you know if it's at risk of shutting down in a few weeks, or if it'll probably be up for months?
-
pokechu22
hmm,
archive.alsharekh.org/contents/293/20679 requires a bunch of API requests to e.g.
archiveapi.alsharekh.org/Search/IssueIndex?IID=20679 actually; archivebot probably won't follow those
-
ikkoup
Hmm, not sure.
-
ikkoup
The owner was the pioneer or Arabic language in the early days of computers and he (and his company at the time) added Arabic support for almost every OS/software at the time.
-
ikkoup
The company isn't very active these days and he stepped down from it. I guess it'd be up for a few months considering his finances and tech background?
-
pokechu22
... though
archive.alsharekh.org/sitemap10.xml links to articles, so it *would* find all of the articles, but the table of contents would not work unless we did that separately (which would not be *too* hard)
-
ikkoup
Not sure if its possible, but can you ignore the API requests?
-
ikkoup
It's for info about individual articles which is not as important as the whole issue/chapter/magazine (
archive.alsharekh.org/MagazinePages/MagazineBook/~xxx)
-
ikkoup
The important stuff is at the above url structure, the API acts like an index for the issue (article 1 is at page 3, article 2 is at page 6 etc)
-
pokechu22
Hmm,
archive.alsharekh.org/MagazinePages…l_Maarefa_2020/Issue_681/index.html doesn't have any URLs archivebot would find in it... that flipbook won't work well with it
-
pokechu22
it looks like
archive.alsharekh.org/MagazinePages…sue_681/mobile/javascript/config.js has bookConfig.totalPageCount=337 and bookConfig.CreatedTime ="201204132846"
-
ikkoup
If you check dev inspection (ctrl shift i) then you can see that the flipbook is just a bunch of images and js.
-
ikkoup
I guess it's not possible after all eh?
-
pokechu22
It would be possible, but it would require additional work to make the flipbooks function
-
pokechu22
archive.alsharekh.org/Articles/293/20679/470610 links the images directly though so that would work. Do all magazines have both flipbooks and those /Articles/ pages?
-
pokechu22
archive.alsharekh.org/Articles/293/20679/470610 has a blue "تصفح العدد" button that opens
archive.alsharekh.org/MagazinePages…/Al_Shariqa_2017/Issue_3/index.html so it seems like flipbooks do exist for everything... but I can't see where that link comes from
-
pokechu22
-
ikkoup
The whole thing is basically a giant flip book :(
-
ikkoup
And not very sure about articles page, but it exists for most of it (unindexed issues have no articles, only flipbook)
-
pokechu22
I'll start it in archivebot just to get *something*, and hopefully a solution for the flipbooks can be found afterwards
-
pokechu22
Thanks for letting us know about the site, we probably wouldn't have found it otherwise :)
-
pokechu22
I assume the rest of alsharekh.org should also be saved?
-
arkiver
thank you ikkoup!
-
arkiver
yeah it might be interesting to save everything on that site
-
arkiver
at least into WARCs, perhaps separate items on IA as well
-
ikkoup
Not really, alsharekh.org is landing page for other services run by the same guy.
-
ikkoup
a Lexicon, Dictionary (acquired by Saudi government), Tashkeel (vowel movement corrector) and a spell checker. I guess they can't be saved.
-
ikkoup
I also tried to setup grab-site (
github.com/ArchiveTeam/grab-site) on a vps to help crawling the archive, but had some troubles with python 3.8 not being supported.
-
Terbium
ikkoup: I would recommend using a container or Python version manager for grab-site in that case to drop back down to Python 3.7
-
pokechu22
That said, archivebot isn't a distributed project - running grab-site locally would mean you grab the entire site yourself, and additional archivebot grabs the entire site by itself. It won't make things run faster.
-
ikkoup
Ah, I thought it was something like the archivewarrior.
-
ikkoup
I wanted to run grab-site since it has some advanced crawling/scraping capabilities for forums like vBulletin and SMF which are not found in other crawling/scarping tools I looked up.
-
arkiver
i realise i don't know much about storj
-
arkiver
is it just private storage only for files to be made available from elsewhere, page requisites and such?
-
kiska
I think you can use storj as S3
-
kiska
Which I guess means you could have some site assets on storj being served
-
kiska
Or something like that
-
arkiver
right
-
kpcyrd
is there a channel for archiving #web3?
-
arkiver
archiving web3?
-
arkiver
so like... archiving blockchains?
-
FireFly
I thought part of the point was that it's kind of implicitly so already due to its distributed nature
-
arkiver
that's not archiving
-
FireFly
..fair
-
kpcyrd
the question was tongue in cheek, I probably should've made that more obvious :)
-
h2ibot
Censuro edited Talk:URLTeam (+983, /* Shouldn't archive.today be considered a URL…):
wiki.archiveteam.org/?diff=51913&oldid=26103
-
h2ibot
Popthebop edited Talk:Deathwatch (+423, /* the Tom Lehrer website containing original…):
wiki.archiveteam.org/?diff=51914&oldid=51350
-
h2ibot
Popthebop edited Talk:Tumblr (+1278, /* Current state of tumblr | IMPORTANT */ new…):
wiki.archiveteam.org/?diff=51915&oldid=45705
-
h2ibot
Sepro edited List of websites excluded from the Wayback Machine (+24, Add loom.com):
wiki.archiveteam.org/?diff=51916&oldid=51896
-
h2ibot
Flama12333 edited Deathwatch (+167, added realtek ftp sadly):
wiki.archiveteam.org/?diff=51917&oldid=51901
-
h2ibot
JAABot edited List of websites excluded from the Wayback Machine (+0):
wiki.archiveteam.org/?diff=51918&oldid=51916
-
h2ibot
JacksonChen666 edited Deathwatch (+3, fix citation errors):
wiki.archiveteam.org/?diff=51919&oldid=51917
-
michaelblob
how are people doing log agg? looking into grafana loki but getting piss poor performance generating graphs
-
michaelblob
also eyeing influxdb but now sure how/where that fits in
-
Barto
work use an ELK stack
-
nstrom|m
Just using dozzle on individual servers, no agg
-
pabs
arkiver, kpcyrd: I wonder if Web3 is as distributed as advertised? relatedly NFTs certainly aren't, lots of them apparently just load stuff off HTTP
-
nicolas17
lmk when there's anything of value worth archiving, too
-
AK
I did ELK, but then it was approaching hundreds of GB of logs per day, now I just use dozzle everywhere 🤷♂️ At work we use Azure stuff and grafana if we need graphs
-
AK
dozzle does everything I need for almost all my personal stuff:
logs.hel1.aktheknight.co.uk
-
icedice
JAA if you haven't gotten The PokéCommunity completely archived by now, you might want to put it high up on the priority list. A Pokémon fan game website was just shut down by DMCA:
twitter.com/RelicCastleCom/status/1770901435867361351
-
icedice
The PokéCommunity has probably the largest Pokémon fan game communities out there and they had four games C&D'd a while ago, so the ninja lawyers are well aware that they exist
-
Terbium
why they gotta do my PokeCommunity like that....
-
pokechu22
-
nulldata
Terbium - because Nintendo loathes its fans.
-
Terbium
Also, they really should have hosted the site in a DMCA ignored location. After so many DMCA's over the decades, it seems like this lesson is never learned