-
h2ibot
-
arkiver
TheTechRobo: if you are asking if the code is opensource - then yes
github.com/internetarchive/heritrix3
-
TheTechRobo
-
TheTechRobo
I mean using the WARC-writing API in other programs
-
TheTechRobo
would be easiest if I could just use it as a library :-)
-
arkiver
err right
-
arkiver
why not use one of the existing solutions?
-
TheTechRobo
there are ones for java?
-
arkiver
are you writing in java now?
-
arkiver
but why
-
TheTechRobo
to learn :-)
-
arkiver
not sure if java is the most useful language at the moment to learn but sure
-
arkiver
well yeah you could look into heritrix
-
TheTechRobo
I don't actually hate Java
-
arkiver
i don't have much experience with it or its code though
-
arkiver
so not much help from me I'm afraid
-
TheTechRobo
Maven sucks, and the RAM usage is awful, but it's not as bad as some people make it out to be (imo)
-
» arkiver also doesn't hate java
-
TheTechRobo
arkiver: out of curiosity, why don't you think it's useful? (and what languages do you think are more useful to learn?)
-
arkiver
i dont see a ton of new software being written in it
-
arkiver
it usually seems older software that uses that or php that needs to be maintained
-
TheTechRobo
that's why you need to use it ig :-)
-
arkiver
( JAA may also have opinion on java and 'usefulness' of languages)
-
arkiver
opinions*
-
JAA
In my opinion, Java as a language isn't terrible, but it's tainted by the history with Sun and Oracle, and it's become a meme due to hilariously overengineered 'enterprise-grade' code. It's been a good few years since I last touched it, but the ecosystem was a mess at the time, and I'm not sure it's improved significantly since.
-
h2ibot
JustAnotherArchivist edited V Live (+66, Add source):
wiki.archiveteam.org/?diff=49192&oldid=49191
-
fishingforsoup
In case anyone here was curious about that lost song of mine, it's partially found!
youtube.com/watch?v=U-BqT6TQR7s
-
audrooku|m
Oh wow
-
TheTechRobo
-
Ryz
Hmm, I'm trying to figure out if there's something that needs to be addressed scraping wise before the end of November; someone was saying something that's closing down and I'm not sure if it has been addressed
-
arkiver
looks like we have all the blogs shutting down on November 30th covered
-
Sanqui
probably last set of sweb.cz domains (derived from outlinks from the previous sweb.cz archivebot run warcs) put in AB
-
OrIdow6
Congratulations, I'll admit I was skeptical it would work in time
-
arkiver
Sanqui: absolutely awesome
-
Sanqui
I don't know how much I'm missing
-
Sanqui
the total is 155k domains with some extra unreachable-from-/ urls
-
h2ibot
-
arkiver
sounds pretty good
-
Sanqui
I will also note that maybe half or more of the domains are already dead
-
Sanqui
likely a lot more new domains could be gathered by scraping seznam.cz search
-
Sanqui
you know what, i'm looking into that...