02:48:30 TheTechRobo edited V Live (+61): https://wiki.archiveteam.org/?diff=49191&oldid=49187 02:49:19 TheTechRobo: if you are asking if the code is opensource - then yes https://github.com/internetarchive/heritrix3 02:49:49 arkiver: https://github.com/internetarchive/heritrix3/discussions/541 02:49:59 I mean using the WARC-writing API in other programs 02:50:07 would be easiest if I could just use it as a library :-) 02:50:31 err right 02:50:45 why not use one of the existing solutions? 02:51:02 there are ones for java? 02:51:10 are you writing in java now? 02:51:12 but why 02:51:20 to learn :-) 02:51:44 not sure if java is the most useful language at the moment to learn but sure 02:51:51 well yeah you could look into heritrix 02:52:02 I don't actually hate Java 02:52:02 i don't have much experience with it or its code though 02:52:10 so not much help from me I'm afraid 02:52:23 Maven sucks, and the RAM usage is awful, but it's not as bad as some people make it out to be (imo) 02:52:34 * arkiver also doesn't hate java 02:56:10 arkiver: out of curiosity, why don't you think it's useful? (and what languages do you think are more useful to learn?) 02:59:20 i dont see a ton of new software being written in it 02:59:32 it usually seems older software that uses that or php that needs to be maintained 02:59:50 that's why you need to use it ig :-) 02:59:51 ( JAA may also have opinion on java and 'usefulness' of languages) 02:59:58 opinions* 03:07:33 In my opinion, Java as a language isn't terrible, but it's tainted by the history with Sun and Oracle, and it's become a meme due to hilariously overengineered 'enterprise-grade' code. It's been a good few years since I last touched it, but the ecosystem was a mess at the time, and I'm not sure it's improved significantly since. 05:02:54 JustAnotherArchivist edited V Live (+66, Add source): https://wiki.archiveteam.org/?diff=49192&oldid=49191 05:14:50 In case anyone here was curious about that lost song of mine, it's partially found! https://www.youtube.com/watch?v=U-BqT6TQR7s 13:46:31 Oh wow 13:52:12 Ooh https://github.com/iipc/jwarc exists 15:46:08 Hmm, I'm trying to figure out if there's something that needs to be addressed scraping wise before the end of November; someone was saying something that's closing down and I'm not sure if it has been addressed 17:00:59 looks like we have all the blogs shutting down on November 30th covered 18:55:10 probably last set of sweb.cz domains (derived from outlinks from the previous sweb.cz archivebot run warcs) put in AB 19:28:39 Congratulations, I'll admit I was skeptical it would work in time 19:59:11 Sanqui: absolutely awesome 19:59:27 I don't know how much I'm missing 20:00:21 the total is 155k domains with some extra unreachable-from-/ urls 20:02:22 Sanqui edited Sweb.cz (+197, set 3): https://wiki.archiveteam.org/?diff=49193&oldid=49176 20:02:23 sounds pretty good 20:02:36 I will also note that maybe half or more of the domains are already dead 20:05:12 likely a lot more new domains could be gathered by scraping seznam.cz search 20:12:01 you know what, i'm looking into that...