-
h2ibot
-
flashfire42
fireonlive you still wanted me to hit the polycom domains hard?
-
fireonlive
pls :)
-
flashfire42
fireonlive looks like some of them 403 with archivebot. If you wanna babysit the dashboard I can throw them in as fast as possible or I can put them on the backburner til I have time to better monitor it all
-
fireonlive
have the x over atm sadly so it’ll have to burner at the back
-
fireonlive
but thanks :)
-
h2ibot
Ufarwisan edited Discord (+660, Rename /* Software */ to /* Self-archival */):
wiki.archiveteam.org/?diff=50434&oldid=48752
-
h2ibot
TheTechRobo edited Discord (+11, /* Self-archival */ We should probably make the…):
wiki.archiveteam.org/?diff=50435&oldid=50434
-
h2ibot
TheTechRobo edited Discord (+28, /* Self-archival */ More details about tools):
wiki.archiveteam.org/?diff=50436&oldid=50435
-
h2ibot
TheTechRobo edited Discord (+125, Link my URL extractor):
wiki.archiveteam.org/?diff=50437&oldid=50436
-
h2ibot
TheTechRobo edited Vanillo (+117, Appears to be back up, with content dating back…):
wiki.archiveteam.org/?diff=50438&oldid=41059
-
h2ibot
TheTechRobo edited Wysp (-1, It is now offline):
wiki.archiveteam.org/?diff=50439&oldid=50417
-
nicolas17
thuban: looks like I have to archive all 4 video qualities for the DASH .mpd to work
-
thuban
interesting
-
nicolas17
at least ffmpeg/mpv/etc try to read the first segment of *every available alt quality* before they even start playing
-
nicolas17
if the low quality segment 1 returns 404 then it says the video is corrupted and dies, even if you told it to play 1080p
-
nicolas17
unfortunately archiving all qualities means 10GB per episode ugh
-
nicolas17
do I archivebot?
-
flashfire42
what are you wanting to archivebot?
-
nicolas17
flashfire42:
rtve.es/play/videos/grand-prix spanish TV game show
-
systwi_
nicolas17: ArchiveBot can't save full television programmes (typically), if that is what you were hoping for.
-
nicolas17
systwi_: what exactly do you mean by "can't"? file size limit?
-
systwi_
ArchiveBot's purpose is to save web pages and eventually make them available in
web.archive.org
-
systwi_
-
nicolas17
earlier I asked "video is in DASH format, should I remux it to .mp4 and upload it as an item, or archive the .mpd and video segments in a WARC, or give archivebot a URL list and let it do that for me?" and thuban said "1 and 3 imho"
-
systwi_
But if the URL to which you had linked were to be saved with ArchiveBot, it would try its best to save any web pages it can find.
-
systwi_
All three sound good, but I think thuban has a good point, so I second it.
-
systwi_
1 & 3.
-
nicolas17
transfer.archivete.am/inline/8x4IQ/6939444.txt this is what I planned to give to archivebot, not the web player :)
-
systwi_
Looks good to me. Thank you for the list. I'll save it with ArchiveBot for you.
-
nicolas17
note having multiple video qualities it adds up to 10GB
-
systwi_
~10GB shouldn't be too problematic.
-
nicolas17
someone uploaded most or all of the old seasons (1996-2007) to YouTube, probably from personal VHS
-
systwi_
Going the extra mile is nice. :-)
-
nicolas17
in fact, I searched for it on youtube to show someone, and that's where I discovered they were about to reboot it this year
-
nicolas17
systwi_: also, the web player loads 6939444_drm.mpd and gets a FairPlay or Widevine license to decrypt it
-
nicolas17
I asked a friend if he knew how to break widevine nowadays, and then I realized I could just ... remove the "_drm" part of the URL >.>
-
systwi_
Haha, they store a decrypted version too? Lovely. :-P
-
nicolas17
I *hope* their paid content for RTVE Play+ subscribers is protected better than that
-
fireonlive
i hope it isn't
-
fireonlive
:D
-
fireonlive
🏴☠️
-
fireonlive
-
fireonlive
if only….
-
fireonlive
someone should mention AT there too :)
-
OrIdow6
Hm
-
OrIdow6
I'm kinda surprised that AI types don't toss around AT data as much as they seem to, like, pushshift
-
OrIdow6
If that happens could put us at risk of being more aggressively blocked
-
fireonlive
maybe warcs are too difficult for them lol
-
thuban
JAA, did you ever hear back from uktrainsim?
-
Artem4ikBaik
close
-
h2ibot
-
h2ibot
-
h2ibot
-
h2ibot
-
pos12
Canadian file host filegenie.com will shut down for undisclosed reasons on August 31; most of the links in its sitemap, including FAQ, are dead.
-
pos12
Filegenie's file URL format is
wl.filegenie.com/~<username>/<filename> . Websites that contain still-active wl.filegenie.com links should be archived too.
-
imer
that sounds difficult to do a comprehensive grab on :|
-
pabs
needs some search engine queries I guess
-
pabs
oh, no directory listings :(
-
pabs
flashfire42 seems to be on it already
-
flashfire42
Thats everything from bing anyway
-
pabs
does bing have a results limit like google/ddg do?
-
flashfire42
Yes
-
flashfire42
Alright I am looping on myself I am going back to be
-
pabs
ah, did you try the adding keywords trick from
wiki.archiveteam.org/index.php/Site_exploration ?
-
flashfire42
No because my usual checked urls trick doesnt work on those pdfs because it tries to download them straight away instead of opening them in a web browser
-
pabs
Google/DDG don't find many URLs
-
h2ibot
TheTechRobo edited The WARC Ecosystem (+713, Add section for people who just want to view…):
wiki.archiveteam.org/?diff=50444&oldid=50100
-
h2ibot
Farrukhali6177 edited CNET Forums (+33, /* Shutdown notice */):
wiki.archiveteam.org/?diff=50445&oldid=48231
-
h2ibot
-
h2ibot
-
h2ibot
-
h2ibot
TheTechRobo edited Discord (+178, /* Self-archival */ Add source code licences):
wiki.archiveteam.org/?diff=50449&oldid=50447
-
arkiver
i love all the changes to the wiki lately :)
-
jacksonchen666
hi, i intend to shutdown my warrior for a system upgrade. however, it seems like it's stuck doing nothing useful (server returned bad response & nearly 16 elapsed job). could i force stop the warrior right now?
-
jacksonchen666
*nearly 16 hours elapsed job
-
kiryu
jacksonchen666: It's fine
-
kiryu
I think you have already got banned and the failed items in the warrior project should return to the backfeed
-
jacksonchen666
doesn
-
jacksonchen666
doesn
-
jacksonchen666
oops again
-
jacksonchen666
seems like my warrior is still trying for some reason, switched it to another project manually
-
h2ibot
TheTechRobo edited Twitch.tv (+642, #burnthetwitch: Add directory structure and caveat):
wiki.archiveteam.org/?diff=50450&oldid=50418
-
JAA
thuban: I didn't even remember sending that email, but no, I didn't.
-
thuban
ouch
-
pokechu22
Any ideas for what to do with a site like
ericbrasseur.org/? It does a JS challenge of some sort that sets a cookie, and then redirects to a different page. But the challenge seems to fail randomly sometimes too. It seems like useful content at least
-
pabs
pokechu22: IIRC JAA had a way to archive stuff that needs a cookie
-
pabs
JAA: did you end up getting the opensource.com cookie-requiring stuff btw?
-
flashfire42
any requests for archivebot focus today or just me going on with my ISP hosting stuff?
-
pokechu22
I'm doing some greek university stuff (for a school that I think was merged into a different one in 2019) but it's not super high priority
-
flashfire42
I can add site:teithessaly.gr to my tabs
-
pokechu22
Don't worry about it - the stuff I did was the only relevant cached stuff (the other domains are live)
-
flashfire42
Ah ok
-
flashfire42
all good
-
pokechu22
there's also some jank with teithessaly.gr and teilar.gr being the same site (I've already handled teilar.gr for the most part, currently checking subdomains)
-
nicolas17
JAA: it seems s3://origin.ka.cdn/ is entirely inaccessible now?
-
nicolas17
its CDNs too