-
JAADallas: pywb
-
thubancan we handle flickr? this fellow died recently flickr.com/photos/andreaskay
-
JAAthuban: Yes, I'll throw it into AB.
-
thubancool, thanks!
-
flashfire42So the dutch government resigned we should porbably do some grabs about that
-
Dallasty JAA :)
-
justcool393JAA: gotcha
-
Dallasidk If anyone pointed this out yet but looks like parler is starting to come back up, or at least they have some sort of host parler.com
-
purplebotWebzdarma edited by Sanqui (+192, /* ArchiveBot jobs */) just now -- archiveteam.org/?diff=46197&oldid=46178
-
DallasThey're still on aws somehow .... subdomainfinder.c99.nl/scans/2021-01-11/https://parler.com subdomainfinder.c99.nl/geoip/13.224.227.89
-
purplebotWebzdarma edited by Sanqui (+7, /* ArchiveBot jobs */) 20 minutes ago -- archiveteam.org/?diff=46198&oldid=46197
-
purplebotSite exploration edited by Sanqui (+447, Add subdomain enumeration) just now -- archiveteam.org/?diff=46200&oldid=30117
-
purplebotWebzdarma edited by Sanqui (+787, subdomainfinder jobs) 23 minutes ago -- archiveteam.org/?diff=46199&oldid=46198
-
purplebotDeathwatch edited by JustAnotherArchivist (+167, /* 2021 */ BugTraq not shutting …) just now -- archiveteam.org/?diff=46201&oldid=46180
-
purplebotAlive... OR ARE THEY edited by Maner76 (+249, /* Watchlist */) just now -- archiveteam.org/?diff=46202&oldid=45963
-
purplebotBBC News Social Media edited by Flashfire42 (+12, /* Twitter Accounts */) just now -- archiveteam.org/?diff=46203&oldid=45101
-
purplebotX3.hu edited by Flashfire42 (+45) just now -- archiveteam.org/?diff=46204&oldid=45614
-
purplebotList of major MediaWiki wikis with the LinkSearch extension edited by Sanqui (+74, add cs, sk wikipedias) just now -- archiveteam.org/?diff=46205&oldid=28672
-
Sanquisubdomainfinder.c99.nl is nifty btw
-
Sanquiand the fact that 50% of these domains 404 already is depressing etc.sanqui.net/archivebot_wzcz_404.png
-
JAASanqui: Have you looked at Project Sonar's lists?
-
SanquiI haven't
-
Sanquihonestly this page needs more love archiveteam.org/index.php?title=Site_exploration
-
Sanquithe very first link is dead...
-
Sanquimwlinkscrape actually still works tho
-
purplebotWebzdarma edited by Sanqui (+1185, add mwlinkscrape set) just now -- archiveteam.org/?diff=46206&oldid=46199
-
kiskaWhat is being discovered? I have project sonar stuffs
-
Sanquikiska: these subdomains archiveteam.org/index.php?title=Webzdarma
-
SanquiI have a local master link that I'm working with and adding more from various sources as you can see
-
SanquiI built up a big backlog of jobs already though so... no stress I guess
-
JAAFor visibility: BugTraq is not shutting down after all. Looks like they'll revive the mailing list as well, not just keep the archives up.
-
JAAI'll make sure everything's archived anyway.
-
JAASpecifically, every message under securityfocus.com/archive/1 and every advisory under securityfocus.com/bid
-
AKFinally worked out why docker exec -it <name> touch STOP never seemed to do anything. "restart: unless-stopped" I'm an idiot
-
EggplantNah yes
-
EggplantNtouch STOP causes run-pipeline3 to exist
-
EggplantNwhich then will restart the CT
-
AKYeah it explains why when I look back they just seem to still be running
-
AKAhh well, time to change the compose files
-
avoozlSanqui: I've got around 8M posts loaded (takes half an hour from scratch) and search queries currently can take around 1 second, for a personal hosting that seems acceptable
-
Sanquidefinitely -- that'd be fine for public consumption too if you just limit to 1 search per ip per minute or something
-
avoozlI think for public consumption another type of data store would be better suited, but that's for another day
-
avoozlfor example, searching for "magic penetration" "armor penetration" currently gives me posts (and their fragments) like this:
-
avoozlYes but I would expect Ryze to have much more <mark>magic</mark> <mark>penetration</mark> than <mark>armor</mark> <mark>penetration</mark> so the damage dealt will be lower.
-
avoozlBut there's some work to be done on the parser side now first
-
avoozlI'm just adding a sanitized html output for my post overview, because otherwise I'm going to have some issues down the road with disallowed tags and other madness
-
avoozlthe lol forums download itself is still running, but I'll do a sizeable load overnight, see how well the index keeps up. This was just a single 5GB WARC
-
avoozlestimate is around an hour to load the entire forums.eune.leagueoflegends.com