00:00:08 → #telegrab I guess 01:09:50 thuban: anything you feed on #telegrab will be queued 01:10:11 the majority of the telegram project came in over #// 01:10:23 arkiver: i see, thanks! are !p items also being queued? 01:18:18 thuban: new posts that were not discovered yet at the time I stashed away the item lists yes 01:19:15 also please queue anything youtube and prigozjin related to #down-the-tube - it will become playable from the Wayback Machine from there 01:37:22 i have to get going in a bit, so if someone else wants to look for youtube coverage i for one would be grateful 02:00:44 JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=50576&oldid=50556 02:17:11 Cont. from #archivebot, re: Fulton County Jail site crawl - http://justice.fultoncountyga.gov/PAJailManager/JailingDetail.aspx?JailingID=1472670 02:17:44 Gonna post this link here, maybe someone can find a way to successfully capture an entry like the one above ^ JAA mentioned the site is... not very easily navigated 02:18:24 WBM and Archive.is both capture the url, but each capture redirects to the site's main page. 02:21:43 When I load that once it goes to the main page, and a second time gives me a redirect to http://justice.fultoncountyga.gov/pajailmanager/ErrorOccured.aspx?aspxerrorpath=/PAJailManager/JailingDetail.aspx saying "Only documents that have been redacted are available via public access. If an expected document does not appear, ensure that it has been redacted." 02:22:17 My guess is that it needs a cookie to work - save page now outlinks from another page might do it (but you'd need to find a page that doesn't require the same cookie) 02:25:22 Hmm 02:26:16 Well, that's why I asked JAA if simply crawling the whole site altogether would somehow inadvertently capture those specific ID entry pages 02:26:33 But that would be through AB, not through SPN 02:27:27 http://justice.fultoncountyga.gov/PAJailManager/ starts off with a JS-populated
and a link to a search form. Not sure how it could discover anything from there. 02:28:17 Er, no on the homepage, just script hell. 02:28:36 ew :/ 02:30:40 Somehow we should find a way to archive it. But a screenshot or right click, save as the html to a zip then to IA is not exactly ideal... 02:31:55 Trump is expected to be booked tomorrow and his entry will likely be online at some point tomorrow. FYI in case someone finds a way to capture the pages sans the 'script hell' 02:34:03 Oh and pokechu22: You were right about a cookie - I just clicked the link I sent a few minutes ago and it returned that same error message. Oops. 02:34:44 An !ao < list thing would work for setting cookies actually 02:34:59 but, what is http://justice.fultoncountyga.gov/PAJailManager/JailingDetail.aspx?JailingID=1472670 supposed to show? 02:37:50 pokechu22: Giuliani's booking jail ID info w/associated charges and fines 02:38:39 Since it's session specific you'll have to perform the search on your own. But, if you click jail records and then enter 'Giuliani' and 'Rudy' for Last and First names respectively, it should bring you to the booking page for him 02:40:17 Pretty historical. Tomorrow, even more so for Trump. But if !ao < list would work, then maybe we can try it. Idk how that would circumvent a redirect though 02:42:25 can someone please show governments how to configure https please 02:42:27 :| 02:44:06 ugh, I don't think !ao < list will work here - it needs the POST on http://justice.fultoncountyga.gov/pajailmanager/JailingSearch.aspx?ID=400 too 02:46:22 pokechu22: Yup, I figured that since SPN is more/less the same thing and, again, just redirects to the main page. Archive.is has the same behavior. Idk. It should be saved, but beyond my expertise 02:47:31 A thought: Maybe once Trump's booking goes live on the site we can throw the whole site in AB for the hell of it and see what happens? 02:48:02 To clarify: one redirect is caused by the lack of the cookie, but the second is by the lack of the POST. AB can work around the cookie one but not the POST one since AB doesn't do POSTs. 02:49:09 =/ 02:49:33 Like I said, beyond my know-how. I at least wanted to bring it to the attention of everyone else here. 02:51:56 Alternatively, I've seen screenshots of the charges against Giuliani trending on Twitter, or X, or whatever-the-fuck. Always possible someone screenshots the .gov site entry, posts it to some other site, and we could capture that page into WBM easily. 02:52:21 Long way to get there, but it would be "captured", heh 02:53:02 sooooo many people are like ha-ha, i've screenshotted your deleted bad tweets sucker.. and i'm just sitting there with a tear in my eye like what about the wayback machine et al. 02:53:08 screenshots aren't proof :'( 02:53:34 oh sorry this isn't -ot 02:55:12 All good, fireonlive. Yeah, obviously, screenshots are not archival. But in this case, I don't see a way around? Like I said, I just wanted to talk about it here so others knew. Maybe someone will do something clever, heh 02:55:57 ye i think best case in this case 02:56:11 it's probably some contracted-out-lowest-bidder crap 02:56:29 I mentioned saving the hmtl locally and zipping it, then offloading to its own IA item. That would be another way, I guess 02:56:33 html* 04:17:07 HP_Archivist: I haven't tried it, but according to https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit , if you use the SPN2 API you can manually specify the cookies it should use when saving a page. Maybe that would work? 04:28:24 warcprox + manually browsing the site in chrome? 05:46:15 anyone awake to change the default warrior project? it's currently set to telegram, and we already went through all the queue there 05:47:32 Do I need to dig up more telegram links in the mean time? 05:48:34 feed the beat, flashfire42 :D 05:48:37 if you're up for it 05:48:51 beast 05:49:18 flashfire42: maybe, but being set to default project, you're unlikely to have enough to keep everyone busy anyway :P 05:57:50 true 06:01:23 JAA: ^ can you change default project? 07:33:52 Exorcism edited Skillfeed (+49): https://wiki.archiveteam.org/?diff=50577&oldid=50564 07:34:52 Exorcism edited NewsGrabber (+1): https://wiki.archiveteam.org/?diff=50578&oldid=50566 07:52:26 maybe it would be better to mark as "Special case"? 07:52:29 cc Exorcism 08:26:56 DigitalDragons: but the website is ded sooo 😭 09:00:39 (most of?) of the news outlets are still up though 09:00:41 hmm 09:46:10 JAA: arkiver: https://github.com/h2non/bbscraper and https://github.com/Dascienz/phpBB-forum-scraper for phpbb scraping 09:47:53 another such script: https://metacpan.org/pod/WWW::phpBB 09:52:11 after the WARC is ready, we could feed our WARC to such a script 13:21:20 nicolas17: Switched to xuite. 13:22:09 almost 5 days and wowturkey archival still going full blast 13:22:15 erkinalp: Sure, such things can be done, but that's not usually something we do. We just archive things, and then people can use it as they like. 13:42:02 JustAnotherArchivist edited NewsGrabber (+5): https://wiki.archiveteam.org/?diff=50579&oldid=50578 13:42:03 Reece2oo9 edited Miiverse (+20): https://wiki.archiveteam.org/?diff=50580&oldid=47734 13:42:04 Znak edited UC Berkeley Course Captures (-1, /* Download the videos */ Fix typo "yyt-dlp".): https://wiki.archiveteam.org/?diff=50581&oldid=50570 14:00:06 JAABot edited CurrentWarriorProject (+4): https://wiki.archiveteam.org/?diff=50582&oldid=50576 14:11:52 imer: Whoops, sorry '=D thanks for the reply though 14:12:09 no worries 17:08:35 https://sports.betonline.ag/sportsbook/futures-and-props/us-election/trump-indictment 17:08:41 bets on Trump's indictment 17:09:46 not sure if gambling is an archive target but there's one of them 17:13:36 qwertyasdfuiopghjkl: RE: No-cookies when using SPN. Wanna give that a try? 17:14:51 I'll give it a try 17:22:41 pokechu22: Okay cool, thanks 17:23:21 Doesn't look like it worked :| 17:23:49 but there aren't many details on how capture_cookie is supposed to work - I might have done it wrong 17:25:14 (I URL-encoded the whole cookie... which probably wasn't right, especially since there are multiple cookies...) 17:41:51 no, what I'm doing is correct; these two commands correctly captured spriters-resource.com and models-resource.com's front pages with changed settings (enabling NSFW posts for the first and enabling NSFW posts and using text mode for the second - both cookies were recognized for the second); I've censored my own IA login cookies in these: 17:41:54 curl -X POST -H "Accept: application/json" -d'url=https://www.spriters-resource.com/&force_get=1&capture_all=1&capture_screenshot=1&capture_cookie=nsfw%3Dshow' -H 'Cookie: logged-in-sig=; logged-in-user=' https://web.archive.org/save 17:41:57 curl -X POST -H "Accept: application/json" -d'url=https://www.models-resource.com/&force_get=1&capture_all=1&capture_screenshot=1&capture_cookie=viewmode%3Dtext%3B%20nsfw%3Dshow' -H 'Cookie: logged-in-sig=; logged-in-user=' https://web.archive.org/save 17:44:46 It might be timing sensitive too - I'll give it another try later 17:58:05 pokechu22: Hm, alright. Would a local instance of wget work on something like this or more/less the same thing? 17:58:45 A local instance of wget probably would work if you can specify the relevant cookies 17:59:10 Firefox's browser tools have a "copy as curl" function that could be a starting point (though you'd need to change it to wget parameters) 18:00:13 If that would work, it's more/less just saving the html locally (essentially). Still not archived into WBM... 18:01:32 Theoretically you can write a WARC with the right version of wget (or wpull), but it still wouldn't end up in WBM 18:02:32 Yeah, true. But not much different from right click, save as > zip > IA item 18:18:28 https://web.archive.org/web/20230824181621/http://justice.fultoncountyga.gov/PAJailManager/JailingDetail.aspx?JailingID=1472670 18:19:39 qwertyasdfuiopghjkl: Works on my end, nice! How'd you get it? 18:24:02 It might have been a second attempt of mine, though I wasn't sure if that went through or not (the jobid seemed to be the same) 18:25:00 from the timestamp I'd guess it's mine, unless you did it at around the same time 18:25:26 I did one at about the same time, but I told it to save a screenshot and there doesn't seem to be mine so it's probably yours 18:25:35 Anything special you did? 18:33:32 I went to the page via the method mentioned above, opened the F12 menu, reloaded the page, opened the raw request headers of the request, copied the stuff after "Cookie: " from there into a text editor, find-and-replaced "=" to "%3D" and "+" to "%2B", and used that in the command after &capture_cookie= 18:33:42 Example of the command I used: curl -X POST -H "Accept: application/json" -H "Authorization: LOW :" -d 'url=http://justice.fultoncountyga.gov/PAJailManager/JailingDetail.aspx?JailingID=1472670&capture_cookie=' https://web.archive.org/save 18:35:46 Hmm, I used Notepad++'s MIME tools plugin to do URL encoding, which also changed / to %2F and space to %20 and ; to %3B - I guess it must not have undone some of those 18:37:05 I didn't need to url-encode ";", maybe that was the issue with your try 18:38:36 Hmm, actually, URL-encoding semicolon and space worked for models-resource.com, so maybe it's the slash that did it or something? 18:39:06 wait, no, it *didn't* URL-encode pluses - that's probably the cause. And maybe they got treated as spaces instead of pluses in that case? 18:40:08 I did a bit of trial and error with saving pages of https://ip.wtf to figure out how the &capture_cookie= worked before trying the actual page. "+" was replaced with " ". 18:46:43 I also tried screenshot, too, probably around the same time, heh. Not sure how it worked though? 18:49:18 HP_Archivist: you missed some messages, https://hackint.logs.kiska.pw/archiveteam-bs/20230824#c373550 18:49:43 Ah, thanks. Will look now 18:51:32 Hmm - I don't know how to do any of that. But I guess this is a good guide for future issues like this. Trump's page should be up by end of today. Wanna make sure we capture that, especially. 18:52:15 Thank you, qwertyasdfuiopghjkl 19:00:07 HP_Archivist: If you ping me when that one goes up i'll try to get it (if I'm not asleep). Are there any others that are already up that should be saved or was it just the one I did? 19:05:32 qwertyasdfuiopghjkl: The other parties involved in Trump's circle who were also booked yesterday, too. Their names are escaping me atm. And okay, thank you 19:16:37 qwertyasdfuiopghjkl: Here's a list: https://abcnews.go.com/Politics/18-defendants-charged-alongside-donald-trump-georgia/story?id=102285022 19:17:40 Thanks 19:18:03 No problem. Should be able to find each one by just entering First and Last name 19:18:14 Some might not have been booked yet 19:31:20 Mark Meadows: http://justice.fultoncountyga.gov/PAJailManager/JailingDetail.aspx?JailingID=1472730 19:39:31 Mark Meadows: https://web.archive.org/web/20230824192914/http://justice.fultoncountyga.gov/PAJailManager/JailingDetail.aspx?JailingID=1472730 19:39:57 John Eastman: https://web.archive.org/web/20230824193802/http://justice.fultoncountyga.gov/PAJailManager/JailingDetail.aspx?JailingID=1472570 19:42:24 Both load on my end, nice 19:43:07 Gonna be AFK for a while. Will be on later. 19:56:11 Kenneth Chesebro: https://web.archive.org/web/20230824195116/http://justice.fultoncountyga.gov/PAJailManager/JailingDetail.aspx?JailingID=1472637 20:00:10 thanks qwertyasdfuiopghjkl :) 20:18:16 David Shafer: https://web.archive.org/web/20230824201246/http://justice.fultoncountyga.gov/PAJailManager/JailingDetail.aspx?JailingID=1472632 20:44:53 i'm glad that it's usable in this situation, but i find it kind of odd that spn will let you get captures with arbitrary cookies into the wbm 20:47:17 (and that this is apparently explicitly intended to support logged-in views, judging by target_username and target_password! i would sure like to know how those are implemented) 20:51:16 i wonder if that's http auth 20:51:40 er basic auth 20:51:49 i.e. https://httpbingo.org/basic-auth/user/password 20:52:03 But you could just put that in the URL instead. 20:52:11 oh right. 20:52:34 curious indeed 21:01:57 right, and the docs specifically refer to "the target page's login forms", which sounds to me like they're talking about page content 21:17:59 ooh that's very interesting 21:18:27 thanks for bringing that up thuban, would be neat to know indeed 22:04:20 Is there any way to increase the concurrent items to more than 6? 6 is just too low for me (archive warrior) 22:05:28 Darken: 6 is the maximum for the warrior 22:05:45 I am aware, but is there a way to go past this amount 22:05:52 and if not why? 22:05:55 if you want to do more, you can run additional warriors, or run project containers (which go up to 20) instead: https://wiki.archiveteam.org/index.php/Running_Archive_Team_Projects_with_Docker 22:06:08 thanks 22:07:16 what is the image address for the xuite project? 22:08:51 atdr.meo.ws/archiveteam/xuite-grab (you can get the link from the README in the source repo, linked in the infobox on the project wiki page) 22:10:55 n.b.: with xuite it's ok to go as high as you want, but when starting a new project check to see whether there's a recommended concurrency--one of the reasons the warrior is limited to 6 is that some sites implement ip bans 22:54:46 Switchnode edited ArchiveTeam Warrior (-28, /* How can I run tons of Warriors easily? */…): https://wiki.archiveteam.org/?diff=50583&oldid=50455 23:51:13 Hey, I dunno if anyone has done this, but Mac GUI had a "blog post" where they said they took down all of their downloads: https://web.archive.org/web/20230721053511/https://macgui.com/downloads/ 23:52:38 However I randomly checked today and they are back up: https://macgui.com/downloads/?cat_id=53 , has anyone ran a archivebot on these files? 23:56:20 viewer says yes, most recently in mid-july 23:58:15 but it looks like that was during the 'downtime', and the previous one was from like 2015, so another go-around seems good 23:59:00 maybe just of /downloads ?