-
fireonlive
-
fireonlive
-
xkey
moin
-
fireonlive
moin
-
thalia
moin
-
Terbium
moin
-
Ryz
Heya folks, I have a bit of a silly request; can someone help me (with me providing said text file when they're available to take up the request) sort by how many forward slashes are there? '/' - this stuff; highest number goes on top, lowest (ideally just 3) are at the bottom
-
Ryz
Asking for help because this is a way for me to tackle the list as frictionless as possible from my end~
-
Ryz
Text files are the custom bruteforced Blogspot URLs back on 2024 January that I've now been getting around to on finding extra goodies
-
Ryz
*Text file
-
thuban
Ryz: sure, i'll do it
-
Ryz
thuban+++
-
eggdrop
[karma] 'thuban+' now has 1 karma!
-
Ryz
One for a successful text file submission just moments ago
-
Ryz
thuban+++
-
eggdrop
[karma] 'thuban+' now has 2 karma!
-
Ryz
The other is the other text file weeks ago~
-
Ryz
Did I do it right...?
-
pabs
++ instead of +++ :)
-
Ryz
Aww crap
-
thuban
it's the thought that counts
-
Ryz
Hold onnn <#>;
-
Ryz
thuban---
-
eggdrop
[karma] 'thuban-' now has -1 karma!
-
Ryz
Ugh, I'm messing things uppppp
-
Ryz
thuban++
-
Ryz
xDDDDDD
-
Ryz
Well, never seen this message before
-
thuban
(for logs: `awk -F/ '{ print NF-1 "\t" $0 }' | sort -rn | grep -v ^0 | cut -f2`)
-
pabs
thuban+--
-
eggdrop
[karma] 'thuban+' now has 1 karma!
-
pabs
thuban-++
-
eggdrop
[karma] 'thuban-' now has 0 karma!
-
pabs
thuban+--
-
eggdrop
[karma] 'thuban+' now has 0 karma!
-
pabs
thuban++
-
pabs
aw, it thinks I'm flooding
-
fireonlive
!kfind thuban
-
eggdrop
[karma] 3 matches for 'thuban': thuban: '4' thuban+: '0' thuban-: '0'
-
pabs
is there a delete?
-
fireonlive
i should reap 0 karma after some time
-
fireonlive
thuban++
-
eggdrop
[karma] 'thuban' now has 5 karma!
-
fireonlive
haha
-
fireonlive
had to stare for a sec to see if he got it in the end
-
Vokun
!kfind Vokun
-
eggdrop
[karma] 1 matches for 'Vokun': Vokun: '1'
-
Ryz
thuban++
-
eggdrop
[karma] 'thuban' now has 6 karma!
-
Ryz
thuban++
-
eggdrop
[karma] 'thuban' now has 7 karma!
-
Ryz
Some proper credit, here >#<;
-
Vokun
thuban++
-
eggdrop
[karma] 'thuban' now has 8 karma!
-
hak335
Hi, can someone explain to me how a feature on archive.org works?
-
JAA
hak335: #internetarchive is better suited for that.
-
ghgffba
JAA it has to do with archiveteam
-
JAA
Well then
-
ghgffba
In some URLs, a "why: archiveteam" is present
-
ghgffba
does this mean that the URL is present in the archiveteam collection?
-
JAA
It means that snapshot is coming from somewhere in our collection, yeah.
-
ghgffba
can I locate, from the URL, the collection that it is in?
-
JAA
It should normally tell you the more narrow subcollection in the same line. E.g. '(why: archiveteam, archiveteam_urls)'
-
JAA
What are you looking for?
-
ghgffba
-
ghgffba
It only says "why:archiveteam"
-
ghgffba
I have noticed the archiveteam archives twitter and so I want to know what collection they usually store twitter in
-
ghgffba
JAA is that not something that can be known from the URL?
-
nulldata
ghgffba - The only Twitter specific collection from AT I think is this one.
archive.org/details/twitterstream which I don't think has what you're looking for. I believe most of the Twitter grabbing was done with ArchiveBot, so it would be buried in the ArchiveBot collection.
archive.org/details/archivebot When Twitter became heavily
-
nulldata
restricted there was a time we were only able to grab with third-party clients like nitter.net so for anything post-Elon you'd need to look for those URLs instead of twitter.com.
-
nulldata
Also to note we haven't been able to grab Twitter stuff since Twitter killed "guest tokens" a few months back.
-
ghgffba
nulldata so it would be in a GO Pack ?
-
ghgffba
I've noticed also that archiveteam_urls are unavailable for download, is there a reason for this?
-
nulldata
Usually yes. Though as you noted the one you linked to is strange since it doesn't list the collection. Maybe an item didn't get put into the AB collection? Or maybe collection listing on WB worked differently back then?
-
ghgffba
nulldata honestly, most that I find from twitter by the archiveteam are like this
-
nulldata
I think the urls items are restricted for legal concerns since it's basically a fire-hose of stuff.
-
ghgffba
wouldn't that apply to the bot too however?
-
ghgffba
here's another example of this only displaying archiveteam in the why
-
JAA
AB has manually triggered targeted crawls. URLs grabs random stuff from all over the web. So not really comparable, no.
-
ghgffba
-
ghgffba
-
ghgffba
JAA would a targeted crawl with AB on twitter archived every single tweet a targeted account has replied to or something? I say this because I feel like some account that have "why:archiveteam" seem way too small to have been chosen for crawling
-
JAA
We did quite a bit of Twitter archiving with AB back before Elon ruined it. That included replies. I don't see a run directly for this account, but it might've been part of a list that was run through or something.
-
JAA
No idea why it doesn't show the collection here, but I have seen various issues with that 'why' display before, including where it says it isn't in any collection. Might be an indexing thing.
-
JAA
Maybe we should put an FAQ entry on the wiki about access-restricted collections. Cc arkiver
-
ghgffba
You mention you don't see a run, Is there a way to see what accounts were archived?
-
JAA
Not really, those jobs weren't even indexed by the viewer due to a bug. I just grepped my channel logs.
-
Notrealname1234
fireonlive: your bot broke and left
-
pabs
<vort3> geminispace.info going to shut down on 1st of June 2024.
-
pabs
looks like it doesn't support http:// only gemini:// though
-
arkiver
JAA: interesting on marginalia and WARCs
-
Notrealname1234
Do you need to have a reason to archive stuff on #mediaonfire like #down-the-tube ?
-
fireonlive
Notrealname1234: nope
-
fireonlive
go crazy
-
Notrealname1234
Oh ok
-
Notrealname1234
fireonlive: not even copyright can stop it?
-
JAA
Notrealname1234: Most sensible jurisdictions have an exemption in copyright law for archival/preservation. Also, virtually everything we archive is copyrighted because that's how copyright works. Even most non-trivial messages in here are copyrighted.
-
Notrealname1234
Oh ok
-
nulldata
This message is copyrighted. Please do not violate my ©opyright by reading this message.
-
Notrealname1234
I read it
-
Notrealname1234
Is there a way to request URLs on URLTeam
-
fireonlive
url shorteners?
-
Notrealname1234
fireonlive: yes
-
fireonlive
#urlteam with as much description as you have of it, what the url space is like (custom urls?), what errors look like, what success looks like
-
fireonlive
flashfire42 can add ones that don't require code changes if he's available
-
fireonlive
code changes -> specialized code
-
Notrealname1234
That isn't a url shortener
-
Notrealname1234
Like this bit.ly/UnlockMyWii
-
Notrealname1234
fireonlive ^
-
JAA
URLTeam can't do custom codes. Also, please use the relevant project channel for your questions.
-
Notrealname1234
Ok
-
ghgffba
Does removal of a URL from the Wayback Machine remove it from the ArchiveBot warc(for example)? Or is it just from the Wayback Machine search
-
pokechu22
The warc file isn't modified, but they may make the entire warc file unavailable for download in some cases IIRC
-
thalia
Have any of you heard of the TLIB version control system? It was a major VCS for DOS/Windows in the '90s
-
ghgffba
pokechu22 wouldn't that mean the Wayback Machine is GDPR non-compliant?
-
pokechu22
¯\_(ツ)_/¯
-
DigitalDragons
there is an exemption for archives
-
fireonlive
(keeping in mind that archiveteam != internet archive) this sounds like an #internetarchive discussion at best (also an unofficial channel). if you wish to reach engage IA about such info⊙ao or the details at the bottom of
archive.org/about/terms.php is your best bet
-
nicolas17
-
nicolas17
but its warcs are not downloadable
-
nicolas17
so knowing that the collection is
archive.org/details/archiveteam_twitter won't help you muck
-
nicolas17
much
-
ghgffba
nicolas17 how did you find that? I've been looking for days for ways to derive the collection of something from the url
-
nicolas17
HTTP headers
-
nicolas17
-
nicolas17
returns
-
nicolas17
x-archive-src: archiveteam_twitter_20210618084430_5c8a071e/twitter_20210618084430_5c8a071e.1623687446.megawarc.warc.zst
-
ghgffba
DigitalDragons Since this is the "archive shit to save it" irc chat I obviously don't want to argue preservation, but there are caveats to that that exemption
-
ghgffba
nicolas17 is there a reason why the collection says "Storage_size 0 B (in 0 files)"
-
nicolas17
hm seems it's access-restricted so you can't see the items in the collection?
-
nicolas17
the items don't let you download their files, but I didn't know the collection also didn't list its items
-
BornOn420
Does anybody know how the Telegram bot discovers comments? Are comments only discovered for new posts or also for existing posts for recurring channels?
-
JAA
→ #telegrab
-
BornOn420
OK
-
Notrealname1234
Now that i think about it, i should join every AT channel (no archiveteam-sucks folks)
-
Notrealname1234
Done
-
fireonlive
man has real irc client
-
fireonlive
ish; Revolution IRC:0.5.2:Android
-
fireonlive
:d
-
Notrealname1234
What are you gonna do about it
-
Notrealname1234
It's android
-
fireonlive
literally nothing
-
JAA
Once the reconnects become annoying enough, measures will be taken. :-)
-
Notrealname1234
How did you know anyway
-
fireonlive
😈
-
fireonlive
/ctcp Notrealname1234 VERSION
-
Notrealname1234
Ok
-
Notrealname1234
No u wont
-
fireonlive
lol
-
h2ibot
Bear edited Abload (+267, Abload will rest alongside TinyPic on the image…):
wiki.archiveteam.org/?diff=52191&oldid=51772
-
h2ibot
Bear edited The Chive (-34, updated file name):
wiki.archiveteam.org/?diff=52192&oldid=52043
-
h2ibot
Bear edited List of websites excluded from the Wayback Machine (+998, thegreenhead.com also got the Red Sorry of…):
wiki.archiveteam.org/?diff=52193&oldid=52190