-
arkiver
let's also use #recordedjournal for livejournal channel
-
arkiver
do we have a source on the latest livejournal news?
-
Gray_cat
Similar question to rocketdive's - how come 10 GB was downloaded and only 4 unloaded for the telegram project? 6 GB of stylesheets got stripped away and only the message content was sent up?
-
OrIdow6
Compression
-
OrIdow6
Always a ping timeout
-
Jake
Heh
-
mohsaid
Hi
-
OrIdow6
Hello mohsaid, what can we do for you?
-
mohsaid
I have a question.
-
rewby
We may have answers if you ask your question. :)
-
rewby
Announcement: I'm doing maintenance on my infrastructure momentarily. Both my IRC and several systems required for target management will be offline for a few hours. Targets should keep running but if anything stops functioning, I can't fix it until this maintenance is done.
-
mohsaid
My website (occlub.org) had been archived automatically on the wayback machine, and it says that the website had been archived originally by the (Archive Team). My question is, does that mean that the Archive Team had a copy of my website?
-
OrIdow6
mohsaid: ArchiveTeam generally does not keep copies of the websites we archived as we do not have enough storage ourselves; rather we upload them to archive.org
-
rewby
We had a copy of the site as it was visible to the public. Depending on how we did the archive, we may have not kept the copy after we uploaded it to the IA.
-
OrIdow6
Also, we may have made and uploaded a copy of a single page, or of the whole site, or of a section
-
rewby
Judging by the dataset being archiveteam_urls, it'll be single pages
-
OrIdow6
That's what it looks like
-
OrIdow6
And news.html isn't in the WBM at all
-
rewby
Yeah
-
OrIdow6
*.php
-
rewby
So no, we don't have a copy anymore because #// moves so fast
-
mohsaid
Thank you for your assistance:)
-
thuban
why do you ask?
-
mohsaid
Because I'm curious how they got to my website.
-
thuban
the urls project (channel #//, collection archiveteam_urls) collects outlinks from a variety of sources
-
rewby
^
-
rewby
outlinks being, someone linked to it
-
rewby
So maybe someone posted it on reddit
-
rewby
Or we found it on social media somewhere
-
rewby
Or it was found as part of another project
-
mohsaid
Yes, I posted it on my Facebook group.
-
OrIdow6
We don't do Facebook AFAIK
-
OrIdow6
Too hard for a variety of reasons
-
mohsaid
Also, I am interested in whether you use a crawler bot or something like this to crawl websites that have been posted on (channel #//)?
-
rewby
We have a big distributed crawler, yeah
-
rewby
If you're interested, we can tell one of our bots to crawl and archive your entire site.
-
mohsaid
No problem if you want to
-
mohsaid
And what is the name of the crawler?
-
OrIdow6
The one for smaller-scale sites like this is ArchiveBot
-
OrIdow6
Well, the one rew_by's talking about
-
rewby
Yep
-
mohsaid
Is that crawl open source?
-
thuban
-
thuban
-
mohsaid
Thank you everyone for the help :)
-
thuban
you're welcome!
-
rewby
Always happy to help!
-
mohsaid
That crawl will help me a lot.
-
datechnoman
Could someone possibly share the docker creation command/script they are using to run the docker containers fully out of RAM using tmpfs? Looking to bypass I/O issues on disks if possible.
-
Sluggs
im using this on a few machines with local hdd
-
Sluggs
docker run -d --name telegram --label=com.centurylinklabs.watchtower.enable=true -v '/dev/shm/telegram':'/grab/data':'rw' --restart=unless-stopped atdr.meo.ws/archiveteam/telegram-grab --concurrent 4 Sluggs
-
JAA
datechnoman: hackint/#down-the-tube 2021-11-30 18:27:48 UTC < ThreeHM> Hmm, good idea to use tmpfs for reducing disk load. Turns out docker has an option for that: "docker run --mount type=tmpfs,tmpfs-size=2G,destination=/grab/data ..."
-
h2ibot
Jakiki6 edited Talk:Main Page (+169, /* Mirroring to IPFS */ new section):
wiki.archiveteam.org/?diff=48686&oldid=48337
-
h2ibot
Entartet edited List of websites excluded from the Wayback Machine (+25, Added petitcolas.net.):
wiki.archiveteam.org/?diff=48687&oldid=48685
-
h2ibot
-
datechnoman
Thanks for that info all!
-
datechnoman
just what I was after