-
Arcorann
-
yano
yikes
-
Dallas
I'm using snscrape on an instagram user, is there a way to extract all the media urls for each post? The DisplayUrl only points to the first image if a post has multiple images
-
JAA
Dallas: I don't think so. Not without making one request per post with multiple images, at least, and that'll get you banned even quicker than normal.
-
Dallas
ah crap that's annoying, well I can do it slowly, I'm adding the field `mediaUrls` to the snscrape insta module so I can at least dump all the direct urls out
-
JAA
Yeah, I basically gave up on developing the Instagram module because it's basically useless with the bans. :-/
-
Dallas
Do we know the rate limit? do they send a 429 or is it just a straight up ban outta no where?
-
JAA
Nope. And it's a 302 to the login.
-
JAA
All I know about the rate limit is that it's ridiculously low and can easily be triggered just by browsing manually. Also, there's a strong dependence on your IP address.
-
JAA
Residential IPs are less likely to get banned, as are IPs from blocks not allocated to major server providers.
-
systwi
Does anyone have any suggestions for creating an .md5 file from IA's _files.xml files?
-
systwi
As in each file listed in it.
-
JAA
-
systwi
I've tried with `sed' and even keypress automation to try to create it, and it usually works (albeit *very* clunky)
-
JAA
It's nasty but it works.
-
systwi
Thanks I'll check it out
-
systwi
`readarray' doesn't exist on my copy of macOS :-/
-
JAA
> macOS
-
JAA
Found the problem.
-
JAA
If you don't need the fancy check, just direct the output to a file instead of reading it into an array and then doing all that stuff below with it.
-
JAA
The script is for checking an IA item against a local directory to verify that everything's uploaded correctly.
-
JAA
That's why it ignores derives and the item metadata files.
-
systwi
I was trying to do that, near the end I changed one of the lines to "{ (printf "%s\n" "${iamd5sums[@]}") > >(sed 's,^, ,') 2>&1; } > md5sums.txt" but haven't tried it yet.
-
systwi
Yep, I was looking to ignore derives.
-
JAA
What Bash version are you using by the way that doesn't have readarray? That was added ages ago.
-
systwi
Yeah, macOS'
-
systwi
Woops
-
systwi
Yeah, macOS's bash is quite old due to licensing issues I think, it's 3.2.57
-
JAA
lol wtf
-
systwi
Granted, I'm not running the newest OS
-
systwi
I'm 3 major versions behind
-
systwi
Newer macOS versions use zsh
-
JAA
Ah
-
JAA
Yeah, apparently it's because Bash 4+ uses GPLv3 instead of GPLv2. Apparently Apple hates GPLv3.
-
systwi
I could install bash 4 with brew, but I'll stick with stock.
-
JAA
-
JAA
However, it doesn't verify that the file lists match as well, so if you have files in the local directory that aren't in the item, it won't catch that.
-
systwi
That's fine, I'm actually doing the opposite. I'm making sure the download didn't fail. Missing files in that regard would make sense.
-
JAA
Ah
-
systwi
Is there any need to keep -P in the grep query? macOS doesn't have it :-/
-
systwi
*macOS's `grep' doesn't support -P
-
ivan
brew install grep
-
JAA
Yes, it's necessary for non-greedy matching.
-
JAA
The non-crazy way would be to actually parse the XML, of course.
-
JAA
But where's the fun in that?
-
ivan
but that would unnecessarily insist on valid XML
-
systwi
Hah, yeah, I was kinda doing that before all of this. I even tried converting the XML to JSON and parsing that, which was a bit more tolerable, but yikes.
-
systwi
Can't believe I haven't run brew install grep before, thanks
-
systwi
Ahhh, I'll bring this over to my Linux box. It spit out the file like I changed it to do, but it's, let's just say, a little broken.
-
systwi
Hmm, never mind, I still have the same issue. The filename gets interpreted as part of the hash
-
systwi
And the output is, for example, "ipsum dolor Lorem"
-
systwi
The only change I'm making is to save the hashes as a file
-
tech234a
New YouTube Terms of Service allows YouTube to show ads on non-Partner Program videos; Payments to partners treated as royalties.
youtube.com/t/terms
-
systwi
JAA: Figured it out, `awk' was causing the issues. The files I was checking have spaces in their names.
-
JAA
Ah
-
systwi
-
systwi
Take note that most of my GNU commands have "g" prepended to them
-
systwi
But it works!
-
JAA
Yeah, unless there are 2+ consecutive spaces in a filename.
-
systwi
Like this?
-
JAA
Yeah
-
systwi
Uh oh
-
systwi
Oh, I see, it removes the extra space
-
JAA
sed 's,^\(.*\) \([0-9a-f]\+\)$,\2 \1,' instead of the awk command should work (assuming a reasonable version of sed).
-
JAA
There are probably various more edge cases my command doesn't cover properly though, like stuff that's escaped as XML entities in the filename.
-
JAA
In any case, you'll get warnings/errors about that, so it won't fail silently.
-
systwi
Ok, cool, thank you.
-
systwi
That worked with gsed (GNU sed). Didn't try it with BSD sed
-
systwi
Would you happen to know if there's a way to get direct links to original items only?
-
systwi
I'm looking into it now, but I just thought I'd ask just in case.
-
JAA
Nope
-
JAA
Looks like `ia download` has no option to skip derives either. Interesting.
-
systwi
Huh. There must be a way, somehow. How else would IA's zip-on-the-fly work?
-
JAA
Oh, it's certainly possible, just not implemented.
-
systwi
I see
-
JAA
I bet it would be pretty trivial to implement there as well.
-
JAA
The CLI should already have that information.