Mike Fährmann
bd0f21478a
[twitter] login using the mobile nojs login page
4 years ago
Mike Fährmann
a10f31dde5
[twitter] rewrite; use new interface ( #740 , #806 )
...
Everything except logging in with username & password and TwitPic
embeds should be working again.
Metadata per Tweet is massively different than before (mostly raw API
responses - might need some cleaning up) and the default 'archive_fmt'
changed.
4 years ago
Mike Fährmann
3bad1579ee
update extractor test results
4 years ago
Mike Fährmann
864f4220d9
update output of 'oauth:…' ( #616 )
4 years ago
Mike Fährmann
0f459f340b
[instagram] fix and re-enable login with username&password
...
This reverts commit 3e0848a482
.
(#756 , #771 , #797 , #803 )
https://github.com/althonos/InsaLooter/issues/287#issuecomment-630456522
4 years ago
Mike Fährmann
3e0848a482
[instagram] disable login with username&password ( #756 )
4 years ago
Mike Fährmann
a32aea41e1
[instagram] update 'query_hash' values
4 years ago
Mike Fährmann
2bff8dd465
[hentainexus] fix flake8 issues ( #787 )
4 years ago
Mike Fährmann
a63682a9c0
[instagram] simplify code & complete tests ( #743 )
4 years ago
墨焓
a4e3d40672
hentainexus.py minor fix ( #787 )
...
* rectify code of `join_title`, some minor fix.
* + hentainexus self.data
* fixed: call staticmethod join_title with data
4 years ago
Vrihub
62b65e59d0
Add instagram metadata: post_pageurl, post_tags ( #743 )
...
* Add instagram metadata: post_pageurl, post_tags
Add the following metadata for instagram:
- post_pageurl: json string with url of the post page
- post_tags: json array with instagram tags extracted from the post description
* Oops: rename post_tags to tags for --write-tags
This way, --write-tags will pick up the post tags.
* Rename to post_url, improve regex
* Add post_url and tags to tests
* Remove duplicate tags and sort them
* Bugfix: don't create empty tag lists
* Metadata: add location
* Metadata: add tagged_users for each media
* Move self._find_tags() to base class
* Make flake happy
4 years ago
Mike Fährmann
275cceeb6a
[redgifs] fix extraction ( #724 )
...
… and prepare for more potential extractors
4 years ago
Mike Fährmann
45baa13615
update extractor test results
...
- don't run Instagram tests on Travis anymore
- replace Twitter test because timeline was made private
- update Hiperdex domain to '.com' (again ...)
4 years ago
Mike Fährmann
dfcf2a2c91
write OAuth token to cache by default ( #616 )
4 years ago
Mike Fährmann
15c3d29062
move dump_response() into a separate function ( #737 )
4 years ago
Mike Fährmann
a363da4b43
include redirects and headers in --write-pages dumps ( #737 )
4 years ago
Mike Fährmann
6bcdb264e0
[imgur] treat 't/unmuted' URLs as galleries
4 years ago
Mike Fährmann
b6cee3e45b
[imgur] fix extraction of animated images without 'mp4' entry
4 years ago
Leonardo Taccari
bcac31b7c7
[webtoons] make archive_fmt unique ( #779 )
...
close #778
4 years ago
Mike Fährmann
e19f665a44
[danbooru] change default for 'ugoira' to 'false'
...
Downloading the pre-rendered versions should be a better default
than .zip files with individual frames.
4 years ago
Mike Fährmann
3201fe3521
add global SENTINEL object
4 years ago
Mike Fährmann
c8787647ed
add global WINDOWS bool
4 years ago
Mike Fährmann
6294e2c540
add 'text.ensure_http_scheme()'
4 years ago
Mike Fährmann
0378d079a5
[webtoons] fixes and simplifications ( #593 , #761 )
...
- fix episode listings for french comics
- allow input URLs without explicit scheme
- add 'lang'/'language' metadata
- use str.format() instead of '+' to assemble URLs
4 years ago
Mike Fährmann
ab11b1c896
[imagechest] simplify code ( #750 )
4 years ago
Mike Fährmann
846d3a2466
[sexcom] replace 404ed test
4 years ago
Mike Fährmann
9b4635917f
[gelbooru] simplify and fix pool extraction
...
use 'pool:<pool id>' as search tag to get pool posts
4 years ago
Leonardo Taccari
39cd389679
[webtoons] Add a new extractor for webtoons.com ( #761 )
...
The webtoons extractor can extract episode and entire comic (all
episodes) from webtoons.com.
All the logic of the extractors should be trivial except for a couple
of kludges needed:
- `ageGatePass' cookie is always set to avoid possible redirect and stop of
extraction, especially in the comic extractor
- The image URLs returned by the episode extractor could not be fetched
directly and the `Referer:' HTTP header needs to be passed to fetch them
Close #593 .
4 years ago
Bepis
7b5711ee04
[imagechest] Add new extractor for ImageChest ( #750 )
...
* [imagechest] Add new extractor for ImageChest
* [imagechest] Fix flake8 compliance issues
4 years ago
Mike Fährmann
a1e739b96c
reuse connection adapters from parent extractors
4 years ago
Mike Fährmann
f8f95e68a7
improve '--write-pages' ( #737 )
...
- move code into its own function
- add enumeration index to filenames
- dump responses regardless of status code
4 years ago
Mike Fährmann
09cc9dbec0
prevent flake8 errors from comments looking like type annotations
4 years ago
Mike Fährmann
2d6724180b
[hiperdex] update domain to hiperdex.info
4 years ago
Vrihub
4cc761c730
Implement --write-pages option ( #736 )
...
* Implement --write-pages option
* Fix long lines
* Fix file mode to binary
* Fix pattern for Windows compatibility
4 years ago
Mike Fährmann
f557cac074
[redgifs] add image extractor ( #724 )
4 years ago
Mike Fährmann
65b1cb7acd
[deviantart] use private access tokens for Journals ( fixes #738 )
4 years ago
Mike Fährmann
0bf0146bfe
[reddit] don't send OAuth headers for file downloads ( fixes #729 )
4 years ago
Mike Fährmann
d6a480682f
update test results
4 years ago
Leonardo Taccari
b47cfc5ac9
[speakerdeck] Add a new extractor for speakerdeck.com ( #726 )
4 years ago
Mike Fährmann
90491ab606
[artstation] improve embed extraction ( #720 )
4 years ago
Mike Fährmann
999efec5cc
[deviantart] limit API wait times to 2**9=512 seconds ( #721 )
4 years ago
Mike Fährmann
504de79d8b
[vsco] fix extraction
4 years ago
Mike Fährmann
5e2974d699
[weibo] add 'videos' option
4 years ago
Mike Fährmann
9f638c2e01
[twitter] add 'replies' option ( closes #705 )
4 years ago
Mike Fährmann
fc3e54275b
[patreon] respect filters and sort order in query params ( #711 )
4 years ago
Mike Fährmann
46b9a4d8ff
[patreon] improve hash extraction ( #693 , #713 )
...
Instead of accessing a specific part of a download URL, potentially
causing an exception if it doesn't exist, we're now searching through
all parts for a potential MD5 hash without ever raising an exception.
4 years ago
Mike Fährmann
c56a751dae
[newgrounds] fix URLs produced by 'followng' extractors ( #684 )
4 years ago
Mike Fährmann
a4fd620a25
[hiperdex] revert domain back to hiperdex.com
4 years ago
Mike Fährmann
233b6f93a2
[patreon] recognize URLs with creator IDs ( #711 )
...
e.g. https://www.patreon.com/user/posts?u= …
4 years ago
Mike Fährmann
38b6bd66b0
[500px] match 'web.500px.com' subdomains
4 years ago
Mike Fährmann
d3b3b30107
update test results
4 years ago
Mike Fährmann
5d7ca76885
retry Cloudflare challenges
4 years ago
Mike Fährmann
3eab07739f
[twitter] ensure videos have a 'filename'
...
This usually gets set when invoking the 'ytdl' downloader, but when
that fails, the error message would use 'None' as filename.
4 years ago
Mike Fährmann
c4371a6970
[twitter] add 'reply' metadata field ( #705 )
4 years ago
Mike Fährmann
12ff23b6cc
[mastodon] improve account searches ( fixes #704 )
...
Searching for just the username ("@NAME") can produce multiple
unrelated results, so we now search for username + mastodon instance
("@NAME@INSTANCE")
4 years ago
Mike Fährmann
400a0df661
[jaiminisbox] update decoding procedure ( fixes #702 )
4 years ago
Mike Fährmann
8fe858eb0e
improve parameter extraction when solving Cloudflare challenge
4 years ago
Mike Fährmann
fb98b567fa
[gelbooru] improve post ID extraction for pools
4 years ago
Mike Fährmann
d6facdee7b
[mastodon] add tests ( #701 )
4 years ago
Mike Fährmann
12eebb6f16
[xhamster] support xhamster.porncache.net domains ( closes #700 )
4 years ago
Mike Fährmann
e749402191
[mastodon] fix pagination ( #701 )
4 years ago
Mike Fährmann
921914141e
[imgbb] improve redirect handling
4 years ago
Mike Fährmann
6cc800aad4
[instagram] add 'post_id' and 'num' metadata fields ( closes #698 )
4 years ago
Mike Fährmann
a3de234e70
[hitomi] add extractor for tag searches ( closes #697 )
4 years ago
Mike Fährmann
456f6e8d05
[nozomi] move '_unpack()' method to global scope
4 years ago
Mike Fährmann
55ac408bdf
[hitomi] fix extraction of galleries without tags
4 years ago
Mike Fährmann
db6685eeae
[aryion] support downloading from folders ( fixes #694 )
4 years ago
Mike Fährmann
fa2952ac55
[furaffinity] add 'following' extractor ( #515 )
4 years ago
Mike Fährmann
9b194520db
[newgrounds] add 'following' extractor ( closes #684 )
4 years ago
Mike Fährmann
6386ee54e1
[deviantart] add extractor info to 'following' results
4 years ago
Mike Fährmann
d5273f9b0c
[hiperdex] update domain to hiperdex.net
4 years ago
Mike Fährmann
08674a91f3
[patreon] fix hash extraction from download URLs ( closes #693 )
...
The old method was assuming every URL path ends with '/1'. For URLs
where this is not the case, the segment containing the post ID was
used as file hash.
4 years ago
Mike Fährmann
a6286bb551
[hiperdex] add 'artist' extractor ( #606 )
5 years ago
Mike Fährmann
291033720a
[hiperdex] fix manga extraction
5 years ago
Mike Fährmann
dfc0557807
[vsco] fix collection extraction
5 years ago
Mike Fährmann
fd438f0d78
update extractor test results
5 years ago
Mike Fährmann
bae1e8ed12
[deviantart] fix JPEG quality replacement pattern
...
'q_\d+' would sometimes also replace something in the 'token' query
parameter, invalidating the URL.
5 years ago
Mike Fährmann
cf4cef3d63
[aryion] adjust 'date' to UTC time
5 years ago
Mike Fährmann
6c531be294
[aryion] fix malformed 'last-modified' headers ( #390 )
5 years ago
Mike Fährmann
dc65f7d8dc
[aryion] use generic download URLs ( #390 )
...
i.e. /g4/data.php?id=…
- get filename & extension from Content-Disposition header
- handle all downloadable file types (docx, swf, etc)
5 years ago
Mike Fährmann
96b78bcf04
[aryion] include path in default directory format ( #390 )
5 years ago
Mike Fährmann
6143050980
[aryion] add gallery and post extractors ( #390 , #673 )
5 years ago
Mike Fährmann
9e7dfc0cfc
[myportfolio] fix extraction of galleries without title
5 years ago
Mike Fährmann
88fca0a172
[mastodon] update OAuth credentials for pawoo.net ( #665 )
5 years ago
Mike Fährmann
4ae8a25567
[mastodon] use 'combine_dict()' to combine extractor info dicts
5 years ago
Mike Fährmann
220c06b86e
[mastodon] handle rate limits
5 years ago
Mike Fährmann
d02f7c1118
improve Extractor.wait()
...
- allow 'until' to be a datetime object
- do "time calculations" with UTC timestamps
- set a default 'reason'
5 years ago
Mike Fährmann
5d7404ab58
[oauth] use the new name for 'DeviantartAPI' ( fixes #670 )
5 years ago
Mike Fährmann
762c758af4
[hiperdex] fix extraction
5 years ago
Mike Fährmann
f9a590f92b
[deviantart] apply HTTP request limits in more places
...
"Request blocked" can also happen on sta.sh and for *any* HTTP
request directed at deviantart.com
5 years ago
Mike Fährmann
2587296deb
[mastodon] add access tokens for mastodon.social and baraag.net
...
(closes #665 )
5 years ago
Mike Fährmann
ff7c0b7eff
[deviantart] handle "Request blocked" errors ( #655 )
...
- add a 2 second wait time between requests to deviantart.com
- catch 403 "Request blocked" errors and wait for 3 minutes until
retrying
5 years ago
Mike Fährmann
c874684f05
[deviantart] retrieve *all* download URLs through OAuth API
...
'/extended_fetch' as well as Deviation webpages now again contain
Deviation UUIDs needed to grab Deviation info through the OAuth API,
meaning cookies are no longer necessary to grab original files.
The only instance were cookies are still needed are scraps marked as
"mature", since those entries are hidden for public users.
(#655 , #657 , #660 )
5 years ago
Mike Fährmann
5c27b25a8f
[deviantart] improve sta.sh extraction
...
Extract all sta.sh items in a single extractor run.
Don't spawn a new StashExtractor for each individual sta.sh item to
preserve the current requests.Session and its opened TCP connections.
5 years ago
Mike Fährmann
e2fc4eaa6f
[deviantart] detect stash folders ( fixes #659 )
5 years ago
Mike Fährmann
c034159701
[piczel] fix extraction for single images
5 years ago
Mike Fährmann
699036ea0c
[weibo] accept status URLs with non-numeric IDs ( #664 )
5 years ago
Mike Fährmann
fe96f99e4b
[hentainexus] reduce line length (flake8) & update test
5 years ago
墨焓
6f81cac8fa
Add metadata to hentainexus: circle, event, title_conventional. ( #661 )
5 years ago
Mike Fährmann
6f911aeb1c
[deviantart] add error message for cloudFront blocks ( #655 )
5 years ago