Mike Fährmann
4efe56f419
[furaffinity] improve new/old layout detection ( fixes #2277 )
3 years ago
Mike Fährmann
0f1e7ff319
[twitter] fix extraction ( #2275 )
3 years ago
Mike Fährmann
f351746483
release version 1.20.4
3 years ago
Mike Fährmann
dee0d22561
update extractor test results
3 years ago
Mike Fährmann
d7b8e04b50
[kemonoparty] use 'Accept-Encoding: identity' for all downloads
...
(#2267 )
fixes issues when data send with 'Content-Encoding: gzip' or other
encodings is larger than the actual file
3 years ago
enormous-muscles
55326377d8
Add Kohlchan extractor ( #2251 )
3 years ago
Mike Fährmann
cc7dce5755
[sexcom] add 'pins' extractor ( closes #2265 )
3 years ago
Mike Fährmann
02e18f56be
[e621] add 'favorite' extractor ( closes #2250 )
3 years ago
Mike Fährmann
70e6e1549e
[twitter] provide fallback URLs for card images
...
f2e8aedd74 (commitcomment-64057751)
3 years ago
Mike Fährmann
86fa412b47
[hitomi] add 'format' option ( #2260 )
...
default is 'webp' since downloading original files is no longer allowed
3 years ago
Mike Fährmann
492436f936
[twitter] add 'warnings' option ( #2258 )
...
disable reporting any non-fatal errors by default
3 years ago
Mike Fährmann
a5163e4c70
[twitter] restore 'logout' functionality ( #1719 )
3 years ago
Mike Fährmann
f58364f6a8
update Firefox cipher list
3 years ago
Mike Fährmann
7e6981dda6
rename 'disabletls12' to 'tls12'
...
and let config options override any default settings
3 years ago
Mike Fährmann
bb3e182562
overhaul session initialization
...
- share adapter & connection pool across sessions with the same
ssl options, ssl ciphers, and source address
- simplify browser emulation to just a list of headers and ciphers
3 years ago
Mike Fährmann
e670dc518e
[weibo] update pagination code ( fixes #2244 )
...
- send proper headers and query parameters
- use 'since_id' instead of page numbers
- set a 1-2 second delay between requests
3 years ago
Robert Pendell
4c651f6252
[patreon] Disable TLS 1.2 by default ( #2249 )
...
Disables TLS 1.2 on Patreon by default.
3 years ago
Robert Pendell
392cf079f7
Add ability to disable TLS 1.2 ( #2243 )
...
Fix for Patreon Cloudflare issues by having only TLS v1.3 or higher establish HTTPS connections
This now allows you to disable it on a per-host or global basis. Add disabletls12 as a config option either under extractor.(host) or just under extractor. Option is false by default.
Example:
"patreon":
{
"disabletls12": true,
"cookies": {
"session_id": "X"
}
}
3 years ago
Mike Fährmann
d33227fc38
[twitter] restore errors for protected timelines etc ( fixes #2237 )
3 years ago
Mike Fährmann
ebd3d5c1cc
[bunkr] fix .mp4 downloads ( closes #2239 )
3 years ago
Mike Fährmann
e2be199124
[gelbooru] improve and fix pagination ( #2230 , #2232 )
...
Use 'id:<POSTID' as a tag instead of going through pages with 'pid'.
Something similar was already implemented in 93cef784
,
but that got broken again in 3085aac4
.
3 years ago
Mike Fährmann
806badbeec
release version 1.20.3
3 years ago
Mike Fährmann
8230f31800
[twitter] update query hashes
3 years ago
Mike Fährmann
c180806cec
[twitter] fix deleted/invalid retweets ( #2225 )
3 years ago
Mike Fährmann
a2eecc6aa8
[kemonoparty] fix DMs extraction ( #2008 )
3 years ago
Mike Fährmann
2bf554a896
[twitter] fix several errors ( #2212 , #2216 , #2225 )
...
- fix Tweets with deleted quotes
- fix suspended Tweets without 'legacy' entry
- fix unified_cards without 'type'
3 years ago
Mike Fährmann
fbd17547f5
release version 1.20.2
3 years ago
Mike Fährmann
e5242b83bf
[twitter] define directory format for events ( #2109 )
3 years ago
Mike Fährmann
efb3e65a6a
[sexcom] extend URL pattern ( fixes #2220 )
3 years ago
vsyx
3f2b6335d7
[instagram] fix highlights extraction ( #2197 )
...
* [instagram] fix highlights extraction
* [instagram] improve highlights extraction
- 'yield' individual reels instead of collecting them in a list
and returning them all at once
- reduce 'chunk_size' to an even saver value
(instagram.com also uses 5)
3 years ago
Mike Fährmann
5ed26e1773
[twitter] fix pinned tweets ( #2216 )
...
caused by the changes in dffa440ede
3 years ago
Mike Fährmann
a9f78e6527
[twitter] improve error handling
...
- handle accounts without 'rest_id'
- handle timelines with empty 'instructions'
3 years ago
Mike Fährmann
729b07c1f5
[twitter] simplify
...
- use dict with common GraphQL variables
- reduce 'variables' size with custom JSON encoder instance
- centralise TwitterAPI() creation
3 years ago
Mike Fährmann
7cb29224f0
[philomena] fix search parameter escaping ( #2215 )
...
The pluses from search terms in /tags/ URLs need to be
replaced with spaces to get accepted by Philomena.
3 years ago
Mike Fährmann
9ca8bb2dc0
[twitter] improve error handling
3 years ago
Mike Fährmann
9a221494c3
[twitter] add 'event' extractor ( closes #2109 )
3 years ago
Mike Fährmann
14867dad6b
[twitter] fix unified cards from search results
3 years ago
Mike Fährmann
dffa440ede
[twitter] improve handling of deleted tweets ( #2212 )
3 years ago
Mike Fährmann
54ef874ba4
[twitter] fix retweet filter ( #2212 )
3 years ago
Mike Fährmann
cb43f7731b
[twitter] update to GraphQL API ( #2212 )
...
The old REST API endpoints, which were not used by Twitter since
summer 2021, are going to finally be phased out it seems, with
'/2/timeline/profile/USERID.json' being the first one.
Only Twitter's search doesn't have a GraphQL interface yet.
3 years ago
Mike Fährmann
de754590e0
add --source-address command-line option ( closes #2206 )
3 years ago
Mike Fährmann
698f35215e
[blogger] support new image domain ( fixes #2204 )
3 years ago
Mike Fährmann
c587b678d0
[mangadex] re-enable warning for external chapters ( #2193 )
3 years ago
Mike Fährmann
f2e8aedd74
[twitter] changes to 'cards' option
...
- change default value to 'true'
- only invoke youtube-dl for cards unsupported by gallery
when 'cards' is set to "ytdl"
"cards": true --> only download card images
"cards": "ytdl" --> download card images and
use youtube_dl on otherwise unsupported cards
3 years ago
Mike Fährmann
2d34d8ff8b
[reddit] allow downloading from quarantined subreddits ( #2180 )
3 years ago
Mike Fährmann
17c9c47ca0
[hitomi] fix 'tag' extraction ( fixes #2189 )
3 years ago
Mike Fährmann
df2f0c09bb
[twitter] support "image_carousel_website" unified cards
3 years ago
Mike Fährmann
cdc96e1217
[gelbooru] improve video file detection ( fixes #2188 )
...
not all files from 'https://video-cdnN.gelbooru.com ' are videos
3 years ago
Mike Fährmann
4acc31bd9f
[newgrounds] set suitabilities filter before starting a search
3 years ago
Mike Fährmann
58a7921b5c
release version 1.20.1
3 years ago
Mike Fährmann
170711af7e
[mangadex] fix extraction ( closes #2177 )
3 years ago
Mike Fährmann
199e7616a7
[rule34] use https://api.rule34.xxx for API requests
3 years ago
Mike Fährmann
6e0a6c484f
apply SPECIAL_EXTRACTORS only for blacklist settings
...
as was the case before 010d65dc
3 years ago
Mike Fährmann
37beb1298e
[newgrounds] add 'search' extractor ( closes #2161 )
3 years ago
Mike Fährmann
8b910dd8ae
[hitomi] fix image URLs
...
again and again ...
3 years ago
Mike Fährmann
dcfe08838d
restore -d/--dest functionality
...
change short option for --directory from -d to -D
3 years ago
Mike Fährmann
3085aac4d8
[gelbooru] handle changed API response format ( #2157 )
3 years ago
Mike Fährmann
38e2af29d6
[hitomi] fix image URLs
...
update '_parse_gg()' yet again
3 years ago
Mike Fährmann
6f2e0c9c3d
fix cookie checks for patreon, fanbox, fantia
...
The changes in 9a255344
caused a warning about missing cookies to be
displayed even if those cookies were present, because _check_cookies()
did not account for an empty cookiedomain.
3 years ago
Mike Fährmann
1e0278702d
[hitomi] update '_parse_gg()'
3 years ago
Mike Fährmann
3b7c7daa76
improve UNC path handling ( #2126 )
...
always call 'abspath()' on the directory path to handle cases when the
current working directory is UNC and 'base-directory' is relative.
3 years ago
Mike Fährmann
47eae4c393
release version 1.20.0
3 years ago
Mike Fährmann
becc7f85a6
[hitomi] fix image URLs
3 years ago
Mike Fährmann
6af8d71da6
[kemonoparty] use service as subcategory ( closes #2147 )
3 years ago
Vrihub
96fcff182c
generic extractor ( #735 )
...
* Generic extractor, see issue #683
* Fix failed test_names test, no subcategory needed
* Prefix directory_fmt with "generic"
* Relax regex (would break some urls)
* Flake8 compliance
* pattern: don't require a scheme
This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).
* Fix using g: and r: on urls without http(s) scheme
Almost all extractors accept urls without an initial http(s) scheme.
Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).
This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.
This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.
* Relax imageurl_pattern_ext: allow relative urls
* First round of small suggested changes
* Support image urls starting with "//"
* self.baseurl: remove trailing slash
* Relax regexp (didn't catch some image urls)
* Some fixes and cleanup
* Fix domain pattern; option to enable extractor
Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
3 years ago
Mike Fährmann
4376b39a2b
[sexcom] fix and improve embed extraction ( fixes #2145 )
3 years ago
Mike Fährmann
6d190834ee
[instagram] fix error when PostPage data is not in GraphQL format
...
(#2037 )
3 years ago
Mike Fährmann
4edf43891c
add -d/--directory and -f/--filename command-line arguments
3 years ago
Mike Fährmann
dd67e24aa9
[lolisafe] include file ID in filenames
...
More precisely, it now splits the full 'filename' into 'name' and 'id'
instead of overwriting 'filename'. The format string stays the same as
before. Use '{name}.{extension}' to restore the old behavior.
before:
- filename: foobar
- id : 12345
now:
- filename: foobar-12345
- name : foobar
- id : 12345
3 years ago
Mike Fährmann
f3d61de18d
[artstation] create directories per asset ( closes #2136 )
3 years ago
Mike Fährmann
49a50fb2eb
[500px] create directories per photo
3 years ago
Mike Fährmann
89bebe1bef
[500px] add 'favorite' extractor ( closes #1927 )
3 years ago
Mike Fährmann
22b0433985
[fanbox] support pixiv redirects ( closes #2122 )
3 years ago
Mike Fährmann
281828b58b
[tumblrgallery] improve search pagination ( fixes #2132 )
3 years ago
Mike Fährmann
9b67e63a89
[ytdl] update to latest yt-dlp changes ( fixes #2124 )
3 years ago
Mike Fährmann
4bec34fc94
[pixiv] allow setting a date range for search results ( #2133 )
...
with the 'scd' and 'ecd' query parameters
3 years ago
Mike Fährmann
882c614281
add album extractor for lolisafe/chibisafe instances
...
- support bunkr.is (closes #2038 )
- support zz.ht (closes #2105 )
3 years ago
Mike Fährmann
7bf1d3fd32
rename --write-infojson to --write-info-json
...
to be consistent with the name used in youtube-dl/yt-dlp
(the old --write-infojson still works)
3 years ago
Mike Fährmann
d441888bfb
[deviantart] adjust API endpoints
...
Start all endpoints with a forward slash '/'
to be consistent with other API interfaces.
3 years ago
Mike Fährmann
8f0cf0bf71
[deviantart] use '/browse/newest' for most-recent searches
...
(#2096 )
3 years ago
Mike Fährmann
0bd7607da5
[tumblrgallery] improve 'id' extraction ( #2115 )
3 years ago
Mike Fährmann
ac80474371
handle UNC paths ( #2113 )
3 years ago
Mike Fährmann
47df50a2ad
add --sleep-request and --sleep-extractor command-line options
3 years ago
Mike Fährmann
64cf26eaf4
allow specifying sleep-* options as string
...
either as single value or as range: "3.5", "2.1 - 5.0"
3 years ago
Mike Fährmann
0d02a7861e
[tumblrgallery] fix extraction ( closes #2112 )
3 years ago
Mike Fährmann
62692c6842
[exhentai] add 'source' option
...
setting it to "hitomi" downloads the corresponding gallery from
hitomi.la; might be extended to other sources in the future
3 years ago
Mike Fährmann
099ed72de7
[hitomi] disable extra 'metadata' by default
...
safes one HTTP request that not needed with default filename settings
3 years ago
Mike Fährmann
9a25534490
use Extractor._check_cookies() for all cookie checks
3 years ago
Mike Fährmann
63c6bc26b5
[rule34us] extract tags per category ( #1527 )
...
like for other boorus with 'tags': true
3 years ago
Mike Fährmann
f587458a3c
[twitter] include '4096x4096' as a default image fallback
...
(closes #2107 , closes #1881 )
3 years ago
Mike Fährmann
8ed282f7f2
[kemonoparty] support coomer.party URLs ( #2100 )
3 years ago
Mike Fährmann
87ce3fa669
[furaffinity] warn when no session cookies were found
3 years ago
Mike Fährmann
159631c808
[philomena] use a default 'filter_id' if non is given
3 years ago
Mike Fährmann
ad30653b17
allow running a BaseExtractor for any URL
...
by prefixing it with '<base-category>:'
For example:
shopify:https://partakefoods.com/products/crunchy-cookie-variety-pack
gelbooru_v01:https://5naf.booru.org/index.php?page=post&s=view&id=46963
Available base categories are:
mastodon, shopify, moebooru, gelbooru_v01, gelbooru_v02,
reactor, foolslide, foolfuuka, philomena
3 years ago
Mike Fährmann
299bd2f1f5
[rule34us] add 'tag' and 'post' extractors ( #1527 )
3 years ago
Mike Fährmann
3cf1075d86
[inkbunny] add 'search' extractor ( closes #2094 )
3 years ago
Mike Fährmann
c6a23c26d7
[instagram] allow downloading specific stories ( closes #2088 )
...
https://instagram.com/stories/ <USER>/<ID> now only downloads the one
story specified by <ID> and not all stories from that user.
3 years ago
Mike Fährmann
352ffcddb0
[instagran] match post URLs with usernames ( fixes #2085 )
3 years ago
Mike Fährmann
45ca1693d8
add indicator to debug output when using a standalone executable
3 years ago
Mike Fährmann
f4e3cee6ac
use yt-dlp by default ( #1850 , #2028 )
3 years ago