Mike Fährmann
a9f78e6527
[twitter] improve error handling
...
- handle accounts without 'rest_id'
- handle timelines with empty 'instructions'
3 years ago
Mike Fährmann
729b07c1f5
[twitter] simplify
...
- use dict with common GraphQL variables
- reduce 'variables' size with custom JSON encoder instance
- centralise TwitterAPI() creation
3 years ago
Mike Fährmann
7cb29224f0
[philomena] fix search parameter escaping ( #2215 )
...
The pluses from search terms in /tags/ URLs need to be
replaced with spaces to get accepted by Philomena.
3 years ago
Mike Fährmann
9ca8bb2dc0
[twitter] improve error handling
3 years ago
Mike Fährmann
9a221494c3
[twitter] add 'event' extractor ( closes #2109 )
3 years ago
Mike Fährmann
14867dad6b
[twitter] fix unified cards from search results
3 years ago
Mike Fährmann
dffa440ede
[twitter] improve handling of deleted tweets ( #2212 )
3 years ago
Mike Fährmann
54ef874ba4
[twitter] fix retweet filter ( #2212 )
3 years ago
Mike Fährmann
cb43f7731b
[twitter] update to GraphQL API ( #2212 )
...
The old REST API endpoints, which were not used by Twitter since
summer 2021, are going to finally be phased out it seems, with
'/2/timeline/profile/USERID.json' being the first one.
Only Twitter's search doesn't have a GraphQL interface yet.
3 years ago
Mike Fährmann
de754590e0
add --source-address command-line option ( closes #2206 )
3 years ago
Mike Fährmann
698f35215e
[blogger] support new image domain ( fixes #2204 )
3 years ago
Mike Fährmann
c587b678d0
[mangadex] re-enable warning for external chapters ( #2193 )
3 years ago
Mike Fährmann
f2e8aedd74
[twitter] changes to 'cards' option
...
- change default value to 'true'
- only invoke youtube-dl for cards unsupported by gallery
when 'cards' is set to "ytdl"
"cards": true --> only download card images
"cards": "ytdl" --> download card images and
use youtube_dl on otherwise unsupported cards
3 years ago
Mike Fährmann
2d34d8ff8b
[reddit] allow downloading from quarantined subreddits ( #2180 )
3 years ago
Mike Fährmann
17c9c47ca0
[hitomi] fix 'tag' extraction ( fixes #2189 )
3 years ago
Mike Fährmann
df2f0c09bb
[twitter] support "image_carousel_website" unified cards
3 years ago
Mike Fährmann
cdc96e1217
[gelbooru] improve video file detection ( fixes #2188 )
...
not all files from 'https://video-cdnN.gelbooru.com ' are videos
3 years ago
Mike Fährmann
4acc31bd9f
[newgrounds] set suitabilities filter before starting a search
3 years ago
Mike Fährmann
170711af7e
[mangadex] fix extraction ( closes #2177 )
3 years ago
Mike Fährmann
199e7616a7
[rule34] use https://api.rule34.xxx for API requests
3 years ago
Mike Fährmann
37beb1298e
[newgrounds] add 'search' extractor ( closes #2161 )
3 years ago
Mike Fährmann
8b910dd8ae
[hitomi] fix image URLs
...
again and again ...
3 years ago
Mike Fährmann
3085aac4d8
[gelbooru] handle changed API response format ( #2157 )
3 years ago
Mike Fährmann
38e2af29d6
[hitomi] fix image URLs
...
update '_parse_gg()' yet again
3 years ago
Mike Fährmann
6f2e0c9c3d
fix cookie checks for patreon, fanbox, fantia
...
The changes in 9a255344
caused a warning about missing cookies to be
displayed even if those cookies were present, because _check_cookies()
did not account for an empty cookiedomain.
3 years ago
Mike Fährmann
1e0278702d
[hitomi] update '_parse_gg()'
3 years ago
Mike Fährmann
becc7f85a6
[hitomi] fix image URLs
3 years ago
Mike Fährmann
6af8d71da6
[kemonoparty] use service as subcategory ( closes #2147 )
3 years ago
Vrihub
96fcff182c
generic extractor ( #735 )
...
* Generic extractor, see issue #683
* Fix failed test_names test, no subcategory needed
* Prefix directory_fmt with "generic"
* Relax regex (would break some urls)
* Flake8 compliance
* pattern: don't require a scheme
This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).
* Fix using g: and r: on urls without http(s) scheme
Almost all extractors accept urls without an initial http(s) scheme.
Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).
This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.
This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.
* Relax imageurl_pattern_ext: allow relative urls
* First round of small suggested changes
* Support image urls starting with "//"
* self.baseurl: remove trailing slash
* Relax regexp (didn't catch some image urls)
* Some fixes and cleanup
* Fix domain pattern; option to enable extractor
Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
3 years ago
Mike Fährmann
4376b39a2b
[sexcom] fix and improve embed extraction ( fixes #2145 )
3 years ago
Mike Fährmann
6d190834ee
[instagram] fix error when PostPage data is not in GraphQL format
...
(#2037 )
3 years ago
Mike Fährmann
dd67e24aa9
[lolisafe] include file ID in filenames
...
More precisely, it now splits the full 'filename' into 'name' and 'id'
instead of overwriting 'filename'. The format string stays the same as
before. Use '{name}.{extension}' to restore the old behavior.
before:
- filename: foobar
- id : 12345
now:
- filename: foobar-12345
- name : foobar
- id : 12345
3 years ago
Mike Fährmann
f3d61de18d
[artstation] create directories per asset ( closes #2136 )
3 years ago
Mike Fährmann
49a50fb2eb
[500px] create directories per photo
3 years ago
Mike Fährmann
89bebe1bef
[500px] add 'favorite' extractor ( closes #1927 )
3 years ago
Mike Fährmann
22b0433985
[fanbox] support pixiv redirects ( closes #2122 )
3 years ago
Mike Fährmann
281828b58b
[tumblrgallery] improve search pagination ( fixes #2132 )
3 years ago
Mike Fährmann
4bec34fc94
[pixiv] allow setting a date range for search results ( #2133 )
...
with the 'scd' and 'ecd' query parameters
3 years ago
Mike Fährmann
882c614281
add album extractor for lolisafe/chibisafe instances
...
- support bunkr.is (closes #2038 )
- support zz.ht (closes #2105 )
3 years ago
Mike Fährmann
d441888bfb
[deviantart] adjust API endpoints
...
Start all endpoints with a forward slash '/'
to be consistent with other API interfaces.
3 years ago
Mike Fährmann
8f0cf0bf71
[deviantart] use '/browse/newest' for most-recent searches
...
(#2096 )
3 years ago
Mike Fährmann
0bd7607da5
[tumblrgallery] improve 'id' extraction ( #2115 )
3 years ago
Mike Fährmann
0d02a7861e
[tumblrgallery] fix extraction ( closes #2112 )
3 years ago
Mike Fährmann
62692c6842
[exhentai] add 'source' option
...
setting it to "hitomi" downloads the corresponding gallery from
hitomi.la; might be extended to other sources in the future
3 years ago
Mike Fährmann
099ed72de7
[hitomi] disable extra 'metadata' by default
...
safes one HTTP request that not needed with default filename settings
3 years ago
Mike Fährmann
9a25534490
use Extractor._check_cookies() for all cookie checks
3 years ago
Mike Fährmann
63c6bc26b5
[rule34us] extract tags per category ( #1527 )
...
like for other boorus with 'tags': true
3 years ago
Mike Fährmann
f587458a3c
[twitter] include '4096x4096' as a default image fallback
...
(closes #2107 , closes #1881 )
3 years ago
Mike Fährmann
8ed282f7f2
[kemonoparty] support coomer.party URLs ( #2100 )
3 years ago
Mike Fährmann
87ce3fa669
[furaffinity] warn when no session cookies were found
3 years ago
Mike Fährmann
159631c808
[philomena] use a default 'filter_id' if non is given
3 years ago
Mike Fährmann
ad30653b17
allow running a BaseExtractor for any URL
...
by prefixing it with '<base-category>:'
For example:
shopify:https://partakefoods.com/products/crunchy-cookie-variety-pack
gelbooru_v01:https://5naf.booru.org/index.php?page=post&s=view&id=46963
Available base categories are:
mastodon, shopify, moebooru, gelbooru_v01, gelbooru_v02,
reactor, foolslide, foolfuuka, philomena
3 years ago
Mike Fährmann
299bd2f1f5
[rule34us] add 'tag' and 'post' extractors ( #1527 )
3 years ago
Mike Fährmann
3cf1075d86
[inkbunny] add 'search' extractor ( closes #2094 )
3 years ago
Mike Fährmann
c6a23c26d7
[instagram] allow downloading specific stories ( closes #2088 )
...
https://instagram.com/stories/ <USER>/<ID> now only downloads the one
story specified by <ID> and not all stories from that user.
3 years ago
Mike Fährmann
352ffcddb0
[instagran] match post URLs with usernames ( fixes #2085 )
3 years ago
Mike Fährmann
f4e3cee6ac
use yt-dlp by default ( #1850 , #2028 )
3 years ago
Mike Fährmann
f1b142e993
{kemonoparty[ change default 'files' order to attachments,file,inline
...
(#1991 )
3 years ago
Mike Fährmann
275543b2d2
update extractor test results
3 years ago
Mike Fährmann
e7ea4f2567
[mangoxo] fix metadata extraction
3 years ago
Mike Fährmann
e298882acc
[kemonoparty] match URLs with www subdomain
3 years ago
Mike Fährmann
addb72e1bb
[reactor] support thatpervert.com ( closes #2029 )
3 years ago
Mike Fährmann
d8d9502e1e
[reactor] inherit from BaseExtractor
3 years ago
Mike Fährmann
f4ea216c95
[shopify] support loungeunderwear.com ( closes #2053 )
3 years ago
Mike Fährmann
93cef78450
[gelbooru] workaround pagination limits
...
Gelbooru only allows to retrieve the latest 20k posts for a tag search.
Add 'id:<N' to the search tags to work around that limitation, where N
is the ID of the last retrieved post.
http://gelbooru.me/index.php?page=forum&s=view&id=1467
3 years ago
Mike Fährmann
f2ae179713
[exhentai] fix extraction for disowned galleries ( closes #2055 )
3 years ago
Alice
612850438e
[skeb] add 'thumbnails' option ( #2047 ) ( #2051 )
3 years ago
Mike Fährmann
11a3d96d13
[mangadex] load additional metadata using includes[] directives
...
- always provide 'artist', 'author', and 'group' metadata fields (#2049 )
- remove 'metadata' option
3 years ago
Mike Fährmann
19e00f1322
[dynastyscans] provide 'date' as proper datetime object ( #2050 )
3 years ago
Mike Fährmann
af6424f398
allow testing metadata in list elements
3 years ago
Mike Fährmann
c67756e187
[kemonoparty] add 'dms' option ( #2008 )
3 years ago
Mike Fährmann
3a7a19c7b9
[dynastyscans] add 'manga' extractor ( closes #2035 )
3 years ago
Mike Fährmann
9bc83af3a6
[kemonoparty] 'postfile' -> 'file' ( #1991 )
...
to stay consistent with the existing file types for kemono
3 years ago
Mike Fährmann
522782c09d
[subscribestar] emit metadata for posts without media ( #1569 )
3 years ago
Mike Fährmann
1c8aaf9318
[subscribestar] add 'num' enumeration index ( closes #2040 )
3 years ago
Mike Fährmann
d433735750
[kemonoparty] skip duplicate files ( #2032 , #1991 , #1899 )
...
Extract the SHA-256 file hash from URLs
and skip files with the same hash in the same post.
- provide a 'hash' metadata field (empty string if not available)
- remove 'patreon-skip-file' option
3 years ago
Mike Fährmann
d4ec245554
[kemonoparty] implement a 'files' option ( #1991 )
...
similar to 8d676151
3 years ago
Mike Fährmann
ab8eea1a24
[twitter] fix extractor for direct image links ( fixes #2030 )
3 years ago
Mike Fährmann
2076d40681
[ytdl] improve error handling ( #1680 )
3 years ago
Mike Fährmann
2aaac3c997
[instagram] include user metadata for 'tagged' downloads ( #2024 )
...
Adds
- tagged_owner_id
- tagged_full_name
- tagged_username
containing the values for the user profile the URL originated from,
e.g. 'instagram' for https://www.instagram.com/instagram/tagged/ .
3 years ago
Mike Fährmann
cfa4876848
[philomena] support furbooru.org ( closes #1995 )
3 years ago
Mike Fährmann
4377f1c284
[twitter] distinguish between fatal & nonfatal errors ( #2020 )
...
only show a warning for nonfatal errors
and do not raise a StopExtraction exception
3 years ago
Kyle Anthony Williams
a14b72be21
[webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net ( #2005 )
...
* [webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net
This trick to avoid having to set a Referer header comes from
Webtoon's RSS feeds. The two URLs below are equivalent in content:
https://webtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
https://swebtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
The URL with the domain "webtoon-phinf.pstatic.net" needs a Referer
header, and the domain "swebtoon-phinf.pstatic.net" does not. This
is because of the environment "swebtoon" images live in, one without
explicit network control: RSS feeds on sites such as Feedly. This change should
make it easier for gallery-dl developers to embed Webtoon comics without
worrying about headers.
3 years ago
Mike Fährmann
6e3658ef52
[kemonoparty] provide 'date' metadata for gumroad ( #2007 )
...
Not the 'published' or 'edited' values since they are 'null',
but still better then nothing at all.
3 years ago
Mike Fährmann
37c9dedee1
[seisoparty] remove module
3 years ago
Mike Fährmann
efa178cc91
[ytdl] implement parsing ytdl command-line options ( #1680 )
...
- adds 'config-file' and 'cmdline-args' options
for both ytdl downloader and extractor
- create 'ytdl' helper module, which combines YoutubeDL creation
and option parsing.
- most likely a buggy mess due to incompatibilities between the
original youtube-dl and yt-dlp.
3 years ago
Mike Fährmann
7cb303d745
[redgifs] improve URL extraction
...
Fields inside 'urls' can be None, which would have caused an exception
with the old method.
3 years ago
Mike Fährmann
2befed1a96
[redgifs] update search URL pattern ( #1984 )
3 years ago
Mike Fährmann
b315a0ecef
[redgifs] update to API v2 ( #1984 )
3 years ago
Mike Fährmann
f0fc3b0ba1
[kemonoparty] add 'comments' option ( #1980 )
3 years ago
Mike Fährmann
1fac74b14d
[reddit] prevent crash for galleries with no 'media_metadata'
...
(fixes #2001 )
3 years ago
Mike Fährmann
211de95dd0
update extractor test results
3 years ago
Mike Fährmann
8bea02c38c
[deviantart] fix 'index' values for stashed deviations
3 years ago
Mike Fährmann
dd88a7d980
{cyberdrop] restore video extraction ( fixes #1993 )
...
fixes a regression introduced in f33c2ef7
3 years ago
Mike Fährmann
fa5646eadc
[mangoxo] fix login and extraction
3 years ago
Mike Fährmann
4c49174579
[mangakakalot] update domain and fix extraction
3 years ago
YongChan Cho
14852f7050
[hitomi] fix image path ( #1988 )
3 years ago
Mike Fährmann
dad2875a3e
fix calculating retry sleep times ( fixes #1990 )
3 years ago
Mike Fährmann
9156e90f1f
[twitter] add 'pinned' option
3 years ago
Mike Fährmann
06b414c9a3
[redgifs] 'gfyId' -> 'id' ( #1984 )
3 years ago