Mike Fährmann
170711af7e
[mangadex] fix extraction ( closes #2177 )
3 years ago
Mike Fährmann
199e7616a7
[rule34] use https://api.rule34.xxx for API requests
3 years ago
Mike Fährmann
6e0a6c484f
apply SPECIAL_EXTRACTORS only for blacklist settings
...
as was the case before 010d65dc
3 years ago
Mike Fährmann
37beb1298e
[newgrounds] add 'search' extractor ( closes #2161 )
3 years ago
Mike Fährmann
8b910dd8ae
[hitomi] fix image URLs
...
again and again ...
3 years ago
Mike Fährmann
dcfe08838d
restore -d/--dest functionality
...
change short option for --directory from -d to -D
3 years ago
Mike Fährmann
3085aac4d8
[gelbooru] handle changed API response format ( #2157 )
3 years ago
Mike Fährmann
38e2af29d6
[hitomi] fix image URLs
...
update '_parse_gg()' yet again
3 years ago
Mike Fährmann
6f2e0c9c3d
fix cookie checks for patreon, fanbox, fantia
...
The changes in 9a255344
caused a warning about missing cookies to be
displayed even if those cookies were present, because _check_cookies()
did not account for an empty cookiedomain.
3 years ago
Mike Fährmann
1e0278702d
[hitomi] update '_parse_gg()'
3 years ago
Mike Fährmann
3b7c7daa76
improve UNC path handling ( #2126 )
...
always call 'abspath()' on the directory path to handle cases when the
current working directory is UNC and 'base-directory' is relative.
3 years ago
Mike Fährmann
47eae4c393
release version 1.20.0
3 years ago
Mike Fährmann
becc7f85a6
[hitomi] fix image URLs
3 years ago
Mike Fährmann
6af8d71da6
[kemonoparty] use service as subcategory ( closes #2147 )
3 years ago
Vrihub
96fcff182c
generic extractor ( #735 )
...
* Generic extractor, see issue #683
* Fix failed test_names test, no subcategory needed
* Prefix directory_fmt with "generic"
* Relax regex (would break some urls)
* Flake8 compliance
* pattern: don't require a scheme
This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).
* Fix using g: and r: on urls without http(s) scheme
Almost all extractors accept urls without an initial http(s) scheme.
Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).
This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.
This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.
* Relax imageurl_pattern_ext: allow relative urls
* First round of small suggested changes
* Support image urls starting with "//"
* self.baseurl: remove trailing slash
* Relax regexp (didn't catch some image urls)
* Some fixes and cleanup
* Fix domain pattern; option to enable extractor
Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
3 years ago
Mike Fährmann
4376b39a2b
[sexcom] fix and improve embed extraction ( fixes #2145 )
3 years ago
Mike Fährmann
6d190834ee
[instagram] fix error when PostPage data is not in GraphQL format
...
(#2037 )
3 years ago
Mike Fährmann
4edf43891c
add -d/--directory and -f/--filename command-line arguments
3 years ago
Mike Fährmann
dd67e24aa9
[lolisafe] include file ID in filenames
...
More precisely, it now splits the full 'filename' into 'name' and 'id'
instead of overwriting 'filename'. The format string stays the same as
before. Use '{name}.{extension}' to restore the old behavior.
before:
- filename: foobar
- id : 12345
now:
- filename: foobar-12345
- name : foobar
- id : 12345
3 years ago
Mike Fährmann
f3d61de18d
[artstation] create directories per asset ( closes #2136 )
3 years ago
Mike Fährmann
49a50fb2eb
[500px] create directories per photo
3 years ago
Mike Fährmann
89bebe1bef
[500px] add 'favorite' extractor ( closes #1927 )
3 years ago
Mike Fährmann
22b0433985
[fanbox] support pixiv redirects ( closes #2122 )
3 years ago
Mike Fährmann
281828b58b
[tumblrgallery] improve search pagination ( fixes #2132 )
3 years ago
Mike Fährmann
9b67e63a89
[ytdl] update to latest yt-dlp changes ( fixes #2124 )
3 years ago
Mike Fährmann
4bec34fc94
[pixiv] allow setting a date range for search results ( #2133 )
...
with the 'scd' and 'ecd' query parameters
3 years ago
Mike Fährmann
882c614281
add album extractor for lolisafe/chibisafe instances
...
- support bunkr.is (closes #2038 )
- support zz.ht (closes #2105 )
3 years ago
Mike Fährmann
7bf1d3fd32
rename --write-infojson to --write-info-json
...
to be consistent with the name used in youtube-dl/yt-dlp
(the old --write-infojson still works)
3 years ago
Mike Fährmann
d441888bfb
[deviantart] adjust API endpoints
...
Start all endpoints with a forward slash '/'
to be consistent with other API interfaces.
3 years ago
Mike Fährmann
8f0cf0bf71
[deviantart] use '/browse/newest' for most-recent searches
...
(#2096 )
3 years ago
Mike Fährmann
0bd7607da5
[tumblrgallery] improve 'id' extraction ( #2115 )
3 years ago
Mike Fährmann
ac80474371
handle UNC paths ( #2113 )
3 years ago
Mike Fährmann
47df50a2ad
add --sleep-request and --sleep-extractor command-line options
3 years ago
Mike Fährmann
64cf26eaf4
allow specifying sleep-* options as string
...
either as single value or as range: "3.5", "2.1 - 5.0"
3 years ago
Mike Fährmann
0d02a7861e
[tumblrgallery] fix extraction ( closes #2112 )
3 years ago
Mike Fährmann
62692c6842
[exhentai] add 'source' option
...
setting it to "hitomi" downloads the corresponding gallery from
hitomi.la; might be extended to other sources in the future
3 years ago
Mike Fährmann
099ed72de7
[hitomi] disable extra 'metadata' by default
...
safes one HTTP request that not needed with default filename settings
3 years ago
Mike Fährmann
9a25534490
use Extractor._check_cookies() for all cookie checks
3 years ago
Mike Fährmann
63c6bc26b5
[rule34us] extract tags per category ( #1527 )
...
like for other boorus with 'tags': true
3 years ago
Mike Fährmann
f587458a3c
[twitter] include '4096x4096' as a default image fallback
...
(closes #2107 , closes #1881 )
3 years ago
Mike Fährmann
8ed282f7f2
[kemonoparty] support coomer.party URLs ( #2100 )
3 years ago
Mike Fährmann
87ce3fa669
[furaffinity] warn when no session cookies were found
3 years ago
Mike Fährmann
159631c808
[philomena] use a default 'filter_id' if non is given
3 years ago
Mike Fährmann
ad30653b17
allow running a BaseExtractor for any URL
...
by prefixing it with '<base-category>:'
For example:
shopify:https://partakefoods.com/products/crunchy-cookie-variety-pack
gelbooru_v01:https://5naf.booru.org/index.php?page=post&s=view&id=46963
Available base categories are:
mastodon, shopify, moebooru, gelbooru_v01, gelbooru_v02,
reactor, foolslide, foolfuuka, philomena
3 years ago
Mike Fährmann
299bd2f1f5
[rule34us] add 'tag' and 'post' extractors ( #1527 )
3 years ago
Mike Fährmann
3cf1075d86
[inkbunny] add 'search' extractor ( closes #2094 )
3 years ago
Mike Fährmann
c6a23c26d7
[instagram] allow downloading specific stories ( closes #2088 )
...
https://instagram.com/stories/ <USER>/<ID> now only downloads the one
story specified by <ID> and not all stories from that user.
3 years ago
Mike Fährmann
352ffcddb0
[instagran] match post URLs with usernames ( fixes #2085 )
3 years ago
Mike Fährmann
45ca1693d8
add indicator to debug output when using a standalone executable
3 years ago
Mike Fährmann
f4e3cee6ac
use yt-dlp by default ( #1850 , #2028 )
3 years ago
Mike Fährmann
f1b142e993
{kemonoparty[ change default 'files' order to attachments,file,inline
...
(#1991 )
3 years ago
Mike Fährmann
0054ad434e
[output] write directly to sys.stdout
3 years ago
Mike Fährmann
da14b3fe9f
[output] write download progress indicator to stderr
3 years ago
Mike Fährmann
604d5b8bb2
release version 1.19.3
3 years ago
Mike Fährmann
275543b2d2
update extractor test results
3 years ago
Mike Fährmann
e7ea4f2567
[mangoxo] fix metadata extraction
3 years ago
Mike Fährmann
e298882acc
[kemonoparty] match URLs with www subdomain
3 years ago
Mike Fährmann
addb72e1bb
[reactor] support thatpervert.com ( closes #2029 )
3 years ago
Mike Fährmann
d8d9502e1e
[reactor] inherit from BaseExtractor
3 years ago
Mike Fährmann
f4ea216c95
[shopify] support loungeunderwear.com ( closes #2053 )
3 years ago
Mike Fährmann
93cef78450
[gelbooru] workaround pagination limits
...
Gelbooru only allows to retrieve the latest 20k posts for a tag search.
Add 'id:<N' to the search tags to work around that limitation, where N
is the ID of the last retrieved post.
http://gelbooru.me/index.php?page=forum&s=view&id=1467
3 years ago
Mike Fährmann
f2ae179713
[exhentai] fix extraction for disowned galleries ( closes #2055 )
3 years ago
Alice
612850438e
[skeb] add 'thumbnails' option ( #2047 ) ( #2051 )
3 years ago
Mike Fährmann
010d65dcec
extend blacklist/whitelist syntax ( #2025 )
...
Each entry in such a list can now also include a subcategory
'<category>:<subcategory>'
and it is possible to use '*' or an empty string as placeholder
'*:<subcategory>', ':<subcategory>', '<category>:*'
For example
"blacklist": "imgur,*:tag,gfycat:user" or
"blacklist": ["imgur", "*:tag", "gfycat:user"]
will filter all 'imgur' extractors, all extractors with a 'tag'
subcategory (e.g. https://danbooru.donmai.us/posts?tags=bonocho ),
and all 'gfycat' user extractors.
3 years ago
Mike Fährmann
11a3d96d13
[mangadex] load additional metadata using includes[] directives
...
- always provide 'artist', 'author', and 'group' metadata fields (#2049 )
- remove 'metadata' option
3 years ago
Mike Fährmann
19e00f1322
[dynastyscans] provide 'date' as proper datetime object ( #2050 )
3 years ago
Mike Fährmann
af6424f398
allow testing metadata in list elements
3 years ago
Mike Fährmann
c67756e187
[kemonoparty] add 'dms' option ( #2008 )
3 years ago
Mike Fährmann
3842cdcd8f
[formatter] implement 'D' format specifier
...
To be able to parse any string into a 'datetime' object
and format it as necessary.
Example:
{created_at:D%Y-%m-%dT%H:%M:%S%z}
->
"2010-01-01 00:00:00"
{created_at:D%Y-%m-%dT%H:%M:%S%z/%b %d %Y %I:%M %p}
->
"Jan 01 2010 12:00 AM"
with 'created_at' == "2010-01-01T01:00:00+0100"
3 years ago
Mike Fährmann
3a7a19c7b9
[dynastyscans] add 'manga' extractor ( closes #2035 )
3 years ago
Mike Fährmann
9bc83af3a6
[kemonoparty] 'postfile' -> 'file' ( #1991 )
...
to stay consistent with the existing file types for kemono
3 years ago
Mike Fährmann
522782c09d
[subscribestar] emit metadata for posts without media ( #1569 )
3 years ago
Mike Fährmann
1c8aaf9318
[subscribestar] add 'num' enumeration index ( closes #2040 )
3 years ago
Mike Fährmann
d433735750
[kemonoparty] skip duplicate files ( #2032 , #1991 , #1899 )
...
Extract the SHA-256 file hash from URLs
and skip files with the same hash in the same post.
- provide a 'hash' metadata field (empty string if not available)
- remove 'patreon-skip-file' option
3 years ago
Mike Fährmann
d4ec245554
[kemonoparty] implement a 'files' option ( #1991 )
...
similar to 8d676151
3 years ago
Mike Fährmann
ab8eea1a24
[twitter] fix extractor for direct image links ( fixes #2030 )
3 years ago
Mike Fährmann
2076d40681
[ytdl] improve error handling ( #1680 )
3 years ago
Mike Fährmann
8eaedb0bd3
[ytdl] fix some compatibility issues ( #1680 )
3 years ago
Mike Fährmann
2aaac3c997
[instagram] include user metadata for 'tagged' downloads ( #2024 )
...
Adds
- tagged_owner_id
- tagged_full_name
- tagged_username
containing the values for the user profile the URL originated from,
e.g. 'instagram' for https://www.instagram.com/instagram/tagged/ .
3 years ago
Mike Fährmann
cfa4876848
[philomena] support furbooru.org ( closes #1995 )
3 years ago
Mike Fährmann
4377f1c284
[twitter] distinguish between fatal & nonfatal errors ( #2020 )
...
only show a warning for nonfatal errors
and do not raise a StopExtraction exception
3 years ago
Mike Fährmann
19403a7fff
[downloader:ytdl] prevent crash in '_progress_hook()' ( #1680 )
...
'speed' is not guaranteed to be defined or convertible to 'int'
3 years ago
Mike Fährmann
01b28f3674
[ytdl] fix syntax for Python 3.4
3 years ago
Kyle Anthony Williams
a14b72be21
[webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net ( #2005 )
...
* [webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net
This trick to avoid having to set a Referer header comes from
Webtoon's RSS feeds. The two URLs below are equivalent in content:
https://webtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
https://swebtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
The URL with the domain "webtoon-phinf.pstatic.net" needs a Referer
header, and the domain "swebtoon-phinf.pstatic.net" does not. This
is because of the environment "swebtoon" images live in, one without
explicit network control: RSS feeds on sites such as Feedly. This change should
make it easier for gallery-dl developers to embed Webtoon comics without
worrying about headers.
3 years ago
Mike Fährmann
6e3658ef52
[kemonoparty] provide 'date' metadata for gumroad ( #2007 )
...
Not the 'published' or 'edited' values since they are 'null',
but still better then nothing at all.
3 years ago
Mike Fährmann
f4d201f626
[ytdl] fix syntax for Python 3.4
3 years ago
Mike Fährmann
37c9dedee1
[seisoparty] remove module
3 years ago
Mike Fährmann
efa178cc91
[ytdl] implement parsing ytdl command-line options ( #1680 )
...
- adds 'config-file' and 'cmdline-args' options
for both ytdl downloader and extractor
- create 'ytdl' helper module, which combines YoutubeDL creation
and option parsing.
- most likely a buggy mess due to incompatibilities between the
original youtube-dl and yt-dlp.
3 years ago
Mike Fährmann
a881305357
release version 1.19.2
3 years ago
Mike Fährmann
7cb303d745
[redgifs] improve URL extraction
...
Fields inside 'urls' can be None, which would have caused an exception
with the old method.
3 years ago
Mike Fährmann
2befed1a96
[redgifs] update search URL pattern ( #1984 )
3 years ago
Mike Fährmann
b315a0ecef
[redgifs] update to API v2 ( #1984 )
3 years ago
Mike Fährmann
f0fc3b0ba1
[kemonoparty] add 'comments' option ( #1980 )
3 years ago
Mike Fährmann
1fac74b14d
[reddit] prevent crash for galleries with no 'media_metadata'
...
(fixes #2001 )
3 years ago
Mike Fährmann
211de95dd0
update extractor test results
3 years ago
Mike Fährmann
8bea02c38c
[deviantart] fix 'index' values for stashed deviations
3 years ago
Mike Fährmann
dd88a7d980
{cyberdrop] restore video extraction ( fixes #1993 )
...
fixes a regression introduced in f33c2ef7
3 years ago
Mike Fährmann
fa5646eadc
[mangoxo] fix login and extraction
3 years ago
Mike Fährmann
4c49174579
[mangakakalot] update domain and fix extraction
3 years ago
YongChan Cho
14852f7050
[hitomi] fix image path ( #1988 )
3 years ago
Mike Fährmann
46e17c5e61
support accessing the current local datetime in format strings
...
{_now}, {_now:%Y-%m-%d}, etc
(#1968 )
3 years ago
Mike Fährmann
dad2875a3e
fix calculating retry sleep times ( fixes #1990 )
3 years ago
Mike Fährmann
9156e90f1f
[twitter] add 'pinned' option
3 years ago
Mike Fährmann
06b414c9a3
[redgifs] 'gfyId' -> 'id' ( #1984 )
3 years ago
Ryu juheon
d4614e5ba4
[hitomi] fix image URLs ( #1982 )
3 years ago
Mike Fährmann
6434ccf9e8
[redgifs] split from 'gfycat' ( #1984 )
...
Update API endpoints and metadata names - mostly 'gfycat' -> 'gif' -
and remove some obsolete checks.
3 years ago
Mike Fährmann
38193dba46
support accessing environment variables in format strings ( #1968 )
...
{_env[HOME]} to get the value of $HOME
every other format string feature is supported as well
3 years ago
Mike Fährmann
e4696b40ba
[instagram] update query hashes
3 years ago
Alice
bfd7401b1e
[skeb] add 'user' and 'post' extractors ( #1031 ) ( #1971 )
...
* Create skeb.py
* Update __init__.py
* Update supportedsites.py
* Update supportedsites.md
* Update supportedsites.py
* Update skeb.py
3 years ago
Ryu juheon
6b6d92d51c
[hitomi]: fix image URLs ( #1975 )
3 years ago
Mike Fährmann
dcb201ff19
[gfycat] show warning when there are no available formats
3 years ago
Mike Fährmann
e436a2607b
[gfycat] consistent 'userName' values for 'user' downloads ( #1962 )
...
by using the name from the input URL and not relying on possibly faulty
or incomplete API results.
'userData[username]', if available, will still have the original name.
3 years ago
Mike Fährmann
ba9579c504
release version 1.19.1
3 years ago
Mike Fährmann
f1487a3cfa
[kemonoparty:discord] improve 'inline' extraction ( #1940 )
...
- extract media.discordapp.*NET* URLs
- rewrite media.discordapp.net to cdn.discordapp.com
- use a more restricted set of characters for the URL path
3 years ago
Mike Fährmann
02a247f4e5
[deviantart] full resolution for non-downloadable images ( #293 )
...
Many thanks to @Ironchest337 for discovering this method
and providing a well-documented implementation.
3 years ago
Mike Fährmann
a7ddb5f5fa
[deviantart] update 'search' argument handling ( fixes #1911 )
...
- use 'alltime' by default
- support newer 'order' values (most-recent, this-week, etc)
3 years ago
Mike Fährmann
c19e762fdf
[vk] add 'album' extractor ( #474 , fixes #1952 )
...
todo: better metadata for albums
3 years ago
Mike Fährmann
8bb442f20d
[redgifs][gfycat] provide fallback URLs ( fixes #1962 )
...
and extend the 'format' option
3 years ago
Mike Fährmann
b6443c576d
[kemonoparty:discord] extract 'inline' files
3 years ago
Mike Fährmann
232ab626a7
[downloader:ytdl] prevent crash in '_progress_hook()'
...
https://github.com/mikf/gallery-dl/discussions/1964#discussioncomment-1516702
3 years ago
Mike Fährmann
bcbf9bcf36
[kemonoparty] split 'discord' extractor ( #1940 )
...
in 'server' and 'channel'
3 years ago
Mike Fährmann
db857b40d8
[kemonoparty] improve inline extraction ( #1899 )
3 years ago
Mike Fährmann
975e0a4fe0
[furaffinity] unquote search queries ( #1958 )
...
instead of unescape
(unquote -> url params, unescape -> html entities)
3 years ago
Mike Fährmann
8d676151b7
[patreon] implement 'files' option ( #1935 )
3 years ago
Mike Fährmann
6695ef2e10
[patreon] better filenames for 'content' images ( #1954 )
3 years ago
Mike Fährmann
70005e3275
[kemonoparty:discord] support downloading from a specific channel
...
https://kemono.party/discord/server/ <server-id>#<channel-name>>
3 years ago
Mike Fährmann
003f25931d
[kemonoparty:discord] provide a 'channel_name'
3 years ago
Mike Fährmann
28bdd58e6d
[nhentai] simplify
3 years ago
Mike Fährmann
50098762e3
[nhentai] add 'tag' extractor ( closes #1950 )
3 years ago
Mike Fährmann
fe6ce5495a
[kemonoparty] add 'discord' extractor ( #1827 , #1940 )
3 years ago
Mike Fährmann
f2d6b3e6b4
run tests without using 'nose'
...
run_tests.sh -> run_tests.py
3 years ago
Mike Fährmann
918fc9974d
[picarto] add 'gallery' extractor ( closes #1931 )
3 years ago
Mike Fährmann
e33125ad39
[pixiv] add 'sketch' extractor ( #1497 )
3 years ago
Mike Fährmann
e9dc6ff262
[inkbunny] add 'following' extractor ( #515 )
3 years ago
Mike Fährmann
9c8fc6e7b4
[inkbunny] match "long" URLs for pools and favorites ( #1937 )
3 years ago
Mike Fährmann
f33c2ef73b
[cyberdrop] extract direct download URLs ( #1943 )
...
do not rely on redirects from f.cyberdrop.cc
3 years ago
Mike Fährmann
b93915c113
[inkbunny] add 'pool' extractor ( #1937 )
3 years ago
Mike Fährmann
373d3e1c57
[seisoparty] implement login with username & password ( #1906 )
3 years ago
Mike Fährmann
7c5f62b453
[seisoparty] add 'favorite' extractor ( #1906 )
3 years ago
Mike Fährmann
d93b5474c3
[mangadex] update parameter handling for API requests
...
- move common parameters into '_pagination()'
- add 'ratings' (#1908 ) and 'api-parameters' options
3 years ago
Mike Fährmann
cd66c3c415
[twitter] add 'size' option ( #1881 )
3 years ago
Mike Fährmann
df8050b81d
[postprocessor:compare] add 'equal' option ( #1592 )
...
Move functionality from cdd72e14
to its own option,
where it can be used with any 'action'
3 years ago
Mike Fährmann
f8410203ef
release version 1.19.0
3 years ago
Mike Fährmann
cdd72e1413
[postprocessor:compare] extend 'action' option ( #1592 )
...
allow setting it to "abort", "terminate", or "exit" as with 'skip'
3 years ago
Mike Fährmann
fb98b3fdaf
[redgifs][gfycat] remove webtoken code ( fixes #1907 )
3 years ago
Mike Fährmann
96215c926e
[mangadex] fix retrieving chapters from 'pornographic' titles
...
(fixes #1908 )
3 years ago
Mike Fährmann
da9685609c
[kemonoparty] update file download URLs
...
(closes #1902 , fixes #1903 )
3 years ago
Mike Fährmann
783eae6fc5
[hiperdex] fix extraction
3 years ago
Mike Fährmann
28f1c36da2
simplify and adjust download progress indicator ( #1519 )
3 years ago
Mike Fährmann
e0bdacd932
[fappic] add 'image' extractor ( closes #1898 )
3 years ago