Mike Fährmann
b9bdd2c564
[hentaifoundry] add support for stories ( closes #734 )
4 years ago
Mike Fährmann
9a9d1924d8
[hentaicafe] add 'manga_id' metadata field ( closes #1036 )
...
This field is only available when using a non-foolslide URL
like '/hc.fyi/9874' or '/hazuki-yuuto-summer-blues/'
4 years ago
Mike Fährmann
cc4ac80302
[weasyl] add 'favorite' extractor ( #1032 )
4 years ago
Mike Fährmann
e9cc719497
[weasyl] update and simplify
...
- simplify 'pattern' regexps
- parse 'posted_at' as 'date'
- use unaltered 'title' ({title!l:R /_/} to lowercase and replace spaces)
4 years ago
Mike Fährmann
6514312126
[nijie] add 'include' option ( closes #1018 )
4 years ago
Mike Fährmann
0d43456323
[hentaifoundry] add 'include' option
4 years ago
Zanny
ebb7737b9b
Weasyl Extractor ( #977 )
...
* weasyl extractor
* @kattjevfel suggested changes
* @mikf changes
4 years ago
Mike Fährmann
d5fa716d89
fix crash when using 'skip=false' and archive ( fixes #1023 )
...
Separating the archive check from pathfmt.exists() in b5243297
had some unintended side effects.
It is also not possible to monkey-patch a dunder method like
__contains__ because of the special method lookup that gets
performed for them.
4 years ago
Mike Fährmann
aeb0d32333
[twitter] improve twitpic extraction ( fixes #1019 )
...
- ignore twitpic.com/photos/… URLs
- ignore empty image URLs
4 years ago
Mike Fährmann
2184ec5d78
release version 1.15.0
4 years ago
Mike Fährmann
7cd383c0f9
update extractor test results
4 years ago
Mike Fährmann
1e313d5b84
implement 'sleep-request' option
4 years ago
Mike Fährmann
65744a7a31
use alternative for all falsey values in format strings
...
… and not just None (#525 )
It would be better to consistently use None for all non-existent
fields and/or fields without a valid value, but this is a good
enough workaround for now.
4 years ago
Mike Fährmann
c43b3894be
[myhentaigallery] update and fix extraction ( #1001 )
...
- extract more metadata
- match "/show/" URLs
- complete test results
- fix missing images for lines starting with " <img"
- fix missing comma in supportedsites.py
4 years ago
choeronline
05b9ac8d37
[myhentaigallery] add extractor ( #1001 )
...
* adds support for myhentaigallery
* fixes linting issues in myhentaigallery extractor
4 years ago
Mike Fährmann
2626629117
[danbooru] handle posts without 'id' ( fixes #1004 )
4 years ago
Mike Fährmann
cc1fb0b4ea
[500px] update query hash
4 years ago
Mike Fährmann
da87a5fb7e
[exhentai] fix accessing config before main constructor
...
bug introduced with 055c32e0
Making 'Extractor.config()' quite a bit faster is worth the "cost"
of having to set _cfgpath in exhentai constructors, I think.
4 years ago
Mike Fährmann
f5b7ae01c1
update extractor test results
4 years ago
Mike Fährmann
136df52d1f
[deviantart] support watchers-only/paid deviations ( #995 )
4 years ago
Mike Fährmann
055c32e0f7
precompute extractor config paths
4 years ago
Mike Fährmann
231dd4c800
accumulate postprocessor objects ( #994 )
...
Instead of one 'postprocessors' setting overwriting all others lower
in the hierarchy, all postprocessors along the config path will now
get collected into one big list.
For example '--mtime-from-date' will therefore no longer cause
other postprocessor settings in a config file to get ignored.
4 years ago
Mike Fährmann
392d022b04
implement 'config.accumulate()' ( #994 )
4 years ago
Mike Fährmann
3afd362e2e
add 'sleep-extractor' option ( closes #964 )
...
(would have been nice if this were possible without code duplication)
4 years ago
Mike Fährmann
3108e85b89
[worldthree] remove extractors
...
http://www.slide.world-three.org/ hasn't been accessible for a long time.
4 years ago
Mike Fährmann
8fed3eb8cb
[jaiminisbox] remove extractors
...
https://jaiminisbox.com/post.html
4 years ago
Mike Fährmann
dcf3ad7eef
[furaffinity] update download URL extraction ( fixes #988 )
...
support the new 'd2.facdn.net' subdomain
4 years ago
Mike Fährmann
3918b69677
remove 'extractor.blacklist' context manager
4 years ago
Mike Fährmann
c78aa17506
add general 'blacklist' and 'whitelist' options ( #492 , #844 )
4 years ago
Mike Fährmann
abda352a5b
add '--no-skip' command-line option ( closes #986 )
4 years ago
Mike Fährmann
5912727b88
support format string replacement fields in archive paths
...
(closes #985 )
4 years ago
Mike Fährmann
2b8d57f0ab
[twitter] support '/intent/user?user_id=…' URLs ( #980 )
4 years ago
Mike Fährmann
a3b473bd2f
[twitter] support specifying users by ID ( #980 )
...
by using 'id:…' as their screen name, i.e.
https://www.twitter.com/id:2976459548/media
instead of
https://twitter.com/supernaturepics/media
The user ID can, for example, be obtained from the output of
$ gallery-dl -j --range 1 https://twitter.com/ <screen-name>
4 years ago
Mike Fährmann
a0d916ed41
[exhentai] update wait time before original image download ( #978 )
...
depend on 'wait-max', don't use a hard-coded value
4 years ago
Mike Fährmann
f6fd449b59
reduce wait time growth rate from exponential to linear
...
Waiting for 2**N seconds after each error grows too fast.
Simply waiting N seconds seems far more reasonable.
4 years ago
Mike Fährmann
bc48514d84
[aryion] get post ID via gallery-item ( fixes #981 , closes #982 )
...
this even works when fetching post IDs from '/latest.php?id='
4 years ago
Mike Fährmann
799ca07fc8
[imgur] update
...
- fix image/album detection for galleries
- use new API endpoints for image/album data
4 years ago
Mike Fährmann
b5243297ff
write skipped files to archive ( closes #550 )
4 years ago
Mike Fährmann
ac3036ef56
add 'filesize-min' and 'filesize-max' options ( closes #780 )
4 years ago
Mike Fährmann
7876a03ece
[tumblr] create directories for each post ( fixes #965 )
...
This changes the identifiers for directory format string fields.
Everything blog related is now inside a 'blog' object
and not at the "base level" anymore.
E.g. '{name}' for directories is now '{blog[name]}'
(or '{blog_name}', since that is also available)
4 years ago
Mike Fährmann
fd0685d9b5
[postprocessor:zip] defer zip file creation ( fixes #968 )
...
don't try to create zip files on postprocessor construction,
wait until directory creation during file download,
4 years ago
Mike Fährmann
33fe67b594
release version 1.14.5
4 years ago
Mike Fährmann
d50f3b333a
update extractor test results
4 years ago
Mike Fährmann
0f55b8e80a
[exhentai] fix type check from dbbbb21
( #940 )
...
'bool' is a subclass of 'int', and therefore
'isinstance(self.limits, int)' also returns True when
'self.limits' has a boolean value
4 years ago
Mike Fährmann
e33293fdd8
[hentaihand] update to new site layout
4 years ago
Mike Fährmann
fda9e296dd
[gelbooru] fix extraction without API
4 years ago
Mike Fährmann
69e4871005
update extractor test results
...
- sensescans: replace 404d chapters
- mangapark: replace 404d chapters
- subscribestar: update test for attached files
4 years ago
Mike Fährmann
ab1af66a97
[imgur] add 'search' extractor ( #934 )
4 years ago
Mike Fährmann
e4bbc1fb5c
[imgur] add 'tag' extractor ( #934 )
4 years ago
Mike Fährmann
deaacc70bb
[hitomi] update URL pattern for tag searches
4 years ago
ArtaxIsSleeping
0e941553ec
[aryion] Add username/password support ( #960 )
...
* Add username/password support to aryion extractor
* Update docs to match
* Fix code style
4 years ago
Mike Fährmann
84e04cc23b
[500px] fix extraction and update URL patterns ( fixes #956 )
...
- rewrite most API calls to GraphQL queries
- match '500px.com/p/<user>' URLs
4 years ago
Mike Fährmann
d4ff767291
[reddit] improve gallery extraction ( fixes #955 )
4 years ago
Mike Fährmann
7140fe7e6d
[hitomi] fix redirect processing
4 years ago
Mike Fährmann
a57b6b3c3a
[reddit] handle deleted galleries ( fixes #953 )
4 years ago
Mike Fährmann
063c71cd84
[furaffinity] add 'search' extractor ( closes #915 )
4 years ago
Mike Fährmann
dbbbb21180
[exhentai] add ability to specify custom image limit ( #940 )
4 years ago
Mike Fährmann
b2009ea39e
[aryion] update folder mime type list ( fixes #945 )
4 years ago
Mike Fährmann
688bd046fc
release version 1.14.4
4 years ago
Mike Fährmann
d06ad148c7
[shopify] use alternate regex for products on collection pages
...
when the first on doesn't yield any results
4 years ago
Mike Fährmann
7619152988
[reactor] sort 'tags'
...
to ensure a consistent order for test results
4 years ago
Mike Fährmann
cd9de613a2
[exhentai] adjust image limit costs ( #940 )
...
Each original file costs 10 points per 10^6 bytes,
not 10 per 2^20 == 1048576 bytes.
4 years ago
Mike Fährmann
2e6f6ee1c1
[mangoxo] fix login
4 years ago
Mike Fährmann
a6a080656c
[pixnet] detect password-protected albums ( #177 )
4 years ago
Mike Fährmann
67ac6667af
[mangareader] fix extraction
4 years ago
Mike Fährmann
2b88c90f6f
[blogger] add search extractor ( #925 )
4 years ago
Mike Fährmann
d5067c51c5
[instagram] support '/reel/' URLs
4 years ago
Mike Fährmann
2c9766b29f
fix UnboundLocalError in Extractor.request()
...
introduced in d6a271d
4 years ago
Mike Fährmann
aa64149583
[blogger] support searching posts by labels ( closes #925 )
4 years ago
Mike Fährmann
60ba3cb946
[reddit] support gallery posts ( closes #920 )
4 years ago
Mike Fährmann
0d84d3af55
[subscribestar] extract attached media files ( #852 )
4 years ago
Mike Fährmann
19bf76bcf8
update extractor test results
4 years ago
Mike Fährmann
0762d6b29c
[inkbunny] add 'num' field ( #283 )
4 years ago
Mike Fährmann
fbc4278fe4
[instagram] wait before GraphQL requests ( #901 )
4 years ago
Mike Fährmann
ec5870576d
[imgur] handle 403 overcapacity responses ( closes #910 )
4 years ago
Mike Fährmann
d6a271d2c7
add 'response' objects to 'HttpError's
4 years ago
Mike Fährmann
72c5578a27
[hentainexus] improve/simplify code
4 years ago
Mike Fährmann
627d2141d3
[xhamster] fix extraction ( closes #917 )
4 years ago
Mike Fährmann
3f73cc6855
allow 'parent-directory' to work recursively ( fixes #905 )
4 years ago
Mike Fährmann
27e31f4a16
[myportfolio] raise 'NotFoundError' for deleted posts
4 years ago
Mike Fährmann
f317a57c5e
[simplyhentai] fix 'gallery_id' extraction
4 years ago
Mike Fährmann
daeef8a5e3
[vsco] handle missing 'description' fields
4 years ago
Mike Fährmann
26a967cbd4
[pinterest] match 'pinterest.co.uk' URLs ( fixes #914 )
4 years ago
Mike Fährmann
c5aaa1de77
[inkbunny] simplify metadata structure ( #283 )
...
Just put everything at the top level,
instead of having a separate 'post' object.
4 years ago
Mike Fährmann
b921fee24d
[inkbunny] fix submission order ( #283 )
...
Getting detailed submission info via /api_submissions.php reordered the
input submissions and sorted them by ID. InkbunnyAPI.detail() now sorts
them back and ensures they are returned in their original order.
This commit also removes the 'metadata' option and always requests
submission descriptions.
4 years ago
Mike Fährmann
e50c75628c
[subscribestar] update 'date' parsing
4 years ago
Mike Fährmann
c4ed9f4faa
[inkbunny] add 'metadata' option ( #283 )
4 years ago
Mike Fährmann
493cadb1e7
[inkbunny] add 'orderby' option ( #283 )
4 years ago
Mike Fährmann
336e682a7a
[inkbunny] handle gallery/scraps URLs ( #283 )
4 years ago
Mike Fährmann
8dbf827649
[bobx] remove module
4 years ago
Mike Fährmann
8f64585ff2
[twitter] handle 429 responses without x-rate-limit-reset header
4 years ago
Mike Fährmann
d2e17e16bf
[inkbunny] update tests ( #283 )
4 years ago
Mike Fährmann
57f7d9b790
[inkbunny] improve error handling ( #283 )
4 years ago
Mike Fährmann
baf5d0e3c1
[gfycat] skip malformed gfycat responses ( closes #902 )
4 years ago
Mike Fährmann
453f3bc519
[blogger] improve error messages for missing posts/blogs ( #903 )
4 years ago
Mike Fährmann
87202b8d74
[inkbunny] add 'user' and 'post' extractors ( #283 )
4 years ago
Mike Fährmann
b62ea72533
release version 1.14.3
4 years ago
Mike Fährmann
2ecf1efb16
update extractor test results
...
- tumblr: remove deleted post
- jaiminisbox: replace removed manga/chapters
- smugmug: one inconsequential field got removed
4 years ago
Mike Fährmann
d5fcffcced
[subscribestar] add login capabilities ( #852 )
4 years ago
Mike Fährmann
ecaecc4064
[exhentai] add 'domain' option ( #897 )
4 years ago
Mike Fährmann
45c32213dc
[gfycat] retry 404'ed videos on redgifs ( closes #874 )
4 years ago
Mike Fährmann
cf44571fe0
[gfycat] add 'user' and 'search' extractors
4 years ago
Mike Fährmann
11b744d971
[mangakakalot] improve/fix chapter extraction
4 years ago
Mike Fährmann
2da71cb561
[twitter] raise proper exception if user doesn't exist ( #891 )
4 years ago
Leonardo Taccari
86e5a05e29
[twitter] add support for nitter.net URLs in pattern ( #890 )
...
Please note that URLs are only "translated", all requests are still
done always via the Twitter API.
4 years ago
Mike Fährmann
e17d4f44f6
[newgrounds] fix favorites extraction
4 years ago
Mike Fährmann
c51fbd72ba
update extractor test results
4 years ago
Mike Fährmann
9cd1bc6907
[mangakakalot] update URL patterns, fix flake8 errors ( #876 )
4 years ago
jakem72360
7dfdcc3fbf
[mangakakalot] Added extractors for MangaKakalot ( #876 )
4 years ago
Mike Fährmann
cb0132e441
[khinsider] add 'format' option ( closes #840 )
4 years ago
Mike Fährmann
d594977ca1
[artstation] add 'following' extractor ( closes #888 )
4 years ago
Mike Fährmann
3855d0dd3c
[twitter] add debug messages for all skipped Tweets ( #867 )
4 years ago
Mike Fährmann
27d163afb3
[imgur] support all '/t/...' URLs ( closes #880 )
...
… instead of just '/t/unmuted/'
4 years ago
Mike Fährmann
f5c9f1d066
[subscribestar] use current date instead of hard-coded '2020' ( #852 )
4 years ago
Mike Fährmann
5a6e750704
[reddit] fix AttributeError when using 'recursion' ( fixes #879 )
4 years ago
Mike Fährmann
94a08f0bcb
[reddit] limit title length in default filenames ( #873 )
4 years ago
Mike Fährmann
3424fb96c3
[redgifs] support gifsdeliverynetwork.com URLs ( #874 )
4 years ago
Mike Fährmann
f1344fe552
[patreon] yield images and attachments before postfiles ( #871 )
...
The reported filename of the 'postfile' entry of each post may differ
from the corresponding entry in the list of images or attachments,
and be outright "wrong".
4 years ago
Mike Fährmann
dbf841ebd1
prevent unhandled exception on Cloudflare challenges ( #868 )
...
The relatively new v2 challenges aren't supported (*), but retrying
often enough may yield a v1 challenge which can be solved.
(*) and probably never will. They are far too complicated to do without
a real browser.
4 years ago
Mike Fährmann
6e2af9a8d8
[twitter] improve error message formatting
4 years ago
Mike Fährmann
c28db7a6ea
[8muses] support 'comics.8muses.com' URLs
4 years ago
Mike Fährmann
4d8b3e4f70
defer directory creation ( fixes #722 )
...
Only call os.makedirs() before a file is getting downloaded,
and not immediately for every Directory message.
4 years ago
Mike Fährmann
d5bfb0b38c
set pseudo extension for Metadata messages ( #865 )
...
This prevents pathfmt.filename from potentially being empty.
4 years ago
Mike Fährmann
821524e4ee
[subscribestar] add 'user' and 'post' extractors ( #852 )
4 years ago
Mike Fährmann
4f16fd37fe
release version 1.14.2
4 years ago
Mike Fährmann
e62ebb4643
update CHANGELOG before building sdist and wheel packages
4 years ago
Mike Fährmann
f1ddbff0b5
[aryion] add 'recursive' option ( fixes #832 )
...
This is enabled by default and will recursively go through all
(sub)folders in an artist's gallery.
The old method of using "Latest Updates" lists can be restored by
disabling this option.
4 years ago
Mike Fährmann
699062b91f
Revert "[kissmanga] workaround for CAPTCHAs ( #818 )"
...
This reverts commit 4cf3d54718
.
4 years ago
Mike Fährmann
0cac14c3bd
update extractor test results
4 years ago
Mike Fährmann
5e5be67c26
[tumblr] prevent KeyErrors when using reblogs=same-blog
...
(fixes #851 )
4 years ago
Mike Fährmann
9da2bc67f8
[twitter] add option to filter media from quoted tweets ( #854 )
4 years ago
Mike Fährmann
56ab5fb8f4
[twitter] improve handling of quoted tweets ( #854 )
...
Split each "quote" into two parts:
- the original tweet
- the tweet that quoted the original
4 years ago
Mike Fährmann
bd0e1ca1a5
[imgur] build directory path for each file ( closes #842 )
4 years ago
Mike Fährmann
a8c2d997e8
[twitter] treat quoted tweets like retweets ( #833 )
...
- filter them when 'retweets' is disabled
- set 'author' to the creator of the quoted tweet
like it was before the rewrite
4 years ago
Mike Fährmann
aed1c63e51
[twitter] improve search results ( fixes #847 )
...
Adding 'tweet_search_mode=live' to the query parameters
is the most important part here.
4 years ago
Mike Fährmann
0e714b9a0e
[pinterest] add 'section' extractor ( #835 )
4 years ago
Mike Fährmann
53cc498d9c
improve config lookup when there are multiple possible locations
...
This specifically applies to all Mastodon extractors and all
extractors with a 'basecategory', i.e. 'booru', 'foolslide', etc.
Values inside those general config locations wouldn't be recognized
when a value with the same was set on the 'extractor' level.
For example 'extractor.mastodon.directory' should be used over
'extractor.directory' when both are set, but this was impossible
with the previous implementation.
(fixes #843 )
4 years ago
Mike Fährmann
1b3870a4be
flush after writing JSON in DataJob() ( #727 )
...
… and remove the dead handle_finalize() method,
which is never called since DataJob() overrides run().
4 years ago
Mike Fährmann
d81a8e6544
[twitter] update tests
4 years ago
Mike Fährmann
d39eedd9bb
[twitter] improve handling of deleted tweets ( fixes #838 )
4 years ago
Mike Fährmann
1ae1df0d27
update '--write-pages' ( #737 )
...
- fix infinite recursion for responses with multiple entries in
'history'
- hide values of Set-Cookie headers
- only write the response content by default
(use '-o write-pages=all' to also include HTTP headers)
4 years ago
Mike Fährmann
7e8a747c56
improve output of '-K' for parent extractors 2 ( #825 )
...
This is what bb882b8
was supposed to be, but I managed to
not include those changes in the first commit …
4 years ago
Mike Fährmann
dc16f73965
[twitter] move '_guest_token()' into TwitterAPI class
4 years ago
Mike Fährmann
3561d1020a
[twitter] always provide an 'author' field ( #831 , #833 )
...
The idea was to have less metadata clutter for most Tweets were
'author' and 'user' are the same (non-retweets), and only provide
a 'user' field.
The original Tweet author could be gotten with
{author[…]|user[…]}, but basically no one knows about that.
4 years ago
Mike Fährmann
7158bdd7c7
[weibo] improve extractor logic ( #829 )
4 years ago
Mike Fährmann
37d71f6e09
strip microseconds in text.parse_datetime()
4 years ago
Mike Fährmann
0371fd54a1
[artstation] add 'date' metadata field ( #839 )
4 years ago
Mike Fährmann
8c857052d7
[mastodon] ignore toots without media attachments
4 years ago
Mike Fährmann
de045d39b2
[mastodon] add 'date' metadata field ( #839 )
4 years ago
Mike Fährmann
d5d90a0450
[weibo] add 'date' field to 'status' objects ( #829 )
4 years ago