Mike Fährmann
f1344fe552
[patreon] yield images and attachments before postfiles ( #871 )
...
The reported filename of the 'postfile' entry of each post may differ
from the corresponding entry in the list of images or attachments,
and be outright "wrong".
4 years ago
Mike Fährmann
dbf841ebd1
prevent unhandled exception on Cloudflare challenges ( #868 )
...
The relatively new v2 challenges aren't supported (*), but retrying
often enough may yield a v1 challenge which can be solved.
(*) and probably never will. They are far too complicated to do without
a real browser.
4 years ago
Mike Fährmann
6e2af9a8d8
[twitter] improve error message formatting
4 years ago
Mike Fährmann
74494b43d3
let zsh completion immediately suggest cmdline options
...
instead of expecting an URL and trying to complete it.
4 years ago
Mike Fährmann
c28db7a6ea
[8muses] support 'comics.8muses.com' URLs
4 years ago
Mike Fährmann
4d8b3e4f70
defer directory creation ( fixes #722 )
...
Only call os.makedirs() before a file is getting downloaded,
and not immediately for every Directory message.
4 years ago
Mike Fährmann
d5bfb0b38c
set pseudo extension for Metadata messages ( #865 )
...
This prevents pathfmt.filename from potentially being empty.
4 years ago
Mike Fährmann
d0cd86e0d5
add zsh completion script ( #150 )
4 years ago
Mike Fährmann
821524e4ee
[subscribestar] add 'user' and 'post' extractors ( #852 )
4 years ago
Mike Fährmann
4f16fd37fe
release version 1.14.2
4 years ago
Mike Fährmann
e62ebb4643
update CHANGELOG before building sdist and wheel packages
4 years ago
Mike Fährmann
f1ddbff0b5
[aryion] add 'recursive' option ( fixes #832 )
...
This is enabled by default and will recursively go through all
(sub)folders in an artist's gallery.
The old method of using "Latest Updates" lists can be restored by
disabling this option.
4 years ago
Mike Fährmann
699062b91f
Revert "[kissmanga] workaround for CAPTCHAs ( #818 )"
...
This reverts commit 4cf3d54718
.
4 years ago
Mike Fährmann
0cac14c3bd
update extractor test results
4 years ago
Mike Fährmann
5e5be67c26
[tumblr] prevent KeyErrors when using reblogs=same-blog
...
(fixes #851 )
4 years ago
Mike Fährmann
9da2bc67f8
[twitter] add option to filter media from quoted tweets ( #854 )
4 years ago
Mike Fährmann
56ab5fb8f4
[twitter] improve handling of quoted tweets ( #854 )
...
Split each "quote" into two parts:
- the original tweet
- the tweet that quoted the original
4 years ago
Mike Fährmann
bd0e1ca1a5
[imgur] build directory path for each file ( closes #842 )
4 years ago
Mike Fährmann
a8c2d997e8
[twitter] treat quoted tweets like retweets ( #833 )
...
- filter them when 'retweets' is disabled
- set 'author' to the creator of the quoted tweet
like it was before the rewrite
4 years ago
Mike Fährmann
aed1c63e51
[twitter] improve search results ( fixes #847 )
...
Adding 'tweet_search_mode=live' to the query parameters
is the most important part here.
4 years ago
Mike Fährmann
0e714b9a0e
[pinterest] add 'section' extractor ( #835 )
4 years ago
Mike Fährmann
53cc498d9c
improve config lookup when there are multiple possible locations
...
This specifically applies to all Mastodon extractors and all
extractors with a 'basecategory', i.e. 'booru', 'foolslide', etc.
Values inside those general config locations wouldn't be recognized
when a value with the same was set on the 'extractor' level.
For example 'extractor.mastodon.directory' should be used over
'extractor.directory' when both are set, but this was impossible
with the previous implementation.
(fixes #843 )
4 years ago
Mike Fährmann
1b3870a4be
flush after writing JSON in DataJob() ( #727 )
...
… and remove the dead handle_finalize() method,
which is never called since DataJob() overrides run().
4 years ago
Mike Fährmann
d81a8e6544
[twitter] update tests
4 years ago
Mike Fährmann
d39eedd9bb
[twitter] improve handling of deleted tweets ( fixes #838 )
4 years ago
Mike Fährmann
1ae1df0d27
update '--write-pages' ( #737 )
...
- fix infinite recursion for responses with multiple entries in
'history'
- hide values of Set-Cookie headers
- only write the response content by default
(use '-o write-pages=all' to also include HTTP headers)
4 years ago
Mike Fährmann
7e8a747c56
improve output of '-K' for parent extractors 2 ( #825 )
...
This is what bb882b8
was supposed to be, but I managed to
not include those changes in the first commit …
4 years ago
Mike Fährmann
dc16f73965
[twitter] move '_guest_token()' into TwitterAPI class
4 years ago
Mike Fährmann
3561d1020a
[twitter] always provide an 'author' field ( #831 , #833 )
...
The idea was to have less metadata clutter for most Tweets were
'author' and 'user' are the same (non-retweets), and only provide
a 'user' field.
The original Tweet author could be gotten with
{author[…]|user[…]}, but basically no one knows about that.
4 years ago
Mike Fährmann
7158bdd7c7
[weibo] improve extractor logic ( #829 )
4 years ago
Mike Fährmann
37d71f6e09
strip microseconds in text.parse_datetime()
4 years ago
Mike Fährmann
0371fd54a1
[artstation] add 'date' metadata field ( #839 )
4 years ago
Mike Fährmann
8c857052d7
[mastodon] ignore toots without media attachments
4 years ago
Mike Fährmann
de045d39b2
[mastodon] add 'date' metadata field ( #839 )
4 years ago
Mike Fährmann
d5d90a0450
[weibo] add 'date' field to 'status' objects ( #829 )
4 years ago
Mike Fährmann
5ba90f72ca
[pinterest] add support for sections ( closes #835 )
4 years ago
Mike Fährmann
c37a1c06c8
[twitter] add extractor for liked tweets ( closes #837 )
...
You need to be logged in to get access to anyone's liked tweets,
it seems.
4 years ago
Mike Fährmann
b94394104c
[twitter] don't download video previews ( #833 )
...
when 'videos' is set to False
4 years ago
Mike Fährmann
bb882b8cdb
improve output of '-K' for parent extractors ( #825 )
4 years ago
Mike Fährmann
6db7ed90cb
release version 1.14.1
4 years ago
Mike Fährmann
087e3184dc
use a non-twitter URL when testing snap creation
4 years ago
Mike Fährmann
c184cce876
update configuration.rst
...
- fix anonymous links
- update description of 'extractor.twitter.videos'
- document 'extractor.redgifs.format' (#724 )
4 years ago
Mike Fährmann
4cf3d54718
[kissmanga] workaround for CAPTCHAs ( fixes #818 )
...
Requesting the same page again when being redirected to a CAPTCHA
lets us access that page without solving it.
4 years ago
Mike Fährmann
7daef6ee70
update extractor test results
...
- certain posts on Instagram now return
https://static.cdninstagram.com/rsrc.php/null.jpg
for public users
- MangaDex is deploying its new MangaDex@Home network similar to
exhentai's Hentai@Home
- realbooru has a new site layout, but the underlying booru API still
works like before
4 years ago
Mike Fährmann
ffb6c5277a
[furaffinity] add 'artist_url' metadata field ( closes #821 )
4 years ago
Mike Fährmann
be04e44e2c
[reddit] catch JSON decode errors ( #765 )
4 years ago
Mike Fährmann
cf863f60b3
[redgifs] add 'user' and 'search' extractors ( closes #724 )
4 years ago
Mike Fährmann
998d1d3a5c
[webtoons] generalize and improve comic extraction ( fixes #820 )
4 years ago
Mike Fährmann
1489712325
resolve redirects after solving Cloudflare challanges
4 years ago
Mike Fährmann
b0b1feaa67
request 'transparent.gif' when solving Cloudflare challenges
...
This currently also works without, but they might be using these to
detect potential bots in the future.
4 years ago