Mike Fährmann
2a085a5e96
[sankakucomplex] fix 'date' values ( #258 )
5 years ago
Mike Fährmann
bcd1801aa8
[sankakucomplex] add 'tag' extractor ( #258 )
5 years ago
Mike Fährmann
74c2415138
[sankakucomplex] move article extractor to its own module ( #258 )
5 years ago
Mike Fährmann
4465a3ea68
[kissmanga][readcomiconline] add 'captcha' option ( #279 )
...
to configure how to handle CAPTCHA page redirects:
- either interactively wait for the user to solve the CAPTCHA
- or raise StopExtraction like before
5 years ago
Mike Fährmann
e30ada162d
fix cookie tests
...
update _get_extractor():
- always return an Extractor instance with a _login_impl() method
- use Extractor.from_url()
5 years ago
Mike Fährmann
1e3e15c4f3
[sankaku] add article extractor ( #258 )
5 years ago
Mike Fährmann
48233f00c0
[readcomiconline] detect 'AreYouHuman' redirects ( #279 )
5 years ago
Mike Fährmann
1cde38110d
[livedoor] return 'date' as datetime object
5 years ago
Mike Fährmann
e88824e1a7
[livedoor] fix adjustments for https:// URLs
5 years ago
Mike Fährmann
2316e0ed3d
fix strptime workaround from b0e85a4
...
Don't return a modified version of 'date_time' if strptime fails.
5 years ago
Mike Fährmann
b3e4664715
[hentainexus] fix extraction
5 years ago
Mike Fährmann
399e8e965a
also update urllib3's cipher list for versions >= 1.25
5 years ago
Mike Fährmann
f837ea98cb
[deviantart] don't call 'extend()' on folders ( fixes #271 )
5 years ago
Mike Fährmann
bb32a2d490
[patreon] use file extensions from original filenames ( #268 )
5 years ago
Mike Fährmann
efa805c5d7
[sankaku] update pagination end condition ( fixes #265 )
...
Pagination over popular listings (`date:...+order:popular") never
terminates, not even on the site itself, and at some point returns the
same results over and over again.
5 years ago
Mike Fährmann
d514d49c72
release version 1.8.4
5 years ago
Mike Fährmann
a4ba34c835
[booru] prevent crash when no tags are present ( #259 )
5 years ago
Mike Fährmann
ca3bad1779
[patreon] small fixes and adjustments ( #226 )
...
- fix datetime parsing
- rename 'user' to 'creator'
- convert 'id' to integer
- improve tests
5 years ago
Leonardo Taccari
fb09dd962a
[instagram] Fix extraction after `rhx_gis' field removal
5 years ago
Mike Fährmann
7a14aaed7d
[luscious] fix extraction
5 years ago
Mike Fährmann
e82cadac61
[patreon] add extractors ( #226 )
5 years ago
Mike Fährmann
4891f4a328
[hentainexus] add search extractor ( #256 )
5 years ago
Mike Fährmann
c02f12ce2f
avoid Cloudflare CAPTCHAs for OpenSSL < 1.1.1
...
see https://github.com/Anorov/cloudflare-scrape/pull/242
5 years ago
Mike Fährmann
0b4be57a10
[sankaku] fix error when no tags available ( closes #259 )
...
[ci skip]
5 years ago
Mike Fährmann
6764847349
fix cookie tests
...
'cookies' is a CookieJar, not a dict,
and removing the call to '.keys()' doesn't have the same effect
5 years ago
Mike Fährmann
9890bfdf23
[flickr] improve code and metadata
...
- simplify pagination
- add more metadata and slightly change its structure
- convert suitable values to int or list
- move keys from ["photo"] to the base level
- proper video support (#246 )
- rename method and variable names to better fit with other extractors
5 years ago
Mike Fährmann
aa8e366b90
[luscious] fix tag extraction
5 years ago
Mike Fährmann
a5b060765d
improve code in tests
...
- use 'assertRaises' as context manager
- remove calls to .keys()
5 years ago
Mike Fährmann
ba8eb1ffec
[hentainexus] add gallery extractor ( #256 )
5 years ago
Mike Fährmann
bd9cb3d191
improve job class selection code
...
+ consistent argument order for add_argument() calls
5 years ago
Mike Fährmann
e64773ffdd
allow multiple post-processor command-line options ( #253 )
...
... without overwriting any previous ones
5 years ago
Mike Fährmann
b1db194c14
[reactor] update and improve
...
- split 'tags' into a list
- parse 'date' into a datetime object
- fix webm/mp4 URLs
5 years ago
Mike Fährmann
b0e85a42e3
apply workaround from 4736912
in parse_datetime() itself
5 years ago
Mike Fährmann
523ebc9b0b
Fix serialization of 'datetime' objects in '--write-metadata'
...
Simplified universal serialization support in json.dump() can be achieved
by passing 'default=str', which was already the case in DataJob.run()
for -j/--dump-json, but not for the 'metadata' post-processor.
This commit introduces util.dump_json() that (more or less) unifies the
JSON output procedure of both --write-metadata and --dump-json.
(#251 , #252 )
5 years ago
Mike Fährmann
8de5866fd2
[twitter] replace unit test URLs
...
https://twitter.com/PicturesEarth was deleted
5 years ago
Mike Fährmann
74c7304c6b
[newgrounds] extract 'date', 'favorites', and 'score'
5 years ago
Mike Fährmann
4736912d4e
[pixiv] work around strptime limitations in Python < 3.7
...
"%z" doesn't allow a colon separator in older Python versions:
- "+0900" is OK
- "+09:00" raises an exception
5 years ago
Mike Fährmann
1f7fa9dc8e
[exhentai] update data extraction code
...
- parse 'date' to datetime object
- use 'text.extract_from()'
5 years ago
Mike Fährmann
80fdb11508
[pixiv] add 'date' metadata field ( closes #248 )
5 years ago
Mike Fährmann
d09864b581
implement text.parse_datetime()
5 years ago
Mike Fährmann
049e9fd6ce
[twitter] fix pagination end condition
...
Some timelines would cause an endless loop because 'has_more_items' is
always True, even if it would return the same list of tweets over and
over again.
5 years ago
Mike Fährmann
51e0e92429
[deviantart] fix GIF downloads ( #242 )
...
The "original" download URL for GIF animations is only a preview version
of the original file.
5 years ago
Leonardo Taccari
f347d2d152
[instagram] Fix for missing `edge_media_to_comment' field and add `date' metadata ( #250 )
...
* [instagram] Remove no longer always present `comments' field
`edge_media_to_comment' is no longer always present in the response
(also for the same media sometimes is present and sometimes is not present).
* [instagram] Add `date' metadata
5 years ago
Mike Fährmann
26b516b328
release version 1.8.3
5 years ago
Mike Fährmann
5fd94c6b83
import urllib3 from requests.packages
5 years ago
林博仁(Buo-ren Lin)
c68461026a
Add snap installation instruction to README ( #171 )
...
Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
5 years ago
Mike Fährmann
35f343206c
update default SSL cipher list in urllib3 < 1.25
...
Cloudflare now also checks the client's SSL/TLS cipher capabilities and
produces a 403: Forbidden response with CAPTCHA if they are insufficient.
This commit replaces the default cipher list in urllib3 < 1.25 with the
one from 1.25 (1), which doesn't cause problems as long as the client
platform actually supports these ciphers. On some platforms (tested with
Python 3.4 on Linux and Python 3.7 on an outdated Windows 7 VM) it is
necessary to install pyOpenSSL to get everything to work.
Explicitly setting a minimum/maximum version for urllib3 is also no
longer necessary and installing gallery-dl will therefore not pull a
incompatible urllib3 version (#229 )
Fixes the "403: Forbidden" error on Artstation (#227 )
(1) 0cedb3b0f1
5 years ago
林博仁(Buo-ren Lin)
77eae04bcf
snap: Use descriptive interface reference for *-files plugs
...
New Snap Store policy requires *-files interface plugs be named in a specific name.
Fixes #241 .
Refer-to: The personal-files interface - doc - snapcraft.io <https://forum.snapcraft.io/t/the-personal-files-interface/9357 >
Refer-to: The system-files interface - doc - snapcraft.io <https://forum.snapcraft.io/t/the-system-files-interface/9358 >
Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
5 years ago
Mike Fährmann
fc5e4f2b21
[hitomi] simplify data extraction code
5 years ago
Mike Fährmann
2756cc8dde
[hitomi] set Referer header ( fixes #239 )
5 years ago