Mike Fährmann
569747a78d
implement extractor.wait()
5 years ago
Mike Fährmann
ce54b8c04c
let extractors opt-out of cookie option usage
...
useful to avoid sending unnecessary cookies when all authentication
is done through OAuth tokens
5 years ago
Mike Fährmann
d3e44e899d
raise NotFoundErrors for 404 responses in GalleryExtractors
5 years ago
Mike Fährmann
a4dd8b3dab
improve _check_cookies()
...
Only loop over all cookies once instead of calling
cookiejar._find() for each cookie name.
5 years ago
Mike Fährmann
15f9bb3d14
add option to disable pyOpenSSL usage ( #508 )
...
(pyOpenSSL is now disabled by default)
5 years ago
Mike Fährmann
e17907ee2a
change default value of 'cookies-update' to 'true'
5 years ago
Mike Fährmann
e2710702d4
fix Cloudflare bypss
5 years ago
Mike Fährmann
ae09f87602
improve SharedConfigMixin config lookups
5 years ago
Mike Fährmann
f5604492c3
update interface of config functions
5 years ago
Mike Fährmann
d45fabb79d
match user profile handling on deviantart and newgrounds
5 years ago
Mike Fährmann
1a197d2195
store the original cookiejar as Extractor._cookiejar
5 years ago
Mike Fährmann
de83ae4576
make 'method' argument of Extractor.request keyword-only
5 years ago
Mike Fährmann
d44f790e81
adjust output for HTTP status related errors
5 years ago
Mike Fährmann
389d2d7e38
implement 'cookies-update' option ( #445 )
5 years ago
Mike Fährmann
1693d97bd3
update extractor class hierarchies
...
- let the GalleryExtractor class inherit directly from Extractor
- make ChapterExtractor a subclass of GalleryExtractor
- change enumeration field names of GalleryExtractors to 'num'
5 years ago
Mike Fährmann
f4bc75e854
fix rate limit handling for OAuth APIs ( #368 )
5 years ago
Mike Fährmann
21991acc49
add 'ciphers' option; update default User-Agent
5 years ago
Mike Fährmann
84f4d3bc0b
replace urllib3's default cipher list with Firefox's ( #342 )
...
Avoids Cloudflare CAPTCHAs on both Linux in Windows without
pyOpenSSL installed.
5 years ago
Mike Fährmann
09f37fde39
[reddit] move date-min/-max handling into Extractor class
5 years ago
Mike Fährmann
56c7a66a4a
detect Cloudflare CAPTCHAs and update cipher list
5 years ago
Mike Fährmann
fdec59f8e2
replace extractor.request() 'expect' argument
...
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
5 years ago
Mike Fährmann
69205df68d
allow '-1' for infinite retries ( #300 )
5 years ago
Mike Fährmann
f7b5c4c3e7
use values of 'retries' options correctly
...
The RE-tries option now specifies exactly that: the maximum number a
failed HTTP request is re-tried. For example a value of 2 will now
correctly stop after 3 attempts: the initial one + 2 re-tries.
The maximum wait-time now also caps at 30min and increases exponentially
for both extractor.request() and downloader.http.download().
5 years ago
Mike Fährmann
399e8e965a
also update urllib3's cipher list for versions >= 1.25
5 years ago
Mike Fährmann
c02f12ce2f
avoid Cloudflare CAPTCHAs for OpenSSL < 1.1.1
...
see https://github.com/Anorov/cloudflare-scrape/pull/242
5 years ago
Mike Fährmann
5fd94c6b83
import urllib3 from requests.packages
5 years ago
Mike Fährmann
35f343206c
update default SSL cipher list in urllib3 < 1.25
...
Cloudflare now also checks the client's SSL/TLS cipher capabilities and
produces a 403: Forbidden response with CAPTCHA if they are insufficient.
This commit replaces the default cipher list in urllib3 < 1.25 with the
one from 1.25 (1), which doesn't cause problems as long as the client
platform actually supports these ciphers. On some platforms (tested with
Python 3.4 on Linux and Python 3.7 on an outdated Windows 7 VM) it is
necessary to install pyOpenSSL to get everything to work.
Explicitly setting a minimum/maximum version for urllib3 is also no
longer necessary and installing gallery-dl will therefore not pull a
incompatible urllib3 version (#229 )
Fixes the "403: Forbidden" error on Artstation (#227 )
(1) 0cedb3b0f1
5 years ago
Mike Fährmann
e25ebc4bff
don't disable certificate checks anymore
...
Executables generated with PyInstaller auto-include the root certificate
file and certificate checks now work out-of-the-box.
6 years ago
Mike Fährmann
49a6522c38
ensure consistent headers and params ordering
...
Necessary to avoid being labeled a bot and getting a CAPTCHA response
after solving a Cloudflare challenge.
6 years ago
Mike Fährmann
f612284d24
cache cfclearance cookies
6 years ago
Mike Fährmann
591a07f20c
small code changes and cleanups
6 years ago
Mike Fährmann
6dae6bee37
automatically detect and bypass cloudflare challenge pages
...
TODO: cache and re-apply cfclearance cookies
6 years ago
Mike Fährmann
4ca4631bad
simplify auto-disabling certificate verification
...
if no certificate bundle is found
6 years ago
Mike Fährmann
09d872a2b1
generalize extractor creation code
6 years ago
Mike Fährmann
3595cd582f
use GalleryExtractor as common base class
6 years ago
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
6 years ago
Mike Fährmann
32edf4fc7b
add '_extractor' info to manga extractor results
6 years ago
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
6 years ago
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
9a9cd32461
implement alternative constructor for extractors
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
bc0951d974
allow for simplified test data structures
...
Instead of a strict list of (URL, RESULTS)-tuples, extractor result
tests can now be a single (URL, RESULTS)-tuple, if it's just one test,
and "only matching" tests can now be a simple string.
6 years ago
Mike Fährmann
00dc37ccbf
replace AsynchronousMixin Extractor with a Mixin
6 years ago
Mike Fährmann
4d656a81ca
replace SharedConfigExtractor class with a Mixin
6 years ago
Mike Fährmann
bfbbac4495
[tsumino] add login capabilities ( #161 )
6 years ago
Mike Fährmann
dd358b4564
improve cookie handling during logins
6 years ago
Mike Fährmann
06cbf5f9c4
implement 'chapter-reverse' option ( #149 )
...
Setting it to `true` will start with the latest chapter instead of the
first one.
6 years ago
Mike Fährmann
9a98b6769d
use extractor.request for API calls ( #130 )
...
... at least for OAuth1.0 based APIs (flickr, smugmug, tumblr)
6 years ago
Mike Fährmann
b828473aa3
retry HTTP requests for more exception classes
6 years ago
Mike Fährmann
c47482b110
smaller changes, missing docs, etc.
...
- make 'netrc' extractor-specific
- rename 'downloader.enable' to 'enabled'
- document 'downloader.ytdl.format'
- consistent newlines in configuration.rst
6 years ago
Mike Fährmann
2fa28a2609
update default user-agent string ( closes #122 )
6 years ago
Mike Fährmann
c9861ca812
adjust message for status_code based exceptions
...
from: 5xx HTTP Error: Reason
to : 5xx: Reason
The "HTTP Error" part was in there to emulate Request's error messages
from response.raise_for_status(), but it reads a lot better without.
6 years ago
Mike Fährmann
4a348990f4
adjust value resolution for retries/timeout/verify options
...
This change introduces 'extractor.*.retries/timeout/verify' options
as a general way to set these values for all HTTP requests.
'downloader.http.retries/timeout/verify' is a way to override these
options for file downloads only and will fall back to 'extractor.*.…*
values if they haven't been explicitly set.
Also: downloader classes now take an extractor object as first argument
instead of a requests.session.
6 years ago
Mike Fährmann
f647f5d9c3
use 'verify' option for regular HTTP requests
6 years ago
Mike Fährmann
68d6033a5d
use 'retries' and 'timeout' options for regular HTTP requests
6 years ago
Mike Fährmann
017188d268
improve extractor.request()
...
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
6 years ago
Mike Fährmann
2d17a9e07f
improve extractor.request()
...
- better retry behavior
- exponential back-off
- removed 'allow_empty' argument
7 years ago
Mike Fährmann
8704d850bf
add explicit proxy support ( #76 )
...
- '--proxy' as command-line argument
- 'extractor.*.proxy' as config option
7 years ago
Mike Fährmann
179bcdd349
adjust archive-ids
7 years ago
Mike Fährmann
3cec533c28
Merge branch 'archive'
7 years ago
Mike Fährmann
5b3c34aa96
use generic chapter-extractor in more modules
7 years ago
Mike Fährmann
7a412f5c32
implement generic manga-chapter extractor
7 years ago
Mike Fährmann
84a52a9256
add DownloadArchive class
7 years ago
Mike Fährmann
cc0c2cca57
[reddit] add extractor for reddit-hosted images ( closes #68 )
7 years ago
Mike Fährmann
e6814aebe2
add 'extractor.*.user-agent' config option
7 years ago
Mike Fährmann
baf8094868
improve Extractor.request()'s retry behavior
7 years ago
Mike Fährmann
16783e327f
[common] fix UnboundLocalError in Extractor.request()
7 years ago
Mike Fährmann
9aecc67841
[common] explicitly handle HTTP status code 429
7 years ago
Mike Fährmann
b319f4bab3
smaller code and text changes
7 years ago
Mike Fährmann
26a866e7d8
implement (sub)category-transfer between extractors ( #41 )
...
ImageFap- and all Manga-Extractors will transfer their (sub)category
values to other extractors instantiated by them, which will in turn
allow those to use options set for their parents.
Example:
ImagefapGalleryExtractors will use options set under
extractor.imagefap.user, if (and only if) they have been instantiated by
a ImagefapUserExtractor; and options from extractor.imagefap.gallery
otherwise.
7 years ago
Mike Fährmann
9c138dfc1f
[common] detect empty HTTP response bodies
7 years ago
Mike Fährmann
deb2e803ba
simplify MangaExtractor class
7 years ago
Mike Fährmann
0dedbe759c
enable '--chapter-filter'
...
The same filter infrastructure that can be applied to image URLS now
also works for manga chapters and other delegated URLs.
TODO: actually provide any metadata (currently supported is only
deviantart and imagefap).
7 years ago
Mike Fährmann
be30fb2f98
add common config category for boorus and foolslide
7 years ago
Mike Fährmann
915a0137de
improve 'extractor.request'
...
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
7 years ago
Mike Fährmann
7aa9fa796a
code cleanup and fixes
7 years ago
Mike Fährmann
55f048d02b
ignore case of cookiejar magic strings
7 years ago
Mike Fährmann
808f67ba7d
use 'cookiedomain' for cookies set by object-config-values
...
otherwise these cookies would not be picked up by the
_check_cookies() method.
7 years ago
Mike Fährmann
0610ae5000
skip login if cookies are present
7 years ago
Mike Fährmann
726c6f01ae
allow 'cookies' config option to be a dictionary
7 years ago
Mike Fährmann
a804a42e23
add '--cookies' command-line option
7 years ago
Mike Fährmann
d3b04076f7
add .netrc support ( #22 )
...
Use the '--netrc' cmdline option or set the 'netrc' config option
to 'true' to enable the use of .netrc authentication data.
The 'machine' names for the .netrc info are the lowercase extractor
names (or categories): batoto, exhentai, nijie, pixiv, seiga.
7 years ago
Mike Fährmann
c184e47ee3
put common directory- and filename formats in base classes
7 years ago
Mike Fährmann
f226417420
simplify code by using a MangaExtractor base class
7 years ago
Mike Fährmann
4b967fa189
implement and use extractor.config() method
8 years ago
Mike Fährmann
f782282f97
add logger objects to extractors
8 years ago
Mike Fährmann
7a9d66fbce
implement basic way to tell extractors to skip ahead
8 years ago
Mike Fährmann
0b59d9f8c7
disable urllib3s InsecureConnectionWarning
8 years ago
Mike Fährmann
37d4d07d9b
compatibility fixes to make a standalone exe work
8 years ago
Mike Fährmann
cc0b4f2661
[yomanga] add chapter extractor
8 years ago
Mike Fährmann
ad4b02508f
trying to understand travis-ci unit test failures
...
- added some debug output via logging module
- unit tests work on my machine (tm)
8 years ago
Mike Fährmann
e6d26f0476
don't overwrite a response's encoding with None
8 years ago
Mike Fährmann
f0f7306db6
re-raise async exceptions in main thread
8 years ago
Mike Fährmann
000df8d1fa
add 'encoding' argument for Extractor.request
8 years ago
Mike Fährmann
81dcfbec90
initial support for extractor-subcategories
9 years ago
Mike Fährmann
1497da07de
remove unused format-strings
9 years ago
Mike Fährmann
3fb5a8b834
delay 'requests'-import
9 years ago
Mike Fährmann
539faa0322
remove SequentialExtractor class
9 years ago
Mike Fährmann
3c13548f29
rewrite extractors to use config-module
9 years ago
Mike Fährmann
42b8e81a68
rewrite extractors to use text-module
9 years ago
Mike Fährmann
5806e02f97
better support for KeyboardInterrupt exceptions
10 years ago
Mike Fährmann
1cd25b5369
[pixiv] update to new extractor interface
10 years ago
Mike Fährmann
cd4a699dd2
add 'Headers' and 'Cookies' message
10 years ago
Mike Fährmann
513808d156
move code from util.py
10 years ago
Mike Fährmann
41f00809ff
update extractor base classes
10 years ago
Mike Fährmann
deef91eddc
initial commit
10 years ago