Mike Fährmann
4465a3ea68
[kissmanga][readcomiconline] add 'captcha' option ( #279 )
...
to configure how to handle CAPTCHA page redirects:
- either interactively wait for the user to solve the CAPTCHA
- or raise StopExtraction like before
5 years ago
Mike Fährmann
48233f00c0
[readcomiconline] detect 'AreYouHuman' redirects ( #279 )
5 years ago
Mike Fährmann
6dae6bee37
automatically detect and bypass cloudflare challenge pages
...
TODO: cache and re-apply cfclearance cookies
6 years ago
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
6 years ago
Mike Fährmann
32edf4fc7b
add '_extractor' info to manga extractor results
6 years ago
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
6126615698
update URLs for supportedsites.rst
6 years ago
Mike Fährmann
259123732f
[readcomiconline] improve comic-page parsing
6 years ago
Mike Fährmann
1c6b9ba322
[readcomiconline] use HTTPS
6 years ago
Mike Fährmann
1d43cbbf52
[gelbooru] tag-splitting for non-api mode
6 years ago
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
7 years ago
Mike Fährmann
d11fcf4804
smaller changes and fixes
...
- fix the cloudflare challenge result if the last decimal places
are zero (JS`s toFixed() removes trailing zeroes)
- fix downloading of kissmanga chapter-pages hosted on blogspot
(accessing blogspot with "kissmanga.com" as referrer yields a 401)
- disable certificate validation for 'mangahere' tests
- update flickr test result
7 years ago
Mike Fährmann
179bcdd349
adjust archive-ids
7 years ago
Mike Fährmann
3cec533c28
Merge branch 'archive'
7 years ago
Mike Fährmann
5b3c34aa96
use generic chapter-extractor in more modules
7 years ago
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
7 years ago
Mike Fährmann
e6814aebe2
add 'extractor.*.user-agent' config option
7 years ago
Mike Fährmann
68a0a7579c
fix/improve some regular expressions
7 years ago
Mike Fährmann
885bd4cbe2
[readcomiconline] extract comic metadata
7 years ago
Mike Fährmann
92a11528d1
smaller changes
7 years ago
Mike Fährmann
f226417420
simplify code by using a MangaExtractor base class
7 years ago
Mike Fährmann
f537ad5f2f
[kissmanga] re-enable module
8 years ago
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
8 years ago
Mike Fährmann
40dbea7ed2
rewrite parts of the cloudflare bypass system
8 years ago
Mike Fährmann
2449825d53
[kissmanga] solve cloudflare challenge on demand
8 years ago
Mike Fährmann
9e3788175e
implement decorator for cloudflare bypass
...
this method for enabling and caching a cloudflare bypass for a
requests.session object allows for different cache-timeouts for
different domains
8 years ago
Mike Fährmann
b634ace39e
[readcomiconline] add comic-issue and comic extractor
8 years ago