Mike Fährmann
fe849382d8
[komikcast] improve extraction
5 years ago
Mike Fährmann
eacebf41e4
fix typo in README
6 years ago
Mike Fährmann
fe27154a10
[komikcast] fix extraction
...
... again
6 years ago
Mike Fährmann
d0f88c35be
[komikcast] fix extraction
6 years ago
Mike Fährmann
6dae6bee37
automatically detect and bypass cloudflare challenge pages
...
TODO: cache and re-apply cfclearance cookies
6 years ago
Mike Fährmann
0887fb61f4
[komikcast] update test results
6 years ago
Mike Fährmann
f6734142ee
[komikcast] remove 'width' and 'height' info
6 years ago
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
6 years ago
Mike Fährmann
32edf4fc7b
add '_extractor' info to manga extractor results
6 years ago
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
d70db2d555
Revert "[komikcast] fix extraction"
...
This reverts commit 5507f5ce2e
.
6 years ago
Mike Fährmann
5507f5ce2e
[komikcast] fix extraction
6 years ago
Mike Fährmann
1694039de0
[komikcast] update ad-filter
6 years ago
Mike Fährmann
38d4f43cc0
[komikcast] skip ads
6 years ago
Mike Fährmann
f7e7306e5a
[komikcast] update URL pattern and unescape image URLs
6 years ago
Mike Fährmann
7f899bd5d8
Merge branch 'master' into 1.4-dev
6 years ago
Mike Fährmann
e2157f594e
[mangadex] fix manga extraction ( closes #84 )
...
Chapter listings for manga now use
https://mangadex.org/manga/ <id>/_/chapters/2/
as URL instead of
https://mangadex.org/manga/ <id>/_//2/
6 years ago
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
7 years ago
Mike Fährmann
7073ab7707
[komikcast] update regex to only match manga pages
...
The 'readerarea' section now includes some (shady) external
Javascript file, which got matched as well.
7 years ago
Mike Fährmann
5f37d40a3e
[komikcast] bypass cloudflare challenge
7 years ago
Mike Fährmann
2dd3aeeeae
[komikcast] add chapter- and manga-extractor ( #70 )
7 years ago