Mike Fährmann
27ec653991
fix bug in test_init and update example URLs
1 year ago
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
1 year ago
Mike Fährmann
3d8de383bf
[mangapark] extract 'source_id' for manga
...
forgot to add this to 6ae3101f
1 year ago
Mike Fährmann
6ae3101fd0
[mangapark] add 'source' option ( #3969 )
1 year ago
Mike Fährmann
3479646f65
[mangapark] update and fix 'manga' extractor ( #3969 )
...
TODO:
- non-English chapters
- 'source' option
1 year ago
Mike Fährmann
10786c657e
[mangapark] update and fix 'chapter' extractor ( #3969 )
1 year ago
Mike Fährmann
dd884b02ee
replace json.loads with direct calls to JSONDecoder.decode
2 years ago
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2 years ago
Mike Fährmann
c6a9bab019
update extractor test results
2 years ago
Mike Fährmann
211de95dd0
update extractor test results
3 years ago
Mike Fährmann
21c2da454f
update extractor test results
3 years ago
thatfuckingbird
264beb8556
recognize v2.mangapark URLs ( #1578 )
...
* recognize v2.mangapark URLs
* update mangapark root url to use the v2 subdomain
3 years ago
Mike Fährmann
8b22d4e667
[mangapark] use '"browser": "firefox"' by default
...
to get rid of Cloudflare CAPTCHA resonses
3 years ago
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
4 years ago
Mike Fährmann
69e4871005
update extractor test results
...
- sensescans: replace 404d chapters
- mangapark: replace 404d chapters
- subscribestar: update test for attached files
4 years ago
Mike Fährmann
d3b3b30107
update test results
4 years ago
Mike Fährmann
4203dc0bdc
[mangapark] fix metadata extraction
5 years ago
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
6 years ago
Mike Fährmann
32edf4fc7b
add '_extractor' info to manga extractor results
6 years ago
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
0e46db6f45
rename some base classes
...
They shouldn't be called …Extractor if they don't have 'Extractor' as
their base class.
6 years ago
Mike Fährmann
8095f5f81a
[mangapark] fix manga title extraction
6 years ago
Mike Fährmann
217a0687ef
[behance] add 'collection' extractor ( closes #157 )
6 years ago
Mike Fährmann
66460337f1
[mangapark] fix extraction
6 years ago
Mike Fährmann
fa7fa2f8ff
[deviantart1 update tests]
6 years ago
Mike Fährmann
b7b5456a32
[kissmanga] use HTTPS
6 years ago
Mike Fährmann
98314aa04c
[mangapark] detect non-existent chapters
6 years ago
Mike Fährmann
f9ace0f4a3
[mangapark] fix manga extraction ... again
6 years ago
Mike Fährmann
0c9762f00e
[mangapark] fix extraction
6 years ago
Mike Fährmann
4d73cc785d
update test results
6 years ago
Mike Fährmann
1c6b9ba322
[readcomiconline] use HTTPS
6 years ago
Mike Fährmann
fd8ed35591
[turboimagehost] fix extraction
6 years ago
Mike Fährmann
d1f3d32eec
[fallenangels] unescape chapter titles
6 years ago
Mike Fährmann
2eefaa99a3
[mangapark] support .net and .com mirrors
6 years ago
Mike Fährmann
95392554ee
use text.urljoin()
6 years ago
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
7 years ago
Mike Fährmann
f5c6a2d7f5
[nhentai] use API to get gallery info
7 years ago
Mike Fährmann
7a412f5c32
implement generic manga-chapter extractor
7 years ago
Mike Fährmann
35e09869d1
[mangapark] fix image URLs and use HTTPS
7 years ago
Mike Fährmann
c1e331edbb
[mangapark] replace manga test
7 years ago
Mike Fährmann
444008a14a
[khinsider] use urljoin() to complete page URLs
7 years ago
Mike Fährmann
633b376f35
improve/adjust default filename formats for manga sites
7 years ago
Mike Fährmann
9fc1d0c901
implement and use 'util.safe_int()'
...
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
7 years ago
Mike Fährmann
b7a54a51d0
[mangapark] extract manga metadata + code improvements
7 years ago
Mike Fährmann
6f30cf4c64
change keyword names to valid Python identifiers
...
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.
(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
7 years ago
Mike Fährmann
47bcf53ec1
implement support for additional unit test result types
...
- "pattern" matches all resulting URLs against the given regex
- "count" allows to specify the amount of returned URLs
7 years ago
Mike Fährmann
c45770331a
use 'str.partition()'
...
The (r)partition method is always faster then split() or any other
method that has been replaced in this commit.
7 years ago
Mike Fährmann
9759fe8c6b
allow 'only_matching' tests
7 years ago