Mike Fährmann
9b2e5f72d6
[exhentai] update image URL parsing ( #1094 )
4 years ago
Mike Fährmann
3ebb174f2c
add missing extractor info when spawning new ones ( fixes #1051 )
...
Not having this information causes the blacklist/whitelist logic to
trigger and prevents things from functioning as intended when using
default settings.
Fixes issues for 8muses, deviantart, exhentai, and mangoxo.
4 years ago
Mike Fährmann
da87a5fb7e
[exhentai] fix accessing config before main constructor
...
bug introduced with 055c32e0
Making 'Extractor.config()' quite a bit faster is worth the "cost"
of having to set _cfgpath in exhentai constructors, I think.
4 years ago
Mike Fährmann
a0d916ed41
[exhentai] update wait time before original image download ( #978 )
...
depend on 'wait-max', don't use a hard-coded value
4 years ago
Mike Fährmann
0f55b8e80a
[exhentai] fix type check from dbbbb21
( #940 )
...
'bool' is a subclass of 'int', and therefore
'isinstance(self.limits, int)' also returns True when
'self.limits' has a boolean value
4 years ago
Mike Fährmann
dbbbb21180
[exhentai] add ability to specify custom image limit ( #940 )
4 years ago
Mike Fährmann
cd9de613a2
[exhentai] adjust image limit costs ( #940 )
...
Each original file costs 10 points per 10^6 bytes,
not 10 per 2^20 == 1048576 bytes.
4 years ago
Mike Fährmann
ecaecc4064
[exhentai] add 'domain' option ( #897 )
4 years ago
Mike Fährmann
6b373cb7e2
[exhentai] restrict default directory name length ( #545 )
5 years ago
Mike Fährmann
4409d00141
embed error messages in StopExtraction exceptions
5 years ago
Mike Fährmann
1848788970
update test results etc
5 years ago
Mike Fährmann
9ae58a6b3e
[exhentai] update image limit checks
...
- adjust cost of original images
- delay limit initialization until gallery and first image page have
been requested and all cookies are available
5 years ago
Mike Fährmann
fa60109e97
[exhentai] don't use e-hentai.org for exhentai URLs
5 years ago
Mike Fährmann
beb4fab2e6
[exhentai] improve limit and error handling ( #360 )
...
- check image limit before opening the first gallery or image page
- prevent any further exhentai extractors from running after the image
limit has been reached
5 years ago
Mike Fährmann
81b35ed3cb
[exhentai] catch more error states ( #356 , #360 )
...
- warn on MPV-enabled galleries
- catch parsing errors for gallery pages and image info
- write page content to debug output
5 years ago
Mike Fährmann
6ce22f606b
[exhentai] update login procedure and tests
...
Logging in now follows the natural login flow that also happens in a
browser more closely and collects more cookies than just ipb_member_id
and ipb_pass_hash.
Test URLs have been updated and now point to the e-hentai.org domain.
5 years ago
Mike Fährmann
dc73d02d87
[exhentai] always use e-hentai.org as domain + set nw cookie
5 years ago
Mike Fährmann
fdec59f8e2
replace extractor.request() 'expect' argument
...
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
5 years ago
Mike Fährmann
1c36e65e9b
[exhentai] choose site version depending on input URL ( #278 )
...
Use e-hentai.org as root and cookiedomain if the input URL is from
e-hentai (or g.e-hentai), use exhentai.org otherwise.
5 years ago
Mike Fährmann
1f7fa9dc8e
[exhentai] update data extraction code
...
- parse 'date' to datetime object
- use 'text.extract_from()'
5 years ago
Mike Fährmann
5398bfbd69
[exhentai] fix search and favorite extraction
...
removes basically all metadata, but that can be compensated for with the
right search query. writing "parsers" for all 4 possible views that have
been introduced in the latest changes is too much of a hassle ...
6 years ago
Mike Fährmann
a2af2d2965
adjust cache maxage values
6 years ago
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
6 years ago
Mike Fährmann
61741d7333
provide type information for Queue messages
...
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
6 years ago
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
dd358b4564
improve cookie handling during logins
6 years ago
Mike Fährmann
134487ffb0
[exhentai] stop extraction if image limit is exceeded ( #141 )
...
can be turned off with the `exhentai.limits' option
6 years ago
Mike Fährmann
e868fb4393
[exhentai] improve gallery extraction
...
- match image page URLs and extract galleries from that point onward
- add a few more metadata entries: 'parent', 'visible', 'cost'
6 years ago
Mike Fährmann
2ffc105887
[exhentai] extract tag metadata
6 years ago
Mike Fährmann
2801a0d997
[exhentai] skip "Content Warning" page when not logged in
...
(closes #97 )
6 years ago
Mike Fährmann
b8c97d2295
use 'extractor.request()' for more HTTP requests
6 years ago
Mike Fährmann
017188d268
improve extractor.request()
...
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
6 years ago
Mike Fährmann
7a58151566
fix util.parse_bytes invocations
...
(should be text.parse_bytes)
6 years ago
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
7 years ago
Mike Fährmann
64d7c85b55
[exhentai] improve metadata
...
- add 'width', 'height' and 'size' (in bytes) for each image
- change the former 'size' and 'size_units' into 'gallery_size'
7 years ago
Mike Fährmann
52d41c41e7
[exhentai] add extractor for favorited galleries
7 years ago
Mike Fährmann
63cc2599c4
[exhentai] add extractor for search results
7 years ago
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
7 years ago
Mike Fährmann
e6814aebe2
add 'extractor.*.user-agent' config option
7 years ago
Mike Fährmann
9fc1d0c901
implement and use 'util.safe_int()'
...
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
7 years ago
Mike Fährmann
6f30cf4c64
change keyword names to valid Python identifiers
...
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.
(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
7 years ago
Mike Fährmann
c0755a4d5e
[exhentai] revert login-method to its old version ( #37 )
...
Additional cookies don't seem to help and have to be manually set
anyway. The older method is more likely to succeed, so I'd rather
use this one.
7 years ago
Mike Fährmann
3ee39ffd93
[exhentai] update login procedure ( #37 )
...
This new version behaves pretty much exactly like a browser would and
caches all cookies sent to it and not just "ipb_member_id" and
"ipb_pass_hash".
7 years ago
Mike Fährmann
2d0dfe9d56
[exhenai] init headers before login and detect sadpanda
...
- also debug-logs html after failed login
- #37
7 years ago
Mike Fährmann
915a0137de
improve 'extractor.request'
...
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
7 years ago
Mike Fährmann
7aa9fa796a
code cleanup and fixes
7 years ago
Mike Fährmann
808f67ba7d
use 'cookiedomain' for cookies set by object-config-values
...
otherwise these cookies would not be picked up by the
_check_cookies() method.
7 years ago
Mike Fährmann
0610ae5000
skip login if cookies are present
7 years ago