Mike Fährmann
e70af6a550
[hentaifoundry] do not update filters when cookies are provided
1 year ago
Mike Fährmann
d84a617273
[hentaifoundry] fix setting content filters ( #3887 )
1 year ago
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2 years ago
Mike Fährmann
4e11ca737e
[hentaifoundry] fix metadata extraction
2 years ago
Mike Fährmann
00825cddf5
[hentaifoundry] use scheme from input URL ( fixes #1095 )
...
Let the user choose between http and https,
instead of always forcing https.
4 years ago
Mike Fährmann
0211af7ca8
[hentaifoundry] update 'YII_CSRF_TOKEN' cookie handling
...
(fixes #1083 )
4 years ago
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
4 years ago
Mike Fährmann
783e0af26d
[hentaifoundry] update and simplify
4 years ago
Mike Fährmann
dd1e545597
[hentaifoundry] rename GalleryExtractor to PicturesExtractor
4 years ago
Mike Fährmann
b9bdd2c564
[hentaifoundry] add support for stories ( closes #734 )
4 years ago
Mike Fährmann
0d43456323
[hentaifoundry] add 'include' option
4 years ago
Mike Fährmann
4e361b3008
add tests for specific datetime values
5 years ago
Mike Fährmann
33a6e0ac6e
[hentaifoundry] extract more metadata ( closes #565 )
5 years ago
Mike Fährmann
1848788970
update test results etc
5 years ago
Mike Fährmann
61e413d85d
[hentaifoundry] stop disabling IPv6 addresses
...
The rogue address mentioned in a138d58
is no longer included in the DNS
results for www.hentai-foundry.com.
5 years ago
Mike Fährmann
a138d5873d
[hentaifoundry] improve/fix extraction
...
- Sometimes an ad interfered when trying to get a download URL
- Resolving "www.hentai-foundry.com" yields an invalid(?) IPv6 address
(2607:5300:60:ca9e:feed:dead:beef:1) and urllib3 only tries to connect
to the IPv4 variant after a rather long wait time
6 years ago
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
6 years ago
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
041bd501fc
[hentaifoundry] unescape YII_CSRF_TOKEN value
...
This fixes the POST requests to /site/filters
6 years ago
Mike Fährmann
c402cc4047
[hentaifoundry] add 'popular' and 'recent' extractors
...
for "Popular Pictures" and "Recent Pictures" listings
6 years ago
Mike Fährmann
a5fc311dfa
[hentaifoundry] add 'favorite' extractor
6 years ago
Mike Fährmann
1c95a0173f
[hentaifoundry] split 'artist' into 'user'+'artist'
...
and some smaller changes ...
'user' is the name of the account an image is listed at and
'artist' is now the name of the account who created the image.
For example "https://www.hentai-foundry.com/user/Tenpura/faves/pictures "
- 'user': Tenpura
- 'artist' of the only image: LewdBrush
6 years ago
Mike Fährmann
006f75b538
[hentaifoundry] rewrite + more metadata
...
- extract width, height, artist per image
- improve pattern regex
- better extensibility for other listings
6 years ago
Mike Fährmann
eeb7424783
[hentaifoundry] add support for "scraps" ( #110 )
6 years ago
Mike Fährmann
017188d268
improve extractor.request()
...
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
6 years ago
Mike Fährmann
95392554ee
use text.urljoin()
6 years ago
Mike Fährmann
2d17a9e07f
improve extractor.request()
...
- better retry behavior
- exponential back-off
- removed 'allow_empty' argument
7 years ago
Mike Fährmann
f471161920
Merge branch 'master' into 1.4-dev
7 years ago
Mike Fährmann
eb37fbf0e8
[hentaifoundry] improve extractor
...
- use common base class
- better pagination
- respect '.../page/<num>'
- implement skip() / --range support
- get YII_CSRF_TOKEN from cookies
7 years ago
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
7 years ago
Mike Fährmann
179bcdd349
adjust archive-ids
7 years ago
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
7 years ago
Mike Fährmann
92027f67f9
use consistent names for URL constants
...
root := <scheme>://<host>
base_url := <root>/<common path>
7 years ago
Mike Fährmann
9c138dfc1f
[common] detect empty HTTP response bodies
7 years ago
Mike Fährmann
9fc1d0c901
implement and use 'util.safe_int()'
...
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
7 years ago
Mike Fährmann
915a0137de
improve 'extractor.request'
...
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
7 years ago
Mike Fährmann
dcc1d3b2ea
[hentaifoundry] fix infinite loop for multiple of 25 images
7 years ago
Mike Fährmann
13dc5d72bc
update some extractors to use https
8 years ago
Mike Fährmann
bd95fea82c
update unit test results
8 years ago
Mike Fährmann
0456efaa5a
[hentaifoundry] update unit tests
8 years ago
Mike Fährmann
0257d3e7ac
[mangamint] remove extractors - site is down
8 years ago
Mike Fährmann
7880cc1ad7
[imgtrex] remove extractor - domain no longer exists
8 years ago
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
8 years ago
Mike Fährmann
a849d8f2f7
add a few more tests
8 years ago
Mike Fährmann
efdc299547
[hentaifoundry] get artist name from webpage
8 years ago
Mike Fährmann
8b2024a1a5
[hentaifoundry] support direct links to images
8 years ago
Mike Fährmann
dfd1992a2c
[hentaifoundry] small updates
...
- throw an exception if an user or image does not exist
- update tests, since the user of the old ones left
8 years ago
Mike Fährmann
56d810c896
update keyword hashes for tests
8 years ago