Mike Fährmann
b41d9bf616
[paheal] fix 'source' metadata
8 months ago
Mike Fährmann
f9544194c0
[paheal] restore 'extension' metadata ( #4976 )
9 months ago
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
1 year ago
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
1 year ago
Mike Fährmann
d97b8c2fba
consistent cookie-related names
...
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
1 year ago
Mike Fährmann
f0cb951566
[paheal] unescape 'source'
1 year ago
Mike Fährmann
b480b7076a
[paheal] fix a78f8ce5
for enabled 'metadata' ( #4262 )
1 year ago
Mike Fährmann
a78f8ce5b0
[paheal] fix extraction ( #4262 )
...
swap ' and "
1 year ago
Mike Fährmann
7865067d19
[shimmie2] add generic extractors for Shimmie2 sites ( #3734 )
...
add support for
- loudbooru.com (#3734 )
- booru.cavemanon.xyz (#3734 )
- giantessbooru.com (#943 )
- tentaclerape.net
1 year ago
Mike Fährmann
2ed58029f9
{paheal[ add proper support for videos ( #2892 )
2 years ago
Mike Fährmann
4b78bd423f
[paheal] add 'metadata' option ( #2641 )
2 years ago
Mike Fährmann
61fa9b535a
[paheal] improve metadata extraction ( #2641 )
...
- unescape 'tags'
- add 'date', 'source', and 'uploader' for single posts
2 years ago
Mike Fährmann
211de95dd0
update extractor test results
3 years ago
Mike Fährmann
4b1cda4cf7
[paheal] fix metadata extraction
4 years ago
Mike Fährmann
43120407cc
[paheal] create directory for each post ( closes #1147 )
4 years ago
Mike Fährmann
1e3dd7330e
merge SharedConfigMixin functionality into Extractor
4 years ago
Mike Fährmann
558cde139c
[paheal] fix extraction ( fixes #1088 )
4 years ago
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
4 years ago
Mike Fährmann
844793847c
update extractor test results
4 years ago
Mike Fährmann
19bf76bcf8
update extractor test results
4 years ago
Mike Fährmann
1d4a369ea2
update extractor test results
5 years ago
Mike Fährmann
e6cd49e78b
update extractor test results
5 years ago
Mike Fährmann
2852691d78
[paheal] replace test URL
...
searching for 'k-on' doesn't yield any results anymore
5 years ago
Mike Fährmann
62335b9015
[paheal] adjust test results
5 years ago
Mike Fährmann
6a34f4b0c1
skip tests on read timeouts; print list of skipped tests
5 years ago
Mike Fährmann
d6ddb74cde
update test results
...
- deviantart: 'index' is now an integer
- flickr: image file with lower quality
- paheal: image server name changed
- rule34: post got deleted
6 years ago
Mike Fährmann
f8782c05f2
[paheal] rename "tags" to "search_tags"
...
to better match field names of other booru extractors
6 years ago
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
4d656a81ca
replace SharedConfigExtractor class with a Mixin
6 years ago
Mike Fährmann
4d73cc785d
update test results
6 years ago
Mike Fährmann
c9f70e0a19
[paheal] use HTTPS
6 years ago
Mike Fährmann
7a58151566
fix util.parse_bytes invocations
...
(should be text.parse_bytes)
6 years ago
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
7 years ago
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
7 years ago
Mike Fährmann
40d35c87bc
[paheal] add tag- and post-extractors ( closes #69 )
7 years ago