Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
1 year ago
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
1 year ago
Mike Fährmann
d97b8c2fba
consistent cookie-related names
...
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
1 year ago
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2 years ago
Mike Fährmann
b1b15d6cef
[imagebam] add support for /view/ paths ( closes #2378 )
3 years ago
Mike Fährmann
1c79044433
[imagebam] set 'nsfw_inter' cookie ( fixes #2334 )
3 years ago
Mike Fährmann
8a909e478d
[imagebam] fix extraction of NSFW images ( #1534 )
3 years ago
Mike Fährmann
15b0241bbc
[imagebam] fix extraction
3 years ago
Mike Fährmann
eb7da159e2
[imagebam] update URL test results
...
Image URLs are now using https://, but the website itself is still
served as http://.
5 years ago
Mike Fährmann
155e1faeaf
[imagebam] support galleries with >100 images ( fixes #219 )
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
eb1c24b286
[imagebam] detect nonexistent galleries
6 years ago
Mike Fährmann
789608c107
[imagebam] fix extraction for certain galleries
6 years ago
Mike Fährmann
5008e105ee
update archive IDs
...
... to behave in a more straightforward way when dealing with
bookmarks/favourites/etc.
specific IDs are now grouped by their owner, album-id, ... to
allow for duplicates when it would be expected.
7 years ago
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
7 years ago
Mike Fährmann
619387cbb1
update extractor unittest results
7 years ago
Mike Fährmann
92027f67f9
use consistent names for URL constants
...
root := <scheme>://<host>
base_url := <root>/<common path>
7 years ago
Mike Fährmann
68a0a7579c
fix/improve some regular expressions
7 years ago
Mike Fährmann
6f30cf4c64
change keyword names to valid Python identifiers
...
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.
(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
7 years ago
Mike Fährmann
c184e47ee3
put common directory- and filename formats in base classes
7 years ago
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
8 years ago
Mike Fährmann
56d810c896
update keyword hashes for tests
8 years ago
Mike Fährmann
19c2d4ff6f
remove explicit (sub)category keywords
8 years ago
Mike Fährmann
d7e168799d
consistent extractor naming scheme + docstrings
8 years ago
Mike Fährmann
2afa65cfc7
[imagebam] add single-image extractor
8 years ago
Mike Fährmann
000df8d1fa
add 'encoding' argument for Extractor.request
8 years ago
Mike Fährmann
4d56b76aa8
update all other extractors
9 years ago
Mike Fährmann
c2f0720184
code cleanup to use nameext_from_url
9 years ago
Mike Fährmann
c0efea339e
[imagebam] rewrite/fix
9 years ago
Mike Fährmann
3c13548f29
rewrite extractors to use config-module
9 years ago
Mike Fährmann
42b8e81a68
rewrite extractors to use text-module
9 years ago
Mike Fährmann
e41768d969
[imagebam] update to new extractor interface
10 years ago
Mike Fährmann
729d2d8b20
[imagebam] fixed issue with destination direcotry name
10 years ago
Mike Fährmann
98dd5f9a90
added extractor 'imagebam'
10 years ago