Mike Fährmann
3ecb512722
send Referer headers by default
1 year ago
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
1 year ago
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
1 year ago
Mike Fährmann
a08fdfac6e
[foolfuuka] add 'archive.palanq.win'
1 year ago
Mike Fährmann
1870df8b23
[foolfuuka] remove 'tokyochronos.net'
1 year ago
Mike Fährmann
ef4e2d8178
[foolfuuka] remove 'archive.alice.al'
1 year ago
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2 years ago
Mike Fährmann
7e385ed63e
[foolfuuka] update domains
...
- remove nyafuu
- add rozenarcana (https://archive.alice.al/ )
- add tokyochronos (https://www.tokyochronos.net )
2 years ago
Mike Fährmann
2dc57637cf
[foolfuuka] remove archive.wakarimasen.moe
2 years ago
Mike Fährmann
bd6ec5c352
[foolfuuka] match 4chan filenames ( #2577 )
...
introduce two new metadata fields:
- filename_media: original filename of file uploaded to 4chan
- timestamp_ms : timestamp with millisecond precision (tim)
2 years ago
Mike Fährmann
d26da3b9e5
add pre-generated 'pattern' for supported BaseExtractor sites
2 years ago
Mike Fährmann
dee0d22561
update extractor test results
3 years ago
Mike Fährmann
275543b2d2
update extractor test results
3 years ago
Mike Fährmann
211de95dd0
update extractor test results
3 years ago
Mike Fährmann
c04f7ab139
[foolfuuka] add 'gallery' extractor ( #1785 )
3 years ago
Mike Fährmann
21c2da454f
update extractor test results
3 years ago
Mike Fährmann
407627ec86
[foolfuuka] support 'archive.wakarimasen.moe' ( closes #1595 )
3 years ago
Mike Fährmann
532ac79fb0
update extractor test results
3 years ago
Mike Fährmann
671a95cae5
[foolfuuka] use BaseExtractor
4 years ago
Mike Fährmann
e9a75e27d9
[foolfuuka] stop search when results are exhausted ( #1174 )
4 years ago
Mike Fährmann
56b460dcea
[foolfuuka] add 'search' extractors ( #1174 )
4 years ago
Mike Fährmann
fb64183d53
[foolfuuka] add 'board' extractors ( closes #1044 )
4 years ago
Mike Fährmann
1e3dd7330e
merge SharedConfigMixin functionality into Extractor
4 years ago
Mike Fährmann
f5b7ae01c1
update extractor test results
4 years ago
Mike Fährmann
82f7f4172a
update test results
5 years ago
Mike Fährmann
41a3169c67
[foolfuuka] use '{extension}' in default filename format
5 years ago
Mike Fährmann
2a3bd4e3c7
rename extractor classes starting with a digit
5 years ago
Mike Fährmann
8de5866fd2
[twitter] replace unit test URLs
...
https://twitter.com/PicturesEarth was deleted
5 years ago
Mike Fährmann
591a07f20c
small code changes and cleanups
6 years ago
Mike Fährmann
09d872a2b1
generalize extractor creation code
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
4d656a81ca
replace SharedConfigExtractor class with a Mixin
6 years ago
Mike Fährmann
12ff750111
[foolfuuka] smaller code changes and updates
6 years ago
Mike Fährmann
58a9eede38
[foolfuuka] dynamically generate extractor classes
6 years ago