Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
6 years ago
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
ade86da7a1
[tsumino] replace test
6 years ago
Mike Fährmann
1f3422c28b
[mangahere] fix extraction
6 years ago
Mike Fährmann
84ae72b8d8
[ngomik] fix extraction
6 years ago
Mike Fährmann
02d733d219
[simplyhentai] fix and improve tag extraction
...
The "tags" field is now a list instead of a string.
In format strings, use "{tags:J, }" to Join them.
6 years ago
Mike Fährmann
3a0b4af744
[seiga] recognize /thumb/ URLs
...
https://lohas.nicoseiga.jp/thumb/5977527i
6 years ago
Mike Fährmann
8fc6fbfa34
[artstation] recognize shortened project URLs
...
https://artstn.co/p/ <project-id>
6 years ago
Mike Fährmann
9a9cd32461
implement alternative constructor for extractors
6 years ago
Mike Fährmann
abbd45d0f4
update handling of extractor URL patterns
...
When loading extractor classes during 'extractor.find(…)', their
'pattern' attribute will be replaced with a compiled version of itself.
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
34bab080ae
rewrite URL patterns to use only 1 per extractor
6 years ago
Mike Fährmann
0e46db6f45
rename some base classes
...
They shouldn't be called …Extractor if they don't have 'Extractor' as
their base class.
6 years ago
Mike Fährmann
793b24e513
[imagehosts] fix and improve various extractors
6 years ago
Mike Fährmann
bc0951d974
allow for simplified test data structures
...
Instead of a strict list of (URL, RESULTS)-tuples, extractor result
tests can now be a single (URL, RESULTS)-tuple, if it's just one test,
and "only matching" tests can now be a simple string.
6 years ago
Mike Fährmann
050bc1aa4a
[reactor] simplify tests
...
Some posts have, for whatever reason, a slightly different text
formatting the first time they are accessed that day
compared to any further time.
6 years ago
Mike Fährmann
2f3a021d72
[hentaicafe] restore functionality
6 years ago
Mike Fährmann
347398f692
fix various tests
6 years ago
Mike Fährmann
00dc37ccbf
replace AsynchronousMixin Extractor with a Mixin
6 years ago
Mike Fährmann
4d656a81ca
replace SharedConfigExtractor class with a Mixin
6 years ago
Mike Fährmann
ccb95d0ba4
[mastodon] changes/improvements based on foolfuuka/-slide
6 years ago
Mike Fährmann
12ff750111
[foolfuuka] smaller code changes and updates
6 years ago
Mike Fährmann
e1bf3b225e
[foolslide] dynamically generate extractor classes
6 years ago
Mike Fährmann
58a9eede38
[foolfuuka] dynamically generate extractor classes
6 years ago
Mike Fährmann
22d7a783d5
update extraction result tests
6 years ago
Mike Fährmann
197d0e99a4
[tsumino] more useful error message ( #161 )
...
if Tsumino suspects a non-human user and refuses to send gallery pages
6 years ago
Mike Fährmann
d36ec51e5a
[tsumino] add extractor for search results ( #161 )
6 years ago
Mike Fährmann
1c1367ec5b
[behance] fix empty docstring
6 years ago
Mike Fährmann
45e529ab91
[behance] fix extraction
...
HTML structure for gallery pages changed quite a bit, so it is now using
the embedded JSON data. This changes a lot of metadata field names, but
'gallery_id', 'title', and 'user' are still provided for backwards
compatibility.
The internal API endpoint for user galleries also changed its data
structure, but nothing too major.
6 years ago
Mike Fährmann
bfbbac4495
[tsumino] add login capabilities ( #161 )
6 years ago
Mike Fährmann
dd358b4564
improve cookie handling during logins
6 years ago
Mike Fährmann
6126615698
update URLs for supportedsites.rst
6 years ago
Mike Fährmann
80a75a1ecf
[tsumino] add gallery extractor ( #161 )
6 years ago
Mike Fährmann
2d2953a5bf
add 'text.parse_float()' + cleanup in text.py
6 years ago
Mike Fährmann
0c32dc5858
[hentaifox] add extractor for search results ( #160 )
6 years ago
Mike Fährmann
580947bfce
[hentaifox] rename Chapter- to GalleryExtractor ( #160 )
6 years ago
Mike Fährmann
8095f5f81a
[mangapark] fix manga title extraction
6 years ago
Mike Fährmann
0156189468
[hentaifox] add chapter extractor ( #160 )
6 years ago
Mike Fährmann
e4171d6baf
[luscious] add login capabilities ( closes #159 )
6 years ago
Mike Fährmann
4f49fdf065
[mastodon] various improvements and fixes ( #144 )
...
- allow instances to specify their own 'category'
- improve config lookup:
- first look into extractor.<category>.*
- and afterwards look into extractor.mastodon.<instance>.*
- add a default entry for pawoo.net in a way that actually works
- add an 'instance' keyword and turn 'tags' into a usable list
6 years ago
Mike Fährmann
3f608a84b7
[photobucket] don't crash if JSON data is missing
6 years ago
Mike Fährmann
134487ffb0
[exhentai] stop extraction if image limit is exceeded ( #141 )
...
can be turned off with the `exhentai.limits' option
6 years ago
Mike Fährmann
e868fb4393
[exhentai] improve gallery extraction
...
- match image page URLs and extract galleries from that point onward
- add a few more metadata entries: 'parent', 'visible', 'cost'
6 years ago
Mike Fährmann
a50e9faf0e
[newgrounds] recognize direct links
6 years ago
Mike Fährmann
c5559fa07d
[photobucket] improve subalbum extraction ( #117 )
...
The former implementation would produce a complete list of all subalbums
for each (sub)album extraction. This would for example result in a
level 2 subalbum getting "extracted" twice: once through the root-album
(level 0) and once through its parent album on level 1.
In the current implementation only the next level of subalbums are
returned, which themselves will handle their next level in a recursive
fashion.
6 years ago
Mike Fährmann
ecad69100a
[photobucket] add 'image' extractor ( #117 )
6 years ago
Mike Fährmann
b50b30f1c9
[photobucket] download subalbums ( #117 )
6 years ago
Mike Fährmann
d19bac71be
[photobucket] add 'album' extractor ( #117 )
6 years ago
Mike Fährmann
78b5f29a00
[sankaku] unescape tags
6 years ago