Mike Fährmann
8fc6fbfa34
[artstation] recognize shortened project URLs
...
https://artstn.co/p/ <project-id>
6 years ago
Mike Fährmann
9a9cd32461
implement alternative constructor for extractors
6 years ago
Mike Fährmann
abbd45d0f4
update handling of extractor URL patterns
...
When loading extractor classes during 'extractor.find(…)', their
'pattern' attribute will be replaced with a compiled version of itself.
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
34bab080ae
rewrite URL patterns to use only 1 per extractor
6 years ago
Mike Fährmann
0e46db6f45
rename some base classes
...
They shouldn't be called …Extractor if they don't have 'Extractor' as
their base class.
6 years ago
Mike Fährmann
793b24e513
[imagehosts] fix and improve various extractors
6 years ago
Mike Fährmann
bc0951d974
allow for simplified test data structures
...
Instead of a strict list of (URL, RESULTS)-tuples, extractor result
tests can now be a single (URL, RESULTS)-tuple, if it's just one test,
and "only matching" tests can now be a simple string.
6 years ago
Mike Fährmann
b49c3c9991
release version 1.7.0
6 years ago
Mike Fährmann
53c2fd4664
add mastodon/foolslide/foolfuuka examples to example config
6 years ago
Mike Fährmann
050bc1aa4a
[reactor] simplify tests
...
Some posts have, for whatever reason, a slightly different text
formatting the first time they are accessed that day
compared to any further time.
6 years ago
Mike Fährmann
2f3a021d72
[hentaicafe] restore functionality
6 years ago
Mike Fährmann
347398f692
fix various tests
6 years ago
Mike Fährmann
00dc37ccbf
replace AsynchronousMixin Extractor with a Mixin
6 years ago
Mike Fährmann
4d656a81ca
replace SharedConfigExtractor class with a Mixin
6 years ago
Mike Fährmann
ccb95d0ba4
[mastodon] changes/improvements based on foolfuuka/-slide
6 years ago
Mike Fährmann
12ff750111
[foolfuuka] smaller code changes and updates
6 years ago
Mike Fährmann
e1bf3b225e
[foolslide] dynamically generate extractor classes
6 years ago
Mike Fährmann
58a9eede38
[foolfuuka] dynamically generate extractor classes
6 years ago
Mike Fährmann
22d7a783d5
update extraction result tests
6 years ago
Mike Fährmann
197d0e99a4
[tsumino] more useful error message ( #161 )
...
if Tsumino suspects a non-human user and refuses to send gallery pages
6 years ago
Mike Fährmann
d36ec51e5a
[tsumino] add extractor for search results ( #161 )
6 years ago
Mike Fährmann
1c1367ec5b
[behance] fix empty docstring
6 years ago
Mike Fährmann
373cb07b28
update .travis.yml and run_tests.sh
...
- add python3.8 and pypy3 builds
- remove deprecated 'sudo: true' and 'sudo: false'
- enable builds for 'test-...' branches
6 years ago
Mike Fährmann
45e529ab91
[behance] fix extraction
...
HTML structure for gallery pages changed quite a bit, so it is now using
the embedded JSON data. This changes a lot of metadata field names, but
'gallery_id', 'title', and 'user' are still provided for backwards
compatibility.
The internal API endpoint for user galleries also changed its data
structure, but nothing too major.
6 years ago
Mike Fährmann
e1d3e9a926
add 'ext_from_url' to text.py
6 years ago
Mike Fährmann
bfbbac4495
[tsumino] add login capabilities ( #161 )
6 years ago
Mike Fährmann
dd358b4564
improve cookie handling during logins
6 years ago
Mike Fährmann
6126615698
update URLs for supportedsites.rst
6 years ago
Mike Fährmann
80a75a1ecf
[tsumino] add gallery extractor ( #161 )
6 years ago
Mike Fährmann
2d2953a5bf
add 'text.parse_float()' + cleanup in text.py
6 years ago
Mike Fährmann
0c32dc5858
[hentaifox] add extractor for search results ( #160 )
6 years ago
Mike Fährmann
580947bfce
[hentaifox] rename Chapter- to GalleryExtractor ( #160 )
6 years ago
Mike Fährmann
8095f5f81a
[mangapark] fix manga title extraction
6 years ago
Mike Fährmann
0156189468
[hentaifox] add chapter extractor ( #160 )
6 years ago
Mike Fährmann
e4171d6baf
[luscious] add login capabilities ( closes #159 )
6 years ago
Mike Fährmann
4f49fdf065
[mastodon] various improvements and fixes ( #144 )
...
- allow instances to specify their own 'category'
- improve config lookup:
- first look into extractor.<category>.*
- and afterwards look into extractor.mastodon.<instance>.*
- add a default entry for pawoo.net in a way that actually works
- add an 'instance' keyword and turn 'tags' into a usable list
6 years ago
Mike Fährmann
3f608a84b7
[photobucket] don't crash if JSON data is missing
6 years ago
Mike Fährmann
134487ffb0
[exhentai] stop extraction if image limit is exceeded ( #141 )
...
can be turned off with the `exhentai.limits' option
6 years ago
Mike Fährmann
e868fb4393
[exhentai] improve gallery extraction
...
- match image page URLs and extract galleries from that point onward
- add a few more metadata entries: 'parent', 'visible', 'cost'
6 years ago
Mike Fährmann
a50e9faf0e
[newgrounds] recognize direct links
6 years ago
Mike Fährmann
9fba48fbd7
[postprocessor:metadata] add '--write-tags' flag ( #135 )
6 years ago
Mike Fährmann
c5559fa07d
[photobucket] improve subalbum extraction ( #117 )
...
The former implementation would produce a complete list of all subalbums
for each (sub)album extraction. This would for example result in a
level 2 subalbum getting "extracted" twice: once through the root-album
(level 0) and once through its parent album on level 1.
In the current implementation only the next level of subalbums are
returned, which themselves will handle their next level in a recursive
fashion.
6 years ago
Mike Fährmann
ecad69100a
[photobucket] add 'image' extractor ( #117 )
6 years ago
Mike Fährmann
b50b30f1c9
[photobucket] download subalbums ( #117 )
6 years ago
Mike Fährmann
d19bac71be
[photobucket] add 'album' extractor ( #117 )
6 years ago
Mike Fährmann
78b5f29a00
[sankaku] unescape tags
6 years ago
Mike Fährmann
277b52101a
add 'category-transfer' option
...
[ci skip]
6 years ago
Mike Fährmann
9b8ac12eed
[behance] enable 'categorytransfer' for collections ( #157 )
6 years ago
Mike Fährmann
217a0687ef
[behance] add 'collection' extractor ( closes #157 )
6 years ago