Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
1 year ago
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
1 year ago
Mike Fährmann
d97b8c2fba
consistent cookie-related names
...
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
1 year ago
Mike Fährmann
da11fb32d0
update extractor test results
2 years ago
Mike Fährmann
dee0d22561
update extractor test results
3 years ago
Mike Fährmann
211de95dd0
update extractor test results
3 years ago
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
3 years ago
Mike Fährmann
ed4b3c48cb
fix flake8 and other tests
3 years ago
Nyasume
fa6af46756
Added ability to download GIFs instead of mp4 from Luscious and Reactor ( #1701 )
3 years ago
Mike Fährmann
bdfcc9c4b1
update extractor test results
3 years ago
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
4 years ago
Mike Fährmann
fd438f0d78
update extractor test results
5 years ago
Mike Fährmann
762c758af4
[hiperdex] fix extraction
5 years ago
Mike Fährmann
4e361b3008
add tests for specific datetime values
5 years ago
Mike Fährmann
82f7f4172a
update test results
5 years ago
Mike Fährmann
4325695d74
[luscious] expand GraphQL queries
5 years ago
Mike Fährmann
4409d00141
embed error messages in StopExtraction exceptions
5 years ago
Mike Fährmann
6e08ada4fe
[luscious] simplify some metadata entries
5 years ago
Mike Fährmann
b23c822b23
[luscious] use GraphQL
5 years ago
Mike Fährmann
d92802fd37
[luscious] fix detection of unavailable galleries
5 years ago
Mike Fährmann
c50d60a53d
[reactor] fix image URLs
5 years ago
Mike Fährmann
4a0c98bfc9
miscellaneous fixes and adjustments
5 years ago
Mike Fährmann
40637556fa
[ngomik] fix extraction
5 years ago
Mike Fährmann
7a14aaed7d
[luscious] fix extraction
5 years ago
Mike Fährmann
aa8e366b90
[luscious] fix tag extraction
5 years ago
Mike Fährmann
f2cf1c1d73
use 'text.extract_from()' in a few places
5 years ago
Mike Fährmann
e25ebc4bff
don't disable certificate checks anymore
...
Executables generated with PyInstaller auto-include the root certificate
file and certificate checks now work out-of-the-box.
6 years ago
Mike Fährmann
2ff043edfa
[yaplog] add user- and post-extractors ( #190 )
6 years ago
Mike Fährmann
00d604cafb
[luscious] fix SearchExtractor URL-pattern
6 years ago
Mike Fährmann
1384ebf907
[luscious] fix metadata extraction
...
- remove 'artist', 'language', and 'lang' fields
- replace 'section' with 'genre'
- provide 'tags' as list
- use GalleryExtractor as base class
6 years ago
Mike Fährmann
d0f88c35be
[komikcast] fix extraction
6 years ago
Mike Fährmann
a2af2d2965
adjust cache maxage values
6 years ago
Mike Fährmann
e687a6095e
[luscious] raise exception if album is not available
6 years ago
Mike Fährmann
61741d7333
provide type information for Queue messages
...
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
00dc37ccbf
replace AsynchronousMixin Extractor with a Mixin
6 years ago
Mike Fährmann
dd358b4564
improve cookie handling during logins
6 years ago
Mike Fährmann
0c32dc5858
[hentaifox] add extractor for search results ( #160 )
6 years ago
Mike Fährmann
e4171d6baf
[luscious] add login capabilities ( closes #159 )
6 years ago
Mike Fährmann
c9ef5ed364
[luscious] ensure URLs have a scheme
6 years ago
Mike Fährmann
a4263fb253
[luscious] add extractor for search results ( closes #127 )
6 years ago
Mike Fährmann
e1d306cc48
update unit test results
6 years ago
Mike Fährmann
38d4f43cc0
[komikcast] skip ads
6 years ago
Mike Fährmann
df7e18399e
[luscious] fix image order
7 years ago
Mike Fährmann
759ba26fb0
[luscious] proper image order for picture albums
...
... and (try) to start with the first image instead of somewhere
in the middle of an album.
7 years ago
Mike Fährmann
557cb94f81
[deviantart] use proper exponential backoff on API errors
...
... and use separate API credentials for unit tests.
7 years ago
Mike Fährmann
3cec533c28
Merge branch 'archive'
7 years ago
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
7 years ago
Mike Fährmann
a34cebc253
[luscious] jump to first image if cover does not link to it
7 years ago