Mike Fährmann
3ecb512722
send Referer headers by default
1 year ago
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
1 year ago
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
1 year ago
Mike Fährmann
d97b8c2fba
consistent cookie-related names
...
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
1 year ago
Mike Fährmann
5503ac4d5e
replace json.dumps with direct calls to JSONEncoder.encode
2 years ago
Mike Fährmann
c6a9bab019
update extractor test results
2 years ago
Mike Fährmann
49a50fb2eb
[500px] create directories per photo
3 years ago
Mike Fährmann
89bebe1bef
[500px] add 'favorite' extractor ( closes #1927 )
3 years ago
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
3 years ago
Mike Fährmann
21c2da454f
update extractor test results
3 years ago
Mike Fährmann
0d2961ae81
[500px] remove last query hash entry
...
forgot to include this in b56e2450
3 years ago
Mike Fährmann
b56e245094
[500px] update GraphQL queries
...
500px changed its method from query hashes to sending the entire query
string for every request.
3 years ago
Mike Fährmann
532ac79fb0
update extractor test results
3 years ago
Mike Fährmann
d7bc4a2b8b
[500px] update query hashes
3 years ago
Mike Fährmann
b3ee10a7fb
[500px] update query hashes
3 years ago
Mike Fährmann
82c32d25af
[500px] update query hashes
3 years ago
Mike Fährmann
9785c551bc
[500px] skip unavailable photos ( #1335 )
...
instead of crashing with a KeyError exception
4 years ago
Mike Fährmann
e88d5bede8
[500px] update query hash
4 years ago
Mike Fährmann
a46561bc16
[500px] update query hashes
4 years ago
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
4 years ago
Mike Fährmann
93e04bf9a9
[500px] update query hashes
4 years ago
Mike Fährmann
cc1fb0b4ea
[500px] update query hash
4 years ago
Mike Fährmann
84e04cc23b
[500px] fix extraction and update URL patterns ( fixes #956 )
...
- rewrite most API calls to GraphQL queries
- match '500px.com/p/<user>' URLs
4 years ago
Mike Fährmann
38b6bd66b0
[500px] match 'web.500px.com' subdomains
4 years ago
Mike Fährmann
a3c736fedc
[500px] fix extraction
...
Maximum available image dimensions have been reduced to 4096px
on the longest edge. (from 5000px)
A few (unimportant) metadata fields are no longer available or have
been changed to 'null'.
5 years ago
Mike Fährmann
8d96a8ce4c
[500px] add user-, gallery-, and image-extractors ( #185 )
6 years ago