Mike Fährmann
ecdc3475b8
[pixhost] support .to TLDs
6 years ago
Mike Fährmann
f3d770d4e2
Merge branch '1.4-dev'
6 years ago
Mike Fährmann
1ff626db97
[pixiv] improve bookmark extraction
...
- combine 'favorite' and 'bookmark' extractors
- it is now one extractor class, but its subcategory still
distinguishes between your own bookmarks ('bookmark') and other
user's bookmarks ('favorite') like before
- allow filtering by bookmark tags and public/private bookmarks
- fix pagination for bookmark results
6 years ago
Mike Fährmann
0a1863fce3
[pixiv] respect more query parameters for user URLs
...
The API endpoint responsible for user illustrations does not
provide sufficient filter capabilities* to match the actual
website, so we are spinning our own filters.
Respected parameters are
'type': illust, manga, ugoira
'tag' : any image tag (this was already supported)
'p' : the page to start on
*
- API can filter for illustrations and manga, but not for ugoira.
- 'offset' is applied before filtering
- no 'tag' filter
6 years ago
Mike Fährmann
f43d446692
[mangahere] extract chapter titles
6 years ago
Mike Fährmann
b8e53b8c6b
[pixiv] move query parsing out of constructor
...
better exception handling, among other things
6 years ago
Mike Fährmann
909d105ae6
[pixiv] add extractor for illusts from followed users
6 years ago
Mike Fährmann
7f899bd5d8
Merge branch 'master' into 1.4-dev
6 years ago
Mike Fährmann
fe69d01083
[pixiv] add extractor for search results
6 years ago
Mike Fährmann
247f785af1
[pixiv] use App API
...
Transitioning to the App API breaks favorites archive IDs (there is
no longer any bookmark ID information), but the favorites API endpoint
of the public API was gone anyways ...
6 years ago
Mike Fährmann
92fc199b07
[reddit] allow arbitrary subdomains
6 years ago
Mike Fährmann
4cea886177
[imgur] allow longer album hashes
6 years ago
Mike Fährmann
e1e23165a0
[pinterest] catch JSON decode errors
6 years ago
Mike Fährmann
789608c107
[imagebam] fix extraction for certain galleries
6 years ago
Mike Fährmann
7a58151566
fix util.parse_bytes invocations
...
(should be text.parse_bytes)
6 years ago
Mike Fährmann
1c1e086d01
use common base class for OAuth1.0 based API interfaces
6 years ago
Mike Fährmann
f3483a2b7c
[smugmug] add OAuth support
6 years ago
Mike Fährmann
6a31ada9e3
re-implement OAuth1.0 code
...
OAuth support for SmugMug needs some additional features
(auth-rebuild on redirect, query parameters in URL, ...)
and fixing this in the old code wouldn't work all that well.
6 years ago
Mike Fährmann
ec158776ed
[deviantart] add extractor for popular listings
6 years ago
Mike Fährmann
0e3883303f
[pixiv] implement AppAPI wrapper
6 years ago
Mike Fährmann
e2157f594e
[mangadex] fix manga extraction ( closes #84 )
...
Chapter listings for manga now use
https://mangadex.org/manga/ <id>/_/chapters/2/
as URL instead of
https://mangadex.org/manga/ <id>/_//2/
6 years ago
Mike Fährmann
69a5e6ddb3
Merge branch 'master' into 1.4-dev
6 years ago
Mike Fährmann
3ce5296313
[smugmug] code cleanup
...
- combine User and Node extractors
- (re)move miscellaneous helper functions
- rename "Owner" to "User"
6 years ago
Mike Fährmann
42ed7667b8
[smugmug] support user- and general album URLs
6 years ago
Mike Fährmann
2ea0d1da42
[smugmug] improve API code; use data expansions
6 years ago
Mike Fährmann
16e014baaa
[smugmug] added image and album extractor
...
just some initial code that still requires a lot of work ...
TODO:
- folders
- old-style albums (which are nearly all of them ...)
- images from users
- OAuth
It could also happen that the API credentials used will become invalid
whenever my 14 day trial period ends (7 days remaining), but that
would just require users to supply their own.
6 years ago
Mike Fährmann
d96b3474e5
[puremashiro] remove module
...
site has been unreachable for a couple of weeks
and now the DNS record is gone as well
6 years ago
Mike Fährmann
b44a296404
[gomanga] remove module
...
site has been unreachable for a couple of weeks
and the cloudflare status page shows host errors
6 years ago
Mike Fährmann
95392554ee
use text.urljoin()
6 years ago
Mike Fährmann
2395d870dd
[pinterest] unquote board and user names, better errors
6 years ago
Mike Fährmann
8b79eaafea
[tumblr] log actual time of rate limit resets
...
... instead of the amount of seconds until a reset
6 years ago
Mike Fährmann
0f1e07f627
[pinterest] scrap OAuth implementation; code improvements
...
OAuth authentication isn't needed anymore and other tools
like Postman are better suited for this job anyway.
6 years ago
Mike Fährmann
55d4d23860
[pinterest] use Pinterest's "Web" API ( #83 )
...
no access tokens, no user credentials of any kind ...
7 years ago
Mike Fährmann
2721417dd8
Merge branch 'master' into 1.4-dev
7 years ago
Mike Fährmann
c6d5154fc3
fix flake8 errors, ignore W504
...
pycodestyle 2.4.0 enforces some new style guidelines
7 years ago
Mike Fährmann
2d17a9e07f
improve extractor.request()
...
- better retry behavior
- exponential back-off
- removed 'allow_empty' argument
7 years ago
Mike Fährmann
80521ae1f6
[deviantart] improve API error handling
...
The previous implementation would retry requests with 4xx status codes
in an infinite loop, which is especially a problem when querying
non-existent users or groups. These are now properly handled with a
NotFoundError exception.
7 years ago
Mike Fährmann
e54b43be08
[mangadex] add title info for chapter extractors
7 years ago
Mike Fährmann
f471161920
Merge branch 'master' into 1.4-dev
7 years ago
Mike Fährmann
eb37fbf0e8
[hentaifoundry] improve extractor
...
- use common base class
- better pagination
- respect '.../page/<num>'
- implement skip() / --range support
- get YII_CSRF_TOKEN from cookies
7 years ago
Mike Fährmann
80bead739d
[oauth] require custom client-* values for pinterest
7 years ago
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
7 years ago
Mike Fährmann
10cc59f3b5
fix extractor names
7 years ago
Mike Fährmann
b1325d4d2c
fix extractor docstrings
7 years ago
Mike Fährmann
df7e18399e
[luscious] fix image order
7 years ago
Mike Fährmann
d10579edb5
[pinterest] improve PinterestAPI code; remove OAuth mentions
...
on another note: access_tokens have been set to only allow for
10 requests per hour (from 200 yesterday)
7 years ago
Mike Fährmann
4bd182c107
[pinterest] implement `oauth:pinterest` ( #83 )
...
Pinterest access tokens are rate limited at 200 requests per
hour (or maybe per 2 or 3 hours?) so having just one access token
for all users isn't going to work in the long run.
7 years ago
Mike Fährmann
9651f3fce0
[pinterest] improve error messages ( #83 )
7 years ago
Mike Fährmann
dbe250f7e5
[pinterest] update access_token ( #83 )
7 years ago
Mike Fährmann
dd49127408
[spectrumnexus] remove module
...
Site stopped hosting manga scans (http://view.thespectrum.net/ )
7 years ago
Mike Fährmann
5c487300ee
improve 'parse_query()' and add tests
...
- another irrelevant micro-optimization !
- use urllib.parse.parse_qsl directly instead of parse_qs, which
just packs the results of parse_qsl in a different data structure
- reduced memory requirements since no additional dict and lists are
created
7 years ago
Mike Fährmann
728c64a3fb
[tumblr] rename 'offset' to 'num and adjust formats
...
Trying to somehow emulate Tumblr filenames is a bad idea ...
7 years ago
Mike Fährmann
6bd857a319
[tumblr] handle rate limits / 429 errors
...
- wait for the hourly limit to reset
- abort upon exceeding the daily limit (it doesn't seem useful to
potentially wait for several hours)
7 years ago
Mike Fährmann
7073ab7707
[komikcast] update regex to only match manga pages
...
The 'readerarea' section now includes some (shady) external
Javascript file, which got matched as well.
7 years ago
Mike Fährmann
a1fa4b43b0
Revert "[tumblr] add option to sort photosets by upload order"
...
This reverts commit 4a26ae32df
.
7 years ago
Mike Fährmann
48a83a89e9
[loveisover] remove module
...
archive.loveisover.me was shut down on 2018-03-29;
https://www.archiveteam.org/index.php?title=4chan#archive.loveisover.me
7 years ago
Mike Fährmann
564e12ca8f
replace 'imgyt' with 'imxto'
...
https://img.yt/ wasn't available for a couple of days, but has now
re-emerged as https://imx.to/ with a new web-interface.
Links to older images still work (see tests).
7 years ago
Mike Fährmann
1b80fa82a9
[imgur] update URL pattern and tests
7 years ago
Mike Fährmann
4a26ae32df
[tumblr] add option to sort photosets by upload order
7 years ago
Mike Fährmann
6b72be8ee6
[tumblr] add 'hash' keyword
...
'hash' is the middle part of the filename in a tumblr image URL.
For example an image with '.../tumblr_p6tgemp1NZ1wgha4yo1_250.png' as
its URL would have 'p6tgemp1NZ1wgha4yo1' as hash.
7 years ago
Mike Fährmann
d11fcf4804
smaller changes and fixes
...
- fix the cloudflare challenge result if the last decimal places
are zero (JS`s toFixed() removes trailing zeroes)
- fix downloading of kissmanga chapter-pages hosted on blogspot
(accessing blogspot with "kissmanga.com" as referrer yields a 401)
- disable certificate validation for 'mangahere' tests
- update flickr test result
7 years ago
Mike Fährmann
759ba26fb0
[luscious] proper image order for picture albums
...
... and (try) to start with the first image instead of somewhere
in the middle of an album.
7 years ago
Mike Fährmann
68e9fbee16
[tumblr] check all 4 keys/secrets before using OAuth
...
it was possible to cause a crash by setting api-key or -secret to null.
(this commit also slightly improves the blog-cache implementation)
7 years ago
Mike Fährmann
f8168c693e
[tumblr] avoid calls to '/blog/.../info'
...
The same information returned by the 'blog/.../info' API endpoint
is also included in the result of every 'blog/.../posts' call.
7 years ago
Mike Fährmann
64d7c85b55
[exhentai] improve metadata
...
- add 'width', 'height' and 'size' (in bytes) for each image
- change the former 'size' and 'size_units' into 'gallery_size'
7 years ago
Mike Fährmann
64b22e0fc1
[pawoo] update URL pattern
...
adds support for 'https://pawoo.net/@.../media '
7 years ago
Mike Fährmann
7b562907c3
[nijie] add favorites extractor
...
adds support for 'https://nijie.info/user_like_illust_view.php?id= ...'
7 years ago
Mike Fährmann
445db75955
[nijie] improve extraction and metadata
...
- add 'title' and 'description'
- split 'artist_id' into 'user_id' and 'artist_id'
- 'user_id' is the ID of the user from which the image entry
originates from
- 'artist_id' is the ID of the actual image artist
- improve pagination and URL patterns
7 years ago
Mike Fährmann
a112e3f2a0
[nijie] add doujin extractor
...
adds support for "https://nijie.info/members_dojin.php?id= <artist_id>"
7 years ago
Mike Fährmann
f39153b6e9
[nhentai] add extractor for search results
7 years ago
Mike Fährmann
52d41c41e7
[exhentai] add extractor for favorited galleries
7 years ago
Mike Fährmann
63cc2599c4
[exhentai] add extractor for search results
7 years ago
Mike Fährmann
d1c91a1f2b
[mangadex] fix manga-page extraction
7 years ago
Mike Fährmann
299ae24996
[test] add a few downloader tests
7 years ago
Mike Fährmann
dd314279fb
[test] add unit tests for extractor module functions
7 years ago
Mike Fährmann
e7525b1b0e
[artstation] add challenge extractor ( #80 )
7 years ago
Mike Fährmann
f5c6a2d7f5
[nhentai] use API to get gallery info
7 years ago
Mike Fährmann
b2ba2b821d
[hitomi] fix image URLs and improve metadata
...
- use '?a.hitomi.la' as subdomain depending in gallery-id
- add 'characters', 'tags' and 'date' information
- support multiple entires per metadata-value
- rename 'num' to 'page'
7 years ago
Mike Fährmann
3905474805
[booru] call update_page() with correct dict ( closes #82 )
7 years ago
Mike Fährmann
44c267e362
[artstation] add search extractor ( #80 )
7 years ago
Mike Fährmann
40ca562d7b
[artstation] add album extractor ( #80 )
7 years ago
Mike Fährmann
f367d5c281
[deviantart] move delay-increase after expect_error check
...
[ci skip]
7 years ago
Mike Fährmann
557cb94f81
[deviantart] use proper exponential backoff on API errors
...
... and use separate API credentials for unit tests.
7 years ago
Mike Fährmann
723cc66bb1
[artstation] add user-, image- and likes-extractors
7 years ago
Mike Fährmann
4d74749496
[tests] rework filters for extractor tests
...
CI incompatible tests will now only be skipped if tests are run in
a CI environment.
7 years ago
Mike Fährmann
d6ef52897c
[imgchili] remove module
...
All previously hosted images yield a 404
and the main page is just a logo.
7 years ago
Mike Fährmann
7847ab1d5a
[imagehosts] remove even more dead sites
...
All removed sites either
- reject all incoming connections or
- display a message from their domain registrar
7 years ago
Mike Fährmann
5f37d40a3e
[komikcast] bypass cloudflare challenge
7 years ago
Mike Fährmann
f9884e2338
[pixiv] update URL pattern
...
add support for 'https://www.pixiv.net/user/ <id>'
7 years ago
Mike Fährmann
85ed023c2e
[mangadex] remove the trailing ' - MangaDex' in a better way
...
str.rstrip() works differently than assumed.
7 years ago
Mike Fährmann
32bbd12f08
update extractor tests
7 years ago
Mike Fährmann
ca326bd275
[deviantart] fix folder and collection archive IDs
...
{folder[index]} and {collection[index]} are both '0' when being
delegated from Gallery- or FavoriteExtractors, as there is no
way of knowing a folder's index when getting folder-information
from the API.
7 years ago
Mike Fährmann
e32fe1cdf1
[pinterest] cast IDs to int
...
... and update test results.
Image URLs changed from
https://s-media-cache-ak0.pinimg.com/ ... to
https://i.pinimg.com/ ...
7 years ago
Mike Fährmann
179ecee965
[turboimagehost] fix extraction
7 years ago
Mike Fährmann
1400868f53
[mangadex] general improvements
...
- support >100 chapter entries per manga
- custom archive ID format
- detect non-existing chapters
7 years ago
Mike Fährmann
749fbbfa6c
[mangadex] add chapter- and manga-extractor
7 years ago
Mike Fährmann
6e38cf5aab
[mangareader] use 'https://'
...
The site now redirects from http://mangareader.net/
to https://mangareader.net/
7 years ago
Mike Fährmann
1d71123f91
[pixiv] update archive IDs and add metadata-fields
...
(Pixiv bookmarks actually have their own IDs, comments and tags,
independent of the bookmarked image, which makes creating an
archive ID a lot easier)
7 years ago
Mike Fährmann
858fdbdb22
[tumblr] improve 'inline' extraction
...
'quote' posts store their HTML content in the 'source' field
7 years ago
Mike Fährmann
5008e105ee
update archive IDs
...
... to behave in a more straightforward way when dealing with
bookmarks/favourites/etc.
specific IDs are now grouped by their owner, album-id, ... to
allow for duplicates when it would be expected.
7 years ago