Mike Fährmann
6ea9a78588
[wallhaven] add login capabilities
...
Being logged in is required to access NSFW wallpapers.
6 years ago
Mike Fährmann
c9290d8212
[wallhaven] add wallpaper and search extractors
...
todo:
- login support to gain access to NSFW wallpapers
- extractors for tag-, similar-, latest-listings
- skip() support
6 years ago
Mike Fährmann
26cbcb3a72
[flickr] improve error handling ( #109 )
6 years ago
Mike Fährmann
2be4c9ffe3
[sankaku] small code improvements
6 years ago
Mike Fährmann
529aa21dd9
move FileAdapter definition into recursive.py
6 years ago
Mike Fährmann
31a5c7c2c0
release version 1.5.3
6 years ago
Mike Fährmann
22ab509a70
[bobx] rename "model" to "idol" extractor
6 years ago
Mike Fährmann
99137f1bee
[sankaku] send login info as formdata
...
Previously they were erroneously send as URL parameters.
6 years ago
Mike Fährmann
fa64c38d5b
[sankaku] fix pagination for user favorites ( #106 )
6 years ago
Mike Fährmann
69fd61ea86
[bobx] add gallery and model extractors
6 years ago
Mike Fährmann
0232d80cec
[deviantart] convert 'published_time' to int ( fixes #108 )
...
The 'published_time' field (a timestamp) changed from integer to string
and caused journal creation to fail.
6 years ago
Mike Fährmann
7742cf8601
[tumblr] change 'reblogs' option ( #103 )
...
- rename "deleted" to "same-blog"
- change test for deleted original post to test if
original post owner has the same UUID (full blog name) as the one
being downloaded from
- add 'blog[uuid]' metadata to allow comparison with
'reblogged_from_uuid'
6 years ago
Mike Fährmann
d4d95d3154
[tumblr] improve rewrite rules for video URLs
6 years ago
Mike Fährmann
542a25c389
[ngomik] fix extraction
6 years ago
Mike Fährmann
a666ddd16b
[tumblr] extend 'reblogs' functionality ( #103 )
...
Setting 'reblogs' to "deleted" will check if the parent post of a
reblog has been deleted and download its media content if that is the
case, otherwise it will be skipped.
This is a rather costly operation (1 API request per reblogged post)
and should therefore be used with care.
6 years ago
Mike Fährmann
c9b8e6aefc
[reddit] fix submission-ID parsing ( #104 )
...
Uppercase characters caused a ValueError exception
6 years ago
Mike Fährmann
488abeca0b
[hentaicafe] adjust default directory format
...
A separate folder for each chapter is rather pointless if almost all
manga have only one chapter each.
6 years ago
Mike Fährmann
b4eca2633e
[tumblr] support /archive URLs
6 years ago
Mike Fährmann
aa1de70da0
[tumblr] recognize inline videos ( #102 )
6 years ago
Mike Fährmann
3ecea4cf36
[hentaicafe] add chapter and manga extractors ( #101 )
6 years ago
Mike Fährmann
41249f3ead
improve extractor.get_downloader()
6 years ago
Mike Fährmann
eb3185d6a3
update exception hierarchy
6 years ago
Mike Fährmann
e9ae6fd080
improve downloader/postprocessor module loading
...
- handle arguments of any type without propagating an exception
- prevent potential security risk through relative imports
6 years ago
Mike Fährmann
712b58a93b
[postprocessor] add black-/whitelist options
...
Each post-processor config dict now supports a list of extractor
categories for which it should/shouldn't be active for.
For example:
"postprocessors": [
{"name": "classify",
"whitelist": ["tumblr", "deviantart"],
...
}
]
6 years ago
Mike Fährmann
0bc8ef51c8
[smugmug] Handle albums with no explicit owner ( #100 )
6 years ago
Mike Fährmann
ff83ee22b0
release version 1.5.2
6 years ago
Mike Fährmann
b47af4637a
[mangadex] update URL pattern
...
Manga URLs now begin with /title/ instead of /manga/
6 years ago
Mike Fährmann
75862715ac
[behance] add user extractor
6 years ago
Mike Fährmann
a493fed376
[deviantart] fix journal creation if no 'username' is set
6 years ago
Mike Fährmann
6ecb36d88c
[postprocessor:ugoira] add 'ffmpeg-output' option
6 years ago
Mike Fährmann
02a4a67f6d
[postprocessor:ugoira] support danbooru sources
6 years ago
Mike Fährmann
5b8a314de7
[tumblr] replace inline URLs with higher quality ones ( #98 )
6 years ago
Mike Fährmann
2af2bb7911
[mangadex] fix relative page URLs
6 years ago
Mike Fährmann
590c0b3ad5
re-implement and improve filename formatter
...
A format string now gets parsed only once instead of re-parsing it each
time it is applied to a set of data.
The initial parsing causes directory path creation to be at about 2x
slower than before, since each format string there is used only once,
but building a filename, the more common operation, is at least 2x
faster. The "directory slowness" cancels at about 5 filenames and
everything above that is significantly faster.
6 years ago
Mike Fährmann
34b556922d
update/restore tests
6 years ago
Mike Fährmann
ab2bfaeb46
[ngomik] add replacement for 'subapics'
...
http://subapics.com/ got discontinued and replaced by http://ngomik.in/ .
ngomik.in is still displaying a link to the "old site" showing a big
"Account Suspended" sign.
6 years ago
Mike Fährmann
a2eeef1f5e
[behance] replace test
...
The "UVMW Studio" account and their galleries are gone.
6 years ago
Mike Fährmann
e9dd2eff1d
[twitter] add extractor for media-tweet timelines ( #96 )
...
For example "https://twitter.com/PicturesEarth/media ".
They are different from normal timelines in that they do not contain
any (re)tweets from other users and feature all media the user ever
posted, including responses to other tweets.
6 years ago
Mike Fährmann
f45c9f2141
[gfycat] test-updates and code-adjustments
6 years ago
Mike Fährmann
9b1c39032c
[twitter] changes and improvements
...
- rename User- to TimelineExtractor
- rename 'userid' to 'user_id' to conform to the other ..._id values
- adjust archive_fmt to deal with retweets
- emulate browser behavior for API calls
6 years ago
Mike Fährmann
10365394d7
[twitter] add support for user-timelines ( closes #96 )
...
also adds a 'retweets' option to filter retweeted content
6 years ago
Mike Fährmann
e3055d356c
release version 1.5.1
6 years ago
Mike Fährmann
d3f1eed2a6
[pinterest] improvements
...
- add stop condition for pin-related pins
- improve URL patterns
- make Pylint happy
6 years ago
Mike Fährmann
2801a0d997
[exhentai] skip "Content Warning" page when not logged in
...
(closes #97 )
6 years ago
Mike Fährmann
63fa0b2006
[pinterest] add extractors for related pins
...
Related pins can not be accessed by adding a "#related" fragment
to the end of a Pinterest URL, for example:
- https://www.pinterest.com/pin/858146903966145189/#related
- https://www.pinterest.com/g1952849/test-/#related
There are no explicit real URLs for related pins,
using an option to enable them results in "clunky" code,
and a custom "related:<URL>" scheme doesn't feel right either.
6 years ago
Mike Fährmann
1694039de0
[komikcast] update ad-filter
6 years ago
Mike Fährmann
a74591b84b
[tumblr] remove "original image" functionality
...
Accessing higher/original quality images on
https://s3.amazonaws.com/data.tumblr.com and http://data.tumblr.com
is no longer possible and any HTTP request results in 403 Forbidden.
A few images can still be accessed through https//a.tumblr.com [1][2],
but not as "_raw", just "_1280", and that might also be "fixed" in
the near future.
[1] https://a.tumblr.com/tumblr_kzjlfiTnfe1qz4rgho1_1280.jpg
[2] https://a.tumblr.com/ee589c6345f29d2d5935cecb49b0a705/tumblr_oztu02dIHp1wgha4yo1_1280.png
6 years ago
Mike Fährmann
38d4f43cc0
[komikcast] skip ads
6 years ago
Mike Fährmann
4313c95bc9
improve error message for OAuth2 authentication
6 years ago
Mike Fährmann
b55e39d1ee
[mangadex] improve extraction
...
- cache manga API results
- add artist, author and date fields to chapter metadata
- remove Manga-/ChapterExtractor inheritance
- minor code simplifications and improvements
6 years ago
Mike Fährmann
b1c4c1e13c
[mangadex] fix extraction
6 years ago
Mike Fährmann
3c90df6635
[piczel] add user, folder and image extractors
6 years ago
Mike Fährmann
2a9f3341a2
[behance] fix title extraction
6 years ago
Mike Fährmann
3fc2f269fa
[behance] filter 'fields' list
6 years ago
Mike Fährmann
b67339155f
[rule34] update test results
...
'metadata' tag type has been removed
6 years ago
Mike Fährmann
a86f2bfc80
[pinterest] update not-found redirects
6 years ago
Mike Fährmann
7442d2940c
release version 1.5.0
6 years ago
Mike Fährmann
b040ca0718
[rule34] small unit test fixes
6 years ago
Mike Fährmann
b164231bca
[sankaku] increase default values for 'wait-min/-max'
6 years ago
Mike Fährmann
68d6033a5d
use 'retries' and 'timeout' options for regular HTTP requests
6 years ago
Mike Fährmann
f3793660ef
update tests
6 years ago
Mike Fährmann
df082e923c
[behance] add gallery extractor ( #95 )
6 years ago
Mike Fährmann
c83fc62abc
prioritize archive over disk access ( #87 )
6 years ago
Mike Fährmann
e0dd8dff5f
implement L<maxlen>/<replacement>/ format option
...
The L option allows for the contents of a format field to be replaced
with <replacement> if its length is greater than <maxlen>.
Example:
{f:L5/too long/} -> "foo" (if "f" is "foo")
-> "too long" (if "f" is "foobar")
(#92 ) (#94 )
6 years ago
Mike Fährmann
5f27cfeff6
[deviantart] remove `prefer-public` option
...
All API requests now always use a public token and only switch to
a private token for pagination results if `refresh-token` is set
and less deviations than requested were returned.
6 years ago
Mike Fährmann
bb89a1e6d7
[mangahere] use http://
...
invalid SSL cert for quite some time now
6 years ago
Mike Fährmann
212130b048
[deviantart] improve public-private token switching
...
- rename option to `prefer-public`
- now also works for galleries with less than 24 items
6 years ago
Mike Fährmann
886d662582
[deviantart] add option to minimize refresh-token usage
...
Always trying with a public token first and repeating the API request
with a private token if deviations are missing doesn't quite work for
galleries and folders with less than 25 items, so its an option and
not the default.
6 years ago
Mike Fährmann
d98e47817d
[deviantart] reduce refresh-token usage
...
Instead of using a refresh-token-based access-token for every API
request, they are now only used for paginated results.
API requests to get a user's profile and the original download URL
now always use a public access-token.
6 years ago
Mike Fährmann
54a0d72dc8
[postprocessor:ugoira] improve frame rate handling
...
By default FFmpeg assumes a 25 FPS input frame rate, leading to dropped
frames if the source requires a higher frame rate than that.
This commit adds a `framerate` option (default "auto"), which allows to
automatically assign a (more or less) fitting frame rate based on
delays between ugoira frames and avoids dropped frames.
6 years ago
Mike Fährmann
84854fcad7
[myportfolio] add user and gallery extractors ( #95 )
6 years ago
Mike Fährmann
39f609b4c6
include current Git HEAD in debug output
6 years ago
Mike Fährmann
c9f70e0a19
[paheal] use HTTPS
6 years ago
Mike Fährmann
e8311eb1ed
drop Python 3.3 support
6 years ago
Mike Fährmann
ff436692bf
["deviantart] add 'journals' option
6 years ago
Mike Fährmann
00032b828c
[deviantart] add 'wait-min' option
6 years ago
Mike Fährmann
a6fe2bb594
[whatisthisimnotgoodwithcomputers] remove extractor
6 years ago
Mike Fährmann
0ba93650e0
[8chan] replace unit test URL
...
the other thread is no longer accessible
6 years ago
Mike Fährmann
8fe9056b16
implement string slicing for format strings
...
It is now possible to slice string (or list) values of format string
replacement fields with the same syntax as in regular Python code.
"{digits}" -> "0123456789"
"{digits[2:-2]}" -> "234567"
"{digits[:5]}" -> "01234"
The optional third parameter (step) has been left out to simplify things.
6 years ago
Mike Fährmann
269dc2bbd5
[sankaku] add 'tags' option ( #94 )
6 years ago
Mike Fährmann
173add6935
[nijie] fix artist_id extraction
...
view_popup.php pages for older images or dojins either have the
artist_id value at a different place or not at all.
6 years ago
Mike Fährmann
6996f5c118
[mangahere] fix and improve chapter extraction
6 years ago
Mike Fährmann
764331823b
release version 1.4.2
6 years ago
Mike Fährmann
1d43cbbf52
[gelbooru] tag-splitting for non-api mode
6 years ago
Mike Fährmann
2eefaa99a3
[mangapark] support .net and .com mirrors
6 years ago
Mike Fährmann
c20c0a4820
[safebooru] add pool extractor
6 years ago
Mike Fährmann
f916279ae6
[rule34] add pool extractor
6 years ago
Mike Fährmann
3dbc7c5f8d
[gelbooru] restore pool functionality
6 years ago
Mike Fährmann
a2c74bc6f0
[gelbooru] inherit from BooruExtractor class
...
Breaks pool functionality when using API calls (for now),
but reduces code clutter and enables the `tags` option.
6 years ago
Mike Fährmann
4a57509392
generalize tag-splitting option ( #92 )
...
- extend functionality to other booru sites:
- http://behoimi.org/
- https://konachan.com/
- https://e621.net/
- https://rule34.xxx/
- https://safebooru.org/
- https://yande.re/
6 years ago
Mike Fährmann
188e956c4e
[imagefap] use HTTPS + update test results
6 years ago
Mike Fährmann
87853538b4
[yandere] add option to split tags by type ( #92 )
6 years ago
Mike Fährmann
a699787d01
[deviantart] update URL patterns to new format
...
DeviantArt changed its URL format from
https://<name>.deviantart.com/...
to
https://www.deviantart.com/ <name>/...
With this change both formats will be supported.
6 years ago
Mike Fährmann
9e3415886c
[senmanga] fix/update tests
6 years ago
Mike Fährmann
973cf98e88
fix download skip for files without extension
6 years ago
Mike Fährmann
b8c97d2295
use 'extractor.request()' for more HTTP requests
6 years ago
Mike Fährmann
cc15c6105c
release version 1.4.1
6 years ago
Mike Fährmann
150a6b9064
[xvideos] fix metadata extraction
6 years ago
Mike Fährmann
7a98cc9798
[smugmug] update tests
...
My test account expired and all uploaded images got deleted.
6 years ago
Mike Fährmann
4eb94aca17
[postprocessor:ugoira] pass '-f' if not present
6 years ago
Mike Fährmann
0c1c4557dd
[postprocessor:ugoira] add option for two-pass encoding
6 years ago
Mike Fährmann
a9e276bc37
reset delete-flag
...
Since 'PathFormat' objects are being reused, setting `delete`
to True once caused all files downloaded after to be deleted as well.
6 years ago
Mike Fährmann
91340d9d27
[pixiv] fix ugoira test
6 years ago
Mike Fährmann
709c5d466d
add '--zip' and '--ugoira-conv' command-line options
6 years ago
Mike Fährmann
eb7a1f3b98
[pixiv] rework ugoira handling
...
Frame information now gets attached to the ZIP file's keyword dict
instead of being written to a separate text file.
6 years ago
Mike Fährmann
017188d268
improve extractor.request()
...
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
6 years ago
Mike Fährmann
613b692275
[postprocessor:ugoira] add a few options
...
- ffmpeg-location: path to the ffmpeg (or avconv) executable
- ffmpeg-args: additional command line args for ffmpeg
- extension: filename extension of the resulting video file
6 years ago
Mike Fährmann
a444755979
[postprocessor] add 'ugoira' to convert pixiv animations to webm
6 years ago
Mike Fährmann
f10bd5cdbe
[4chan] unescape filenames
6 years ago
Mike Fährmann
eec081dd3e
[postprocessor:zip] delete directory ( #85 )
6 years ago
Mike Fährmann
2d1a104739
[mangadex] unescape manga names and chapter titles
...
pretty sure I previously tested if unescaping strings from the
embedded JSON object was necessary ... maybe they changed it
6 years ago
Mike Fährmann
3bcce77f6d
release version 1.4.0
6 years ago
Mike Fährmann
6ac403c5d3
add postprocessor config example
6 years ago
Mike Fährmann
2403c405e3
Merge branch 'postprocessor'
6 years ago
Mike Fährmann
baccf8a958
improve postprocessor handling
...
- add pathfmt argument for __init__()
- add finalization step
- add option to keep or delete zipped files
6 years ago
Mike Fährmann
2628911ba0
[pp:exec] add 'async' option
6 years ago
Mike Fährmann
7646bdbcfd
improve postprocessor initialization code
6 years ago
Mike Fährmann
37d97ff02c
[pp:classify] use temppath
6 years ago
Mike Fährmann
97189e50cd
[pp:zip] use temppath; add options
6 years ago
Mike Fährmann
821535b458
adjust PathFormat class
6 years ago
Mike Fährmann
a47c6136cd
[simplyhentai] avoid redirects for all-pages.json ( #89 )
6 years ago
Mike Fährmann
ad14de19c6
[imgur] support "unmuted" URLs
6 years ago
Mike Fährmann
72e66f0aac
[simplyhentai] improve URL pattern
...
[ci skip]
6 years ago
Mike Fährmann
cdcc3427a0
[simplyhentai] add video extractor ( #89 )
...
All videos hosted on their own servers seem be to dead,
but myhentai.tv embeds, which are most of the videos, work fine.
6 years ago
Mike Fährmann
f9a6a19658
[simplyhentai] add image extractor ( #89 )
6 years ago
Mike Fährmann
ebf596b399
[pawoo] restore metadata fields + smaller improvements
6 years ago
Mike Fährmann
f7e7306e5a
[komikcast] update URL pattern and unescape image URLs
6 years ago
Mike Fährmann
70f3617d88
[mangafox] fix URL extraction
6 years ago
Mike Fährmann
a62bd81e9b
[pixiv] fix filter for 'type=all'
6 years ago
Mike Fährmann
12797e3b1f
update configuration.rst
...
... again
- some more 'Path' references
- fixed some inconsistencies and errors
- added note about logging config for files
6 years ago
Mike Fährmann
55b0913412
[simplyhentai] add gallery extractor ( #89 )
6 years ago
Mike Fährmann
ae9a37a528
implement text.split_html()
6 years ago
Mike Fährmann
b08d95ebe4
add an 'encoding' option for logging files (default 'utf-8')
6 years ago
Mike Fährmann
513d807632
explicitly open config files as utf-8
6 years ago
Mike Fährmann
2df1a15fb8
add '-s/--simulate' to run data extraction without download
...
Useful for quick testing (even though -g and -j kind of do the same)
and to fill a download archive without actually downloading the files.
-s does the same as the default behaviour, except downloading stuff.
Maybe it should get a more fitting name, as it does actually write to
disk (cache, archive)?
6 years ago
Mike Fährmann
15cce22d82
[mangadex] fix parsing of unusual chapter strings
6 years ago
Mike Fährmann
ecdc3475b8
[pixhost] support .to TLDs
6 years ago
Mike Fährmann
f3d770d4e2
Merge branch '1.4-dev'
6 years ago
Mike Fährmann
d0ae3ed52c
[postprocessor] add 'zip' to write files to a ZIP archive
...
(#85 )
6 years ago
Mike Fährmann
ca4008e1c1
[postprocessor] add 'classify' to sort downloads by fileext
6 years ago
Mike Fährmann
d378c0a323
[postprocessor] add 'exec' to execute user-defined processes
6 years ago
Mike Fährmann
76c32d58e5
[postprocessor] initial code
6 years ago
Mike Fährmann
1ff626db97
[pixiv] improve bookmark extraction
...
- combine 'favorite' and 'bookmark' extractors
- it is now one extractor class, but its subcategory still
distinguishes between your own bookmarks ('bookmark') and other
user's bookmarks ('favorite') like before
- allow filtering by bookmark tags and public/private bookmarks
- fix pagination for bookmark results
6 years ago
Mike Fährmann
0a1863fce3
[pixiv] respect more query parameters for user URLs
...
The API endpoint responsible for user illustrations does not
provide sufficient filter capabilities* to match the actual
website, so we are spinning our own filters.
Respected parameters are
'type': illust, manga, ugoira
'tag' : any image tag (this was already supported)
'p' : the page to start on
*
- API can filter for illustrations and manga, but not for ugoira.
- 'offset' is applied before filtering
- no 'tag' filter
6 years ago
Mike Fährmann
f43d446692
[mangahere] extract chapter titles
6 years ago
Mike Fährmann
b8e53b8c6b
[pixiv] move query parsing out of constructor
...
better exception handling, among other things
6 years ago
Mike Fährmann
909d105ae6
[pixiv] add extractor for illusts from followed users
6 years ago
Mike Fährmann
7f899bd5d8
Merge branch 'master' into 1.4-dev
6 years ago
Mike Fährmann
fe69d01083
[pixiv] add extractor for search results
6 years ago
Mike Fährmann
247f785af1
[pixiv] use App API
...
Transitioning to the App API breaks favorites archive IDs (there is
no longer any bookmark ID information), but the favorites API endpoint
of the public API was gone anyways ...
6 years ago
Mike Fährmann
92fc199b07
[reddit] allow arbitrary subdomains
6 years ago
Mike Fährmann
4cea886177
[imgur] allow longer album hashes
6 years ago
Mike Fährmann
e1e23165a0
[pinterest] catch JSON decode errors
6 years ago
Mike Fährmann
789608c107
[imagebam] fix extraction for certain galleries
6 years ago
Mike Fährmann
7a58151566
fix util.parse_bytes invocations
...
(should be text.parse_bytes)
6 years ago
Mike Fährmann
1c1e086d01
use common base class for OAuth1.0 based API interfaces
6 years ago
Mike Fährmann
f3483a2b7c
[smugmug] add OAuth support
6 years ago
Mike Fährmann
6a31ada9e3
re-implement OAuth1.0 code
...
OAuth support for SmugMug needs some additional features
(auth-rebuild on redirect, query parameters in URL, ...)
and fixing this in the old code wouldn't work all that well.
6 years ago
Mike Fährmann
ec158776ed
[deviantart] add extractor for popular listings
6 years ago
Mike Fährmann
0e3883303f
[pixiv] implement AppAPI wrapper
6 years ago
Mike Fährmann
e2157f594e
[mangadex] fix manga extraction ( closes #84 )
...
Chapter listings for manga now use
https://mangadex.org/manga/ <id>/_/chapters/2/
as URL instead of
https://mangadex.org/manga/ <id>/_//2/
6 years ago
Mike Fährmann
69a5e6ddb3
Merge branch 'master' into 1.4-dev
6 years ago
Mike Fährmann
82c50fa609
release version 1.3.5
6 years ago
Mike Fährmann
3ce5296313
[smugmug] code cleanup
...
- combine User and Node extractors
- (re)move miscellaneous helper functions
- rename "Owner" to "User"
6 years ago
Mike Fährmann
42ed7667b8
[smugmug] support user- and general album URLs
6 years ago
Mike Fährmann
8bf3cdd82b
implement logging options
...
Standard logging to stderr, logfiles, and unsupported URL files (which
are now handled through the logging module) can now be configured by
setting their respective option keys (log, logfile, unsupportedfile)
to a dict and specifying the following options;
- format:
format string for logging messages
available keys: see [1]
default: "[{name}][{levelname}] {message}"
- format-date:
format string for {asctime} fields in logging messages
available keys: see [2]
default: "%Y-%m-%d %H:%M:%S"
- level:
the lowercase levelname until which the logger should activate;
available levels are debug, info, warning, error, exception
default: "info"
- path:
path of the file to be written to
- mode:
'mode' argument when opening the specified file
can be either "w" to truncate the file or "a" to append to it (see [3])
If 'output.log', '.logfile', or '.unsupportedfile' is a string, it will
be interpreted, as it has been, as the filepath
(or as format string for .log)
[1] https://docs.python.org/3/library/logging.html#logrecord-attributes
[2] https://docs.python.org/3/library/time.html#time.strftime
[3] https://docs.python.org/3/library/functions.html#open
6 years ago
Mike Fährmann
2ea0d1da42
[smugmug] improve API code; use data expansions
6 years ago
Mike Fährmann
16e014baaa
[smugmug] added image and album extractor
...
just some initial code that still requires a lot of work ...
TODO:
- folders
- old-style albums (which are nearly all of them ...)
- images from users
- OAuth
It could also happen that the API credentials used will become invalid
whenever my 14 day trial period ends (7 days remaining), but that
would just require users to supply their own.
6 years ago
Mike Fährmann
d96b3474e5
[puremashiro] remove module
...
site has been unreachable for a couple of weeks
and now the DNS record is gone as well
6 years ago
Mike Fährmann
b44a296404
[gomanga] remove module
...
site has been unreachable for a couple of weeks
and the cloudflare status page shows host errors
6 years ago
Mike Fährmann
95392554ee
use text.urljoin()
6 years ago
Mike Fährmann
2395d870dd
[pinterest] unquote board and user names, better errors
6 years ago
Mike Fährmann
8b79eaafea
[tumblr] log actual time of rate limit resets
...
... instead of the amount of seconds until a reset
7 years ago
Mike Fährmann
0f1e07f627
[pinterest] scrap OAuth implementation; code improvements
...
OAuth authentication isn't needed anymore and other tools
like Postman are better suited for this job anyway.
7 years ago
Mike Fährmann
55d4d23860
[pinterest] use Pinterest's "Web" API ( #83 )
...
no access tokens, no user credentials of any kind ...
7 years ago
Mike Fährmann
2721417dd8
Merge branch 'master' into 1.4-dev
7 years ago
Mike Fährmann
c6d5154fc3
fix flake8 errors, ignore W504
...
pycodestyle 2.4.0 enforces some new style guidelines
7 years ago
Mike Fährmann
2d17a9e07f
improve extractor.request()
...
- better retry behavior
- exponential back-off
- removed 'allow_empty' argument
7 years ago
Mike Fährmann
80521ae1f6
[deviantart] improve API error handling
...
The previous implementation would retry requests with 4xx status codes
in an infinite loop, which is especially a problem when querying
non-existent users or groups. These are now properly handled with a
NotFoundError exception.
7 years ago
Mike Fährmann
e54b43be08
[mangadex] add title info for chapter extractors
7 years ago
Mike Fährmann
f471161920
Merge branch 'master' into 1.4-dev
7 years ago
Mike Fährmann
a2020c736e
release version 1.3.4
7 years ago
Mike Fährmann
eb37fbf0e8
[hentaifoundry] improve extractor
...
- use common base class
- better pagination
- respect '.../page/<num>'
- implement skip() / --range support
- get YII_CSRF_TOKEN from cookies
7 years ago
Mike Fährmann
80bead739d
[oauth] require custom client-* values for pinterest
7 years ago
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
7 years ago
Mike Fährmann
ff643793bd
improve and document cloudflare bypass code
7 years ago
Mike Fährmann
10cc59f3b5
fix extractor names
7 years ago
Mike Fährmann
b1325d4d2c
fix extractor docstrings
7 years ago
Mike Fährmann
df7e18399e
[luscious] fix image order
7 years ago
Mike Fährmann
d10579edb5
[pinterest] improve PinterestAPI code; remove OAuth mentions
...
on another note: access_tokens have been set to only allow for
10 requests per hour (from 200 yesterday)
7 years ago
Mike Fährmann
4bd182c107
[pinterest] implement `oauth:pinterest` ( #83 )
...
Pinterest access tokens are rate limited at 200 requests per
hour (or maybe per 2 or 3 hours?) so having just one access token
for all users isn't going to work in the long run.
7 years ago
Mike Fährmann
9651f3fce0
[pinterest] improve error messages ( #83 )
7 years ago
Mike Fährmann
dbe250f7e5
[pinterest] update access_token ( #83 )
7 years ago
Mike Fährmann
dd49127408
[spectrumnexus] remove module
...
Site stopped hosting manga scans (http://view.thespectrum.net/ )
7 years ago
Mike Fährmann
5c487300ee
improve 'parse_query()' and add tests
...
- another irrelevant micro-optimization !
- use urllib.parse.parse_qsl directly instead of parse_qs, which
just packs the results of parse_qsl in a different data structure
- reduced memory requirements since no additional dict and lists are
created
7 years ago
Mike Fährmann
728c64a3fb
[tumblr] rename 'offset' to 'num and adjust formats
...
Trying to somehow emulate Tumblr filenames is a bad idea ...
7 years ago
Mike Fährmann
4ffa94f634
remove 'shorten_path()' and 'shorten_filename()'
7 years ago
Mike Fährmann
27eab4e467
rewrite text tests and improve functions
...
- test more edge cases
- consistently return an empty string for invalid arguments
- remove the ungreedy-flag in 'remove_html()'
7 years ago
Mike Fährmann
e3f2bd4087
add tests for 'text.clean_xml()' and improve it
7 years ago
Mike Fährmann
6d8b191ea7
improve 'parse_query()' and add tests
...
- another irrelevant micro-optimization !
- use urllib.parse.parse_qsl directly instead of parse_qs, which
just packs the results of parse_qsl in a different data structure
- reduced memory requirements since no additional dict and lists are
created
7 years ago
Mike Fährmann
51ea699083
add 'abort()' as function to filter expressions
...
calling 'abort()' in a filter aborts the current extractor run
in a cleaner way than using something like 1/0, which
causes an error message to be printed
7 years ago
Mike Fährmann
6bd857a319
[tumblr] handle rate limits / 429 errors
...
- wait for the hourly limit to reset
- abort upon exceeding the daily limit (it doesn't seem useful to
potentially wait for several hours)
7 years ago
Mike Fährmann
7073ab7707
[komikcast] update regex to only match manga pages
...
The 'readerarea' section now includes some (shady) external
Javascript file, which got matched as well.
7 years ago
Mike Fährmann
a1fa4b43b0
Revert "[tumblr] add option to sort photosets by upload order"
...
This reverts commit 4a26ae32df
.
7 years ago
Mike Fährmann
48a83a89e9
[loveisover] remove module
...
archive.loveisover.me was shut down on 2018-03-29;
https://www.archiveteam.org/index.php?title=4chan#archive.loveisover.me
7 years ago
Mike Fährmann
564e12ca8f
replace 'imgyt' with 'imxto'
...
https://img.yt/ wasn't available for a couple of days, but has now
re-emerged as https://imx.to/ with a new web-interface.
Links to older images still work (see tests).
7 years ago
Mike Fährmann
1b80fa82a9
[imgur] update URL pattern and tests
7 years ago
Mike Fährmann
4a26ae32df
[tumblr] add option to sort photosets by upload order
7 years ago
Mike Fährmann
6b72be8ee6
[tumblr] add 'hash' keyword
...
'hash' is the middle part of the filename in a tumblr image URL.
For example an image with '.../tumblr_p6tgemp1NZ1wgha4yo1_250.png' as
its URL would have 'p6tgemp1NZ1wgha4yo1' as hash.
7 years ago
Mike Fährmann
ffc0c67701
release version 1.3.3
7 years ago
Mike Fährmann
d11fcf4804
smaller changes and fixes
...
- fix the cloudflare challenge result if the last decimal places
are zero (JS`s toFixed() removes trailing zeroes)
- fix downloading of kissmanga chapter-pages hosted on blogspot
(accessing blogspot with "kissmanga.com" as referrer yields a 401)
- disable certificate validation for 'mangahere' tests
- update flickr test result
7 years ago
Mike Fährmann
f6c95dccf9
[cloudflare] fix bypass procedure
...
Cloudflare challenges, at least for kissmanga and readcomiconline,
now use slightly different Javascript expressions.
Instead of a single value per expression, they now have a numerator
and a denominator of a fractional value, which in the end gets
truncated to 10 decimal places.
7 years ago
Mike Fährmann
759ba26fb0
[luscious] proper image order for picture albums
...
... and (try) to start with the first image instead of somewhere
in the middle of an album.
7 years ago
Mike Fährmann
68e9fbee16
[tumblr] check all 4 keys/secrets before using OAuth
...
it was possible to cause a crash by setting api-key or -secret to null.
(this commit also slightly improves the blog-cache implementation)
7 years ago
Mike Fährmann
4810d446bb
remove the obsolete safeprint() and error() functions
...
- safeprint() was used to print values which might have caused a
UnicodeEncodeError, but that is no longer necessary (0381ae5
)
- errors are now handled via logging output (f94e370
)
7 years ago
Mike Fährmann
0381ae5318
replace error handlers for stdout and co.
...
Python3.5 and lower throw an UnicodeEncodeError when trying to print
not-encodable characters when not using 'utf-8' as encoding.
Setting their error handlers to 'replace' should help.
7 years ago
Mike Fährmann
f8168c693e
[tumblr] avoid calls to '/blog/.../info'
...
The same information returned by the 'blog/.../info' API endpoint
is also included in the result of every 'blog/.../posts' call.
7 years ago
Mike Fährmann
64d7c85b55
[exhentai] improve metadata
...
- add 'width', 'height' and 'size' (in bytes) for each image
- change the former 'size' and 'size_units' into 'gallery_size'
7 years ago
Mike Fährmann
64b22e0fc1
[pawoo] update URL pattern
...
adds support for 'https://pawoo.net/@.../media '
7 years ago
Mike Fährmann
7b562907c3
[nijie] add favorites extractor
...
adds support for 'https://nijie.info/user_like_illust_view.php?id= ...'
7 years ago
Mike Fährmann
445db75955
[nijie] improve extraction and metadata
...
- add 'title' and 'description'
- split 'artist_id' into 'user_id' and 'artist_id'
- 'user_id' is the ID of the user from which the image entry
originates from
- 'artist_id' is the ID of the actual image artist
- improve pagination and URL patterns
7 years ago
Mike Fährmann
a112e3f2a0
[nijie] add doujin extractor
...
adds support for "https://nijie.info/members_dojin.php?id= <artist_id>"
7 years ago
Mike Fährmann
f39153b6e9
[nhentai] add extractor for search results
7 years ago
Mike Fährmann
52d41c41e7
[exhentai] add extractor for favorited galleries
7 years ago
Mike Fährmann
63cc2599c4
[exhentai] add extractor for search results
7 years ago
Mike Fährmann
d1c91a1f2b
[mangadex] fix manga-page extraction
7 years ago
Mike Fährmann
299ae24996
[test] add a few downloader tests
7 years ago
Mike Fährmann
dd314279fb
[test] add unit tests for extractor module functions
7 years ago
Mike Fährmann
a993d0ea90
release version 1.3.2
7 years ago
Mike Fährmann
e7525b1b0e
[artstation] add challenge extractor ( #80 )
7 years ago
Mike Fährmann
3f2dd6b6f8
avoid double path-separators
...
(#74 )
7 years ago
Mike Fährmann
f5c6a2d7f5
[nhentai] use API to get gallery info
7 years ago
Mike Fährmann
b2ba2b821d
[hitomi] fix image URLs and improve metadata
...
- use '?a.hitomi.la' as subdomain depending in gallery-id
- add 'characters', 'tags' and 'date' information
- support multiple entires per metadata-value
- rename 'num' to 'page'
7 years ago
Mike Fährmann
3905474805
[booru] call update_page() with correct dict ( closes #82 )
7 years ago
Mike Fährmann
44c267e362
[artstation] add search extractor ( #80 )
7 years ago
Mike Fährmann
40ca562d7b
[artstation] add album extractor ( #80 )
7 years ago
Mike Fährmann
7121eeae8b
check supportedsites.rst in release script
7 years ago
Mike Fährmann
c59f9b71f1
release version 1.3.1
7 years ago
Mike Fährmann
f367d5c281
[deviantart] move delay-increase after expect_error check
...
[ci skip]
7 years ago
Mike Fährmann
557cb94f81
[deviantart] use proper exponential backoff on API errors
...
... and use separate API credentials for unit tests.
7 years ago
Mike Fährmann
723cc66bb1
[artstation] add user-, image- and likes-extractors
7 years ago
Mike Fährmann
b69cc94f0e
[util] implement bencode()
7 years ago
Mike Fährmann
4d74749496
[tests] rework filters for extractor tests
...
CI incompatible tests will now only be skipped if tests are run in
a CI environment.
7 years ago
Mike Fährmann
d6ef52897c
[imgchili] remove module
...
All previously hosted images yield a 404
and the main page is just a logo.
7 years ago
Mike Fährmann
7847ab1d5a
[imagehosts] remove even more dead sites
...
All removed sites either
- reject all incoming connections or
- display a message from their domain registrar
7 years ago
Mike Fährmann
5f37d40a3e
[komikcast] bypass cloudflare challenge
7 years ago
Mike Fährmann
f9884e2338
[pixiv] update URL pattern
...
add support for 'https://www.pixiv.net/user/ <id>'
7 years ago
Mike Fährmann
85ed023c2e
[mangadex] remove the trailing ' - MangaDex' in a better way
...
str.rstrip() works differently than assumed.
7 years ago
Mike Fährmann
9fb82e6b43
apply expand_path() to archive paths
7 years ago
Mike Fährmann
32bbd12f08
update extractor tests
7 years ago
Mike Fährmann
ca326bd275
[deviantart] fix folder and collection archive IDs
...
{folder[index]} and {collection[index]} are both '0' when being
delegated from Gallery- or FavoriteExtractors, as there is no
way of knowing a folder's index when getting folder-information
from the API.
7 years ago
Mike Fährmann
e32fe1cdf1
[pinterest] cast IDs to int
...
... and update test results.
Image URLs changed from
https://s-media-cache-ak0.pinimg.com/ ... to
https://i.pinimg.com/ ...
7 years ago
Mike Fährmann
179ecee965
[turboimagehost] fix extraction
7 years ago
Mike Fährmann
1400868f53
[mangadex] general improvements
...
- support >100 chapter entries per manga
- custom archive ID format
- detect non-existing chapters
7 years ago
Mike Fährmann
749fbbfa6c
[mangadex] add chapter- and manga-extractor
7 years ago
Mike Fährmann
b58449fd88
release version 1.3.0
7 years ago
Mike Fährmann
6e38cf5aab
[mangareader] use 'https://'
...
The site now redirects from http://mangareader.net/
to https://mangareader.net/
7 years ago
Mike Fährmann
1d71123f91
[pixiv] update archive IDs and add metadata-fields
...
(Pixiv bookmarks actually have their own IDs, comments and tags,
independent of the bookmarked image, which makes creating an
archive ID a lot easier)
7 years ago
Mike Fährmann
858fdbdb22
[tumblr] improve 'inline' extraction
...
'quote' posts store their HTML content in the 'source' field
7 years ago
Mike Fährmann
1d54a8e07d
fix logging output during downloads
...
from:
filename.ext[download][warning] ...
to:
filename.ext
[download][warning] ...
7 years ago
Mike Fährmann
5008e105ee
update archive IDs
...
... to behave in a more straightforward way when dealing with
bookmarks/favourites/etc.
specific IDs are now grouped by their owner, album-id, ... to
allow for duplicates when it would be expected.
7 years ago
Mike Fährmann
829ddf4ac1
[sankaku] general improvements
...
- simplify regex
- unquote search tags
- increase default wait-time between HTTP requests
- downloading several hundreds of images always resulted
in '429 Too Many Requests' eventually
- circumvent paging restrictions for unauthenticated users by only
using the 'next' parameter
- setting 'page' to a constant, low value (or simply omitting it)
does the trick
7 years ago
Jad
49463f76bb
support multi-page URL ( #79 )
...
* support multi-page URL
* fix
* all done.
* fix, again
7 years ago
Mike Fährmann
19aefdfde3
[directlink] update test results
7 years ago
Mike Fährmann
74029c50bb
[directlink] unquote metadata fields
7 years ago
Mike Fährmann
2fad0b1f1b
add 'U' conversion for format strings to unquote their content
...
(#74 )
7 years ago
Mike Fährmann
8cdce21dcb
make archive keys user-configurable
7 years ago
Mike Fährmann
8f338347b6
[imagehosts] cleanup
...
removed
- chronos.to - unable to resolve hostname
- coreimg.net - same
- imgmaid.net - same
- hosturimage.com - everything returns 404
- imageontime.org - redirects to some shady site
- imgupload.yt - cloudflare error 522, host down
- img4ever.net - read timeout
7 years ago
Mike Fährmann
edfd3d9fc9
[yeet] remove module
...
- archive.yeet.net returns a 500 server error
- yeet.net moved to yeet.rip, but the archive is gone
7 years ago
Mike Fährmann
e1e0668ca8
add option to set default replacement field value
...
Missing or undefined keywords will now be replaced with the value
set for 'keywords-default'. The default is Python's 'None', which
is equivalent to setting this option to JSON's 'null'.
7 years ago
Mike Fährmann
ac3da8115e
[util] don't add text: URLs to list of downloaded URLs
7 years ago
Mike Fährmann
8704d850bf
add explicit proxy support ( #76 )
...
- '--proxy' as command-line argument
- 'extractor.*.proxy' as config option
7 years ago
Mike Fährmann
367b963d37
[pixiv] fix ugoira extraction ... again ( #78 )
...
Some animations are not available for mobile devices, so we
pretend to be a desktop browser when requesting the ugoira page.
7 years ago
Mike Fährmann
b79f1f2ca7
[pixiv] fix ugoira extraction ( closes #78 )
7 years ago
Mike Fährmann
731ffd4986
improve text.filename_from_url() performance
...
- urlsplit() is faster than urlparse()
- rpartition() is faster than rindex() + slicing
- new version is 2.3 times as fast
7 years ago
Mike Fährmann
d122203be1
[mangastream] fix extraction
7 years ago
Mike Fährmann
8809b32aed
release version 1.2.0
7 years ago
Mike Fährmann
b50bdbf3d7
change config specifiers in input file format
...
Instead of a dictionary/object, input file options are now specified
by a 'key=value' pair starting with '-' for options only applying to
the next URL or '-G' for Global options applying to all following URLs.
See the docstring of parse_inputfile() for details.
Example option specifiers:
- filename = "{id}.{extension}"
- extractor.pixiv.user.directory = ["Pixiv Users", "{user[id]}"]
-spaces="are_optional"
-G keywords = {"global": "option"}
7 years ago
Mike Fährmann
f970a8f13c
fix adding keys to download archive when using skip=false
7 years ago
Mike Fährmann
179bcdd349
adjust archive-ids
7 years ago
Mike Fährmann
be3ea4425d
test archive-id creation and uniqueness
7 years ago
Mike Fährmann
3cec533c28
Merge branch 'archive'
7 years ago
Mike Fährmann
20af86b2ea
add more extractor tests
...
for mangastream, reddit and imgur
7 years ago
Mike Fährmann
b73b8b4f50
add OAuth unittests
7 years ago
Mike Fährmann
4d2fadfb6f
restore skip actions with download archive
7 years ago
Mike Fährmann
65773263fc
[util] implement OAuthSession.urlencode() ( closes #75 )
...
- Python's own urllib.parse.urlencode() has no quote_via argument in
Python 3.3 and 3.4, which is necessary to follow OAuth 1.0 quoting
rules.
7 years ago
Mike Fährmann
7e0207bcf4
[imgur] strip trailing '?1' from 'ext'
7 years ago
Mike Fährmann
cf147dfee9
[hentai2read] fix manga extraction
...
- site changed its HTML structure
7 years ago
Mike Fährmann
f5f2d29f56
[nijie] fix dojin extraction
...
- correctly extract artist_id
- set extension to "jpg" if it was empty and let filetype checks do
the rest
7 years ago
Mike Fährmann
7f7c16ae37
add option to specify additional key-value pairs
7 years ago
Mike Fährmann
d38bf2f54c
[tumblr] recognize /image/... URLs
...
xyz.tumblr.com/image/123 refers to the same images
as xyz.tumblr.com/post/123.
7 years ago
Mike Fährmann
057668e17e
extend input-file format with per-URL config and comments
...
- see docstring of parse_inputfile() for details
- TODO: unittests, recursion (currently setting for example
{"extractor": {"key": "value"}} will override the whole "extractor"
branch instead of merging {"key": "value"} into the already existing
dictionary)
7 years ago
Mike Fährmann
5b3c34aa96
use generic chapter-extractor in more modules
7 years ago
Mike Fährmann
347baf7ac5
improve util.parse_range() performance
...
It is never going to actually matter, but using partition() instead
of split() is twice as fast.
7 years ago
Mike Fährmann
7b5ba69951
[hentaihere] ensure consistent extraction results
...
sometimes there is a random space before the next <a>
7 years ago
Mike Fährmann
377b78b3c9
[hentai2read] fix manga name extraction
7 years ago
Mike Fährmann
54c36a8a34
[subapics] add chapter- and manga-extractor ( #70 )
7 years ago
Mike Fährmann
2dd3aeeeae
[komikcast] add chapter- and manga-extractor ( #70 )
7 years ago
Mike Fährmann
7a412f5c32
implement generic manga-chapter extractor
7 years ago
Mike Fährmann
aa38eab2be
allow not-defined fields in format strings
...
... and replace them with "None", for now
7 years ago
Mike Fährmann
6a07e38366
implement extractor.add() and .add_module()
...
... as a public and non-hacky way to add (external) extractors to
gallery-dl's pool and make them available for extractor.find()
7 years ago
Mike Fährmann
c0dd922c13
add '--download-archive' cmdline option
...
… as well as a config file equivalent
7 years ago
Mike Fährmann
8c3b713362
rework DownloadJob.handle_url(); include archive functionality
...
todo:
"abort" and "exit" skip modes if download is skipped because of archive
7 years ago
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
7 years ago
Mike Fährmann
a34cebc253
[luscious] jump to first image if cover does not link to it
7 years ago
Mike Fährmann
84a52a9256
add DownloadArchive class
7 years ago
Mike Fährmann
915807dd77
log HTTP errors as warnings
7 years ago
Mike Fährmann
db7f04dd97
emit log messages on download failure
...
and when retrying with fallback URLs
7 years ago
Mike Fährmann
d951f13e37
add config option for unsupported-URL file
...
for consistency's sake
7 years ago
Mike Fährmann
619387cbb1
update extractor unittest results
7 years ago
Mike Fährmann
364e335440
smaller adjustments and improvements
...
- requests and urllib3 version on 1 line
- close input file after reading from it
- use expand_path for unsupported-urls file
- remove unnecessary logging from options.py
7 years ago
Mike Fährmann
c9a9664a65
change --write-log behaviour
...
- log files now get truncated when opening them
(mode "w" instead of "a")
- log verbosity to file depends on -q/-v
(same as logging to stderr)
7 years ago
Mike Fährmann
97f4f15ec0
add option to write logging output to a file
...
- '--write-log FILE' as cmdline argument
- 'output.logfile' as config file option
7 years ago
Mike Fährmann
f94e3706a8
use logging module for error messages during downloads
7 years ago
Mike Fährmann
db91cf871c
document message identifiers
7 years ago
Mike Fährmann
0dd48d644f
update test results
...
nothing broke, but things got updated or changed
7 years ago
Mike Fährmann
1e93955170
[batoto] remove module
...
Site officially shut down on 2018.01.18
7 years ago
Mike Fährmann
27fce6f600
fix UrlJob behavior
7 years ago
Mike Fährmann
76509a6d3c
[imgur] update test results
7 years ago
Mike Fährmann
9fccd7b783
[tumblr] provide fallback URLs ( #64 )
...
Each image now produces 3 URLs:
- amazonaws.com _raw (or _1280 for older images)
- amazonaws.com _500
- media.tumblr.com (URL returned by API)
7 years ago
Mike Fährmann
b837420291
fix minor urllist issues
7 years ago
Mike Fährmann
9d69401391
initial support for multiple URLs per image
7 years ago
Mike Fährmann
6174a5c4ef
[download] adjust filename extension on filetype mismatch
...
(closes #63 )
7 years ago
Mike Fährmann
91ed147cef
[oauth] use custom key/secret values during oauth:…
7 years ago
Mike Fährmann
421a9740a3
[tumblr] add 'tumblr:' to force Tumblr extractor ( #71 )
7 years ago
Mike Fährmann
40d35c87bc
[paheal] add tag- and post-extractors ( closes #69 )
7 years ago
Mike Fährmann
cc0c2cca57
[reddit] add extractor for reddit-hosted images ( closes #68 )
7 years ago
Mike Fährmann
f10ffc0839
update extractor blacklist to also allow classes
7 years ago
Mike Fährmann
b6797032e3
release version 1.1.2
7 years ago
Mike Fährmann
35e09869d1
[mangapark] fix image URLs and use HTTPS
7 years ago
Mike Fährmann
9a049bdf51
[tumblr] add 'likes' extractor ( #65 )
7 years ago
Mike Fährmann
67d4462d26
[batoto] rudimentary Cloudflare bypass
7 years ago
Mike Fährmann
29d75fc3fa
[tumblr] add support for OAuth authentication ( #65 )
7 years ago
Mike Fährmann
4edb25346e
[slideshare] support mobile URLs ( closes #67 )
7 years ago
Mike Fährmann
e420a28bbc
fix cookie tests
7 years ago
Mike Fährmann
b33efc99a4
[idolcomplex] add support for idol.sankakucomplex.com
7 years ago
Mike Fährmann
75b2e84b6d
[tumblr] use s3.amazonaws.com for image URLs ( #64 )
7 years ago
Mike Fährmann
5b094328b5
[puremashiro] add chapter- and manga-extractor ( closes #66 )
...
Also adds support for region subtags in language codes (e.g. en-us)
7 years ago
Mike Fährmann
974e73bdbb
[booru] smaller code adjustments
7 years ago
Mike Fährmann
03b8a548cb
[tumblr] change `reblogs` default value to `true` ( #61 )
7 years ago
Mike Fährmann
d235f68f59
[tumblr] add option to filter reblogged posts ( #61 )
...
Reblogs are ignored by default, but can be included by setting
'extractor.tumblr.reblogs' to 'true'.
7 years ago
Mike Fährmann
a794fffc6d
[batoto] extend chapter-string regex ( closes #60 )
...
Non-numeric chapter indices exist after all ...
7 years ago
Mike Fährmann
1219ebb7f5
[danbooru] use alternate subdomains; support safebooru
7 years ago
Mike Fährmann
9e8a84ab6c
[booru] rewrite using Mixin classes ( #59 )
...
- improved code structure
- improved URL patterns
- better pagination to work around page limits on
- Danbooru
- e621
- 3dbooru
7 years ago
Mike Fährmann
0876541e43
[seiga] update tests
7 years ago
Mike Fährmann
1a70857a12
update extractor-unittest capabilities
...
- "count" can now be a string defining a comparison in the form of
'<operator> <value>', for example: '> 12' or '!= 1'. If its value
is not a string, it is assumed to be a concrete integer as before.
- "keyword" can now be a dictionary defining tests for individual keys.
These tests can either be a type, a concrete value or a regex
starting with "re:". Dictionaries can be stacked inside each other.
Optional keys can be indicated with a "?" before its name.
For example:
"keyword:" {
"image_id": int,
"gallery_id", 123,
"name": "re:pattern",
"user": {
"id": 321,
},
"?optional": None,
}
7 years ago
Mike Fährmann
88bb0798fd
delay initialization of PathFormat objects
...
This allows the DeviantArt group-check to be moved inside the
Extractor.items() method which in turn allows for better exception
handling.
As a new general rule:
Never raise exceptions during extractor initialization.
7 years ago
Mike Fährmann
c24e0e70a7
[pixiv] simplify main loop
7 years ago
Mike Fährmann
c1e331edbb
[mangapark] replace manga test
7 years ago
Mike Fährmann
5488643fac
add requests and urllib3 versions to debug output
7 years ago
Mike Fährmann
9d73ed4772
fix issue with using 'skip()' when a filter is present
...
calling skip() skips over unfiltered items and does not apply
the filter expression to them, which is not what should happen
7 years ago
Mike Fährmann
28cd78aae0
[kissmanga] extend chapter-string regex ( closes #58 )
7 years ago
Mike Fährmann
0ba618dd1a
release version 1.1.1
7 years ago
Mike Fährmann
a3e9b51bea
[imgbox] update test results
...
Image URLs of older galleries have been updated to the new format.
https://i.imgbox.com/qHhw7lpG.png
-->
https://images3.imgbox.com/6d/9a/qHhw7lpG_o.png
7 years ago
Mike Fährmann
d241a0fb60
[util] replace '/' with '\' in base-directory paths
...
... on Windows to have consistent path separators.
7 years ago
Mike Fährmann
d0886f411e
[gelbooru] re-enable API use ( closes #56 )
...
Gelbooru's API allows access to all images and is not restricted
to the first 20000.
This also adds an option to select between API use and manual
information extraction in case their API gets disabled again.
7 years ago
Mike Fährmann
8102aae311
[mangahere] support ".cc" TLD and mobile URLs
7 years ago
Mike Fährmann
676602056c
[reddit] unescape output URLs
7 years ago
Mike Fährmann
2eedbaaaf9
[deviantart] use cache to store new refresh_tokens
...
The 'refresh_token' set in a user's config file gets used once to
get a new 'access_token' and 'refresh_token', which is then stored
in gallery-dl's cache and gets used the next time the 'access_token'
needs to be refreshed.
This means deleting the cache file invalidates the refresh_token-
chain and requires the user to re-authenticate.
7 years ago
Mike Fährmann
fc7d165c97
[deviantart] add support for OAuth2 authentication
...
Some user galleries [*] require you to be either logged in or
authenticated via OAuth2 to access their deviations.
[*] e.g. https://polinaegorussia.deviantart.com/gallery/
--------------
known issue:
A deviantart 'refresh_token' can only be used once and gets updated
whenever it is used to request a new 'access_token', so storing its
initial value in a config file and reusing it again and again is not
possible.
7 years ago
Mike Fährmann
91c2aed077
[nhentai] fix JSON extraction
7 years ago
Mike Fährmann
444008a14a
[khinsider] use urljoin() to complete page URLs
7 years ago
Mike Fährmann
263741d243
[luscious] update URL pattern ( closes #55 )
7 years ago
Mike Fährmann
0a9a07a6e1
[slideshare] improve metadata; flake8
...
- added 'views' and 'published' keywords
- fixed longer titles and descriptions
7 years ago
Leonardo Taccari
a8d2dde8b2
[slideshare] Add a new extractor for slideshare.net ( #54 )
7 years ago
Mike Fährmann
19a6ae57b2
[sankaku] add pool extractor
7 years ago
Mike Fährmann
e52f0cc1ed
[sankaku] add post extractor
7 years ago
Mike Fährmann
595593a35e
[sankaku] rewrite
...
- better code structure and extensibility
- better metadata
7 years ago
Mike Fährmann
e96e1fea5d
release version 1.1.0
7 years ago
Mike Fährmann
a3924d2072
[sankaku] fix swf extraction ( closes #52 )
7 years ago
Mike Fährmann
ebe9b0a04c
another attempt at downloader retry behavior
...
This commit changes the general behavior from
'Retry on every exception and abort on DownloadError' to
'Only retry on DownloadRetry exceptions and abort on every other one'
The previous version would have retried on several states which
would have no chance of ever succeeding (invalid URLs, etc.)
7 years ago
Mike Fährmann
291369eab2
various smaller changes/additions
7 years ago
Mike Fährmann
4fb6803fa6
add option to sleep before each download
7 years ago
Mike Fährmann
300346ecdf
[mangazuki] remove extractors
...
This site has been in "rebuild"-mode for a fairly long time and the
current extractor code isn't going to work for the new version either.
7 years ago
Mike Fährmann
d275b1d9a3
[khinsider] fix extraction
...
... again
7 years ago
Mike Fährmann
6b8e3003df
[hentai2read] ensure consistent extraction results
7 years ago
Mike Fährmann
a1980b16f3
[gelbooru] various improvements
...
- better metadata for pools
- map ratings to s/q/e like other boorus do
- skip() support
7 years ago
Mike Fährmann
93482a1f88
implement 'util.advance()'
7 years ago
Mike Fährmann
0e5057b15d
remove deprecated options
7 years ago
Mike Fährmann
8f518e03f8
add options to set maximum download rate
...
- -r/--limit-rate as cmdline option
- downloader.http.rate as config option
This implementation very roughly uses the idea of the token bucket
algorithm [1] and mostly uses Wget's approach [2] as inspiration.
[1] https://en.wikipedia.org/wiki/Token_bucket
[2] http://git.savannah.gnu.org/cgit/wget.git/tree/src/retr.c?h=v1.19.2&id=ba6b44f6745b14dce414761a8e4b35d31b176bba#n111
7 years ago
Mike Fährmann
a718c6c6cd
implement 'util.parse_bytes()'
7 years ago
Mike Fährmann
038e3b3369
[kissmanga] handle "AreYouHuman" redirects ( #51 )
7 years ago
Mike Fährmann
2b9a783fc7
[khinsider] fix extraction
7 years ago
Mike Fährmann
3dc1169736
use own mapping before relying on the 'mimetypes' module
7 years ago
Mike Fährmann
214972bc9a
[gelbooru] use manual extraction
...
... to compensate for their disabled API.
(https://gelbooru.com/index.php?page=forum&s=view&id=3875 )
This also adds an extractor for image-pools.
7 years ago
Mike Fährmann
55c64cad4b
[khinsider] fix filename extension and test-pattern
7 years ago
Mike Fährmann
c0bcf8e343
release version 1.0.2
7 years ago
Mike Fährmann
b14de6ffc2
[tumblr] small improvements
...
- don't transform inline GIF URLs
- set 'type' parameter for API calls if there is only
one post type selected
7 years ago
Mike Fährmann
9296a26eae
[tumblr] add warning messages
7 years ago
Mike Fährmann
65c1c53eb8
[khinsider] fix extraction
7 years ago
Mike Fährmann
12de658937
[tumblr] add options to control extraction behavior ( #48 )
...
- posts : list of post-types to inspect
- inline : scan post bodies for inline images
- external: follow external links
7 years ago
Mike Fährmann
077f8c12be
[tumblr] original video URLs + continuous offset
7 years ago
Mike Fährmann
8eb12ebeae
[tumblr] support more post/media types ( #48 )
...
This adds support for audio and video posts (most videos are shared
from youtube/instagram which isn't supported -> youtube-dl),
as well as link posts and image-search inside of text posts.
Most of this is just WIP and will need some sort of improvement
and options to enable/disable different media types etc.
7 years ago
Mike Fährmann
6c9da67581
apply selection options (filter, range) when using '-j'
7 years ago
Mike Fährmann
b8cdd42cab
[senmanga] fix extraction (again)
...
this is basically a re-revert of 2ace5c7
7 years ago
Mike Fährmann
e6814aebe2
add 'extractor.*.user-agent' config option
7 years ago
Mike Fährmann
6913eeaa40
[powermanga] replace manga extractor unit test
...
My Hero Academia is gone
7 years ago
Mike Fährmann
7e0d9257a7
[hbrowse] fix manga extraction
7 years ago
Mike Fährmann
3c576d10c0
[seiga] better metadata + 'skip()' support
7 years ago
Mike Fährmann
f72318e593
[seiga] support more than 200 images
...
Due to API restrictions and/or missing knowledge about and
documentation of API usage, it was only possible to retrieve the
latest 200 images of a niconico seiga user with said API.
The new approach manually visits each HTML page and gets its
information from there.
7 years ago