Mike Fährmann
63cc2599c4
[exhentai] add extractor for search results
7 years ago
Mike Fährmann
d1c91a1f2b
[mangadex] fix manga-page extraction
7 years ago
Mike Fährmann
299ae24996
[test] add a few downloader tests
7 years ago
Mike Fährmann
dd314279fb
[test] add unit tests for extractor module functions
7 years ago
Mike Fährmann
e7525b1b0e
[artstation] add challenge extractor ( #80 )
7 years ago
Mike Fährmann
f5c6a2d7f5
[nhentai] use API to get gallery info
7 years ago
Mike Fährmann
b2ba2b821d
[hitomi] fix image URLs and improve metadata
...
- use '?a.hitomi.la' as subdomain depending in gallery-id
- add 'characters', 'tags' and 'date' information
- support multiple entires per metadata-value
- rename 'num' to 'page'
7 years ago
Mike Fährmann
3905474805
[booru] call update_page() with correct dict ( closes #82 )
7 years ago
Mike Fährmann
44c267e362
[artstation] add search extractor ( #80 )
7 years ago
Mike Fährmann
40ca562d7b
[artstation] add album extractor ( #80 )
7 years ago
Mike Fährmann
f367d5c281
[deviantart] move delay-increase after expect_error check
...
[ci skip]
7 years ago
Mike Fährmann
557cb94f81
[deviantart] use proper exponential backoff on API errors
...
... and use separate API credentials for unit tests.
7 years ago
Mike Fährmann
723cc66bb1
[artstation] add user-, image- and likes-extractors
7 years ago
Mike Fährmann
4d74749496
[tests] rework filters for extractor tests
...
CI incompatible tests will now only be skipped if tests are run in
a CI environment.
7 years ago
Mike Fährmann
d6ef52897c
[imgchili] remove module
...
All previously hosted images yield a 404
and the main page is just a logo.
7 years ago
Mike Fährmann
7847ab1d5a
[imagehosts] remove even more dead sites
...
All removed sites either
- reject all incoming connections or
- display a message from their domain registrar
7 years ago
Mike Fährmann
5f37d40a3e
[komikcast] bypass cloudflare challenge
7 years ago
Mike Fährmann
f9884e2338
[pixiv] update URL pattern
...
add support for 'https://www.pixiv.net/user/ <id>'
7 years ago
Mike Fährmann
85ed023c2e
[mangadex] remove the trailing ' - MangaDex' in a better way
...
str.rstrip() works differently than assumed.
7 years ago
Mike Fährmann
32bbd12f08
update extractor tests
7 years ago
Mike Fährmann
ca326bd275
[deviantart] fix folder and collection archive IDs
...
{folder[index]} and {collection[index]} are both '0' when being
delegated from Gallery- or FavoriteExtractors, as there is no
way of knowing a folder's index when getting folder-information
from the API.
7 years ago
Mike Fährmann
e32fe1cdf1
[pinterest] cast IDs to int
...
... and update test results.
Image URLs changed from
https://s-media-cache-ak0.pinimg.com/ ... to
https://i.pinimg.com/ ...
7 years ago
Mike Fährmann
179ecee965
[turboimagehost] fix extraction
7 years ago
Mike Fährmann
1400868f53
[mangadex] general improvements
...
- support >100 chapter entries per manga
- custom archive ID format
- detect non-existing chapters
7 years ago
Mike Fährmann
749fbbfa6c
[mangadex] add chapter- and manga-extractor
7 years ago
Mike Fährmann
6e38cf5aab
[mangareader] use 'https://'
...
The site now redirects from http://mangareader.net/
to https://mangareader.net/
7 years ago
Mike Fährmann
1d71123f91
[pixiv] update archive IDs and add metadata-fields
...
(Pixiv bookmarks actually have their own IDs, comments and tags,
independent of the bookmarked image, which makes creating an
archive ID a lot easier)
7 years ago
Mike Fährmann
858fdbdb22
[tumblr] improve 'inline' extraction
...
'quote' posts store their HTML content in the 'source' field
7 years ago
Mike Fährmann
5008e105ee
update archive IDs
...
... to behave in a more straightforward way when dealing with
bookmarks/favourites/etc.
specific IDs are now grouped by their owner, album-id, ... to
allow for duplicates when it would be expected.
7 years ago
Mike Fährmann
829ddf4ac1
[sankaku] general improvements
...
- simplify regex
- unquote search tags
- increase default wait-time between HTTP requests
- downloading several hundreds of images always resulted
in '429 Too Many Requests' eventually
- circumvent paging restrictions for unauthenticated users by only
using the 'next' parameter
- setting 'page' to a constant, low value (or simply omitting it)
does the trick
7 years ago
Jad
49463f76bb
support multi-page URL ( #79 )
...
* support multi-page URL
* fix
* all done.
* fix, again
7 years ago
Mike Fährmann
19aefdfde3
[directlink] update test results
7 years ago
Mike Fährmann
74029c50bb
[directlink] unquote metadata fields
7 years ago
Mike Fährmann
8f338347b6
[imagehosts] cleanup
...
removed
- chronos.to - unable to resolve hostname
- coreimg.net - same
- imgmaid.net - same
- hosturimage.com - everything returns 404
- imageontime.org - redirects to some shady site
- imgupload.yt - cloudflare error 522, host down
- img4ever.net - read timeout
7 years ago
Mike Fährmann
edfd3d9fc9
[yeet] remove module
...
- archive.yeet.net returns a 500 server error
- yeet.net moved to yeet.rip, but the archive is gone
7 years ago
Mike Fährmann
8704d850bf
add explicit proxy support ( #76 )
...
- '--proxy' as command-line argument
- 'extractor.*.proxy' as config option
7 years ago
Mike Fährmann
367b963d37
[pixiv] fix ugoira extraction ... again ( #78 )
...
Some animations are not available for mobile devices, so we
pretend to be a desktop browser when requesting the ugoira page.
7 years ago
Mike Fährmann
b79f1f2ca7
[pixiv] fix ugoira extraction ( closes #78 )
7 years ago
Mike Fährmann
d122203be1
[mangastream] fix extraction
7 years ago
Mike Fährmann
179bcdd349
adjust archive-ids
7 years ago
Mike Fährmann
3cec533c28
Merge branch 'archive'
7 years ago
Mike Fährmann
20af86b2ea
add more extractor tests
...
for mangastream, reddit and imgur
7 years ago
Mike Fährmann
7e0207bcf4
[imgur] strip trailing '?1' from 'ext'
7 years ago
Mike Fährmann
cf147dfee9
[hentai2read] fix manga extraction
...
- site changed its HTML structure
7 years ago
Mike Fährmann
f5f2d29f56
[nijie] fix dojin extraction
...
- correctly extract artist_id
- set extension to "jpg" if it was empty and let filetype checks do
the rest
7 years ago
Mike Fährmann
d38bf2f54c
[tumblr] recognize /image/... URLs
...
xyz.tumblr.com/image/123 refers to the same images
as xyz.tumblr.com/post/123.
7 years ago
Mike Fährmann
5b3c34aa96
use generic chapter-extractor in more modules
7 years ago
Mike Fährmann
7b5ba69951
[hentaihere] ensure consistent extraction results
...
sometimes there is a random space before the next <a>
7 years ago
Mike Fährmann
377b78b3c9
[hentai2read] fix manga name extraction
7 years ago
Mike Fährmann
54c36a8a34
[subapics] add chapter- and manga-extractor ( #70 )
7 years ago
Mike Fährmann
2dd3aeeeae
[komikcast] add chapter- and manga-extractor ( #70 )
7 years ago
Mike Fährmann
7a412f5c32
implement generic manga-chapter extractor
7 years ago
Mike Fährmann
6a07e38366
implement extractor.add() and .add_module()
...
... as a public and non-hacky way to add (external) extractors to
gallery-dl's pool and make them available for extractor.find()
7 years ago
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
7 years ago
Mike Fährmann
a34cebc253
[luscious] jump to first image if cover does not link to it
7 years ago
Mike Fährmann
84a52a9256
add DownloadArchive class
7 years ago
Mike Fährmann
619387cbb1
update extractor unittest results
7 years ago
Mike Fährmann
db91cf871c
document message identifiers
7 years ago
Mike Fährmann
0dd48d644f
update test results
...
nothing broke, but things got updated or changed
7 years ago
Mike Fährmann
1e93955170
[batoto] remove module
...
Site officially shut down on 2018.01.18
7 years ago
Mike Fährmann
76509a6d3c
[imgur] update test results
7 years ago
Mike Fährmann
9fccd7b783
[tumblr] provide fallback URLs ( #64 )
...
Each image now produces 3 URLs:
- amazonaws.com _raw (or _1280 for older images)
- amazonaws.com _500
- media.tumblr.com (URL returned by API)
7 years ago
Mike Fährmann
9d69401391
initial support for multiple URLs per image
7 years ago
Mike Fährmann
91ed147cef
[oauth] use custom key/secret values during oauth:…
7 years ago
Mike Fährmann
421a9740a3
[tumblr] add 'tumblr:' to force Tumblr extractor ( #71 )
7 years ago
Mike Fährmann
40d35c87bc
[paheal] add tag- and post-extractors ( closes #69 )
7 years ago
Mike Fährmann
cc0c2cca57
[reddit] add extractor for reddit-hosted images ( closes #68 )
7 years ago
Mike Fährmann
f10ffc0839
update extractor blacklist to also allow classes
7 years ago
Mike Fährmann
35e09869d1
[mangapark] fix image URLs and use HTTPS
7 years ago
Mike Fährmann
9a049bdf51
[tumblr] add 'likes' extractor ( #65 )
7 years ago
Mike Fährmann
67d4462d26
[batoto] rudimentary Cloudflare bypass
7 years ago
Mike Fährmann
29d75fc3fa
[tumblr] add support for OAuth authentication ( #65 )
7 years ago
Mike Fährmann
4edb25346e
[slideshare] support mobile URLs ( closes #67 )
7 years ago
Mike Fährmann
e420a28bbc
fix cookie tests
7 years ago
Mike Fährmann
b33efc99a4
[idolcomplex] add support for idol.sankakucomplex.com
7 years ago
Mike Fährmann
75b2e84b6d
[tumblr] use s3.amazonaws.com for image URLs ( #64 )
7 years ago
Mike Fährmann
5b094328b5
[puremashiro] add chapter- and manga-extractor ( closes #66 )
...
Also adds support for region subtags in language codes (e.g. en-us)
7 years ago
Mike Fährmann
974e73bdbb
[booru] smaller code adjustments
7 years ago
Mike Fährmann
03b8a548cb
[tumblr] change `reblogs` default value to `true` ( #61 )
7 years ago
Mike Fährmann
d235f68f59
[tumblr] add option to filter reblogged posts ( #61 )
...
Reblogs are ignored by default, but can be included by setting
'extractor.tumblr.reblogs' to 'true'.
7 years ago
Mike Fährmann
a794fffc6d
[batoto] extend chapter-string regex ( closes #60 )
...
Non-numeric chapter indices exist after all ...
7 years ago
Mike Fährmann
1219ebb7f5
[danbooru] use alternate subdomains; support safebooru
7 years ago
Mike Fährmann
9e8a84ab6c
[booru] rewrite using Mixin classes ( #59 )
...
- improved code structure
- improved URL patterns
- better pagination to work around page limits on
- Danbooru
- e621
- 3dbooru
7 years ago
Mike Fährmann
0876541e43
[seiga] update tests
7 years ago
Mike Fährmann
88bb0798fd
delay initialization of PathFormat objects
...
This allows the DeviantArt group-check to be moved inside the
Extractor.items() method which in turn allows for better exception
handling.
As a new general rule:
Never raise exceptions during extractor initialization.
7 years ago
Mike Fährmann
c24e0e70a7
[pixiv] simplify main loop
7 years ago
Mike Fährmann
c1e331edbb
[mangapark] replace manga test
7 years ago
Mike Fährmann
28cd78aae0
[kissmanga] extend chapter-string regex ( closes #58 )
7 years ago
Mike Fährmann
a3e9b51bea
[imgbox] update test results
...
Image URLs of older galleries have been updated to the new format.
https://i.imgbox.com/qHhw7lpG.png
-->
https://images3.imgbox.com/6d/9a/qHhw7lpG_o.png
7 years ago
Mike Fährmann
d0886f411e
[gelbooru] re-enable API use ( closes #56 )
...
Gelbooru's API allows access to all images and is not restricted
to the first 20000.
This also adds an option to select between API use and manual
information extraction in case their API gets disabled again.
7 years ago
Mike Fährmann
8102aae311
[mangahere] support ".cc" TLD and mobile URLs
7 years ago
Mike Fährmann
676602056c
[reddit] unescape output URLs
7 years ago
Mike Fährmann
2eedbaaaf9
[deviantart] use cache to store new refresh_tokens
...
The 'refresh_token' set in a user's config file gets used once to
get a new 'access_token' and 'refresh_token', which is then stored
in gallery-dl's cache and gets used the next time the 'access_token'
needs to be refreshed.
This means deleting the cache file invalidates the refresh_token-
chain and requires the user to re-authenticate.
7 years ago
Mike Fährmann
fc7d165c97
[deviantart] add support for OAuth2 authentication
...
Some user galleries [*] require you to be either logged in or
authenticated via OAuth2 to access their deviations.
[*] e.g. https://polinaegorussia.deviantart.com/gallery/
--------------
known issue:
A deviantart 'refresh_token' can only be used once and gets updated
whenever it is used to request a new 'access_token', so storing its
initial value in a config file and reusing it again and again is not
possible.
7 years ago
Mike Fährmann
91c2aed077
[nhentai] fix JSON extraction
7 years ago
Mike Fährmann
444008a14a
[khinsider] use urljoin() to complete page URLs
7 years ago
Mike Fährmann
263741d243
[luscious] update URL pattern ( closes #55 )
7 years ago
Mike Fährmann
0a9a07a6e1
[slideshare] improve metadata; flake8
...
- added 'views' and 'published' keywords
- fixed longer titles and descriptions
7 years ago
Leonardo Taccari
a8d2dde8b2
[slideshare] Add a new extractor for slideshare.net ( #54 )
7 years ago
Mike Fährmann
19a6ae57b2
[sankaku] add pool extractor
7 years ago
Mike Fährmann
e52f0cc1ed
[sankaku] add post extractor
7 years ago
Mike Fährmann
595593a35e
[sankaku] rewrite
...
- better code structure and extensibility
- better metadata
7 years ago
Mike Fährmann
a3924d2072
[sankaku] fix swf extraction ( closes #52 )
7 years ago
Mike Fährmann
291369eab2
various smaller changes/additions
7 years ago
Mike Fährmann
300346ecdf
[mangazuki] remove extractors
...
This site has been in "rebuild"-mode for a fairly long time and the
current extractor code isn't going to work for the new version either.
7 years ago
Mike Fährmann
d275b1d9a3
[khinsider] fix extraction
...
... again
7 years ago
Mike Fährmann
6b8e3003df
[hentai2read] ensure consistent extraction results
7 years ago
Mike Fährmann
a1980b16f3
[gelbooru] various improvements
...
- better metadata for pools
- map ratings to s/q/e like other boorus do
- skip() support
7 years ago
Mike Fährmann
93482a1f88
implement 'util.advance()'
7 years ago
Mike Fährmann
038e3b3369
[kissmanga] handle "AreYouHuman" redirects ( #51 )
7 years ago
Mike Fährmann
2b9a783fc7
[khinsider] fix extraction
7 years ago
Mike Fährmann
214972bc9a
[gelbooru] use manual extraction
...
... to compensate for their disabled API.
(https://gelbooru.com/index.php?page=forum&s=view&id=3875 )
This also adds an extractor for image-pools.
7 years ago
Mike Fährmann
55c64cad4b
[khinsider] fix filename extension and test-pattern
7 years ago
Mike Fährmann
b14de6ffc2
[tumblr] small improvements
...
- don't transform inline GIF URLs
- set 'type' parameter for API calls if there is only
one post type selected
7 years ago
Mike Fährmann
9296a26eae
[tumblr] add warning messages
7 years ago
Mike Fährmann
65c1c53eb8
[khinsider] fix extraction
7 years ago
Mike Fährmann
12de658937
[tumblr] add options to control extraction behavior ( #48 )
...
- posts : list of post-types to inspect
- inline : scan post bodies for inline images
- external: follow external links
7 years ago
Mike Fährmann
077f8c12be
[tumblr] original video URLs + continuous offset
7 years ago
Mike Fährmann
8eb12ebeae
[tumblr] support more post/media types ( #48 )
...
This adds support for audio and video posts (most videos are shared
from youtube/instagram which isn't supported -> youtube-dl),
as well as link posts and image-search inside of text posts.
Most of this is just WIP and will need some sort of improvement
and options to enable/disable different media types etc.
7 years ago
Mike Fährmann
b8cdd42cab
[senmanga] fix extraction (again)
...
this is basically a re-revert of 2ace5c7
7 years ago
Mike Fährmann
e6814aebe2
add 'extractor.*.user-agent' config option
7 years ago
Mike Fährmann
6913eeaa40
[powermanga] replace manga extractor unit test
...
My Hero Academia is gone
7 years ago
Mike Fährmann
7e0d9257a7
[hbrowse] fix manga extraction
7 years ago
Mike Fährmann
3c576d10c0
[seiga] better metadata + 'skip()' support
7 years ago
Mike Fährmann
f72318e593
[seiga] support more than 200 images
...
Due to API restrictions and/or missing knowledge about and
documentation of API usage, it was only possible to retrieve the
latest 200 images of a niconico seiga user with said API.
The new approach manually visits each HTML page and gets its
information from there.
7 years ago
Mike Fährmann
baf8094868
improve Extractor.request()'s retry behavior
7 years ago
Mike Fährmann
7e7b64162b
[batoto] handle error 10031
7 years ago
Mike Fährmann
92027f67f9
use consistent names for URL constants
...
root := <scheme>://<host>
base_url := <root>/<common path>
7 years ago
Mike Fährmann
69cbc0619f
[mangastream] fix 'next-page' URLs ( fixes #49 )
7 years ago
Mike Fährmann
980fd3616d
[tumblr] use API v2 ( #48 )
7 years ago
Mike Fährmann
d6bed9f36f
[tumblr] prevent premature exit to get all images ( fixes #48 )
7 years ago
Mike Fährmann
305da540c3
[mangahere] fix metadata extraction
7 years ago
Mike Fährmann
2d0cfb33e1
[xvideos] add user profile extractor ( #45 )
7 years ago
Mike Fährmann
a393e6e538
[xvideos] add gallery extractor ( #45 )
7 years ago
Mike Fährmann
3a8a0c1f35
[imgbox] rewrite / fix extraction ( closes #47 )
7 years ago
Mike Fährmann
035ef655f1
[imagefap] update unit tests
...
old gallery/image has been deleted
7 years ago
Mike Fährmann
239d7afea7
[hosturimage] fix extraction of larger images
7 years ago
Mike Fährmann
158e60ee89
[3dbooru] enable download continuation
...
behoimi.org doesn't respect 'Range' headers and doesn't report
'Content-Length' for compressed content encodings.
7 years ago
Mike Fährmann
c4fcdf2691
Revert "[senmanga] fix extraction and download"
...
This reverts commit 2ace5c7b3c
.
7 years ago
Mike Fährmann
81a7788b40
replace space characters in unit test URLs
7 years ago
Mike Fährmann
bf82181359
[jaiminisbox] fix extraction
7 years ago
Mike Fährmann
16783e327f
[common] fix UnboundLocalError in Extractor.request()
7 years ago
Mike Fährmann
2ace5c7b3c
[senmanga] fix extraction and download
7 years ago
Mike Fährmann
4d8387f93b
[pixiv] support mobile URLs ( https://touch.pixiv.net/ )
7 years ago
Mike Fährmann
ab2bf0b0dd
[deviantart] replace collection unittest
7 years ago
Mike Fährmann
289d6b65d2
[danbooru] extend and improve URL regex
...
- add support for danbooru mirrors:
- hijiribe.donmai.us
- sonohara.donmai.us
- todo: actually use these domains instead of redirecting everything
to danbooru itself
- improve handling of query string parameters
7 years ago
Mike Fährmann
5fa42336a2
[sankaku] add warning for unauthenticated users
...
also improve URL pattern and add missing options to default config file
7 years ago
Mike Fährmann
6af921a952
[sankaku] rewrite/improve ( fixes #44 )
...
- add wait-time between HTTP requests similar to exhentai
- add 'wait-min' and 'wait-max' options
- increase retry-count for HTTP requests to 10
- implement user authentication (non-authenticated users can only view
images up to page 25)
- implement 'skip()' functionality (only works up to page 50)
- implement image-retrieval for pages >= 51
- fix issue with multiple tags
7 years ago
Mike Fährmann
9aecc67841
[common] explicitly handle HTTP status code 429
7 years ago
Mike Fährmann
d68a24aa70
[kissmanga] fix extraction
...
site changed '\n' to '\r\n' for newlines
7 years ago
Mike Fährmann
864a63ed33
fix typo
...
[skip ci]
7 years ago
Mike Fährmann
f3fbaa5c3e
[reddit] allow users to override the API User-Agent
...
Only overriding the Client-ID is not enough if you want to follow
Reddit's API access rules [1].
[1] https://github.com/reddit/reddit/wiki/API#rules
7 years ago
Mike Fährmann
31ea6001e8
[dynastyscans] improve metadata and filename formats
7 years ago
Mike Fährmann
2ef3c35c98
smaller textual changes
...
- swapped doc for deviantart.mature and .original
- updated gallery-dl.conf
- "transferred" -> "delegated"
7 years ago
Mike Fährmann
68a0a7579c
fix/improve some regular expressions
7 years ago
Mike Fährmann
393755ee94
[tumblr] update tests
7 years ago
Mike Fährmann
75d3a1f72f
[deviantart] always download original images
...
Deviation-objects returned by the DeviantArt API don't always contain
the URL and metadata of the original image ([1]). Getting this
information requires an additional API call [2], which is indicated by
the 'is_downloadable' and 'download_filesize' metadata within a
deviation-object.
[1] https://myria-moon.deviantart.com/art/Aime-Moi-part-en-vadrouille-261986576
[2] https://www.deviantart.com/developers/http/v1/20160316/deviation_download/bed6982b88949bdb08b52cd6763fcafd
7 years ago
Mike Fährmann
a1c8b21cfd
[senmanga] improve metadata
7 years ago
Mike Fährmann
994b2fc1e7
[deviantart] replace 'author[urlname]' keyword
...
author[urlname] has always only been the lowercase version of
author[username], which can now be directly converted to lowercase
using the 'l' conversion: '{author[username]!l}'
7 years ago
Mike Fährmann
633b376f35
improve/adjust default filename formats for manga sites
7 years ago
Mike Fährmann
41adb99e9c
[pawoo] fix extraction
...
- changed access_token
- use account-search instead of general search
7 years ago
Mike Fährmann
b319f4bab3
smaller code and text changes
7 years ago
Mike Fährmann
ad4580800c
[pixiv] add support for more URL patterns
...
- https://www.pixiv.net/mypage.php#id=USERID
- https://www.pixiv.net/#id=USERID
7 years ago
Mike Fährmann
82ea6c0cd3
adjust format strings with optional titles
...
... except for anything manga/comic related
7 years ago
Mike Fährmann
85a2b2ae59
[khinsider] fix extraction
7 years ago
Mike Fährmann
26a866e7d8
implement (sub)category-transfer between extractors ( #41 )
...
ImageFap- and all Manga-Extractors will transfer their (sub)category
values to other extractors instantiated by them, which will in turn
allow those to use options set for their parents.
Example:
ImagefapGalleryExtractors will use options set under
extractor.imagefap.user, if (and only if) they have been instantiated by
a ImagefapUserExtractor; and options from extractor.imagefap.gallery
otherwise.
7 years ago
Mike Fährmann
1ab4c7986f
[mangahere] fix extraction
...
would switch to HTTPS, but there seem to be certificate issues
7 years ago
Mike Fährmann
8e14714c2b
[imgspice] fix extraction
7 years ago
Mike Fährmann
9c138dfc1f
[common] detect empty HTTP response bodies
7 years ago
Mike Fährmann
c51616f8d8
[foolslide] fix minor chapter number
7 years ago
H R X N
77bf923c56
Update imgur.py to include 'title' of single image ( #40 )
...
Add {title} keyword..
Images on Imgur don't necessarily have a title, but I think most of them do, and since this should not break anything else..
7 years ago
Mike Fährmann
a85f06d2d1
[foolslide] restructure; convert suitable values to int
7 years ago
Mike Fährmann
deb2e803ba
simplify MangaExtractor class
7 years ago
Mike Fährmann
9fc1d0c901
implement and use 'util.safe_int()'
...
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
7 years ago
Mike Fährmann
8963da8fd8
[spectrumnexus] extract manga metadata
7 years ago
Mike Fährmann
a3e40734d1
[mangareader] extract manga metadata
7 years ago
Mike Fährmann
9196005a4d
[mangazuki] extract manga metadata
7 years ago
Mike Fährmann
543ba245eb
[deviantart] update test results
...
thumbnail URLs changed from //tXX.… to //t00.…
7 years ago
Mike Fährmann
b7a54a51d0
[mangapark] extract manga metadata + code improvements
7 years ago
Mike Fährmann
d39b8779af
[mangahere] extract manga metadata
7 years ago
Mike Fährmann
c265cc074a
[hbrowse] fix syntax for Python3.3 and 3.4
7 years ago
Mike Fährmann
a9e7145651
[hbrowse] extract hmanga metadata & general maintenance
7 years ago
Mike Fährmann
92c8a6cb01
[hentai2read] extract hmanga metadata
7 years ago
Mike Fährmann
de174b40d6
[hentaihere] extract hmanga metadata
7 years ago
Mike Fährmann
04cc1ffe34
[kissmanga] extract manga metadata
7 years ago
Mike Fährmann
885bd4cbe2
[readcomiconline] extract comic metadata
7 years ago
Mike Fährmann
cebf800a7f
[foolfuuka] add support for more sites ( #18 )
...
- https://arch.b4k.co
- https://archive.whatisthisimnotgoodwithcomputers.com
- https://archive.yeet.net
Notes:
- The name "whatisthisimnotgoodwithcomputers" is way too long ...
- archive.yeet.net is out of date and also blocked by 4chan servers
- newest threads are 2 weeks old
- using "https://archive.yeet.net " as Referer header results in
"403 Forbidden" when accessing 4chan
7 years ago
Mike Fährmann
84d4450410
[fallenangels] extract manga metadata
7 years ago
Mike Fährmann
f32b1a0292
[imgyt] fix extraction
7 years ago
Mike Fährmann
4ad903b797
[warosu] fix extraction
7 years ago
Mike Fährmann
b84f48dfa5
[batoto] extract manga metadata
7 years ago
Mike Fährmann
4ceb176c6b
[foolslide] extract manga metadata
...
enables chapter filtering for
- https://kobato.hologfx.com/
- https://jaiminisbox.com/
- https://reader.kireicake.com/
- https://powermanga.org/
- https://reader.seaotterscans.com/
- http://sensescans.com/
- http://www.slide.world-three.org/
7 years ago
Mike Fährmann
24e5f154a4
[deviantart] update test results
...
API responses now contain proper https:// URLs and their image download
server is now "orig00.deviantart.net" for all images.
7 years ago
Mike Fährmann
0dedbe759c
enable '--chapter-filter'
...
The same filter infrastructure that can be applied to image URLS now
also works for manga chapters and other delegated URLs.
TODO: actually provide any metadata (currently supported is only
deviantart and imagefap).
7 years ago
Mike Fährmann
31cd5b1c1d
[luscious] detect high-load responses
7 years ago
Mike Fährmann
470bbe9d8c
fix smaller stuff
...
- change filename option in example config file
- adapt default filename format for mangafox
- remove unnecessary newline
[skip ci]
7 years ago
Mike Fährmann
6f30cf4c64
change keyword names to valid Python identifiers
...
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.
(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
7 years ago
Mike Fährmann
54c0715135
allow users to set their own API access_tokens/client_ids
7 years ago
Mike Fährmann
49c7e70c10
[acidimg] add image extractor
7 years ago
Mike Fährmann
9b21d3f13c
add '--filter' command-line option
...
This allows for image filtering via Python expressions by the same
metadata that is also used to build filenames (--list-keywords).
The usually shunned eval() function is used to evaluate
filter-expressions, but it seemed quite appropriate in this case and
shouldn't introduce any new security issues, as any attacker that could do
> gallery-dl --filter "delete-everything()" ...
could as well do
> python -c "delete-everything()"
7 years ago
Mike Fährmann
00420ff202
[booru] consistent order for "popular" results
7 years ago
Mike Fährmann
83cf1e1d6d
[sankaku] unescape image URLs
7 years ago
Mike Fährmann
f98e3e8002
[luscious] fix tag extraction
7 years ago
Mike Fährmann
65997d835b
replace popular/ranking tests with older ones
...
Metadata of several year old lists shouldn't change as much as it
would for newer ones, which makes metadata-comparisons of the output
of build_testresult_db.oy easier.
7 years ago
Mike Fährmann
be30fb2f98
add common config category for boorus and foolslide
7 years ago
Mike Fährmann
c0755a4d5e
[exhentai] revert login-method to its old version ( #37 )
...
Additional cookies don't seem to help and have to be manually set
anyway. The older method is more likely to succeed, so I'd rather
use this one.
7 years ago
Mike Fährmann
3ee39ffd93
[exhentai] update login procedure ( #37 )
...
This new version behaves pretty much exactly like a browser would and
caches all cookies sent to it and not just "ipb_member_id" and
"ipb_pass_hash".
7 years ago
Mike Fährmann
88a386977e
[booru] add "popular" extractors for more sites
...
- konachan.com
- behoimi.org
- e621.net
7 years ago
Mike Fährmann
07214f4007
[booru] place subcategories into base classes
7 years ago
Mike Fährmann
60a888a1e4
[foolfuuka] add common config category
...
All FoolFuuka based 4chan-archive extractors can now be configured using
their own config keys (extractor.<category>) as well as a common shared
one (extractor.foolfuuka).
7 years ago
Mike Fährmann
47bcf53ec1
implement support for additional unit test result types
...
- "pattern" matches all resulting URLs against the given regex
- "count" allows to specify the amount of returned URLs
7 years ago
Mike Fährmann
2d0dfe9d56
[exhenai] init headers before login and detect sadpanda
...
- also debug-logs html after failed login
- #37
7 years ago
Mike Fährmann
c7ec103e15
[batoto] fix extraction of chapter URLs
7 years ago
Mike Fährmann
18e6ed1c7e
[booru] add extractors for "Popular" images
7 years ago
Mike Fährmann
f7cdfd4c25
add a simplified version of 'parse_qs'
...
This version only returns a dict of plain string to string key-value
pairs and ignores multiple values for the same query variable.
7 years ago
Mike Fährmann
3b21e0703c
[deviantart] allow distinction between users and groups ( #26 )
...
This is done by prepending "group-" to an extractor's subcategory
if the URL belongs to a group ("folder" becomes "group-folder" and
so on). This changes the configuration-path being used and is also
reflected in the output of '--list-keywords'.
7 years ago
Mike Fährmann
e61a3a56d1
[hentai2read] fix and update keywords
...
Added the "author" keyword and changed the name of a few others to be
consistent with other manga/chapter extractors.
7 years ago
Mike Fährmann
c45770331a
use 'str.partition()'
...
The (r)partition method is always faster then split() or any other
method that has been replaced in this commit.
7 years ago
Mike Fährmann
017a72f448
[pixiv] improve input validation
7 years ago
Mike Fährmann
dcf42c5e89
[pixiv] add extractor for ranking lists
7 years ago
Mike Fährmann
4ea82ea556
[warosu] add thread extractor
7 years ago
Mike Fährmann
9aa95fba8c
[deviantart] adapt download URLs to use https
...
Even though DeviantArt is "completely switching over to HTTPS"[1],
every URL contained in an API response is still using HTTP
[1] https://danlev.deviantart.com/journal/DeviantArt-Is-Switching-To-HTTPS-697996906
7 years ago
Mike Fährmann
02e89700fc
[foolfuuka] ensure sorted posts
7 years ago
Mike Fährmann
8bcf88bff7
[flickr] fix extraction
...
This issue was only noticeable with older Python versions, as these
don't exhibit a consistent ordering of dict keys.
7 years ago
Mike Fährmann
004456d5d5
properly update the config-dictionary
...
When using 2 or more config files, the values of the second would
improperly overwrite nested dictionaries of the first one.
The new method properly combines these nested dictionaries as well.
7 years ago
Mike Fährmann
cfa479fab5
update error message for unspecified exceptions
...
- ask user to report unexpected errors, which usually indicate
extractor failure
- handle OSErrors separately (permissions, disk full, etc)
- revert 30eef52
7 years ago
Mike Fährmann
7e936e9c06
[luscious] simplify and remove dead code
7 years ago
Mike Fährmann
0245a0ba5f
fix extraction and update test results
...
- fixes for hbrowse, imgyt, imgcandy, hosturimage
- test updates for deviantart, gfycat
7 years ago
Mike Fährmann
abd7c559cd
[yonkouprod] remove module
...
Every manga chapter on this site has been removed.
7 years ago
Mike Fährmann
da7219ba74
[kisscomic] remove module
...
Image links on this site are dead.
7 years ago
Mike Fährmann
852e7acd31
[twitter] ignore "Promoted Tweets"
7 years ago
Mike Fährmann
915a0137de
improve 'extractor.request'
...
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
7 years ago
rachmadani haryono
dcd573806e
chg: dev: fix error ( #32 )
...
* fix: dev: error
* fix: dev: AttributeError when getting artist
* fix: dev: typo on luscious parser
7 years ago
Mike Fährmann
c4713404c8
[directlink] improve URL pattern
7 years ago
Mike Fährmann
d443822fdb
[luacious] get correct image URLs ( fixes #33 )
...
Instead of using thumbnail URLs and modifying them the extractor now
goes through every single image-page and gets its download URL from
there.
7 years ago
Mike Fährmann
6950708e52
[hentaicdn] use HTTPS
7 years ago
Mike Fährmann
4f1e6c109f
[deviantart] remove 'invalid escape sequence' warning
...
- use r"\w" or "\\w" instead of "\w"
7 years ago
Mike Fährmann
c864be479e
[directlink] update URL pattern & PEP 8
...
- combine some file extensions
- don't match '.je'
- line length < 80
7 years ago
H R X N
45f9d64c23
Update directlink.py with additional file exts. ( #30 )
...
Add WebP, still not that common, but it's increasing.
Add 3rd JPEG variant (https://en.wikipedia.org/wiki/JPEG#JPEG_filename_extensions )
Never seen JFIF in the wild, would probably be overkill.
Extend Ogg formats (https://en.wikipedia.org/wiki/Ogg ; https://wiki.xiph.org/MIME_Types_and_File_Extensions )
7 years ago
Mike Fährmann
4357966a70
[kissmanga] make URL pattern case-insensitive (fixes 28)
7 years ago
Mike Fährmann
7aa9fa796a
code cleanup and fixes
7 years ago
Mike Fährmann
f08af03845
Merge branch 'cookies'
7 years ago
Mike Fährmann
55f048d02b
ignore case of cookiejar magic strings
7 years ago
Mike Fährmann
f53bf1a323
[thebarchive] add thread extractor
7 years ago
Mike Fährmann
b8cf434bb0
[rebeccablacktech] add thread extractor
7 years ago
Mike Fährmann
808f67ba7d
use 'cookiedomain' for cookies set by object-config-values
...
otherwise these cookies would not be picked up by the
_check_cookies() method.
7 years ago
Mike Fährmann
390eeded4c
[mangazuki] support 'raws.…' subdomain
7 years ago
Mike Fährmann
4a60f6068a
[mangazuki] add manga extractor
7 years ago
Mike Fährmann
394241cd6f
[2chan] fix extraction
7 years ago
Mike Fährmann
a13eb6010f
[fallenangels] fix extraction of chapter URLs
7 years ago
Mike Fährmann
1cb1d2e0a3
[mangazuki] add chapter extractor
7 years ago
Mike Fährmann
2f2e363c97
[imgur] use /a/<key>/all as album-url
7 years ago
Mike Fährmann
1cec03c9c6
[imgur] fix extraction of large albums
7 years ago
Mike Fährmann
0610ae5000
skip login if cookies are present
7 years ago
Mike Fährmann
f105782435
[fireden] add thread extractor
7 years ago
Mike Fährmann
c93f7d7496
[archiveofsins] add thread extractor
7 years ago
Mike Fährmann
96e13604da
[archivedmoe] add thread extractor
7 years ago
Mike Fährmann
30d3a5f9b2
support redirects on 4chan archives
7 years ago
Mike Fährmann
98464d1f1b
[loveisover] add thread extractor
7 years ago
Mike Fährmann
47692f28da
[2chan] add thread extractor
7 years ago
Mike Fährmann
3460dc8950
update gallery-dl.conf
7 years ago
Mike Fährmann
9be8f7e106
[deviantart] add "extractor.deviantart.flat" option
...
Setting this to 'false' downloads images into individual subdirectories
for each gallery-folder or favourite-collection, otherwise it is just
creating a flat list of images.
7 years ago
Mike Fährmann
d075627fd9
[deviantart] support group galleries ( #26 )
...
For groups the 'GalleryExtractor' collects all gallery-folder URLs
and defers its work to the 'FolderExtractor'.
7 years ago
Mike Fährmann
b37a62501b
[pixiv] unquote tags
7 years ago
Mike Fährmann
fbd7dcdfdb
[desuarchive] add thread extractor
7 years ago
Mike Fährmann
af9bd17b19
[deviantart] adjust default paths
...
- user.deviantart.com/(gallery|favourites|journal)/ images go into
* <user>/
* <user>/Favourites/
* <user>/Journal/
(having an extra "Gallery" folder for a user's gallery-images seems
a bit too much if these are all you want to download, which is
probably the default use-case)
- single "deviations" (user.deviantart.com/(art|journal)/name-123) go
into their owner's directory:
* <user>/
(putting them into their own directory seems weird in practice)
7 years ago
Mike Fährmann
eb64fb267c
[nyafuu] add thread extractor ( #18 )
7 years ago
Mike Fährmann
726c6f01ae
allow 'cookies' config option to be a dictionary
7 years ago
Mike Fährmann
4877ef6314
[deviantart] support '?catpath=/' URLs ( #26 )
...
They previously weren't supported for galleries and journals.
This also increases the 'limit' parameter for API calls to its
respective maximum.
7 years ago
Mike Fährmann
8c16cbe7ea
fix tests
7 years ago
Mike Fährmann
a6f689e01a
[deviantart] add gallery-folder extractor ( #26 )
...
The code for this and the available metadata is probably going
to change again. This extractor is very similar to the favorite-
extractor, so they might be "combined" or something like that.
7 years ago
Mike Fährmann
474e9c1aec
[4plebs] add thread extractor ( #18 )
7 years ago
Mike Fährmann
a804a42e23
add '--cookies' command-line option
7 years ago
Mike Fährmann
dcc1d3b2ea
[hentaifoundry] fix infinite loop for multiple of 25 images
7 years ago
Mike Fährmann
34e6e1099e
[batoto] adapt to https chapter URLs
7 years ago
Mike Fährmann
85696d0b3b
[reddit] fix issue with datetime errors
7 years ago
Mike Fährmann
80c2e03aaa
[reddit] allow 'date-min/max' to be human readable dates
...
If the date-min/max config value is a string, try parsing it using
datetime.strptime [1] with 'date-format' as format string [2]
(default: "%Y-%m-%dT%H:%M:%S")
Example: get all submissions posted in 2016
$ gallery-dl reddit.com/r/... \
-o date-format=%Y \
-o date-min=\"2016\" \
-o date-max=\"2017\"
[1] https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime
[2] https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
7 years ago
Mike Fährmann
58e95a7487
share extractor and downloader sessions
...
There was never any "good" reason for the strict separation
between extractors and downloaders. This change allows for
reduced resource usage (probably unnoticeable) and less lines
of code at the "cost" of tighter coupling.
7 years ago
Mike Fährmann
4414aefe97
small fix for aes_cbc_decrypt_text
7 years ago
Mike Fährmann
21064146c1
fix test
7 years ago
Mike Fährmann
f3d0373120
[reddit] add ability to filter by submission id
...
'extractor.reddit.id-min' and '….id-max' specify the lowest and
highest submission-/post-id to consider, similar to 'date-min' and
'date-max'
7 years ago
Mike Fährmann
1dac76fd1c
update extractor docstrings
7 years ago
Mike Fährmann
92a11528d1
smaller changes
7 years ago
Mike Fährmann
44d98e562b
[pixiv] support pixiv.me URLs ( #23 )
7 years ago
Mike Fährmann
b373fe0eea
[pixiv] support shortened URLs and other variants ( #23 )
7 years ago
Mike Fährmann
c951d6276c
[imagetwist] use https
7 years ago
Mike Fährmann
d3b04076f7
add .netrc support ( #22 )
...
Use the '--netrc' cmdline option or set the 'netrc' config option
to 'true' to enable the use of .netrc authentication data.
The 'machine' names for the .netrc info are the lowercase extractor
names (or categories): batoto, exhentai, nijie, pixiv, seiga.
7 years ago
Mike Fährmann
e1d82af5e0
small fixes
7 years ago
Mike Fährmann
719d45f89e
[flickr] allow the use of Flickr's specifiers for format selection
...
- renamed the 'width-max' option to 'size-max'
- filter by both width and height
7 years ago
Mike Fährmann
b4c438c9ad
[oauth] add the 'extractor.oauth.browser' option
...
enables/disables the use of webbrowser.open() during OAuth authorization
7 years ago
Mike Fährmann
2633337833
[kissmanga] update regex ( fixes #20 )
7 years ago
Mike Fährmann
e68af4febe
[flickr] add 'width-max' option ( #16 )
...
This option allows for simple format selection by
specifying a maximum image width.
7 years ago
Mike Fährmann
2993206c4b
smaller fixes and "security" measures
...
- move the OAuthSession class into util.py
- block special extractors for reddit and recursive
- ignore 'only matching' tests for testresults script
7 years ago
Mike Fährmann
8d5e92f641
resolve cyclic dependency between oauth and flickr
7 years ago
Mike Fährmann
d60781de7b
[oauth] workaround for ctrl+c on Windows
7 years ago
Mike Fährmann
9759fe8c6b
allow 'only_matching' tests
7 years ago
Mike Fährmann
56bec79e6a
[reddit] add ability to load more comments ( #15 )
...
The 'extractor.reddit.morecomments' option enables the use of
the '/api/morechildren' API endpoint (1) to load even more
comments than the usual submission-request provides.
Possible values are the booleans 'true' and 'false' (default).
Note: this feature comes at the cost of 1 extra API call towards
the rate limit for every 100 extra comments.
(1) https://www.reddit.com/dev/api/#GET_api_morechildren
7 years ago
Mike Fährmann
05ed95e5b0
[flickr] add search extractor
7 years ago
Mike Fährmann
5f55c854b9
[flickr] replace getPublic... API call with regular ones
7 years ago
Mike Fährmann
9a620784f9
[flickr] add support for user authentication ( #16 )
...
Call '$ gallery-dl oauth:flickr' to get an access_token
and access_token_secret for your account.
7 years ago