gallery-dl

Commit Graph

Author	SHA1	Message	Date
Mike Fährmann	8c3b713362	rework DownloadJob.handle_url(); include archive functionality todo: "abort" and "exit" skip modes if download is skipped because of archive	7 years ago
Mike Fährmann	34873dbd90	set 'archive_fmt' values These are going to be used to create an unique id for each image.	7 years ago
Mike Fährmann	a34cebc253	[luscious] jump to first image if cover does not link to it	7 years ago
Mike Fährmann	84a52a9256	add DownloadArchive class	7 years ago
Mike Fährmann	915807dd77	log HTTP errors as warnings	7 years ago
Mike Fährmann	db7f04dd97	emit log messages on download failure and when retrying with fallback URLs	7 years ago
Mike Fährmann	d951f13e37	add config option for unsupported-URL file for consistency's sake	7 years ago
Mike Fährmann	619387cbb1	update extractor unittest results	7 years ago
Mike Fährmann	364e335440	smaller adjustments and improvements - requests and urllib3 version on 1 line - close input file after reading from it - use expand_path for unsupported-urls file - remove unnecessary logging from options.py	7 years ago
Mike Fährmann	c9a9664a65	change --write-log behaviour - log files now get truncated when opening them (mode "w" instead of "a") - log verbosity to file depends on -q/-v (same as logging to stderr)	7 years ago
Mike Fährmann	97f4f15ec0	add option to write logging output to a file - '--write-log FILE' as cmdline argument - 'output.logfile' as config file option	7 years ago
Mike Fährmann	f94e3706a8	use logging module for error messages during downloads	7 years ago
Mike Fährmann	db91cf871c	document message identifiers	7 years ago
Mike Fährmann	0dd48d644f	update test results nothing broke, but things got updated or changed	7 years ago
Mike Fährmann	1e93955170	[batoto] remove module Site officially shut down on 2018.01.18	7 years ago
Mike Fährmann	27fce6f600	fix UrlJob behavior	7 years ago
Mike Fährmann	76509a6d3c	[imgur] update test results	7 years ago
Mike Fährmann	9fccd7b783	[tumblr] provide fallback URLs (#64 ) Each image now produces 3 URLs: - amazonaws.com _raw (or _1280 for older images) - amazonaws.com _500 - media.tumblr.com (URL returned by API)	7 years ago
Mike Fährmann	b837420291	fix minor urllist issues	7 years ago
Mike Fährmann	9d69401391	initial support for multiple URLs per image	7 years ago
Mike Fährmann	6174a5c4ef	[download] adjust filename extension on filetype mismatch (closes #63)	7 years ago
Mike Fährmann	91ed147cef	[oauth] use custom key/secret values during oauth:…	7 years ago
Mike Fährmann	421a9740a3	[tumblr] add 'tumblr:' to force Tumblr extractor (#71 )	7 years ago
Mike Fährmann	40d35c87bc	[paheal] add tag- and post-extractors (closes #69 )	7 years ago
Mike Fährmann	cc0c2cca57	[reddit] add extractor for reddit-hosted images (closes #68 )	7 years ago
Mike Fährmann	f10ffc0839	update extractor blacklist to also allow classes	7 years ago
Mike Fährmann	b6797032e3	release version 1.1.2	7 years ago
Mike Fährmann	35e09869d1	[mangapark] fix image URLs and use HTTPS	7 years ago
Mike Fährmann	9a049bdf51	[tumblr] add 'likes' extractor (#65 )	7 years ago
Mike Fährmann	67d4462d26	[batoto] rudimentary Cloudflare bypass	7 years ago
Mike Fährmann	29d75fc3fa	[tumblr] add support for OAuth authentication (#65 )	7 years ago
Mike Fährmann	4edb25346e	[slideshare] support mobile URLs (closes #67 )	7 years ago
Mike Fährmann	e420a28bbc	fix cookie tests	7 years ago
Mike Fährmann	b33efc99a4	[idolcomplex] add support for idol.sankakucomplex.com	7 years ago
Mike Fährmann	75b2e84b6d	[tumblr] use s3.amazonaws.com for image URLs (#64 )	7 years ago
Mike Fährmann	5b094328b5	[puremashiro] add chapter- and manga-extractor (closes #66 ) Also adds support for region subtags in language codes (e.g. en-us)	7 years ago
Mike Fährmann	974e73bdbb	[booru] smaller code adjustments	7 years ago
Mike Fährmann	03b8a548cb	[tumblr] change `reblogs` default value to `true` (#61 )	7 years ago
Mike Fährmann	d235f68f59	[tumblr] add option to filter reblogged posts (#61 ) Reblogs are ignored by default, but can be included by setting 'extractor.tumblr.reblogs' to 'true'.	7 years ago
Mike Fährmann	a794fffc6d	[batoto] extend chapter-string regex (closes #60 ) Non-numeric chapter indices exist after all ...	7 years ago
Mike Fährmann	1219ebb7f5	[danbooru] use alternate subdomains; support safebooru	7 years ago
Mike Fährmann	9e8a84ab6c	[booru] rewrite using Mixin classes (#59 ) - improved code structure - improved URL patterns - better pagination to work around page limits on - Danbooru - e621 - 3dbooru	7 years ago
Mike Fährmann	0876541e43	[seiga] update tests	7 years ago
Mike Fährmann	1a70857a12	update extractor-unittest capabilities - "count" can now be a string defining a comparison in the form of '<operator> <value>', for example: '> 12' or '!= 1'. If its value is not a string, it is assumed to be a concrete integer as before. - "keyword" can now be a dictionary defining tests for individual keys. These tests can either be a type, a concrete value or a regex starting with "re:". Dictionaries can be stacked inside each other. Optional keys can be indicated with a "?" before its name. For example: "keyword:" { "image_id": int, "gallery_id", 123, "name": "re:pattern", "user": { "id": 321, }, "?optional": None, }	7 years ago
Mike Fährmann	88bb0798fd	delay initialization of PathFormat objects This allows the DeviantArt group-check to be moved inside the Extractor.items() method which in turn allows for better exception handling. As a new general rule: Never raise exceptions during extractor initialization.	7 years ago
Mike Fährmann	c24e0e70a7	[pixiv] simplify main loop	7 years ago
Mike Fährmann	c1e331edbb	[mangapark] replace manga test	7 years ago
Mike Fährmann	5488643fac	add requests and urllib3 versions to debug output	7 years ago
Mike Fährmann	9d73ed4772	fix issue with using 'skip()' when a filter is present calling skip() skips over unfiltered items and does not apply the filter expression to them, which is not what should happen	7 years ago
Mike Fährmann	28cd78aae0	[kissmanga] extend chapter-string regex (closes #58 )	7 years ago
Mike Fährmann	0ba618dd1a	release version 1.1.1	7 years ago
Mike Fährmann	a3e9b51bea	[imgbox] update test results Image URLs of older galleries have been updated to the new format. https://i.imgbox.com/qHhw7lpG.png --> https://images3.imgbox.com/6d/9a/qHhw7lpG_o.png	7 years ago
Mike Fährmann	d241a0fb60	[util] replace '/' with '\' in base-directory paths ... on Windows to have consistent path separators.	7 years ago
Mike Fährmann	d0886f411e	[gelbooru] re-enable API use (closes #56 ) Gelbooru's API allows access to all images and is not restricted to the first 20000. This also adds an option to select between API use and manual information extraction in case their API gets disabled again.	7 years ago
Mike Fährmann	8102aae311	[mangahere] support ".cc" TLD and mobile URLs	7 years ago
Mike Fährmann	676602056c	[reddit] unescape output URLs	7 years ago
Mike Fährmann	2eedbaaaf9	[deviantart] use cache to store new refresh_tokens The 'refresh_token' set in a user's config file gets used once to get a new 'access_token' and 'refresh_token', which is then stored in gallery-dl's cache and gets used the next time the 'access_token' needs to be refreshed. This means deleting the cache file invalidates the refresh_token- chain and requires the user to re-authenticate.	7 years ago
Mike Fährmann	fc7d165c97	[deviantart] add support for OAuth2 authentication Some user galleries [] require you to be either logged in or authenticated via OAuth2 to access their deviations. [] e.g. https://polinaegorussia.deviantart.com/gallery/ -------------- known issue: A deviantart 'refresh_token' can only be used once and gets updated whenever it is used to request a new 'access_token', so storing its initial value in a config file and reusing it again and again is not possible.	7 years ago
Mike Fährmann	91c2aed077	[nhentai] fix JSON extraction	7 years ago
Mike Fährmann	444008a14a	[khinsider] use urljoin() to complete page URLs	7 years ago
Mike Fährmann	263741d243	[luscious] update URL pattern (closes #55 )	7 years ago
Mike Fährmann	0a9a07a6e1	[slideshare] improve metadata; flake8 - added 'views' and 'published' keywords - fixed longer titles and descriptions	7 years ago
Leonardo Taccari	a8d2dde8b2	[slideshare] Add a new extractor for slideshare.net (#54 )	7 years ago
Mike Fährmann	19a6ae57b2	[sankaku] add pool extractor	7 years ago
Mike Fährmann	e52f0cc1ed	[sankaku] add post extractor	7 years ago
Mike Fährmann	595593a35e	[sankaku] rewrite - better code structure and extensibility - better metadata	7 years ago
Mike Fährmann	e96e1fea5d	release version 1.1.0	7 years ago
Mike Fährmann	a3924d2072	[sankaku] fix swf extraction (closes #52 )	7 years ago
Mike Fährmann	ebe9b0a04c	another attempt at downloader retry behavior This commit changes the general behavior from 'Retry on every exception and abort on DownloadError' to 'Only retry on DownloadRetry exceptions and abort on every other one' The previous version would have retried on several states which would have no chance of ever succeeding (invalid URLs, etc.)	7 years ago
Mike Fährmann	291369eab2	various smaller changes/additions	7 years ago
Mike Fährmann	4fb6803fa6	add option to sleep before each download	7 years ago
Mike Fährmann	300346ecdf	[mangazuki] remove extractors This site has been in "rebuild"-mode for a fairly long time and the current extractor code isn't going to work for the new version either.	7 years ago
Mike Fährmann	d275b1d9a3	[khinsider] fix extraction ... again	7 years ago
Mike Fährmann	6b8e3003df	[hentai2read] ensure consistent extraction results	7 years ago
Mike Fährmann	a1980b16f3	[gelbooru] various improvements - better metadata for pools - map ratings to s/q/e like other boorus do - skip() support	7 years ago
Mike Fährmann	93482a1f88	implement 'util.advance()'	7 years ago
Mike Fährmann	0e5057b15d	remove deprecated options	7 years ago
Mike Fährmann	8f518e03f8	add options to set maximum download rate - -r/--limit-rate as cmdline option - downloader.http.rate as config option This implementation very roughly uses the idea of the token bucket algorithm [1] and mostly uses Wget's approach [2] as inspiration. [1] https://en.wikipedia.org/wiki/Token_bucket [2] http://git.savannah.gnu.org/cgit/wget.git/tree/src/retr.c?h=v1.19.2&id=ba6b44f6745b14dce414761a8e4b35d31b176bba#n111	7 years ago
Mike Fährmann	a718c6c6cd	implement 'util.parse_bytes()'	7 years ago
Mike Fährmann	038e3b3369	[kissmanga] handle "AreYouHuman" redirects (#51 )	7 years ago
Mike Fährmann	2b9a783fc7	[khinsider] fix extraction	7 years ago
Mike Fährmann	3dc1169736	use own mapping before relying on the 'mimetypes' module	7 years ago
Mike Fährmann	214972bc9a	[gelbooru] use manual extraction ... to compensate for their disabled API. (https://gelbooru.com/index.php?page=forum&s=view&id=3875) This also adds an extractor for image-pools.	7 years ago
Mike Fährmann	55c64cad4b	[khinsider] fix filename extension and test-pattern	7 years ago
Mike Fährmann	c0bcf8e343	release version 1.0.2	7 years ago
Mike Fährmann	b14de6ffc2	[tumblr] small improvements - don't transform inline GIF URLs - set 'type' parameter for API calls if there is only one post type selected	7 years ago
Mike Fährmann	9296a26eae	[tumblr] add warning messages	7 years ago
Mike Fährmann	65c1c53eb8	[khinsider] fix extraction	7 years ago
Mike Fährmann	12de658937	[tumblr] add options to control extraction behavior (#48 ) - posts : list of post-types to inspect - inline : scan post bodies for inline images - external: follow external links	7 years ago
Mike Fährmann	077f8c12be	[tumblr] original video URLs + continuous offset	7 years ago
Mike Fährmann	8eb12ebeae	[tumblr] support more post/media types (#48 ) This adds support for audio and video posts (most videos are shared from youtube/instagram which isn't supported -> youtube-dl), as well as link posts and image-search inside of text posts. Most of this is just WIP and will need some sort of improvement and options to enable/disable different media types etc.	7 years ago
Mike Fährmann	6c9da67581	apply selection options (filter, range) when using '-j'	7 years ago
Mike Fährmann	b8cdd42cab	[senmanga] fix extraction (again) this is basically a re-revert of `2ace5c7`	7 years ago
Mike Fährmann	e6814aebe2	add 'extractor.*.user-agent' config option	7 years ago
Mike Fährmann	6913eeaa40	[powermanga] replace manga extractor unit test My Hero Academia is gone	7 years ago
Mike Fährmann	7e0d9257a7	[hbrowse] fix manga extraction	7 years ago
Mike Fährmann	3c576d10c0	[seiga] better metadata + 'skip()' support	7 years ago
Mike Fährmann	f72318e593	[seiga] support more than 200 images Due to API restrictions and/or missing knowledge about and documentation of API usage, it was only possible to retrieve the latest 200 images of a niconico seiga user with said API. The new approach manually visits each HTML page and gets its information from there.	7 years ago
Mike Fährmann	baf8094868	improve Extractor.request()'s retry behavior	7 years ago
Mike Fährmann	7e7b64162b	[batoto] handle error 10031	7 years ago
Mike Fährmann	79bcaa8726	improve downloader retry behavior - only retry download on 5xx and 429 status codes - immediately fail on 4xx status codes	7 years ago
Mike Fährmann	5ee8ca0319	release version 1.0.1	7 years ago
Mike Fährmann	42e948584d	fix downloader error handling RequestException being a subclass of OSError caused all exceptions during file downloads to be ignored/re-raised.	7 years ago
Mike Fährmann	92027f67f9	use consistent names for URL constants root := <scheme>://<host> base_url := <root>/<common path>	7 years ago
Mike Fährmann	69cbc0619f	[mangastream] fix 'next-page' URLs (fixes #49 )	7 years ago
Mike Fährmann	980fd3616d	[tumblr] use API v2 (#48 )	7 years ago
Mike Fährmann	d6bed9f36f	[tumblr] prevent premature exit to get all images (fixes #48 )	7 years ago
Mike Fährmann	305da540c3	[mangahere] fix metadata extraction	7 years ago
Mike Fährmann	2d0cfb33e1	[xvideos] add user profile extractor (#45 )	7 years ago
Mike Fährmann	a393e6e538	[xvideos] add gallery extractor (#45 )	7 years ago
Mike Fährmann	3a8a0c1f35	[imgbox] rewrite / fix extraction (closes #47 )	7 years ago
Mike Fährmann	f97207a8e6	release version 1.0.0	7 years ago
Mike Fährmann	707b15b586	create missing directories for 'part-directory' also some code improvements regarding downloader config values	7 years ago
Mike Fährmann	035ef655f1	[imagefap] update unit tests old gallery/image has been deleted	7 years ago
Mike Fährmann	caf26412dd	add option to set alternate location of .part files (#29 ) Note: The path set for 'downloader.*.part-directory' needs to point to an already existing directory.	7 years ago
Mike Fährmann	ea8ca4cfa4	add 'util.expand_path()'	7 years ago
Mike Fährmann	9a41002b77	fix partial downloads for 'text:' URLs Using a filesize in bytes as offset into a Python string is not a good idea if said file contains non-ASCII characters.	7 years ago
Mike Fährmann	239d7afea7	[hosturimage] fix extraction of larger images	7 years ago
Mike Fährmann	27c026543f	re-enable download unit tests	7 years ago
Mike Fährmann	963670d73b	add options to control usage of .part files (#29 ) - '--no-part' command line option to disable them - 'downloader.http.part' and 'downloader.text.part' config options Disabling .part files restores the behaviour of the old downloader implementation.	7 years ago
Mike Fährmann	158e60ee89	[3dbooru] enable download continuation behoimi.org doesn't respect 'Range' headers and doesn't report 'Content-Length' for compressed content encodings.	7 years ago
Mike Fährmann	b0353aa02d	rewrite download modules (#29 ) - use '.part' files during file-download - implement continuation of incomplete downloads - check if file size matches the one reported by server	7 years ago
Mike Fährmann	c4fcdf2691	Revert "[senmanga] fix extraction and download" This reverts commit `2ace5c7b3c`.	7 years ago
Mike Fährmann	81a7788b40	replace space characters in unit test URLs	7 years ago
Mike Fährmann	bf82181359	[jaiminisbox] fix extraction	7 years ago
Mike Fährmann	2e982f56af	use 'Content-Length' to determine incomplete downloads (#29 )	7 years ago
Mike Fährmann	16783e327f	[common] fix UnboundLocalError in Extractor.request()	7 years ago
Mike Fährmann	2ace5c7b3c	[senmanga] fix extraction and download	7 years ago
Mike Fährmann	4d8387f93b	[pixiv] support mobile URLs (https://touch.pixiv.net/)	7 years ago
Mike Fährmann	ab2bf0b0dd	[deviantart] replace collection unittest	7 years ago
Mike Fährmann	289d6b65d2	[danbooru] extend and improve URL regex - add support for danbooru mirrors: - hijiribe.donmai.us - sonohara.donmai.us - todo: actually use these domains instead of redirecting everything to danbooru itself - improve handling of query string parameters	7 years ago
Mike Fährmann	5fa42336a2	[sankaku] add warning for unauthenticated users also improve URL pattern and add missing options to default config file	7 years ago
Mike Fährmann	6af921a952	[sankaku] rewrite/improve (fixes #44 ) - add wait-time between HTTP requests similar to exhentai - add 'wait-min' and 'wait-max' options - increase retry-count for HTTP requests to 10 - implement user authentication (non-authenticated users can only view images up to page 25) - implement 'skip()' functionality (only works up to page 50) - implement image-retrieval for pages >= 51 - fix issue with multiple tags	7 years ago
Mike Fährmann	9aecc67841	[common] explicitly handle HTTP status code 429	7 years ago
Mike Fährmann	d68a24aa70	[kissmanga] fix extraction site changed '\n' to '\r\n' for newlines	7 years ago
Mike Fährmann	864a63ed33	fix typo [skip ci]	7 years ago
Mike Fährmann	f3fbaa5c3e	[reddit] allow users to override the API User-Agent Only overriding the Client-ID is not enough if you want to follow Reddit's API access rules [1]. [1] https://github.com/reddit/reddit/wiki/API#rules	7 years ago
Mike Fährmann	31ea6001e8	[dynastyscans] improve metadata and filename formats	7 years ago
Mike Fährmann	2ef3c35c98	smaller textual changes - swapped doc for deviantart.mature and .original - updated gallery-dl.conf - "transferred" -> "delegated"	7 years ago
Mike Fährmann	68a0a7579c	fix/improve some regular expressions	7 years ago
Mike Fährmann	832b8b76ac	[util] extend global namespace for filter expressions	7 years ago
Mike Fährmann	393755ee94	[tumblr] update tests	7 years ago
Mike Fährmann	75d3a1f72f	[deviantart] always download original images Deviation-objects returned by the DeviantArt API don't always contain the URL and metadata of the original image ([1]). Getting this information requires an additional API call [2], which is indicated by the 'is_downloadable' and 'download_filesize' metadata within a deviation-object. [1] https://myria-moon.deviantart.com/art/Aime-Moi-part-en-vadrouille-261986576 [2] https://www.deviantart.com/developers/http/v1/20160316/deviation_download/bed6982b88949bdb08b52cd6763fcafd	7 years ago
Mike Fährmann	8e6a767109	[util] restructure formatter for better exception propagation	7 years ago
Mike Fährmann	0386503c80	fix (sub)category-transfer for DownloadJob instances (#41 ) ... and extend "parent" parameters to TestJob- and DataJob-classes as well.	7 years ago
Mike Fährmann	a1c8b21cfd	[senmanga] improve metadata	7 years ago
Mike Fährmann	8df023e144	[util:filter] re-enable builtins Trying to restrict access to Python's builtin functions (exec, print, __import__, ...) can easily be circumvented and is therefore completely pointless. This also adds 'safe_int()' and the 'datetime' module to the global namespace used when evaluating filter expressions.	7 years ago
Mike Fährmann	994b2fc1e7	[deviantart] replace 'author[urlname]' keyword author[urlname] has always only been the lowercase version of author[username], which can now be directly converted to lowercase using the 'l' conversion: '{author[username]!l}'	7 years ago
Mike Fährmann	633b376f35	improve/adjust default filename formats for manga sites	7 years ago
Mike Fährmann	41adb99e9c	[pawoo] fix extraction - changed access_token - use account-search instead of general search	7 years ago

1 2 3 4 5 ...

1048 Commits (7073ab77074d36f5ec9fb8e19e697ab84581b596)