gallery-dl

Commit Graph

Author	SHA1	Message	Date
Mike Fährmann	27eab4e467	rewrite text tests and improve functions - test more edge cases - consistently return an empty string for invalid arguments - remove the ungreedy-flag in 'remove_html()'	7 years ago
Mike Fährmann	e3f2bd4087	add tests for 'text.clean_xml()' and improve it	7 years ago
Mike Fährmann	6d8b191ea7	improve 'parse_query()' and add tests - another irrelevant micro-optimization ! - use urllib.parse.parse_qsl directly instead of parse_qs, which just packs the results of parse_qsl in a different data structure - reduced memory requirements since no additional dict and lists are created	7 years ago
Mike Fährmann	51ea699083	add 'abort()' as function to filter expressions calling 'abort()' in a filter aborts the current extractor run in a cleaner way than using something like 1/0, which causes an error message to be printed	7 years ago
Mike Fährmann	6bd857a319	[tumblr] handle rate limits / 429 errors - wait for the hourly limit to reset - abort upon exceeding the daily limit (it doesn't seem useful to potentially wait for several hours)	7 years ago
Mike Fährmann	7073ab7707	[komikcast] update regex to only match manga pages The 'readerarea' section now includes some (shady) external Javascript file, which got matched as well.	7 years ago
Mike Fährmann	a1fa4b43b0	Revert "[tumblr] add option to sort photosets by upload order" This reverts commit `4a26ae32df`.	7 years ago
Mike Fährmann	48a83a89e9	[loveisover] remove module archive.loveisover.me was shut down on 2018-03-29; https://www.archiveteam.org/index.php?title=4chan#archive.loveisover.me	7 years ago
Mike Fährmann	564e12ca8f	replace 'imgyt' with 'imxto' https://img.yt/ wasn't available for a couple of days, but has now re-emerged as https://imx.to/ with a new web-interface. Links to older images still work (see tests).	7 years ago
Mike Fährmann	1b80fa82a9	[imgur] update URL pattern and tests	7 years ago
Mike Fährmann	4a26ae32df	[tumblr] add option to sort photosets by upload order	7 years ago
Mike Fährmann	6b72be8ee6	[tumblr] add 'hash' keyword 'hash' is the middle part of the filename in a tumblr image URL. For example an image with '.../tumblr_p6tgemp1NZ1wgha4yo1_250.png' as its URL would have 'p6tgemp1NZ1wgha4yo1' as hash.	7 years ago
Mike Fährmann	ffc0c67701	release version 1.3.3	7 years ago
Mike Fährmann	d11fcf4804	smaller changes and fixes - fix the cloudflare challenge result if the last decimal places are zero (JS`s toFixed() removes trailing zeroes) - fix downloading of kissmanga chapter-pages hosted on blogspot (accessing blogspot with "kissmanga.com" as referrer yields a 401) - disable certificate validation for 'mangahere' tests - update flickr test result	7 years ago
Mike Fährmann	f6c95dccf9	[cloudflare] fix bypass procedure Cloudflare challenges, at least for kissmanga and readcomiconline, now use slightly different Javascript expressions. Instead of a single value per expression, they now have a numerator and a denominator of a fractional value, which in the end gets truncated to 10 decimal places.	7 years ago
Mike Fährmann	759ba26fb0	[luscious] proper image order for picture albums ... and (try) to start with the first image instead of somewhere in the middle of an album.	7 years ago
Mike Fährmann	68e9fbee16	[tumblr] check all 4 keys/secrets before using OAuth it was possible to cause a crash by setting api-key or -secret to null. (this commit also slightly improves the blog-cache implementation)	7 years ago
Mike Fährmann	4810d446bb	remove the obsolete safeprint() and error() functions - safeprint() was used to print values which might have caused a UnicodeEncodeError, but that is no longer necessary (`0381ae5`) - errors are now handled via logging output (`f94e370`)	7 years ago
Mike Fährmann	0381ae5318	replace error handlers for stdout and co. Python3.5 and lower throw an UnicodeEncodeError when trying to print not-encodable characters when not using 'utf-8' as encoding. Setting their error handlers to 'replace' should help.	7 years ago
Mike Fährmann	f8168c693e	[tumblr] avoid calls to '/blog/.../info' The same information returned by the 'blog/.../info' API endpoint is also included in the result of every 'blog/.../posts' call.	7 years ago
Mike Fährmann	64d7c85b55	[exhentai] improve metadata - add 'width', 'height' and 'size' (in bytes) for each image - change the former 'size' and 'size_units' into 'gallery_size'	7 years ago
Mike Fährmann	64b22e0fc1	[pawoo] update URL pattern adds support for 'https://pawoo.net/@.../media'	7 years ago
Mike Fährmann	7b562907c3	[nijie] add favorites extractor adds support for 'https://nijie.info/user_like_illust_view.php?id=...'	7 years ago
Mike Fährmann	445db75955	[nijie] improve extraction and metadata - add 'title' and 'description' - split 'artist_id' into 'user_id' and 'artist_id' - 'user_id' is the ID of the user from which the image entry originates from - 'artist_id' is the ID of the actual image artist - improve pagination and URL patterns	7 years ago
Mike Fährmann	a112e3f2a0	[nijie] add doujin extractor adds support for "https://nijie.info/members_dojin.php?id=<artist_id>"	7 years ago
Mike Fährmann	f39153b6e9	[nhentai] add extractor for search results	7 years ago
Mike Fährmann	52d41c41e7	[exhentai] add extractor for favorited galleries	7 years ago
Mike Fährmann	63cc2599c4	[exhentai] add extractor for search results	7 years ago
Mike Fährmann	d1c91a1f2b	[mangadex] fix manga-page extraction	7 years ago
Mike Fährmann	299ae24996	[test] add a few downloader tests	7 years ago
Mike Fährmann	dd314279fb	[test] add unit tests for extractor module functions	7 years ago
Mike Fährmann	a993d0ea90	release version 1.3.2	7 years ago
Mike Fährmann	e7525b1b0e	[artstation] add challenge extractor (#80 )	7 years ago
Mike Fährmann	3f2dd6b6f8	avoid double path-separators (#74)	7 years ago
Mike Fährmann	f5c6a2d7f5	[nhentai] use API to get gallery info	7 years ago
Mike Fährmann	b2ba2b821d	[hitomi] fix image URLs and improve metadata - use '?a.hitomi.la' as subdomain depending in gallery-id - add 'characters', 'tags' and 'date' information - support multiple entires per metadata-value - rename 'num' to 'page'	7 years ago
Mike Fährmann	3905474805	[booru] call update_page() with correct dict (closes #82 )	7 years ago
Mike Fährmann	44c267e362	[artstation] add search extractor (#80 )	7 years ago
Mike Fährmann	40ca562d7b	[artstation] add album extractor (#80 )	7 years ago
Mike Fährmann	7121eeae8b	check supportedsites.rst in release script	7 years ago
Mike Fährmann	c59f9b71f1	release version 1.3.1	7 years ago
Mike Fährmann	f367d5c281	[deviantart] move delay-increase after expect_error check [ci skip]	7 years ago
Mike Fährmann	557cb94f81	[deviantart] use proper exponential backoff on API errors ... and use separate API credentials for unit tests.	7 years ago
Mike Fährmann	723cc66bb1	[artstation] add user-, image- and likes-extractors	7 years ago
Mike Fährmann	b69cc94f0e	[util] implement bencode()	7 years ago
Mike Fährmann	4d74749496	[tests] rework filters for extractor tests CI incompatible tests will now only be skipped if tests are run in a CI environment.	7 years ago
Mike Fährmann	d6ef52897c	[imgchili] remove module All previously hosted images yield a 404 and the main page is just a logo.	7 years ago
Mike Fährmann	7847ab1d5a	[imagehosts] remove even more dead sites All removed sites either - reject all incoming connections or - display a message from their domain registrar	7 years ago
Mike Fährmann	5f37d40a3e	[komikcast] bypass cloudflare challenge	7 years ago
Mike Fährmann	f9884e2338	[pixiv] update URL pattern add support for 'https://www.pixiv.net/user/<id>'	7 years ago
Mike Fährmann	85ed023c2e	[mangadex] remove the trailing ' - MangaDex' in a better way str.rstrip() works differently than assumed.	7 years ago
Mike Fährmann	9fb82e6b43	apply expand_path() to archive paths	7 years ago
Mike Fährmann	32bbd12f08	update extractor tests	7 years ago
Mike Fährmann	ca326bd275	[deviantart] fix folder and collection archive IDs {folder[index]} and {collection[index]} are both '0' when being delegated from Gallery- or FavoriteExtractors, as there is no way of knowing a folder's index when getting folder-information from the API.	7 years ago
Mike Fährmann	e32fe1cdf1	[pinterest] cast IDs to int ... and update test results. Image URLs changed from https://s-media-cache-ak0.pinimg.com/... to https://i.pinimg.com/...	7 years ago
Mike Fährmann	179ecee965	[turboimagehost] fix extraction	7 years ago
Mike Fährmann	1400868f53	[mangadex] general improvements - support >100 chapter entries per manga - custom archive ID format - detect non-existing chapters	7 years ago
Mike Fährmann	749fbbfa6c	[mangadex] add chapter- and manga-extractor	7 years ago
Mike Fährmann	b58449fd88	release version 1.3.0	7 years ago
Mike Fährmann	6e38cf5aab	[mangareader] use 'https://' The site now redirects from http://mangareader.net/ to https://mangareader.net/	7 years ago
Mike Fährmann	1d71123f91	[pixiv] update archive IDs and add metadata-fields (Pixiv bookmarks actually have their own IDs, comments and tags, independent of the bookmarked image, which makes creating an archive ID a lot easier)	7 years ago
Mike Fährmann	858fdbdb22	[tumblr] improve 'inline' extraction 'quote' posts store their HTML content in the 'source' field	7 years ago
Mike Fährmann	1d54a8e07d	fix logging output during downloads from: filename.ext[download][warning] ... to: filename.ext [download][warning] ...	7 years ago
Mike Fährmann	5008e105ee	update archive IDs ... to behave in a more straightforward way when dealing with bookmarks/favourites/etc. specific IDs are now grouped by their owner, album-id, ... to allow for duplicates when it would be expected.	7 years ago
Mike Fährmann	829ddf4ac1	[sankaku] general improvements - simplify regex - unquote search tags - increase default wait-time between HTTP requests - downloading several hundreds of images always resulted in '429 Too Many Requests' eventually - circumvent paging restrictions for unauthenticated users by only using the 'next' parameter - setting 'page' to a constant, low value (or simply omitting it) does the trick	7 years ago
Jad	49463f76bb	support multi-page URL (#79 ) * support multi-page URL * fix * all done. * fix, again	7 years ago
Mike Fährmann	19aefdfde3	[directlink] update test results	7 years ago
Mike Fährmann	74029c50bb	[directlink] unquote metadata fields	7 years ago
Mike Fährmann	2fad0b1f1b	add 'U' conversion for format strings to unquote their content (#74)	7 years ago
Mike Fährmann	8cdce21dcb	make archive keys user-configurable	7 years ago
Mike Fährmann	8f338347b6	[imagehosts] cleanup removed - chronos.to - unable to resolve hostname - coreimg.net - same - imgmaid.net - same - hosturimage.com - everything returns 404 - imageontime.org - redirects to some shady site - imgupload.yt - cloudflare error 522, host down - img4ever.net - read timeout	7 years ago
Mike Fährmann	edfd3d9fc9	[yeet] remove module - archive.yeet.net returns a 500 server error - yeet.net moved to yeet.rip, but the archive is gone	7 years ago
Mike Fährmann	e1e0668ca8	add option to set default replacement field value Missing or undefined keywords will now be replaced with the value set for 'keywords-default'. The default is Python's 'None', which is equivalent to setting this option to JSON's 'null'.	7 years ago
Mike Fährmann	ac3da8115e	[util] don't add text: URLs to list of downloaded URLs	7 years ago
Mike Fährmann	8704d850bf	add explicit proxy support (#76 ) - '--proxy' as command-line argument - 'extractor.*.proxy' as config option	7 years ago
Mike Fährmann	367b963d37	[pixiv] fix ugoira extraction ... again (#78 ) Some animations are not available for mobile devices, so we pretend to be a desktop browser when requesting the ugoira page.	7 years ago
Mike Fährmann	b79f1f2ca7	[pixiv] fix ugoira extraction (closes #78 )	7 years ago
Mike Fährmann	731ffd4986	improve text.filename_from_url() performance - urlsplit() is faster than urlparse() - rpartition() is faster than rindex() + slicing - new version is 2.3 times as fast	7 years ago
Mike Fährmann	d122203be1	[mangastream] fix extraction	7 years ago
Mike Fährmann	8809b32aed	release version 1.2.0	7 years ago
Mike Fährmann	b50bdbf3d7	change config specifiers in input file format Instead of a dictionary/object, input file options are now specified by a 'key=value' pair starting with '-' for options only applying to the next URL or '-G' for Global options applying to all following URLs. See the docstring of parse_inputfile() for details. Example option specifiers: - filename = "{id}.{extension}" - extractor.pixiv.user.directory = ["Pixiv Users", "{user[id]}"] -spaces="are_optional" -G keywords = {"global": "option"}	7 years ago
Mike Fährmann	f970a8f13c	fix adding keys to download archive when using skip=false	7 years ago
Mike Fährmann	179bcdd349	adjust archive-ids	7 years ago
Mike Fährmann	be3ea4425d	test archive-id creation and uniqueness	7 years ago
Mike Fährmann	3cec533c28	Merge branch 'archive'	7 years ago
Mike Fährmann	20af86b2ea	add more extractor tests for mangastream, reddit and imgur	7 years ago
Mike Fährmann	b73b8b4f50	add OAuth unittests	7 years ago
Mike Fährmann	4d2fadfb6f	restore skip actions with download archive	7 years ago
Mike Fährmann	65773263fc	[util] implement OAuthSession.urlencode() (closes #75 ) - Python's own urllib.parse.urlencode() has no quote_via argument in Python 3.3 and 3.4, which is necessary to follow OAuth 1.0 quoting rules.	7 years ago
Mike Fährmann	7e0207bcf4	[imgur] strip trailing '?1' from 'ext'	7 years ago
Mike Fährmann	cf147dfee9	[hentai2read] fix manga extraction - site changed its HTML structure	7 years ago
Mike Fährmann	f5f2d29f56	[nijie] fix dojin extraction - correctly extract artist_id - set extension to "jpg" if it was empty and let filetype checks do the rest	7 years ago
Mike Fährmann	7f7c16ae37	add option to specify additional key-value pairs	7 years ago
Mike Fährmann	d38bf2f54c	[tumblr] recognize /image/... URLs xyz.tumblr.com/image/123 refers to the same images as xyz.tumblr.com/post/123.	7 years ago
Mike Fährmann	057668e17e	extend input-file format with per-URL config and comments - see docstring of parse_inputfile() for details - TODO: unittests, recursion (currently setting for example {"extractor": {"key": "value"}} will override the whole "extractor" branch instead of merging {"key": "value"} into the already existing dictionary)	7 years ago
Mike Fährmann	5b3c34aa96	use generic chapter-extractor in more modules	7 years ago
Mike Fährmann	347baf7ac5	improve util.parse_range() performance It is never going to actually matter, but using partition() instead of split() is twice as fast.	7 years ago
Mike Fährmann	7b5ba69951	[hentaihere] ensure consistent extraction results sometimes there is a random space before the next <a>	7 years ago
Mike Fährmann	377b78b3c9	[hentai2read] fix manga name extraction	7 years ago
Mike Fährmann	54c36a8a34	[subapics] add chapter- and manga-extractor (#70 )	7 years ago

1 2 3 4 5 ...

1103 Commits (7f899bd5d8b3fab882cc454dd659738e2f38670b)