gallery-dl

Commit Graph

Author	SHA1	Message	Date
Mike Fährmann	6d0a533d68	[reddit] respect 'comments:0' for single submissions (#429 )	5 years ago
Mike Fährmann	803d8f814e	[oauth] update scope for reddit tokens (#428 ) '/user/<username>/...' requires the 'history' scope to be accessible (https://www.reddit.com/dev/api/#GET_user_{username}_{where})	5 years ago
Mike Fährmann	46ba173ded	[reddit] fix documentation inconsistencies (closes #429 ) - Require 'reddit.comments' to be a number and convert it to an integer to be extra sure - Link to the README's OAuth section were appropriate	5 years ago
Mike Fährmann	20eb6c401f	[nijie] improvements and fixes (#423 ) - ignore unavailable image pages - more metadata fields: artist_name, date, tags - rename 'index' to 'num' - improved code structure	5 years ago
Mike Fährmann	d1ea08c67d	[weibo] fixes and improvements - ignore unavailable videos (fixes #427) - handle empty 'geo' fields - consistent metadata fields for images and videos	5 years ago
Mike Fährmann	38d97f3da6	[deviantart] add debug message about API credentials (#424 )	5 years ago
Mike Fährmann	80c2104fb5	[deviantart] fix 429 handling if 'fatal' is False (closes #424 )	5 years ago
Mike Fährmann	913460240d	[reddit] fix 'extractor.blacklist()' arguments The second argument must support 'append()'.	5 years ago
Mike Fährmann	22bac14452	[pixiv] match '/artworks/' URLs	5 years ago
Mike Fährmann	66cac207ac	[twitter] match and use 'i/web' status URLs	5 years ago
Mike Fährmann	946f2751e2	[reddit] add 'user' extractor (closes #350 )	5 years ago
Mike Fährmann	c14abb9fb8	[reddit] improve URL parameter handling for subreddit links	5 years ago
Mike Fährmann	ee8b654464	[instagram] implement 'highlights' option (closes #329 )	5 years ago
Mike Fährmann	f63c3097a9	[instagram] rework some code paths - combine fetching an HTML page and extracting its 'shared_data' - move 'shared_data' and field access info out of '_extract_page()' - introduce a '_request_graphql()' method	5 years ago
Mike Fährmann	4330133114	[imgur] add 'favorite' extractor (closes #420 ) … and use a newer site-internal API endpoint for user posts	5 years ago
Mike Fährmann	ee5e20221f	[imgth] fix image URLs	5 years ago
Mike Fährmann	b63b126808	[hentaicafe] extend URL pattern	5 years ago
Mike Fährmann	d780f0357e	[imgur] add user extractor	5 years ago
Mike Fährmann	11ea689013	[simplyhentai] fix image and video URLs	5 years ago
Mike Fährmann	15632a1570	[tsumino] fix extraction	5 years ago
Mike Fährmann	d92802fd37	[luscious] fix detection of unavailable galleries	5 years ago
Mike Fährmann	f99da2b866	[imgbb] detect invalid album and user profile links and update test results, since the old album got deleted	5 years ago
Mike Fährmann	01bc7adadc	[deviantart] improve journal detection (#419 ) Some journal-like posts are not reported to be journals (isJournal is set to False), even though they have a textContent field. https://www.deviantart.com/gliitchlord/art/brashstrokes-812942668	5 years ago
Mike Fährmann	776e9e073f	close archive on job completion (#417 )	5 years ago
Mike Fährmann	5ac9732adc	call 'sys.exit()' on Ctrl+c	5 years ago
Mike Fährmann	9178b54eae	handle errors when opening download archive file (#417 )	5 years ago
Mike Fährmann	6e12907de6	[deviantart] improve handling of private deviations (#414 ) - don't try to call '/deviation/metadata' with an empty list of deviation ids - print a warning when detecting private deviations without having a 'refresh-token'	5 years ago
Mike Fährmann	4203931d79	release version 1.10.4	5 years ago
Mike Fährmann	e7690ac694	[vsco] update URL pattern (closes #410 )	5 years ago
Mike Fährmann	1848788970	update test results etc	5 years ago
Mike Fährmann	d5fbb2d9de	[tumblr] ignore audio links from Spotify etc.	5 years ago
Mike Fährmann	b1cddce865	Revert "[simplyhentai] fix extraction; remove image+video extractors" This reverts commit `d1db5180ab`.	5 years ago
Mike Fährmann	d23660c04d	[hentaicafe] restore default 'request()' behavior	5 years ago
Mike Fährmann	9ae58a6b3e	[exhentai] update image limit checks - adjust cost of original images - delay limit initialization until gallery and first image page have been requested and all cookies are available	5 years ago
Mike Fährmann	6fe9a134bf	[lineblog] add blog and post extractors (closes #404 )	5 years ago
Mike Fährmann	4e8a548a61	[livedoor] update metadata extraction	5 years ago
Mike Fährmann	f9285f99e6	[pixiv] fix authentication	5 years ago
Mike Fährmann	6f3df3999a	[fuskator] add gallery and search extractor (closes #407 )	5 years ago
Mike Fährmann	bc0ca66c99	[twitter] small improvements - handle reply tweets (#403) - unset cookies in Tweet extractor to "force" the legacy interface	5 years ago
Mike Fährmann	682105b8ee	prevent crash when loading unavailable downloader (#405 )	5 years ago
Mike Fährmann	5fcebb69c2	[postprocessor:ugoira] improve error messages (#406 )	5 years ago
Mike Fährmann	f02a768b5c	[danbooru] add 'ugoira' option (#406 ) to choose between ZIP archives or converted video files for Ugoira posts	5 years ago
Mike Fährmann	9646ccb320	release version 1.10.3	5 years ago
Mike Fährmann	dedea3b4db	[deviantart] fix journal creation (#400 )	5 years ago
Mike Fährmann	c6c5cb1898	improve 'deviantart.quality' description	5 years ago
Mike Fährmann	8eaae58045	[downloader:http] change log message level to 'debug'	5 years ago
Mike Fährmann	efb64ad031	[deviantart] generate filenames (#392 , #400 )	5 years ago
Mike Fährmann	0ce98169b8	improve path generation - fix 'abspath()' results for Python <3.7 (closes #402) - 'abspath()' in Python 3.7+ removes trailing path separators - in Python <3.7 it doesn't - filter empty path segments	5 years ago
Mike Fährmann	b2151f3928	[seiga] support mobile URLs (closes #401 )	5 years ago
Mike Fährmann	20fd2d8450	[flickr] skip unavailable images/videos (fixes #398 )	5 years ago
Mike Fährmann	60c8e090da	[postprocessor:zip] fix archive names (closes #397 ) Remove the trailing path separator introduced in `3284c62` before adding the archive's filename extension. [ci skip]	5 years ago
Mike Fährmann	7c09545f70	[downloader:ytdl] add 'outtmpl' option (#395 )	5 years ago
Mike Fährmann	5cc7be2536	[piczel] update and improve - use proper pagination (fixes #396) - update API host and endpoints - "fix" double slash // in image URLs	5 years ago
Mike Fährmann	0c1c7abb4d	release version 1.10.2	5 years ago
Mike Fährmann	49f6d7176d	[deviantart] restore filenames (#392 ) <title>_by_<user>_<id> --> <title>_by_<user>-<id>	5 years ago
Mike Fährmann	63daa68d67	[deviantart] improvements (#392 ) - consistent 'filename' entries, at least as far as possible - GIFs and SWFs don't have a <title>_by_<artist>_<id> anywhere in their metadata - Generating <id> (from 'deviationid'?) might be something that needs to be figured out, so we can build those filenames ourselves - better code structure etc. - tests for videos, archives, and flash animations	5 years ago
Mike Fährmann	d1db5180ab	[simplyhentai] fix extraction; remove image+video extractors	5 years ago
Mike Fährmann	30d6e284b0	[deviantart] use NAPI for artworks and scraps (#392 ) TODO: - journal downloads - test for all media types	5 years ago
Mike Fährmann	7d6af936c5	[imgur] simplify gallery extraction	5 years ago
Mike Fährmann	3284c62f22	ensure PathFormat.directory ends with a path separator ... plus some other small optimizations	5 years ago
Mike Fährmann	ebabc5caf1	[downloader:http] treat 416 without downloaded data as error Downloading https://pbs.twimg.com/media/EB2cGUYX4AI2Vuu.jpg:orig (NSFW) sometimes returns a 416 status code, even though no 'Range' header was sent and no data was downloaded prior. This code usually means a file has already been downloaded completely and the download method indicates success, but in this case it causes an exception down the pipeline since no file was created.	5 years ago
Mike Fährmann	2495b99347	[postprocessor:classify] improve path generation (fixes #138 ) It still doesn't work for converted ugoira animations thanks to how those files are handled, but everything else, including files with unknown or changing file extension, now works as it should.	5 years ago
Mike Fährmann	e77a656437	optimize directory path generation - use str.join() instead of os.path.join() (less "features", but 10x as fast) - cache directory formatters - detect and optimize field access for 1-element format strings	5 years ago
Mike Fährmann	51d10783fc	[patreon] include image info in API results (#383 )	5 years ago
Mike Fährmann	7a5e78741c	[booru] build directory path for each file (#385 )	5 years ago
Mike Fährmann	b1728f512d	[patreon] support multi image posts and post URLs (#383 )	5 years ago
Mike Fährmann	454bf1ebf9	preserve enumeration index after 'set_extension()' (#306 )	5 years ago
Mike Fährmann	f5039b897f	replace DownloadArchive.check() with __contains__() Interestingly enough, 'a in obj' is slightly faster than 'obj.check(a)' and is also nicer to look at, I think.	5 years ago
Mike Fährmann	5a210991b6	Remove control characters from filesystem paths - add 'path-remove' option to specify the set of characters that should be removed - rename 'restrict-filenames' to 'path-restrict' - #348, #380	5 years ago
Mike Fährmann	c50d60a53d	[reactor] fix image URLs	5 years ago
Mike Fährmann	32447d0d24	[pixiv] simplify default filename format (#366)	5 years ago
Mike Fährmann	5f8621b29d	improve output of active post processor modules	5 years ago
Mike Fährmann	2cbbc3dec4	add a 'whitelist' to '--ugoira-conv' (#382 )	5 years ago
Mike Fährmann	829b1ccf04	[imgur] distinguish album and gallery URLs (#380 ) A gallery can be either an album or a single image.	5 years ago
Mike Fährmann	23251356cb	require 'extension' data for each URL (#382 )	5 years ago
Mike Fährmann	a67413d64f	[xhamster] use input URL domain Don't rewrite all URLs as 'https://xhamster.com/...'	5 years ago
Mike Fährmann	0bb873757a	update PathFormat class - change 'has_extension' from a simple flag/bool to a field that contains the original filename extension - rename 'keywords' to 'kwdict' and some other stuff as well - inline 'adjust_path()' - put enumeration index before filename extension (#306)	5 years ago
Mike Fährmann	423f68f585	[deviantart] fix scraps extraction (closes #376 )	5 years ago
Mike Fährmann	3bf20ffb70	[instagram] add support for story highlights	5 years ago
Mike Fährmann	a732e9c430	[instagram] update query hashes and headers	5 years ago
Mike Fährmann	2ccf6a9e35	[instagram] make extractor tests happy (#373 )	5 years ago
Mike Fährmann	8dc42bb178	implement 'enumerate' for 'extractor.skip' (#306 ) [ci skip]	5 years ago
Leonardo Taccari	bc5eaf7746	[instagram] Add support for IGTV (#373 ) Add support for IGTV profile (instagram.com/<username>/channel/) and IGTV medias (instagram.com/tv/<short_id>).	5 years ago
Mike Fährmann	b7fb93e2b2	[downloader:http] add 'adjust-extensions' option	5 years ago
Mike Fährmann	eb7da159e2	[imagebam] update URL test results Image URLs are now using https://, but the website itself is still served as http://.	5 years ago
Mike Fährmann	189acbeac9	[imgbb] add extractor for individual images (closes #363 )	5 years ago
Mike Fährmann	ad3ac02fbc	[pixiv] update metadata entries (#366 ) - change 'num' to a simple enumerating integer - change default filename format - provide content of the old 'num' field as 'suffix' - add 'filename' for ugoira	5 years ago
Mike Fährmann	1ff4c4ec03	[adultempire] consistent artist order	5 years ago
Leonardo Taccari	2df050e627	[instagram] Add support for stories (#371 ) * [instagram] Add support for stories Add support for Instagram user's stories (https://www.instagram.com/stories/<username>/). First the shared_data in instagram.com/stories/<username> is fetched in order to retrieve the user_id that is then passed to fetch the stories via the corresponding graphql query. Please note that fetching stories is supported only when authentication is enabled and the corresponding <username> is followed. * [instagram] Add an only-matching test for stories * [instagram] Simplify InstagramExtractor.items() and _extract_stories() Simplify handling of typename in InstagramExtractor.items() and multi-line string in _extract_stories(). NFCI.	5 years ago
Mike Fährmann	f4bc75e854	fix rate limit handling for OAuth APIs (#368 )	5 years ago
Mike Fährmann	3957d27d79	[deviantart] add 'quality' option (#369 )	5 years ago
Mike Fährmann	64b2935d8e	[pixiv] provide 'filename' and change default filename format to '{filename}.{extension}' (closes #366)	5 years ago
Mike Fährmann	2f33bac030	release version 1.10.1	5 years ago
Mike Fährmann	fa60109e97	[exhentai] don't use e-hentai.org for exhentai URLs	5 years ago
Mike Fährmann	dfe552421b	release version 1.10.0	5 years ago
Mike Fährmann	0609afd1e4	update default cache directory ... again Use a 'gallery-dl' subdirectory in ~/.cache to adhere to how other programs store their cached data, and call os.makedirs() so it also works without an existing ~/.cache directory.	5 years ago
Mike Fährmann	4a0c98bfc9	miscellaneous fixes and adjustments	5 years ago
Mike Fährmann	2c839f3760	[imgbb] add user extractor + login support (#361 )	5 years ago
Mike Fährmann	a8b60b2bd9	change default cache directory for unix systems Use either $XDG_CACHE_HOME or ~/.cache (if the former isn't set) and store potentially sensitive cookies and tokens in a user's home directory and not in the world-readable /tmp.	5 years ago
Mike Fährmann	4b6edfbfd2	restrict permissions without importing 'pathlib' and only on non-Windows systems. 1. On Windows the 'mode' argument for os.open() has no (visible) effect on access permissions for new files. 2. The default location for 'cache.file' on Windows is in %USERPROFILE%\AppData\Local\Temp which can only be accessed by the owner himself (or an admin).	5 years ago
Leonardo Taccari	afce1ee1eb	Avoid possible sensitive information disclosure via cache.file Previously cache.file could be created world readable leading to possible sensitive information disclosure on multi-user systems. Restrict permissions only to the owner by creating an empty file. Please note that cache.file created before this commit may need a `chmod 600' or similar!	5 years ago
Mike Fährmann	2153206093	[imgbb] add album extractor (#361 )	5 years ago
Mike Fährmann	beb4fab2e6	[exhentai] improve limit and error handling (#360 ) - check image limit before opening the first gallery or image page - prevent any further exhentai extractors from running after the image limit has been reached	5 years ago
Mike Fährmann	81b35ed3cb	[exhentai] catch more error states (#356 , #360 ) - warn on MPV-enabled galleries - catch parsing errors for gallery pages and image info - write page content to debug output	5 years ago
Mike Fährmann	a90280f4e7	[postprocessor:zip] add 'mode' option (#355 )	5 years ago
Mike Fährmann	6ce22f606b	[exhentai] update login procedure and tests Logging in now follows the natural login flow that also happens in a browser more closely and collects more cookies than just ipb_member_id and ipb_pass_hash. Test URLs have been updated and now point to the e-hentai.org domain.	5 years ago
Mike Fährmann	dc73d02d87	[exhentai] always use e-hentai.org as domain + set nw cookie	5 years ago
Mike Fährmann	40637556fa	[ngomik] fix extraction	5 years ago
Mike Fährmann	3969f9cbbd	[behance] fix collection extraction	5 years ago
Mike Fährmann	20f7b07312	ensure postproc finalize() is called during C-c or crash (#355 )	5 years ago
Mike Fährmann	17a3426845	[gelbooru] enable all content when not using API	5 years ago
Mike Fährmann	279db2c5b2	[vsco] add collection & image extractor + video support (#331 )	5 years ago
Mike Fährmann	547ea71463	[downloader.ytdl] add 'forward-cookies' option (#352 ) The "long" name is necessary because just calling it 'cookies' would clash with how the lookup for '--cookies' is implemented.	5 years ago
Mike Fährmann	d9d44ad953	[tsumino] update test results	5 years ago
Mike Fährmann	b1bea8aaeb	add 'restrict-filenames' option (#348 )	5 years ago
Mike Fährmann	60cf40380a	[vsco] add user extractor (#331 )	5 years ago
Mike Fährmann	3fe5ccdfa6	[adultempire] add gallery extractor (closes #340 )	5 years ago
Mike Fährmann	b3851e01d9	release version 1.9.0	5 years ago
Mike Fährmann	5d968412ca	[deviantart] case-insensitive folder name matching (fixes #343 )	5 years ago
Mike Fährmann	a3c736fedc	[500px] fix extraction Maximum available image dimensions have been reduced to 4096px on the longest edge. (from 5000px) A few (unimportant) metadata fields are no longer available or have been changed to 'null'.	5 years ago
Mike Fährmann	1133b7fcbd	[smugmug] update unit tests The account used for tests before has been deleted.	5 years ago
Mike Fährmann	21991acc49	add 'ciphers' option; update default User-Agent	5 years ago
Mike Fährmann	84f4d3bc0b	replace urllib3's default cipher list with Firefox's (#342 ) Avoids Cloudflare CAPTCHAs on both Linux in Windows without pyOpenSSL installed.	5 years ago
Mike Fährmann	feb98cf196	[twitter] improve 'content' formatting; add option (#338 ) - include emoticons - leave newlines intact - remove pic.twitter.com/ links at the end	5 years ago
Mike Fährmann	1740086d8a	add 'repl' and 'sep' arguments to text.replace_html()	5 years ago
Mike Fährmann	8d1ae9b715	[tumblr] enable date-min/-max/-format options (#337 )	5 years ago
Mike Fährmann	09f37fde39	[reddit] move date-min/-max handling into Extractor class	5 years ago
Mike Fährmann	7b77ecc35a	fix paths for files without extension (#220 )	5 years ago
Mike Fährmann	c41ff9441e	improve find() for downloaders and postprocessors	5 years ago
Mike Fährmann	0151e250f5	[twitter] extract 'content' metadata (closes #333 )	5 years ago
Mike Fährmann	16c582aaf9	implement 'mtime' post-processor (#332 ) This can set a file's modification time according to a UNIX timestamp or a datetime object from its metadata.	5 years ago
Mike Fährmann	62097284fe	add 'download' option (#220 )	5 years ago
Mike Fährmann	fe7805de7c	improve attribute access in DownloadJob.handle_url() Storing a value in a local variable an accessing it that way is faster than going through 'self' if it is accessed more than once.	5 years ago
Mike Fährmann	56c7a66a4a	detect Cloudflare CAPTCHAs and update cipher list	5 years ago
Mike Fährmann	a7b42b37a2	[35photo] fix extraction	5 years ago
Mike Fährmann	04b8d0894a	[newgrounds] improve metadata extraction	5 years ago
Mike Fährmann	12da6bd0c9	[simplyhentai] fix/improve extraction	5 years ago
Mike Fährmann	fdec59f8e2	replace extractor.request() 'expect' argument with - 'fatal': allow 4xx status codes - 'notfound': raise NotFoundError on 404	5 years ago
Mike Fährmann	2ff73873f0	[erolord] add gallery extractor (closes #326 )	5 years ago
Mike Fährmann	b4da8c5a97	[sexcom] add extractor for related pins (#325 )	5 years ago
Mike Fährmann	69997e92db	[sexcom] skip unavailable pins (#325 )	5 years ago
Mike Fährmann	8966930c5c	[downloader:http] try to import SSL exception class from OpenSSL (#324)	5 years ago
Mike Fährmann	bc6b0cfddc	[shopify] skip consecutive duplicate products Not filtering duplicate URLs anymore caused the archive ID uniqueness test to fail.	5 years ago
Mike Fährmann	b89f0d8d3c	update extractor result tests	5 years ago
Mike Fährmann	69205df68d	allow '-1' for infinite retries (#300 )	5 years ago
Mike Fährmann	f7b5c4c3e7	use values of 'retries' options correctly The RE-tries option now specifies exactly that: the maximum number a failed HTTP request is re-tried. For example a value of 2 will now correctly stop after 3 attempts: the initial one + 2 re-tries. The maximum wait-time now also caps at 30min and increases exponentially for both extractor.request() and downloader.http.download().	5 years ago
Mike Fährmann	6393b47db2	add '-A/--abort'; deprecate '--abort-on-skip'	5 years ago
Mike Fährmann	f2000a69aa	implement 'image-unique' and 'chapter-unique' options (#303 ) The default value for both is 'false', i.e. duplicate URLs are NOT ignored. The previous behavior was to always ignore duplicate URLs to make '--abort-on-skip' work properly when new images where added to the beginning of a collection while gallery-dl is running.	5 years ago
Mike Fährmann	40da44b17f	Merge branch 'v1.9.0'	5 years ago
Mike Fährmann	9a216a6c6c	release version 1.8.7	5 years ago
Mike Fährmann	7a99e85943	[kissmanga] fix download URLs and file extensions The current Blogspot image URLs hosted on Kissmanga end with an "invalid" query parameter (/000.png&upx=...), which doesn't get recognized by 'spliturl()' and 'parseurl()' as such and gets therefore included in the 'extension' field from 'text.nameext_from_url()'.	5 years ago
Mike Fährmann	055102431f	[hitomi] handle Game CG galleries with scenes (fixes #321 )	5 years ago
Mike Fährmann	a9c89085fb	[instagram] implement login support (#195 )	5 years ago
Mike Fährmann	f1b0c2bf5c	[downloader:ytdl] forward cookies to youtube-dl to be able to download private videos from Twitter, Instagram, etc.	5 years ago
Mike Fährmann	7856e5e7dc	]deviantart] "fix" scraps extraction	5 years ago
Mike Fährmann	082cb24acd	[pururin] fix extraction Missing metadata information would lead to unnecessary exceptions.	5 years ago
Mike Fährmann	98554cbab8	[mangoxo] fix login	5 years ago
Mike Fährmann	108963d138	[imagefap] include Referer headers	5 years ago
Mike Fährmann	e314621366	[nsfwalbum] fix default directory_fmt (#287 )	5 years ago
Mike Fährmann	95b1e4c3c0	implement R<old>/<new>/ format option (#318 )	5 years ago
Mike Fährmann	18a1f8c6cd	[vanillarock] add post and tag extractors (closes #254 )	5 years ago
Mike Fährmann	f0c5093812	[nsfwalbum] add album extractor (closes #287 )	5 years ago
Mike Fährmann	15e4ddf46d	implement custom logging formatter supports custom log message formats for each loglevel and, by extension, custom ANSI codes and colors for errors and warnings (#304)	5 years ago
Mike Fährmann	61e413d85d	[hentaifoundry] stop disabling IPv6 addresses The rogue address mentioned in `a138d58` is no longer included in the DNS results for www.hentai-foundry.com.	5 years ago
Mike Fährmann	76ae9957c2	[deviantart] force legacy version for single deviations Let's see how long this works ... DeviantArt is rolling out a new version of their website, including a new internal and potentially usable API (rewrite incoming, yay). The issue with the new layout is that it doesn't include the "old" UUIDs for single deviations, i.e. mapping a numeric deviation ID to its UUID counterpart is impossible with the new layout.	5 years ago
Mike Fährmann	db3f52881a	add 'mtime' option	5 years ago
Mike Fährmann	ee4d7c3d89	update downloader.find() and related code Instead of replacing 'https' with 'http' for every URL in 'get_downloader()', this now only happens once during downloader initialization. Also unit tests.	5 years ago
Mike Fährmann	f4ba98771d	use Last-Modified header to set file modification time (#236, #277)	5 years ago
Mike Fährmann	179d112083	[downloader] overhaul http and text modules Get rid of the modular structure and simplify/specialize those modules.	5 years ago
Mike Fährmann	a01f99728c	[postprocessor:zip] delete empty archives when done (#316 )	5 years ago
Mike Fährmann	520c8ba106	[hentaicafe] extract 'tags' and 'artist' metadata (closes #238 ) These metadata fields will only be filled in when using a top-level URL, because that's the only place this information is available. Using a Foolslide URL (1) will leave these fields empty. (1) https://hentai.cafe/manga/read/.../en/0/1/"	5 years ago
Mike Fährmann	b51baa9a4b	[hitomi] fix empty language detection; parse datetime	5 years ago
Mike Fährmann	258e8b2060	[deviantart] small code improvements	5 years ago
Mike Fährmann	a77340c647	[keenspot] fix extraction for "TwoKinds"	5 years ago
Mike Fährmann	03e6876fbe	[instagram] provide 'description' metadata (#310 )	5 years ago
Mike Fährmann	b171befa87	implement 'parse_unicode_escapes()'	5 years ago
Mike Fährmann	3a36a0fa1e	release version 1.8.6	5 years ago
Mike Fährmann	ec3e8601f1	[slickpic] add user extractor (#249 )	5 years ago
Mike Fährmann	97ef416218	[8muses] support multi-page listings (#305 )	5 years ago
Mike Fährmann	f5961ac968	[deviantart] download deviations with no 'content' field Some deviations (possibly only from sta.sh sources) are downloadable (i.e. 'is_downloadable' is true and /deviation/download/ works), but have no 'content' or similar in their JSON representation. (fixes #307)	5 years ago
Mike Fährmann	4e07f99e3e	[mangoxo] change token message level to debug The login page currently doesn't provide and require a login token (logging in works without a token), so printing a warning during each login is unnecessary.	5 years ago
Mike Fährmann	d997c10320	[8muses] add album extractor (#305 )	5 years ago
Mike Fährmann	e05a96db5e	[deviantart] rename 'stash' to 'extra' (#302 ) 'stash' is already used as a name for the StashExtractor and therefore expected to be a dictionary.	5 years ago
Mike Fährmann	2184e3a86b	[slickpic] add album extractor (#249 )	5 years ago
Mike Fährmann	c23bf263fe	[deviantart] rename 'external' to 'stash' (#302 ) restrict extracted URLs to ones from https://sta.sh/...	5 years ago
Mike Fährmann	c73c2cda50	[pornhub] add gallery & user extractor (#282 )	5 years ago
Mike Fährmann	7c6cb908f9	[xhamster] update test results	5 years ago
Mike Fährmann	2fb85178da	[deviantart] add 'external' option (#302 ) If a description is available, this will extract URLs from the description text and try to find Extractors for them.	5 years ago
Mike Fährmann	f85e42cffc	[deviantart] fix --range for deviation & stash extractor	5 years ago
Mike Fährmann	40c7eb3424	[livedoor] improve extraction (fixes #301 )	5 years ago
Mike Fährmann	62335b9015	[paheal] adjust test results	5 years ago
Mike Fährmann	aa1ca4ed35	[shopify] skip deleted products (#175 ) Product pages which return a 4xx status code will now be skipped instead of raising an exception.	5 years ago
Mike Fährmann	096009367b	[xhamster] add gallery & user extractor (#281 )	5 years ago
Mike Fährmann	208202b962	[tumblr] improve error handling (#297 ) In some cases Tumblr's API responds with an HTML document. Trying to decode it as JSON would raise an uncaught exception.	5 years ago
Mike Fährmann	c08c340178	[directlink] make pattern case insensitive (fixes #296 )	5 years ago
Mike Fährmann	95b4a53b9c	[keenspot] improve pagination (#223 ) The old code would skip the last comic page for some series.	5 years ago
Mike Fährmann	12c965d547	release version 1.8.5	5 years ago
Mike Fährmann	731c7cbd5b	[keenspot] support all comics and "random" access (#223 )	5 years ago
Mike Fährmann	6a34f4b0c1	skip tests on read timeouts; print list of skipped tests	5 years ago
Mike Fährmann	1c36e65e9b	[exhentai] choose site version depending on input URL (#278 ) Use e-hentai.org as root and cookiedomain if the input URL is from e-hentai (or g.e-hentai), use exhentai.org otherwise.	5 years ago
Mike Fährmann	6da3e21237	[downloader:ytdl] provide 'filename' metadata (closes #291 )	5 years ago
Mike Fährmann	d33f5a7423	[wallhaven] rewrite - use API - remove login support, add 'api-key' option - remove support for "alpha" subdomain - alpha.wallhaven.cc used numeric IDs that can't be translated to the new ID system - support direct links to wallpapers	5 years ago
Mike Fährmann	5499934ae2	[ngomik] fix extraction	5 years ago
Mike Fährmann	f1893b2b5b	[deviantart] add 'folders' option (#276 )	5 years ago
Mike Fährmann	c849574def	[keenspot] add comic extractor (#223 ) Doesn't work for - http://brawlinthefamily.keenspot.com/ - http://flipside.keenspot.com/ - http://lastblood.keenspot.com/ - http://mysticrevolution.keenspot.com/ - http://porcelain.keenspot.com/ - http://twokinds.keenspot.com/ yet, because of custom layouts.	5 years ago
Mike Fährmann	2b1999476e	implement 'text.rextract()'	5 years ago
Mike Fährmann	8bd5a19515	[hentainexus] add '_extractor' data	5 years ago
Mike Fährmann	2a085a5e96	[sankakucomplex] fix 'date' values (#258 )	5 years ago
Mike Fährmann	bcd1801aa8	[sankakucomplex] add 'tag' extractor (#258 )	5 years ago
Mike Fährmann	74c2415138	[sankakucomplex] move article extractor to its own module (#258 )	5 years ago
Mike Fährmann	4465a3ea68	[kissmanga][readcomiconline] add 'captcha' option (#279 ) to configure how to handle CAPTCHA page redirects: - either interactively wait for the user to solve the CAPTCHA - or raise StopExtraction like before	5 years ago
Mike Fährmann	1e3e15c4f3	[sankaku] add article extractor (#258 )	5 years ago
Mike Fährmann	48233f00c0	[readcomiconline] detect 'AreYouHuman' redirects (#279 )	5 years ago
Mike Fährmann	1cde38110d	[livedoor] return 'date' as datetime object	5 years ago
Mike Fährmann	e88824e1a7	[livedoor] fix adjustments for https:// URLs	5 years ago
Mike Fährmann	2316e0ed3d	fix strptime workaround from `b0e85a4` Don't return a modified version of 'date_time' if strptime fails.	5 years ago
Mike Fährmann	b3e4664715	[hentainexus] fix extraction	5 years ago
Mike Fährmann	399e8e965a	also update urllib3's cipher list for versions >= 1.25	5 years ago
Mike Fährmann	f837ea98cb	[deviantart] don't call 'extend()' on folders (fixes #271 )	5 years ago
Mike Fährmann	bb32a2d490	[patreon] use file extensions from original filenames (#268 )	5 years ago
Mike Fährmann	efa805c5d7	[sankaku] update pagination end condition (fixes #265 ) Pagination over popular listings (`date:...+order:popular") never terminates, not even on the site itself, and at some point returns the same results over and over again.	5 years ago
Mike Fährmann	d514d49c72	release version 1.8.4	5 years ago
Mike Fährmann	a4ba34c835	[booru] prevent crash when no tags are present (#259 )	5 years ago
Mike Fährmann	ca3bad1779	[patreon] small fixes and adjustments (#226 ) - fix datetime parsing - rename 'user' to 'creator' - convert 'id' to integer - improve tests	5 years ago
Leonardo Taccari	fb09dd962a	[instagram] Fix extraction after `rhx_gis' field removal	5 years ago
Mike Fährmann	7a14aaed7d	[luscious] fix extraction	5 years ago
Mike Fährmann	e82cadac61	[patreon] add extractors (#226 )	5 years ago
Mike Fährmann	4891f4a328	[hentainexus] add search extractor (#256 )	5 years ago
Mike Fährmann	c02f12ce2f	avoid Cloudflare CAPTCHAs for OpenSSL < 1.1.1 see https://github.com/Anorov/cloudflare-scrape/pull/242	5 years ago
Mike Fährmann	0b4be57a10	[sankaku] fix error when no tags available (closes #259 ) [ci skip]	5 years ago
Mike Fährmann	9890bfdf23	[flickr] improve code and metadata - simplify pagination - add more metadata and slightly change its structure - convert suitable values to int or list - move keys from ["photo"] to the base level - proper video support (#246) - rename method and variable names to better fit with other extractors	5 years ago
Mike Fährmann	aa8e366b90	[luscious] fix tag extraction	5 years ago
Mike Fährmann	ba8eb1ffec	[hentainexus] add gallery extractor (#256 )	5 years ago
Mike Fährmann	bd9cb3d191	improve job class selection code + consistent argument order for add_argument() calls	5 years ago
Mike Fährmann	e64773ffdd	allow multiple post-processor command-line options (#253 ) ... without overwriting any previous ones	5 years ago
Mike Fährmann	b1db194c14	[reactor] update and improve - split 'tags' into a list - parse 'date' into a datetime object - fix webm/mp4 URLs	5 years ago
Mike Fährmann	b0e85a42e3	apply workaround from `4736912` in parse_datetime() itself	5 years ago
Mike Fährmann	523ebc9b0b	Fix serialization of 'datetime' objects in '--write-metadata' Simplified universal serialization support in json.dump() can be achieved by passing 'default=str', which was already the case in DataJob.run() for -j/--dump-json, but not for the 'metadata' post-processor. This commit introduces util.dump_json() that (more or less) unifies the JSON output procedure of both --write-metadata and --dump-json. (#251, #252)	5 years ago
Mike Fährmann	8de5866fd2	[twitter] replace unit test URLs https://twitter.com/PicturesEarth was deleted	5 years ago
Mike Fährmann	74c7304c6b	[newgrounds] extract 'date', 'favorites', and 'score'	5 years ago
Mike Fährmann	4736912d4e	[pixiv] work around strptime limitations in Python < 3.7 "%z" doesn't allow a colon separator in older Python versions: - "+0900" is OK - "+09:00" raises an exception	5 years ago
Mike Fährmann	1f7fa9dc8e	[exhentai] update data extraction code - parse 'date' to datetime object - use 'text.extract_from()'	5 years ago
Mike Fährmann	80fdb11508	[pixiv] add 'date' metadata field (closes #248 )	5 years ago
Mike Fährmann	d09864b581	implement text.parse_datetime()	5 years ago
Mike Fährmann	049e9fd6ce	[twitter] fix pagination end condition Some timelines would cause an endless loop because 'has_more_items' is always True, even if it would return the same list of tweets over and over again.	5 years ago
Mike Fährmann	51e0e92429	[deviantart] fix GIF downloads (#242 ) The "original" download URL for GIF animations is only a preview version of the original file.	5 years ago
Leonardo Taccari	f347d2d152	[instagram] Fix for missing `edge_media_to_comment' field and add `date' metadata (#250 ) * [instagram] Remove no longer always present `comments' field `edge_media_to_comment' is no longer always present in the response (also for the same media sometimes is present and sometimes is not present). * [instagram] Add `date' metadata	5 years ago
Mike Fährmann	26b516b328	release version 1.8.3	5 years ago
Mike Fährmann	5fd94c6b83	import urllib3 from requests.packages	5 years ago
Mike Fährmann	35f343206c	update default SSL cipher list in urllib3 < 1.25 Cloudflare now also checks the client's SSL/TLS cipher capabilities and produces a 403: Forbidden response with CAPTCHA if they are insufficient. This commit replaces the default cipher list in urllib3 < 1.25 with the one from 1.25 (1), which doesn't cause problems as long as the client platform actually supports these ciphers. On some platforms (tested with Python 3.4 on Linux and Python 3.7 on an outdated Windows 7 VM) it is necessary to install pyOpenSSL to get everything to work. Explicitly setting a minimum/maximum version for urllib3 is also no longer necessary and installing gallery-dl will therefore not pull a incompatible urllib3 version (#229) Fixes the "403: Forbidden" error on Artstation (#227) (1) `0cedb3b0f1`	5 years ago
Mike Fährmann	fc5e4f2b21	[hitomi] simplify data extraction code	5 years ago
Mike Fährmann	2756cc8dde	[hitomi] set Referer header (fixes #239 )	5 years ago
Mike Fährmann	dcc1592dbf	[twitter] add fallback URLs (#237 )	5 years ago
Mike Fährmann	1c665fd4bd	[mangoxo] fix login	5 years ago
Mike Fährmann	add7e693d0	[tumblr] provide parsed 'date' metadata (#232 )	5 years ago
Mike Fährmann	9544683d56	[deviantart] provide 'date' metadata (#232 )	5 years ago
Mike Fährmann	76df628b13	rewrite invalid cloudflare redirect locations After solving a challenge on komikcast.com, cloudflare would redirect to https:/komikcast.com (with only one '/') when testing on TravisCI.	5 years ago
Mike Fährmann	0d7e8be987	[dynastyscans] simplify image extractor	5 years ago
Mike Fährmann	9aa0bb5afe	[dynastyscans] encode "[]" in search queries urllib3 1.25 classifies URLs with unencoded "[" or "]" as invalid and raises an exception	5 years ago
Mike Fährmann	fe849382d8	[komikcast] improve extraction	5 years ago
Mike Fährmann	bc26fc2439	implement '--clear-cache' Effectively clears all cached values from the cache database by executing "DELETE FROM data" without any further user input.	5 years ago
Mike Fährmann	0318c610dc	[sexcom] add extractor for search results (#147 )	5 years ago
Mike Fährmann	a247c94c34	[sexcom] add pin and board extractors (#147 )	5 years ago
Mike Fährmann	6264a46212	use 'utcfromtimestamp()' 'fromtimestamp()' converts its results to the local timezone and causes problems when running tests on a different machine.	5 years ago
Mike Fährmann	d84e7c6861	[twitter] extract 'date' metadata (#224 )	5 years ago
Mike Fährmann	d670de0344	implement 'text.parse_timestamp()'	5 years ago
Mike Fährmann	f2cf1c1d73	use 'text.extract_from()' in a few places	5 years ago
Mike Fährmann	21a7e395a7	implement convenience wrapper for text.extract functionality	6 years ago
Mike Fährmann	8f249f1d54	improve text.extract_iter() performance by roughly 40% through - inlining code - pre-calculating reused values - entering a try-except block only once	6 years ago
Mike Fährmann	e25ebc4bff	don't disable certificate checks anymore Executables generated with PyInstaller auto-include the root certificate file and certificate checks now work out-of-the-box.	6 years ago
Mike Fährmann	7973419b54	restrict downloader and postprocessor module imports	6 years ago
Mike Fährmann	70be494161	[plurk] add a 'comments' options (#212 )	6 years ago
Mike Fährmann	0b2ff406f6	[plurk] add timeline- and post-extractors (#212 )	6 years ago
Mike Fährmann	dcd1bd3b6f	release version 1.8.2	6 years ago
Mike Fährmann	d6ddb74cde	update test results - deviantart: 'index' is now an integer - flickr: image file with lower quality - paheal: image server name changed - rule34: post got deleted	6 years ago
Mike Fährmann	87b0929bec	Revert "[flickr] restore image quality" This reverts commit `3f513f1056`. Both live.staticflickr and farmN.staticflickr servers now produce the same image file with a lower overall quality than before this change in Flickr's end.	6 years ago
Mike Fährmann	e7cd5510d5	[pixnet] add extractors (closes #177 ) for: - users/blogs: http://albertayu773.pixnet.net/ - folders: https://albertayu773.pixnet.net/album/folder/1405768 - sets : https://albertayu773.pixnet.net/album/set/15078995 - photos : https://albertayu773.pixnet.net/album/photo/159443828	6 years ago
Mike Fährmann	155e1faeaf	[imagebam] support galleries with >100 images (fixes #219 )	6 years ago
Mike Fährmann	9587aea98f	[deviantart] don't rewrite URLs for newer deviations The '/intermediary/' trick stopped working for recently posted deviations, but it still appears to be functional for older ones.	6 years ago
Mike Fährmann	f2220938cb	[mangoxo] improve channel extraction (#184 )	6 years ago
Mike Fährmann	d9b94a585d	[mangoxo] add login support (#184 ) A very recent change: It is now only possible to see more than the first 5 images of an album if you are logged in.	6 years ago
Mike Fährmann	49a6522c38	ensure consistent headers and params ordering Necessary to avoid being labeled a bot and getting a CAPTCHA response after solving a Cloudflare challenge.	6 years ago
Mike Fährmann	e730fc9045	[twitter] add login support (#214 )	6 years ago
Mike Fährmann	2c32dc76cb	[yaplog] update metadata structure (#190 ) Put all blog post related fields in its own dict. 'image_id' -> 'id' 'post_id' -> 'post[id]' 'title' -> 'post[title]' etc ...	6 years ago
Mike Fährmann	35919a9bb8	[livedoor] add blog- and post-extractors (#190 )	6 years ago
Mike Fährmann	3f513f1056	[flickr] restore image quality Flickr started serving images from live.staticflickr.com (see `ec88ff1`), but the old farmN.staticflickr.com URLs still work - at least for the time being. Filesize (and most likely quality as well) for images from live.… is severely reduced compared to images from farmN.… for non-original files, so all live URLs are replaced to point to a randomly chosen farm server.	6 years ago
Mike Fährmann	060859cc68	fix URL patterns allow https:// as well as http://	6 years ago
Mike Fährmann	13526f3624	[yaplog] fix archive_id and posts with more than 24 images - 'post_id' and 'image_id' are only unique per user - /image/ pages only show a maximum of 24 images, but there can be more images than that in a blog post - let extraction run in its own thread and maybe improve speed - #190	6 years ago
Mike Fährmann	2ff043edfa	[yaplog] add user- and post-extractors (#190 )	6 years ago
Mike Fährmann	790f15a56f	[photobucket] use HTTPS	6 years ago
Mike Fährmann	6da665f32e	[mangoxo] add album- and channel-extractors (closes #184 )	6 years ago
Mike Fährmann	21e80d60ff	[wikiart] docstring fixes	6 years ago
Mike Fährmann	c70b21248d	[wikiart] add extractors (#179 ) for - artists: https://www.wikiart.org/en/thomas-cole - artist-listings: https://www.wikiart.org/en/artists-by-century/12 - artwork-listings: https://www.wikiart.org/en/paintings-by-media/grisaille	6 years ago
Mike Fährmann	9ebd29fcc1	update cloudflare bypass (wip) This commit adds support for the two new JS expressions embedded in the overall challenge code. It does compute the correct 'js_answer' value, but the HTTP request to /cdn-cgi/l/chk_jschl to get the 'cf_clearance' cookie always results in a 403 response with a CAPTCHA inside (hence 'wip') All steps to make this HTTP request indistinguishable from a regular web browser (which passes the test) show no effect. This includes: - using the exact same HTTP headers as a web browser - follow query argument order - different wait times	6 years ago
Mike Fährmann	0f02e85961	[reactor] use "/full/" URLs (closes #210 ) Putting a "/full/" in image URLs potentially gives higher resolution and better quality.	6 years ago
Mike Fährmann	17c11393f5	[weibo] allow user-ids in status URLs	6 years ago
Mike Fährmann	ec88ff1562	[flickr] relax unit test results Images are now randomly served from the 'live.staticflickr.com' domain instead of the "old" 'farmN.staticflickr.com' one, making it impossible to use static 'url' and 'keyword' hashes as results. Image quality doesn't appear to be effected by which image-server is used. Files from 'farmN' and 'live' are the same.	6 years ago
Mike Fährmann	bc2020e86c	release version 1.8.1	6 years ago
Mike Fährmann	00d604cafb	[luscious] fix SearchExtractor URL-pattern	6 years ago
Mike Fährmann	1384ebf907	[luscious] fix metadata extraction - remove 'artist', 'language', and 'lang' fields - replace 'section' with 'genre' - provide 'tags' as list - use GalleryExtractor as base class	6 years ago
Mike Fährmann	5398bfbd69	[exhentai] fix search and favorite extraction removes basically all metadata, but that can be compensated for with the right search query. writing "parsers" for all 4 possible views that have been introduced in the latest changes is too much of a hassle ...	6 years ago
Mike Fährmann	5476404a5c	update and fix Cloudflare bypass	6 years ago
Leonardo Taccari	790b1336a6	[instagram] Add support for hashtags Add support for hashtags (TagPage-s), i.e. explore/tags/<tag> URLs. This also introduce a get_metadata() method in order to append possible further metadata per-(sub)extractor. Refactor and generalize _extract_profilepage() to _extract_page() in order to be reused by _extract_profilepage() and _extract_tagpage() simply by passing the type of page (`ProfilePage' or `TagPage') and picking up the respective fields in shared data.	6 years ago
Mike Fährmann	114b8eecc5	[downloader;ytdl] utilize '_ytdl_index' metadata fields	6 years ago
Mike Fährmann	a9bdd0f153	[instagram] fix syntax for Python 3.4 Python 3.4 doesn't like '**common' in dict literals. This also makes '_ytdl_index' zero-based.	6 years ago
Mike Fährmann	eacebf41e4	fix typo in README	6 years ago
Leonardo Taccari	1e38f65996	[instagram] Add support for GraphSidecar media types (#201 ) * [instagram] Add support for GraphSidecar media types Refactor _extract_postpage() to always return a list of medias. Fetch common keywords and gracefully handle GraphSidecar media type by extracting each single media and adding `sidecar_media_id' and `sidecar_shortcode' keywords to indicate the parent of sidecar childrens. While here join the copyright comment lines in a single one. Closes #178. * [instagram] Use `yield from' instead of `for ... yield' (thanks @mikf)! * [instagram] Adjust filename for GraphSidecar medias Add a possible leading `media_id' of the sidecar for GraphSidecar media. Thanks to @mikf for the suggestion! * [instagram] Add extra metadata for youtube-dl in GraphSidecar childrens GraphSidecar children ytdl: URLs when consumed by youtube-dl redirects to the URL of their parent. In GraphSidecar-s with multiple GraphVideo-s this leads to downloading the same video multiple times. Add a `_ytdl_index' field to indicate the index of the youtube-dl playlist corresponding the children of the sidecar. This will be used by the `ytdl' downloader.	6 years ago
Mike Fährmann	e7d0d98c88	improve FFmpeg arguments for --ugoira-conv	6 years ago
Mike Fährmann	6ba67b0537	[hypnohub] add extractors (closes #196 )	6 years ago
Mike Fährmann	fe27154a10	[komikcast] fix extraction ... again	6 years ago
Mike Fährmann	5ec55ec4fc	[deviantart] improve URLs for non-downloadable deviations	6 years ago
Mike Fährmann	c7a6b0ed90	[deviantart] add 'metadata' option (#189 )	6 years ago
Mike Fährmann	8d96a8ce4c	[500px] add user-, gallery-, and image-extractors (#185 )	6 years ago
Mike Fährmann	d0f88c35be	[komikcast] fix extraction	6 years ago
Mike Fährmann	6277a739e4	[35photo] add user-, genre-, and image-extractors (#162 )	6 years ago
Mike Fährmann	fb14f80d62	[tumblr] fix avatar URLs for non-OAuth1.0 calls (closes #193 )	6 years ago
Mike Fährmann	8c20443839	release version 1.8.0	6 years ago
Mike Fährmann	973a720a7a	[weibo] fix unit test URL patterns	6 years ago
Mike Fährmann	a2af2d2965	adjust cache maxage values	6 years ago
Mike Fährmann	f612284d24	cache cfclearance cookies	6 years ago
Mike Fährmann	34ea0d6a10	rewrite cache module less complexity, better performance, but some duplicate code here and there	6 years ago
Mike Fährmann	12482553bd	update links to youtube-dl	6 years ago
Mike Fährmann	591a07f20c	small code changes and cleanups	6 years ago
Mike Fährmann	6f57d44ec2	[seaotterscans] remove extractor http://seaotterscans.com/ now redirects to their MangaDex profile	6 years ago
Mike Fährmann	6dae6bee37	automatically detect and bypass cloudflare challenge pages TODO: cache and re-apply cfclearance cookies	6 years ago
Mike Fährmann	25aaf55514	[smugmug] improve format selection (closes #183 ) - use original image if available - support video formats - remove user info for ImageExtractor (it is no longer possible to get image owner information for a single image)	6 years ago
Mike Fährmann	7c1cb923a4	[myportfolio] replace unit test the old gallery got removed	6 years ago
Mike Fährmann	fffbfd3dce	[imgspice] fix extraction	6 years ago
Mike Fährmann	4ca4631bad	simplify auto-disabling certificate verification if no certificate bundle is found	6 years ago
Mike Fährmann	09d872a2b1	generalize extractor creation code	6 years ago
Mike Fährmann	8dc6be246b	[shopify] add custom retry logic for 430 status codes (#175 )	6 years ago
Mike Fährmann	0887fb61f4	[komikcast] update test results	6 years ago
Mike Fährmann	976ccb267f	[myportfolio] combine gallery and user extractors An URL alone isn't good enough to distinguish between a gallery or a gallery-listing, so the new extractor decides what to do based on the page's content.	6 years ago
Mike Fährmann	efd104e45e	[instagram] reject more non-user URLs (#180 )	6 years ago
HRXN	56e0e92e0d	[shopify] cosmetic changes in shopify.py (#181 ) Glanced over the commits, randomly spotted some minor things.	6 years ago
Mike Fährmann	23baecb29e	fix 'CONVERSIONS' variable name	6 years ago
Mike Fährmann	9c0e2f294b	[shopify] add generic collection and product extractors (#175 ) with fashionnova.com as a default domain	6 years ago
Mike Fährmann	105097ddcf	add 'S' conversion options for format string fields Same as 's' (convert to string), but has a better, human-readable conversion for lists.	6 years ago
Mike Fährmann	1578013efc	remove unused default config path	6 years ago
Mike Fährmann	26c4365baa	adjust metadata types for GalleryExtractors	6 years ago
Mike Fährmann	13e0f2a78f	[deviantart] add 'scraps' extractor (closes #168 )	6 years ago
Mike Fährmann	3ea11f5d5e	[nhentai] rewrite - use GalleryExtractor as base class - extract a lot more metadata (artist, tags, etc.)	6 years ago
Mike Fährmann	176b7253a1	update function signature for config.load()	6 years ago
Mike Fährmann	3595cd582f	use GalleryExtractor as common base class	6 years ago
Mike Fährmann	a138d5873d	[hentaifoundry] improve/fix extraction - Sometimes an ad interfered when trying to get a download URL - Resolving "www.hentai-foundry.com" yields an invalid(?) IPv6 address (2607:5300:60:ca9e:feed:dead:beef:1) and urllib3 only tries to connect to the IPv4 variant after a rather long wait time	6 years ago
Mike Fährmann	280531c8ff	[pururin] add gallery extractor (closes #174 )	6 years ago
Mike Fährmann	3159dd79d5	[seiga] use HTTPS	6 years ago
Mike Fährmann	f6734142ee	[komikcast] remove 'width' and 'height' info	6 years ago
Mike Fährmann	d0059cab79	[tumblr] check for null URLs (closes #165 )	6 years ago
Mike Fährmann	e687a6095e	[luscious] raise exception if album is not available	6 years ago
Mike Fährmann	22d3a2fcc8	[artstation] add extractor for artwork listings (#80 ) like https://www.artstation.com/artwork?sorting=latest or https://www.artstation.com/artwork?sorting=picks	6 years ago
Mike Fährmann	937a802b49	[dynastyscans] add extractors for images and image searches (closes #163)	6 years ago
Mike Fährmann	b09a8184ca	move TestJob into test module; test _extractor values	6 years ago
Mike Fährmann	19860655a3	[weibo] add 'user' and 'status' extractors	6 years ago
Mike Fährmann	f8782c05f2	[paheal] rename "tags" to "search_tags" to better match field names of other booru extractors	6 years ago
Mike Fährmann	c7b8421333	[deviantart] don't match 'www' as a potential username	6 years ago
Mike Fährmann	5530871b5a	change results of text.nameext_from_url() Instead of getting a complete 'filename' from an URL and splitting that into 'name' and 'extension', the new approach gets rid of the complete version and renames 'name' to 'filename'. (Using anything other than {extension} for a filename extension doesn't really work anyway) Example: "https://example.org/path/filename.ext" before: - filename : filename.ext - name : filename - extension: ext now: - filename : filename - extension: ext	6 years ago
Mike Fährmann	148b8f15d0	update tests for util.py	6 years ago
Mike Fährmann	ae353ed3b0	provide "extractor" and "job" keys for logging output This allows for stuff like "{extractor.url}" and "{extractor.category}" in logging format strings. Accessing 'extractor' and 'job' in any way will return "None" if those fields aren't defined, i.e. in general logging messages.	6 years ago
Mike Fährmann	32edf4fc7b	add '_extractor' info to manga extractor results	6 years ago
Mike Fährmann	89ee8cd7e4	filter "private" kwdict entries	6 years ago
Mike Fährmann	61741d7333	provide type information for Queue messages Child extractors are now directly constructed with Extractor.from_url() if the extractor class is known beforehand, instead of using extractor.find() and searching through all possible extractor classes.	6 years ago
Mike Fährmann	2e516a1e3e	store the full original URL in Extractor.url	6 years ago
Mike Fährmann	580baef72c	change Chapter and MangaExtractor classes - unify and simplify constructors - rename get_metadata and get_images to just metadata() and images() - rename self.url to chapter_url and manga_url	6 years ago
Mike Fährmann	4b1880fa5e	propagate 'match' to base extractor constructor	6 years ago
Mike Fährmann	ade86da7a1	[tsumino] replace test	6 years ago
Mike Fährmann	1f3422c28b	[mangahere] fix extraction	6 years ago
Mike Fährmann	84ae72b8d8	[ngomik] fix extraction	6 years ago
Mike Fährmann	02d733d219	[simplyhentai] fix and improve tag extraction The "tags" field is now a list instead of a string. In format strings, use "{tags:J, }" to Join them.	6 years ago
Mike Fährmann	3a0b4af744	[seiga] recognize /thumb/ URLs https://lohas.nicoseiga.jp/thumb/5977527i	6 years ago
Mike Fährmann	8fc6fbfa34	[artstation] recognize shortened project URLs https://artstn.co/p/<project-id>	6 years ago
Mike Fährmann	9a9cd32461	implement alternative constructor for extractors	6 years ago
Mike Fährmann	abbd45d0f4	update handling of extractor URL patterns When loading extractor classes during 'extractor.find(…)', their 'pattern' attribute will be replaced with a compiled version of itself.	6 years ago
Mike Fährmann	6284731107	simplify extractor constants - single strings for URL patterns - tuples instead of lists for 'directory_fmt' and 'test' - single-tuple tests where applicable	6 years ago
Mike Fährmann	34bab080ae	rewrite URL patterns to use only 1 per extractor	6 years ago
Mike Fährmann	0e46db6f45	rename some base classes They shouldn't be called …Extractor if they don't have 'Extractor' as their base class.	6 years ago
Mike Fährmann	793b24e513	[imagehosts] fix and improve various extractors	6 years ago
Mike Fährmann	bc0951d974	allow for simplified test data structures Instead of a strict list of (URL, RESULTS)-tuples, extractor result tests can now be a single (URL, RESULTS)-tuple, if it's just one test, and "only matching" tests can now be a simple string.	6 years ago
Mike Fährmann	b49c3c9991	release version 1.7.0	6 years ago
Mike Fährmann	050bc1aa4a	[reactor] simplify tests Some posts have, for whatever reason, a slightly different text formatting the first time they are accessed that day compared to any further time.	6 years ago
Mike Fährmann	2f3a021d72	[hentaicafe] restore functionality	6 years ago
Mike Fährmann	347398f692	fix various tests	6 years ago
Mike Fährmann	00dc37ccbf	replace AsynchronousMixin Extractor with a Mixin	6 years ago
Mike Fährmann	4d656a81ca	replace SharedConfigExtractor class with a Mixin	6 years ago
Mike Fährmann	ccb95d0ba4	[mastodon] changes/improvements based on foolfuuka/-slide	6 years ago
Mike Fährmann	12ff750111	[foolfuuka] smaller code changes and updates	6 years ago
Mike Fährmann	e1bf3b225e	[foolslide] dynamically generate extractor classes	6 years ago
Mike Fährmann	58a9eede38	[foolfuuka] dynamically generate extractor classes	6 years ago
Mike Fährmann	22d7a783d5	update extraction result tests	6 years ago
Mike Fährmann	197d0e99a4	[tsumino] more useful error message (#161 ) if Tsumino suspects a non-human user and refuses to send gallery pages	6 years ago
Mike Fährmann	d36ec51e5a	[tsumino] add extractor for search results (#161 )	6 years ago
Mike Fährmann	1c1367ec5b	[behance] fix empty docstring	6 years ago
Mike Fährmann	45e529ab91	[behance] fix extraction HTML structure for gallery pages changed quite a bit, so it is now using the embedded JSON data. This changes a lot of metadata field names, but 'gallery_id', 'title', and 'user' are still provided for backwards compatibility. The internal API endpoint for user galleries also changed its data structure, but nothing too major.	6 years ago
Mike Fährmann	e1d3e9a926	add 'ext_from_url' to text.py	6 years ago
Mike Fährmann	bfbbac4495	[tsumino] add login capabilities (#161 )	6 years ago
Mike Fährmann	dd358b4564	improve cookie handling during logins	6 years ago
Mike Fährmann	6126615698	update URLs for supportedsites.rst	6 years ago
Mike Fährmann	80a75a1ecf	[tsumino] add gallery extractor (#161 )	6 years ago
Mike Fährmann	2d2953a5bf	add 'text.parse_float()' + cleanup in text.py	6 years ago
Mike Fährmann	0c32dc5858	[hentaifox] add extractor for search results (#160 )	6 years ago
Mike Fährmann	580947bfce	[hentaifox] rename Chapter- to GalleryExtractor (#160 )	6 years ago
Mike Fährmann	8095f5f81a	[mangapark] fix manga title extraction	6 years ago
Mike Fährmann	0156189468	[hentaifox] add chapter extractor (#160 )	6 years ago
Mike Fährmann	e4171d6baf	[luscious] add login capabilities (closes #159 )	6 years ago
Mike Fährmann	4f49fdf065	[mastodon] various improvements and fixes (#144 ) - allow instances to specify their own 'category' - improve config lookup: - first look into extractor.<category>.* - and afterwards look into extractor.mastodon.<instance>.* - add a default entry for pawoo.net in a way that actually works - add an 'instance' keyword and turn 'tags' into a usable list	6 years ago
Mike Fährmann	3f608a84b7	[photobucket] don't crash if JSON data is missing	6 years ago
Mike Fährmann	134487ffb0	[exhentai] stop extraction if image limit is exceeded (#141 ) can be turned off with the `exhentai.limits' option	6 years ago
Mike Fährmann	e868fb4393	[exhentai] improve gallery extraction - match image page URLs and extract galleries from that point onward - add a few more metadata entries: 'parent', 'visible', 'cost'	6 years ago
Mike Fährmann	a50e9faf0e	[newgrounds] recognize direct links	6 years ago
Mike Fährmann	9fba48fbd7	[postprocessor:metadata] add '--write-tags' flag (#135 )	6 years ago
Mike Fährmann	c5559fa07d	[photobucket] improve subalbum extraction (#117 ) The former implementation would produce a complete list of all subalbums for each (sub)album extraction. This would for example result in a level 2 subalbum getting "extracted" twice: once through the root-album (level 0) and once through its parent album on level 1. In the current implementation only the next level of subalbums are returned, which themselves will handle their next level in a recursive fashion.	6 years ago
Mike Fährmann	ecad69100a	[photobucket] add 'image' extractor (#117 )	6 years ago
Mike Fährmann	b50b30f1c9	[photobucket] download subalbums (#117 )	6 years ago
Mike Fährmann	d19bac71be	[photobucket] add 'album' extractor (#117 )	6 years ago
Mike Fährmann	78b5f29a00	[sankaku] unescape tags	6 years ago
Mike Fährmann	277b52101a	add 'category-transfer' option [ci skip]	6 years ago
Mike Fährmann	9b8ac12eed	[behance] enable 'categorytransfer' for collections (#157 )	6 years ago
Mike Fährmann	217a0687ef	[behance] add 'collection' extractor (closes #157 )	6 years ago
Mike Fährmann	b8fed34548	add generalized extractors for Mastodon instances (#144 ) Extractors for Mastodon instances can now be dynamically generated, based on the instance names in the 'extractor.mastodon.*' config path. Example: { "extractor": { "mastodon": { "pawoo.net": { ... }, "mastodon.xyz": { ... }, "tabletop.social": { ... }, ... } } } Each entry requires an 'access-token' value, which can be generated with 'gallery-dl oauth:mastodon:<instance URL>'. An 'access-token' (as well as a 'client-id' and 'client-secret') for pawoo.net is always available, but can be overwritten as necessary.	6 years ago
Mike Fährmann	4b441c162e	release version 1.6.3	6 years ago
Mike Fährmann	66460337f1	[mangapark] fix extraction	6 years ago
Mike Fährmann	8aba2bdebf	[postprocessor:metadata] add 'tags' and 'custom' modes (#135 )	6 years ago
Mike Fährmann	79c01ec7ae	implement J<separator>/ format option J joins list elements by calling <separator>.join(list): Example: {f:J - /} -> "a - b - c" (if "f" is ["a", "b", "c"])	6 years ago
Mike Fährmann	2ffc105887	[exhentai] extract tag metadata	6 years ago
Mike Fährmann	0fb98d1d79	[hbrowse] extract tag metadata	6 years ago
Mike Fährmann	9bbbadd93a	[hbrowse] use HTTPS	6 years ago
Mike Fährmann	2fbf072723	[newgrounds] ensure consistent tag order ... plus some code restructuring	6 years ago
Mike Fährmann	d7a4739cf6	[hbrowse] print error message if site is down ... instead of crashing with a meaningless exception	6 years ago
Mike Fährmann	98c6520384	[pinterest] update root URL of API calls	6 years ago
Mike Fährmann	751e535948	[nhentai] fix extraction (closes #156 ) Use JSON embedded in webpage since API endpoints have been disabled	6 years ago
Mike Fährmann	5f38ac9609	[postprocessor:exec] add a better error message (#155 )	6 years ago
Mike Fährmann	89df37a173	[artstation] use a separate dict for each asset (#154 ) Using the same base-dict for each asset of a project causes unwanted side effects like re-using image filename extensions for videos, resulting in errors with the youtube-dl downloader.	6 years ago
Mike Fährmann	344bbaa71a	remove useless line A remnant from when `filter` and `range` were global and only available as command line options.	6 years ago
Mike Fährmann	1734a6c879	[reactor] detect "circular" redirects (#148 )	6 years ago
Mike Fährmann	e53cdfd6a8	update build_supportedsites.py	6 years ago
Mike Fährmann	1e4d351ad3	[danbooru] add authentication support (closes #151 ) ... via HTTP Basic Auth with username and "password". The password value in this case is not the account password itself, but the"api_key" found in your user profile.	6 years ago
Mike Fährmann	06cbf5f9c4	implement 'chapter-reverse' option (#149 ) Setting it to `true` will start with the latest chapter instead of the first one.	6 years ago
Mike Fährmann	e95b24f056	[reactor] add wait-min & -max options (#148 )	6 years ago
Mike Fährmann	8e01cf0ef8	[reactor] generalize extractors (#148 ) - support *.reactor.cc domains - combine joyreactor and pornreactor modules	6 years ago
Mike Fährmann	38500ad697	[postprocessor:metadata] first implementation (#135 )	6 years ago
Mike Fährmann	1737d7f576	[joyreactor] fix and improve pagination (#148 )	6 years ago
Mike Fährmann	8753627ef4	[joyreactor] improve error handling for faulty JSON (#148 ) - remove all ASCII escape codes, not just \n and \r - ignore faulty posts instead of letting the exception propagate	6 years ago
Mike Fährmann	a36f52a730	[joyreactor] add extractor for search results (#148 )	6 years ago
Mike Fährmann	a303efb597	[mangadex] handle manga pages without chapters	6 years ago
Mike Fährmann	0afa913de4	[tumblr] add tests for hidden and private blogs (#145 ) Hidden / dashboard-only blogs are pretty straightforward and "only" require a valid 'access-token' and 'access-token-secret' for the given 'api-key' and 'api-secret', so that signed OAuth1.0 requests are possible. Private / password protected blogs on the other hand are a bit cumbersome. In addition to a valid 'access-token' and 'access-token-secret', they also require the account belonging to those tokens to be a member of the blog itself. Knowing the password and entering it in the website isn't enough to access a blog through the API. Following a private blog is also impossible, so that option can't work either.	6 years ago
Mike Fährmann	67cc0ac873	release version 1.6.2	6 years ago
Mike Fährmann	fa7fa2f8ff	[deviantart1 update tests]	6 years ago
Mike Fährmann	b7b5456a32	[kissmanga] use HTTPS	6 years ago
Mike Fährmann	259123732f	[readcomiconline] improve comic-page parsing	6 years ago
Mike Fährmann	0328a04a65	[cloudflare] don't output the whole challenge page thanks to the embedded animated gifs this is just a bit too much	6 years ago

... 7 8 9 10 11 ...

2204 Commits (fe224416bfef1669457bc199102e4ef880e414c7)