gallery-dl

Commit Graph

Author	SHA1	Message	Date
Mike Fährmann	0318c610dc	[sexcom] add extractor for search results (#147 )	5 years ago
Mike Fährmann	a247c94c34	[sexcom] add pin and board extractors (#147 )	5 years ago
Mike Fährmann	6264a46212	use 'utcfromtimestamp()' 'fromtimestamp()' converts its results to the local timezone and causes problems when running tests on a different machine.	5 years ago
Mike Fährmann	d84e7c6861	[twitter] extract 'date' metadata (#224 )	5 years ago
Mike Fährmann	d670de0344	implement 'text.parse_timestamp()'	5 years ago
Mike Fährmann	f2cf1c1d73	use 'text.extract_from()' in a few places	5 years ago
Mike Fährmann	21a7e395a7	implement convenience wrapper for text.extract functionality	6 years ago
Mike Fährmann	8f249f1d54	improve text.extract_iter() performance by roughly 40% through - inlining code - pre-calculating reused values - entering a try-except block only once	6 years ago
Mike Fährmann	e25ebc4bff	don't disable certificate checks anymore Executables generated with PyInstaller auto-include the root certificate file and certificate checks now work out-of-the-box.	6 years ago
Mike Fährmann	7973419b54	restrict downloader and postprocessor module imports	6 years ago
Mike Fährmann	70be494161	[plurk] add a 'comments' options (#212 )	6 years ago
Mike Fährmann	0b2ff406f6	[plurk] add timeline- and post-extractors (#212 )	6 years ago
Mike Fährmann	dcd1bd3b6f	release version 1.8.2	6 years ago
Mike Fährmann	d6ddb74cde	update test results - deviantart: 'index' is now an integer - flickr: image file with lower quality - paheal: image server name changed - rule34: post got deleted	6 years ago
Mike Fährmann	87b0929bec	Revert "[flickr] restore image quality" This reverts commit `3f513f1056`. Both live.staticflickr and farmN.staticflickr servers now produce the same image file with a lower overall quality than before this change in Flickr's end.	6 years ago
Mike Fährmann	e7cd5510d5	[pixnet] add extractors (closes #177 ) for: - users/blogs: http://albertayu773.pixnet.net/ - folders: https://albertayu773.pixnet.net/album/folder/1405768 - sets : https://albertayu773.pixnet.net/album/set/15078995 - photos : https://albertayu773.pixnet.net/album/photo/159443828	6 years ago
Mike Fährmann	155e1faeaf	[imagebam] support galleries with >100 images (fixes #219 )	6 years ago
Mike Fährmann	9587aea98f	[deviantart] don't rewrite URLs for newer deviations The '/intermediary/' trick stopped working for recently posted deviations, but it still appears to be functional for older ones.	6 years ago
Mike Fährmann	f2220938cb	[mangoxo] improve channel extraction (#184 )	6 years ago
Mike Fährmann	d9b94a585d	[mangoxo] add login support (#184 ) A very recent change: It is now only possible to see more than the first 5 images of an album if you are logged in.	6 years ago
Mike Fährmann	49a6522c38	ensure consistent headers and params ordering Necessary to avoid being labeled a bot and getting a CAPTCHA response after solving a Cloudflare challenge.	6 years ago
Mike Fährmann	e730fc9045	[twitter] add login support (#214 )	6 years ago
Mike Fährmann	2c32dc76cb	[yaplog] update metadata structure (#190 ) Put all blog post related fields in its own dict. 'image_id' -> 'id' 'post_id' -> 'post[id]' 'title' -> 'post[title]' etc ...	6 years ago
Mike Fährmann	35919a9bb8	[livedoor] add blog- and post-extractors (#190 )	6 years ago
Mike Fährmann	3f513f1056	[flickr] restore image quality Flickr started serving images from live.staticflickr.com (see `ec88ff1`), but the old farmN.staticflickr.com URLs still work - at least for the time being. Filesize (and most likely quality as well) for images from live.… is severely reduced compared to images from farmN.… for non-original files, so all live URLs are replaced to point to a randomly chosen farm server.	6 years ago
Mike Fährmann	060859cc68	fix URL patterns allow https:// as well as http://	6 years ago
Mike Fährmann	13526f3624	[yaplog] fix archive_id and posts with more than 24 images - 'post_id' and 'image_id' are only unique per user - /image/ pages only show a maximum of 24 images, but there can be more images than that in a blog post - let extraction run in its own thread and maybe improve speed - #190	6 years ago
Mike Fährmann	2ff043edfa	[yaplog] add user- and post-extractors (#190 )	6 years ago
Mike Fährmann	790f15a56f	[photobucket] use HTTPS	6 years ago
Mike Fährmann	6da665f32e	[mangoxo] add album- and channel-extractors (closes #184 )	6 years ago
Mike Fährmann	21e80d60ff	[wikiart] docstring fixes	6 years ago
Mike Fährmann	c70b21248d	[wikiart] add extractors (#179 ) for - artists: https://www.wikiart.org/en/thomas-cole - artist-listings: https://www.wikiart.org/en/artists-by-century/12 - artwork-listings: https://www.wikiart.org/en/paintings-by-media/grisaille	6 years ago
Mike Fährmann	9ebd29fcc1	update cloudflare bypass (wip) This commit adds support for the two new JS expressions embedded in the overall challenge code. It does compute the correct 'js_answer' value, but the HTTP request to /cdn-cgi/l/chk_jschl to get the 'cf_clearance' cookie always results in a 403 response with a CAPTCHA inside (hence 'wip') All steps to make this HTTP request indistinguishable from a regular web browser (which passes the test) show no effect. This includes: - using the exact same HTTP headers as a web browser - follow query argument order - different wait times	6 years ago
Mike Fährmann	0f02e85961	[reactor] use "/full/" URLs (closes #210 ) Putting a "/full/" in image URLs potentially gives higher resolution and better quality.	6 years ago
Mike Fährmann	17c11393f5	[weibo] allow user-ids in status URLs	6 years ago
Mike Fährmann	ec88ff1562	[flickr] relax unit test results Images are now randomly served from the 'live.staticflickr.com' domain instead of the "old" 'farmN.staticflickr.com' one, making it impossible to use static 'url' and 'keyword' hashes as results. Image quality doesn't appear to be effected by which image-server is used. Files from 'farmN' and 'live' are the same.	6 years ago
Mike Fährmann	bc2020e86c	release version 1.8.1	6 years ago
Mike Fährmann	00d604cafb	[luscious] fix SearchExtractor URL-pattern	6 years ago
Mike Fährmann	1384ebf907	[luscious] fix metadata extraction - remove 'artist', 'language', and 'lang' fields - replace 'section' with 'genre' - provide 'tags' as list - use GalleryExtractor as base class	6 years ago
Mike Fährmann	5398bfbd69	[exhentai] fix search and favorite extraction removes basically all metadata, but that can be compensated for with the right search query. writing "parsers" for all 4 possible views that have been introduced in the latest changes is too much of a hassle ...	6 years ago
Mike Fährmann	5476404a5c	update and fix Cloudflare bypass	6 years ago
Leonardo Taccari	790b1336a6	[instagram] Add support for hashtags Add support for hashtags (TagPage-s), i.e. explore/tags/<tag> URLs. This also introduce a get_metadata() method in order to append possible further metadata per-(sub)extractor. Refactor and generalize _extract_profilepage() to _extract_page() in order to be reused by _extract_profilepage() and _extract_tagpage() simply by passing the type of page (`ProfilePage' or `TagPage') and picking up the respective fields in shared data.	6 years ago
Mike Fährmann	114b8eecc5	[downloader;ytdl] utilize '_ytdl_index' metadata fields	6 years ago
Mike Fährmann	a9bdd0f153	[instagram] fix syntax for Python 3.4 Python 3.4 doesn't like '**common' in dict literals. This also makes '_ytdl_index' zero-based.	6 years ago
Mike Fährmann	eacebf41e4	fix typo in README	6 years ago
Leonardo Taccari	1e38f65996	[instagram] Add support for GraphSidecar media types (#201 ) * [instagram] Add support for GraphSidecar media types Refactor _extract_postpage() to always return a list of medias. Fetch common keywords and gracefully handle GraphSidecar media type by extracting each single media and adding `sidecar_media_id' and `sidecar_shortcode' keywords to indicate the parent of sidecar childrens. While here join the copyright comment lines in a single one. Closes #178. * [instagram] Use `yield from' instead of `for ... yield' (thanks @mikf)! * [instagram] Adjust filename for GraphSidecar medias Add a possible leading `media_id' of the sidecar for GraphSidecar media. Thanks to @mikf for the suggestion! * [instagram] Add extra metadata for youtube-dl in GraphSidecar childrens GraphSidecar children ytdl: URLs when consumed by youtube-dl redirects to the URL of their parent. In GraphSidecar-s with multiple GraphVideo-s this leads to downloading the same video multiple times. Add a `_ytdl_index' field to indicate the index of the youtube-dl playlist corresponding the children of the sidecar. This will be used by the `ytdl' downloader.	6 years ago
Mike Fährmann	e7d0d98c88	improve FFmpeg arguments for --ugoira-conv	6 years ago
Mike Fährmann	6ba67b0537	[hypnohub] add extractors (closes #196 )	6 years ago
Mike Fährmann	fe27154a10	[komikcast] fix extraction ... again	6 years ago
Mike Fährmann	5ec55ec4fc	[deviantart] improve URLs for non-downloadable deviations	6 years ago
Mike Fährmann	c7a6b0ed90	[deviantart] add 'metadata' option (#189 )	6 years ago
Mike Fährmann	8d96a8ce4c	[500px] add user-, gallery-, and image-extractors (#185 )	6 years ago
Mike Fährmann	d0f88c35be	[komikcast] fix extraction	6 years ago
Mike Fährmann	6277a739e4	[35photo] add user-, genre-, and image-extractors (#162 )	6 years ago
Mike Fährmann	fb14f80d62	[tumblr] fix avatar URLs for non-OAuth1.0 calls (closes #193 )	6 years ago
Mike Fährmann	8c20443839	release version 1.8.0	6 years ago
Mike Fährmann	973a720a7a	[weibo] fix unit test URL patterns	6 years ago
Mike Fährmann	a2af2d2965	adjust cache maxage values	6 years ago
Mike Fährmann	f612284d24	cache cfclearance cookies	6 years ago
Mike Fährmann	34ea0d6a10	rewrite cache module less complexity, better performance, but some duplicate code here and there	6 years ago
Mike Fährmann	12482553bd	update links to youtube-dl	6 years ago
Mike Fährmann	591a07f20c	small code changes and cleanups	6 years ago
Mike Fährmann	6f57d44ec2	[seaotterscans] remove extractor http://seaotterscans.com/ now redirects to their MangaDex profile	6 years ago
Mike Fährmann	6dae6bee37	automatically detect and bypass cloudflare challenge pages TODO: cache and re-apply cfclearance cookies	6 years ago
Mike Fährmann	25aaf55514	[smugmug] improve format selection (closes #183 ) - use original image if available - support video formats - remove user info for ImageExtractor (it is no longer possible to get image owner information for a single image)	6 years ago
Mike Fährmann	7c1cb923a4	[myportfolio] replace unit test the old gallery got removed	6 years ago
Mike Fährmann	fffbfd3dce	[imgspice] fix extraction	6 years ago
Mike Fährmann	4ca4631bad	simplify auto-disabling certificate verification if no certificate bundle is found	6 years ago
Mike Fährmann	09d872a2b1	generalize extractor creation code	6 years ago
Mike Fährmann	8dc6be246b	[shopify] add custom retry logic for 430 status codes (#175 )	6 years ago
Mike Fährmann	0887fb61f4	[komikcast] update test results	6 years ago
Mike Fährmann	976ccb267f	[myportfolio] combine gallery and user extractors An URL alone isn't good enough to distinguish between a gallery or a gallery-listing, so the new extractor decides what to do based on the page's content.	6 years ago
Mike Fährmann	efd104e45e	[instagram] reject more non-user URLs (#180 )	6 years ago
HRXN	56e0e92e0d	[shopify] cosmetic changes in shopify.py (#181 ) Glanced over the commits, randomly spotted some minor things.	6 years ago
Mike Fährmann	23baecb29e	fix 'CONVERSIONS' variable name	6 years ago
Mike Fährmann	9c0e2f294b	[shopify] add generic collection and product extractors (#175 ) with fashionnova.com as a default domain	6 years ago
Mike Fährmann	105097ddcf	add 'S' conversion options for format string fields Same as 's' (convert to string), but has a better, human-readable conversion for lists.	6 years ago
Mike Fährmann	1578013efc	remove unused default config path	6 years ago
Mike Fährmann	26c4365baa	adjust metadata types for GalleryExtractors	6 years ago
Mike Fährmann	13e0f2a78f	[deviantart] add 'scraps' extractor (closes #168 )	6 years ago
Mike Fährmann	3ea11f5d5e	[nhentai] rewrite - use GalleryExtractor as base class - extract a lot more metadata (artist, tags, etc.)	6 years ago
Mike Fährmann	176b7253a1	update function signature for config.load()	6 years ago
Mike Fährmann	3595cd582f	use GalleryExtractor as common base class	6 years ago
Mike Fährmann	a138d5873d	[hentaifoundry] improve/fix extraction - Sometimes an ad interfered when trying to get a download URL - Resolving "www.hentai-foundry.com" yields an invalid(?) IPv6 address (2607:5300:60:ca9e:feed:dead:beef:1) and urllib3 only tries to connect to the IPv4 variant after a rather long wait time	6 years ago
Mike Fährmann	280531c8ff	[pururin] add gallery extractor (closes #174 )	6 years ago
Mike Fährmann	3159dd79d5	[seiga] use HTTPS	6 years ago
Mike Fährmann	f6734142ee	[komikcast] remove 'width' and 'height' info	6 years ago
Mike Fährmann	d0059cab79	[tumblr] check for null URLs (closes #165 )	6 years ago
Mike Fährmann	e687a6095e	[luscious] raise exception if album is not available	6 years ago
Mike Fährmann	22d3a2fcc8	[artstation] add extractor for artwork listings (#80 ) like https://www.artstation.com/artwork?sorting=latest or https://www.artstation.com/artwork?sorting=picks	6 years ago
Mike Fährmann	937a802b49	[dynastyscans] add extractors for images and image searches (closes #163)	6 years ago
Mike Fährmann	b09a8184ca	move TestJob into test module; test _extractor values	6 years ago
Mike Fährmann	19860655a3	[weibo] add 'user' and 'status' extractors	6 years ago
Mike Fährmann	f8782c05f2	[paheal] rename "tags" to "search_tags" to better match field names of other booru extractors	6 years ago
Mike Fährmann	c7b8421333	[deviantart] don't match 'www' as a potential username	6 years ago
Mike Fährmann	5530871b5a	change results of text.nameext_from_url() Instead of getting a complete 'filename' from an URL and splitting that into 'name' and 'extension', the new approach gets rid of the complete version and renames 'name' to 'filename'. (Using anything other than {extension} for a filename extension doesn't really work anyway) Example: "https://example.org/path/filename.ext" before: - filename : filename.ext - name : filename - extension: ext now: - filename : filename - extension: ext	6 years ago
Mike Fährmann	148b8f15d0	update tests for util.py	6 years ago
Mike Fährmann	ae353ed3b0	provide "extractor" and "job" keys for logging output This allows for stuff like "{extractor.url}" and "{extractor.category}" in logging format strings. Accessing 'extractor' and 'job' in any way will return "None" if those fields aren't defined, i.e. in general logging messages.	6 years ago
Mike Fährmann	32edf4fc7b	add '_extractor' info to manga extractor results	6 years ago
Mike Fährmann	89ee8cd7e4	filter "private" kwdict entries	6 years ago
Mike Fährmann	61741d7333	provide type information for Queue messages Child extractors are now directly constructed with Extractor.from_url() if the extractor class is known beforehand, instead of using extractor.find() and searching through all possible extractor classes.	6 years ago
Mike Fährmann	2e516a1e3e	store the full original URL in Extractor.url	6 years ago
Mike Fährmann	580baef72c	change Chapter and MangaExtractor classes - unify and simplify constructors - rename get_metadata and get_images to just metadata() and images() - rename self.url to chapter_url and manga_url	6 years ago
Mike Fährmann	4b1880fa5e	propagate 'match' to base extractor constructor	6 years ago
Mike Fährmann	ade86da7a1	[tsumino] replace test	6 years ago
Mike Fährmann	1f3422c28b	[mangahere] fix extraction	6 years ago
Mike Fährmann	84ae72b8d8	[ngomik] fix extraction	6 years ago
Mike Fährmann	02d733d219	[simplyhentai] fix and improve tag extraction The "tags" field is now a list instead of a string. In format strings, use "{tags:J, }" to Join them.	6 years ago
Mike Fährmann	3a0b4af744	[seiga] recognize /thumb/ URLs https://lohas.nicoseiga.jp/thumb/5977527i	6 years ago
Mike Fährmann	8fc6fbfa34	[artstation] recognize shortened project URLs https://artstn.co/p/<project-id>	6 years ago
Mike Fährmann	9a9cd32461	implement alternative constructor for extractors	6 years ago
Mike Fährmann	abbd45d0f4	update handling of extractor URL patterns When loading extractor classes during 'extractor.find(…)', their 'pattern' attribute will be replaced with a compiled version of itself.	6 years ago
Mike Fährmann	6284731107	simplify extractor constants - single strings for URL patterns - tuples instead of lists for 'directory_fmt' and 'test' - single-tuple tests where applicable	6 years ago
Mike Fährmann	34bab080ae	rewrite URL patterns to use only 1 per extractor	6 years ago
Mike Fährmann	0e46db6f45	rename some base classes They shouldn't be called …Extractor if they don't have 'Extractor' as their base class.	6 years ago
Mike Fährmann	793b24e513	[imagehosts] fix and improve various extractors	6 years ago
Mike Fährmann	bc0951d974	allow for simplified test data structures Instead of a strict list of (URL, RESULTS)-tuples, extractor result tests can now be a single (URL, RESULTS)-tuple, if it's just one test, and "only matching" tests can now be a simple string.	6 years ago
Mike Fährmann	b49c3c9991	release version 1.7.0	6 years ago
Mike Fährmann	050bc1aa4a	[reactor] simplify tests Some posts have, for whatever reason, a slightly different text formatting the first time they are accessed that day compared to any further time.	6 years ago
Mike Fährmann	2f3a021d72	[hentaicafe] restore functionality	6 years ago
Mike Fährmann	347398f692	fix various tests	6 years ago
Mike Fährmann	00dc37ccbf	replace AsynchronousMixin Extractor with a Mixin	6 years ago
Mike Fährmann	4d656a81ca	replace SharedConfigExtractor class with a Mixin	6 years ago
Mike Fährmann	ccb95d0ba4	[mastodon] changes/improvements based on foolfuuka/-slide	6 years ago
Mike Fährmann	12ff750111	[foolfuuka] smaller code changes and updates	6 years ago
Mike Fährmann	e1bf3b225e	[foolslide] dynamically generate extractor classes	6 years ago
Mike Fährmann	58a9eede38	[foolfuuka] dynamically generate extractor classes	6 years ago
Mike Fährmann	22d7a783d5	update extraction result tests	6 years ago
Mike Fährmann	197d0e99a4	[tsumino] more useful error message (#161 ) if Tsumino suspects a non-human user and refuses to send gallery pages	6 years ago
Mike Fährmann	d36ec51e5a	[tsumino] add extractor for search results (#161 )	6 years ago
Mike Fährmann	1c1367ec5b	[behance] fix empty docstring	6 years ago
Mike Fährmann	45e529ab91	[behance] fix extraction HTML structure for gallery pages changed quite a bit, so it is now using the embedded JSON data. This changes a lot of metadata field names, but 'gallery_id', 'title', and 'user' are still provided for backwards compatibility. The internal API endpoint for user galleries also changed its data structure, but nothing too major.	6 years ago
Mike Fährmann	e1d3e9a926	add 'ext_from_url' to text.py	6 years ago
Mike Fährmann	bfbbac4495	[tsumino] add login capabilities (#161 )	6 years ago
Mike Fährmann	dd358b4564	improve cookie handling during logins	6 years ago
Mike Fährmann	6126615698	update URLs for supportedsites.rst	6 years ago
Mike Fährmann	80a75a1ecf	[tsumino] add gallery extractor (#161 )	6 years ago
Mike Fährmann	2d2953a5bf	add 'text.parse_float()' + cleanup in text.py	6 years ago
Mike Fährmann	0c32dc5858	[hentaifox] add extractor for search results (#160 )	6 years ago
Mike Fährmann	580947bfce	[hentaifox] rename Chapter- to GalleryExtractor (#160 )	6 years ago
Mike Fährmann	8095f5f81a	[mangapark] fix manga title extraction	6 years ago
Mike Fährmann	0156189468	[hentaifox] add chapter extractor (#160 )	6 years ago
Mike Fährmann	e4171d6baf	[luscious] add login capabilities (closes #159 )	6 years ago
Mike Fährmann	4f49fdf065	[mastodon] various improvements and fixes (#144 ) - allow instances to specify their own 'category' - improve config lookup: - first look into extractor.<category>.* - and afterwards look into extractor.mastodon.<instance>.* - add a default entry for pawoo.net in a way that actually works - add an 'instance' keyword and turn 'tags' into a usable list	6 years ago
Mike Fährmann	3f608a84b7	[photobucket] don't crash if JSON data is missing	6 years ago
Mike Fährmann	134487ffb0	[exhentai] stop extraction if image limit is exceeded (#141 ) can be turned off with the `exhentai.limits' option	6 years ago
Mike Fährmann	e868fb4393	[exhentai] improve gallery extraction - match image page URLs and extract galleries from that point onward - add a few more metadata entries: 'parent', 'visible', 'cost'	6 years ago
Mike Fährmann	a50e9faf0e	[newgrounds] recognize direct links	6 years ago
Mike Fährmann	9fba48fbd7	[postprocessor:metadata] add '--write-tags' flag (#135 )	6 years ago
Mike Fährmann	c5559fa07d	[photobucket] improve subalbum extraction (#117 ) The former implementation would produce a complete list of all subalbums for each (sub)album extraction. This would for example result in a level 2 subalbum getting "extracted" twice: once through the root-album (level 0) and once through its parent album on level 1. In the current implementation only the next level of subalbums are returned, which themselves will handle their next level in a recursive fashion.	6 years ago
Mike Fährmann	ecad69100a	[photobucket] add 'image' extractor (#117 )	6 years ago
Mike Fährmann	b50b30f1c9	[photobucket] download subalbums (#117 )	6 years ago
Mike Fährmann	d19bac71be	[photobucket] add 'album' extractor (#117 )	6 years ago
Mike Fährmann	78b5f29a00	[sankaku] unescape tags	6 years ago
Mike Fährmann	277b52101a	add 'category-transfer' option [ci skip]	6 years ago
Mike Fährmann	9b8ac12eed	[behance] enable 'categorytransfer' for collections (#157 )	6 years ago
Mike Fährmann	217a0687ef	[behance] add 'collection' extractor (closes #157 )	6 years ago
Mike Fährmann	b8fed34548	add generalized extractors for Mastodon instances (#144 ) Extractors for Mastodon instances can now be dynamically generated, based on the instance names in the 'extractor.mastodon.*' config path. Example: { "extractor": { "mastodon": { "pawoo.net": { ... }, "mastodon.xyz": { ... }, "tabletop.social": { ... }, ... } } } Each entry requires an 'access-token' value, which can be generated with 'gallery-dl oauth:mastodon:<instance URL>'. An 'access-token' (as well as a 'client-id' and 'client-secret') for pawoo.net is always available, but can be overwritten as necessary.	6 years ago
Mike Fährmann	4b441c162e	release version 1.6.3	6 years ago
Mike Fährmann	66460337f1	[mangapark] fix extraction	6 years ago
Mike Fährmann	8aba2bdebf	[postprocessor:metadata] add 'tags' and 'custom' modes (#135 )	6 years ago
Mike Fährmann	79c01ec7ae	implement J<separator>/ format option J joins list elements by calling <separator>.join(list): Example: {f:J - /} -> "a - b - c" (if "f" is ["a", "b", "c"])	6 years ago
Mike Fährmann	2ffc105887	[exhentai] extract tag metadata	6 years ago
Mike Fährmann	0fb98d1d79	[hbrowse] extract tag metadata	6 years ago
Mike Fährmann	9bbbadd93a	[hbrowse] use HTTPS	6 years ago
Mike Fährmann	2fbf072723	[newgrounds] ensure consistent tag order ... plus some code restructuring	6 years ago
Mike Fährmann	d7a4739cf6	[hbrowse] print error message if site is down ... instead of crashing with a meaningless exception	6 years ago
Mike Fährmann	98c6520384	[pinterest] update root URL of API calls	6 years ago
Mike Fährmann	751e535948	[nhentai] fix extraction (closes #156 ) Use JSON embedded in webpage since API endpoints have been disabled	6 years ago
Mike Fährmann	5f38ac9609	[postprocessor:exec] add a better error message (#155 )	6 years ago
Mike Fährmann	89df37a173	[artstation] use a separate dict for each asset (#154 ) Using the same base-dict for each asset of a project causes unwanted side effects like re-using image filename extensions for videos, resulting in errors with the youtube-dl downloader.	6 years ago
Mike Fährmann	344bbaa71a	remove useless line A remnant from when `filter` and `range` were global and only available as command line options.	6 years ago
Mike Fährmann	1734a6c879	[reactor] detect "circular" redirects (#148 )	6 years ago
Mike Fährmann	e53cdfd6a8	update build_supportedsites.py	6 years ago
Mike Fährmann	1e4d351ad3	[danbooru] add authentication support (closes #151 ) ... via HTTP Basic Auth with username and "password". The password value in this case is not the account password itself, but the"api_key" found in your user profile.	6 years ago
Mike Fährmann	06cbf5f9c4	implement 'chapter-reverse' option (#149 ) Setting it to `true` will start with the latest chapter instead of the first one.	6 years ago
Mike Fährmann	e95b24f056	[reactor] add wait-min & -max options (#148 )	6 years ago
Mike Fährmann	8e01cf0ef8	[reactor] generalize extractors (#148 ) - support *.reactor.cc domains - combine joyreactor and pornreactor modules	6 years ago
Mike Fährmann	38500ad697	[postprocessor:metadata] first implementation (#135 )	6 years ago
Mike Fährmann	1737d7f576	[joyreactor] fix and improve pagination (#148 )	6 years ago
Mike Fährmann	8753627ef4	[joyreactor] improve error handling for faulty JSON (#148 ) - remove all ASCII escape codes, not just \n and \r - ignore faulty posts instead of letting the exception propagate	6 years ago
Mike Fährmann	a36f52a730	[joyreactor] add extractor for search results (#148 )	6 years ago
Mike Fährmann	a303efb597	[mangadex] handle manga pages without chapters	6 years ago
Mike Fährmann	0afa913de4	[tumblr] add tests for hidden and private blogs (#145 ) Hidden / dashboard-only blogs are pretty straightforward and "only" require a valid 'access-token' and 'access-token-secret' for the given 'api-key' and 'api-secret', so that signed OAuth1.0 requests are possible. Private / password protected blogs on the other hand are a bit cumbersome. In addition to a valid 'access-token' and 'access-token-secret', they also require the account belonging to those tokens to be a member of the blog itself. Knowing the password and entering it in the website isn't enough to access a blog through the API. Following a private blog is also impossible, so that option can't work either.	6 years ago
Mike Fährmann	67cc0ac873	release version 1.6.2	6 years ago
Mike Fährmann	fa7fa2f8ff	[deviantart1 update tests]	6 years ago
Mike Fährmann	b7b5456a32	[kissmanga] use HTTPS	6 years ago
Mike Fährmann	259123732f	[readcomiconline] improve comic-page parsing	6 years ago
Mike Fährmann	0328a04a65	[cloudflare] don't output the whole challenge page thanks to the embedded animated gifs this is just a bit too much	6 years ago
Mike Fährmann	4ab0960083	[reddit] add metadata to extracted URLs	6 years ago
Mike Fährmann	2f4f60de33	[tumblr] add tests for each post type	6 years ago
Mike Fährmann	98314aa04c	[mangapark] detect non-existent chapters	6 years ago
Mike Fährmann	6c71e9cf5d	[deviantart] add separate 'sta.sh' extractor (#113 ) - supports multiple stashed deviations per page - explicitly mentions sta.sh support on supportedsites.rst	6 years ago
Mike Fährmann	f9ace0f4a3	[mangapark] fix manga extraction ... again	6 years ago
Mike Fährmann	28f9539551	[tumblr] change default values for post types and inline media	6 years ago
Mike Fährmann	5be95034ba	[tumblr] add option to download avatars (#137 )	6 years ago
Mike Fährmann	7471933d5f	use extractor.request for all other API calls - deviantart - pawoo - pixiv - reddit	6 years ago
Mike Fährmann	995844c915	[instagram] relax test pattern even more	6 years ago
Mike Fährmann	2e5f82e59e	[tumblr] don't follow 'external' Tumblr URLs (#139 )	6 years ago
Mike Fährmann	c5d4f558c9	allow missing field access keys in format strings (#136 )	6 years ago
Mike Fährmann	0c9762f00e	[mangapark] fix extraction	6 years ago
Mike Fährmann	c9ef5ed364	[luscious] ensure URLs have a scheme	6 years ago
Mike Fährmann	851ee9f89f	[sensescans] replace tests the old ones got removed	6 years ago
Mike Fährmann	c14d44e1bc	[downloader:common] retry downloads on SSL errors (#130 )	6 years ago
Mike Fährmann	0be7ee3106	[hitomi] fix image subdomains (closes #142 ) galleries with an ID ending in 1 need some special treatment	6 years ago
Mike Fährmann	fe96835d25	[kissmanga] add fallback for chapter-string parsing (#20 )	6 years ago
Mike Fährmann	4d73cc785d	update test results	6 years ago
Mike Fährmann	049a9575c4	[tumblr] fix inline extraction #2 Using only the "comment" field isn't enough ... [ci skip]	6 years ago
Mike Fährmann	f6bf66f72c	[pixiv] create directory for each "work" item (#136 )	6 years ago
Mike Fährmann	79f6755c60	[postprocessor:classify] handle missing "extension" (#138 )	6 years ago
Mike Fährmann	b7a9f6cc49	[tumblr] improve inline extraction (#137 )	6 years ago
Mike Fährmann	010da8372a	[instagram] relax test pattern	6 years ago
Mike Fährmann	1c6b9ba322	[readcomiconline] use HTTPS	6 years ago
Leonardo Taccari	2655a2ea02	Add support for instagram.com user profiles and pages (#134 ) * [instagram] Add extractor for instagram.com user profiles and pages The extractor scrapes `instagram.com/<user>' timelines and `instagram.com/p/<shortcode>' by mimicking the behaviour of a web browser and extracting the sharedData JSON of the single pages. Please note that this mean that for user timelines we also do an extra request to the `instagram.com/p/<shortcode>' page but this permit to have consistent (and all) information about the media fetched. The MD5 logic used for X-Instagram-GIS was documented in <https://stackoverflow.com/questions/49786980/> * [instagram] Test for keywords, not url for GraphImage and GraphSidecar URLs returned by instagram seems not stable so avoid testing for them and instead test for keyword returned. * [instagram] Improve test of InstagramProfilepageExtractor Also check the count of media returned. * [instagram] Several cleanup and improvements - Change description, subcategories to generate a better description in docs/supportedsite.rst - Remove not needed InstagramExtractor.__init__() - Use text.parse_int() instead of directly using int() (the former is more robust) - Use self.request().json() instead of using json.loads() the self.request().text() - Add `pattern:' to check the URLs where we do not have a stable URLs. It seems that only the subdomain is not stable. Thanks to @mikf!	6 years ago
HRXN	e80ee77d71	tumblr.py: update regex for video (#133 ) There seems to be another sub-domain for videos, apparently.. Not just `vt(.media).tumblr` `vtt(media).tumblr` But also `ve(.media).tumblr`	6 years ago
Mike Fährmann	9a98b6769d	use extractor.request for API calls (#130 ) ... at least for OAuth1.0 based APIs (flickr, smugmug, tumblr)	6 years ago
Mike Fährmann	0225d90078	add exception name and traceback for OSErrors	6 years ago
Mike Fährmann	ad2cefda6b	[tumblr] in case of exception use filename as 'hash' (#129 ) While a filename might not be a real 'hash', or comparable to what tumbler usually provides, it is still better than an empty string. At least as long as "alternatives" in format strings aren't implemented.	6 years ago
Mike Fährmann	95636418ad	[tumblr] catch exception for 'hash' extraction (fixes #129 )	6 years ago
Mike Fährmann	40e30694f3	[pinterest] fix pin.it redirects	6 years ago
Mike Fährmann	770200888e	[gfycat] use public API endpoint	6 years ago
Mike Fährmann	b1e22e8354	release version 1.6.1	6 years ago
Mike Fährmann	5d6e219fb2	[joyreactor] update tests	6 years ago
Mike Fährmann	c59f56fe7e	[gfycat] fix extraction /cajax/get/<id> doesn't work anymore	6 years ago
Mike Fährmann	ba56827f36	[newgrounds] add user-, video-, image-extractors (#119 )	6 years ago
Mike Fährmann	15890930ea	[mangafox] fix extraction use mobile version since desktop version is obfuscated	6 years ago
Mike Fährmann	a4263fb253	[luscious] add extractor for search results (closes #127 )	6 years ago
Mike Fährmann	fb53b5dd55	fix control+c during -j and range tests	6 years ago
Mike Fährmann	a0ae156edc	[pornreactor] add tag-, user-, post-extractors (#114 )	6 years ago
Mike Fährmann	bacbc2e7bd	[joyreactor] try to prevent JsonDecodeErrors (#114 )	6 years ago
Mike Fährmann	503d42a1c2	[joyreactor] add tag-, user-, post-extractors (#114 )	6 years ago
Mike Fährmann	59bb434ba5	[flickr] add ability to download all albums of a user for example with 'https://www.flickr.com/photos/shona_s/albums'	6 years ago
Mike Fährmann	13cb270326	set target directory before postprocessor init (fixes #126 )	6 years ago
Mike Fährmann	9e188f6a21	[4chan] support 4channel.org domain	6 years ago
Mike Fährmann	041bd501fc	[hentaifoundry] unescape YII_CSRF_TOKEN value This fixes the POST requests to /site/filters	6 years ago
Mike Fährmann	b828473aa3	retry HTTP requests for more exception classes	6 years ago
Mike Fährmann	c2e59b9a7d	update CHANGELOG.md [ci skip]	6 years ago
Mike Fährmann	d4b2b73bef	release version 1.6.0	6 years ago
Mike Fährmann	c47482b110	smaller changes, missing docs, etc. - make 'netrc' extractor-specific - rename 'downloader.enable' to 'enabled' - document 'downloader.ytdl.format' - consistent newlines in configuration.rst	6 years ago
Mike Fährmann	b17a5d6f3b	give downloader classes proper names	6 years ago
Mike Fährmann	3c25fa2dad	update build_testresult_db.py script	6 years ago
Mike Fährmann	7f6a0be982	adjust some tests	6 years ago
Mike Fährmann	baad7b0fa5	[twitter] unpack API responses when logged in (closes #123 )	6 years ago
Mike Fährmann	3bdfc15be1	[pinterest] don't crash on pins without image info	6 years ago
Mike Fährmann	8ef84a6823	add option to enable/disable specific downloader modules ... and write URLs with no (active) downloader to unsupported-file	6 years ago
Mike Fährmann	14ee6bf611	[behance] handle external URLs with youtube-dl	6 years ago
Mike Fährmann	36425122ff	[artstation] handle external URLs with youtube-dl	6 years ago
Mike Fährmann	bd8670d925	[gfycat] extend URL pattern	6 years ago
Mike Fährmann	2fa28a2609	update default user-agent string (closes #122 )	6 years ago
Mike Fährmann	7e2d6bcd62	[deviantart] fix original image downloads	6 years ago
Mike Fährmann	9e12e073ab	[2chan] fix extraction	6 years ago
Mike Fährmann	966a9ca3a0	update test results	6 years ago
Mike Fährmann	e26ba682a2	enforce utf-8 encoding for input files (#120 )	6 years ago
Mike Fährmann	a36259d8f1	update setup.py - add Python version check - add classifiers - simplify sys.exit() usage	6 years ago
Mike Fährmann	fd8ed35591	[turboimagehost] fix extraction	6 years ago
Mike Fährmann	c69150f715	[imagefap] fix extraction also adds tags to gallery-metadata and converts suitable values to int	6 years ago
Mike Fährmann	d1f3d32eec	[fallenangels] unescape chapter titles	6 years ago
Mike Fährmann	655549df7c	[downloader:ytdl] add several options The "default" downloader options (rate, retries, timeout, verify) are mapped to corresponding youtube-dl options. downloader.ytdl.logging tells the downloader to pass youtube-dl's output to a Logger object. downloader.ytdl.raw-options allows to pass arbitrary options to the YoutubeDL constructor.	6 years ago
Mike Fährmann	d3d7f01543	add 'prepare()' step for post-processors This allows post-processors to modify the destination path before checking if a file already exists.	6 years ago
Mike Fährmann	c9861ca812	adjust message for status_code based exceptions from: 5xx HTTP Error: Reason to : 5xx: Reason The "HTTP Error" part was in there to emulate Request's error messages from response.raise_for_status(), but it reads a lot better without.	6 years ago
Mike Fährmann	eb1c24b286	[imagebam] detect nonexistent galleries	6 years ago
Mike Fährmann	6ed629f2b6	allow specifying number of skips before abort/exit (closes #115 ) In addition to 'abort' and 'exit', it is now possible to specify 'abort:N' and 'exit:N' (where N is any integer) as value for 'skip' to abort/exit after consecutively skipping N downloads.	6 years ago
Mike Fährmann	e1d306cc48	update unit test results	6 years ago
Mike Fährmann	8faf03ed84	[pixiv] use refresh_token based authentication The first login will still use username and password, but everything afterwards will use the refresh_token obtained from that. This will prevent pixiv from sending a "New login to pixiv" email every time a new access_token is requested.	6 years ago
Mike Fährmann	2221cf97ff	implement 'update()' for caches	6 years ago
Mike Fährmann	d8492df51b	[deviantart] extend functionality of 'original' option	6 years ago
Mike Fährmann	c00dce2adc	[behance] enable 'categorytransfer'	6 years ago
Mike Fährmann	1532d1b690	fix 'range' tests and update a few test results	6 years ago
Mike Fährmann	48a8717a7c	add 'output.num-to-str' option ... to convert any numeric values to string when outputting them as JSON (during '--dump-json' or otherwise)	6 years ago
Mike Fährmann	af3f81c7d9	add '--no-check-certificate' command-line option	6 years ago
Mike Fährmann	0514d6a0ae	make --filter and --range config-file options The functionality of --(chapter-)filter and --(chapter-)range are now also exposed as the following config-file options: - extractor..image-filter - extractor..image-range - extractor..chapter-filter - extractor..chapter-range TODO: update configuration.rst	6 years ago
Mike Fährmann	4a348990f4	adjust value resolution for retries/timeout/verify options This change introduces 'extractor..retries/timeout/verify' options as a general way to set these values for all HTTP requests. 'downloader.http.retries/timeout/verify' is a way to override these options for file downloads only and will fall back to 'extractor..…* values if they haven't been explicitly set. Also: downloader classes now take an extractor object as first argument instead of a requests.session.	6 years ago
Mike Fährmann	f647f5d9c3	use 'verify' option for regular HTTP requests	6 years ago
Mike Fährmann	ca6ac4db6a	fix 'content' tests	6 years ago
Mike Fährmann	188876d814	implement youtube-dl downloader module URLs starting with 'ytdl:' will now be handled by youtube-dl. There is probably a lot to fix and improve, but the basic use case works. TODO: - format selection and ytdl options in general - better filename/path handling - ytdl support for "unsupported URLs" - ...	6 years ago
Mike Fährmann	f4df6c2396	[pixiv] remove 'type' and 'page' query parameter handling The "new and improved" /member_illust.php and /bookmark.php listings don't quite work with how things were.	6 years ago
Mike Fährmann	d70db2d555	Revert "[komikcast] fix extraction" This reverts commit `5507f5ce2e`.	6 years ago
Mike Fährmann	d69db60e2a	update unit test results	6 years ago
Mike Fährmann	f8b3b00249	[twitter] add experimental 'videos' option (#99 ) Enabling this option will detect videos in tweets and output them as "unsupported" URLs, so that these can then be downloaded with youtube-dl There are a lot of improvements to be made to the current implementation, but it works and does what it is supposed to, even if inefficient as can be ...	6 years ago
Mike Fährmann	5507f5ce2e	[komikcast] fix extraction	6 years ago
Mike Fährmann	8080071174	[flickr] improve album metadata (closes #109 )	6 years ago
Mike Fährmann	537448ba6e	[yuki] fix extraction of older threads (closes #112 )	6 years ago
Mike Fährmann	1acaed73e0	[warosu] improve extraction and metadata - convert values to int - unquote original filenames - don't parse posts twice	6 years ago
Mike Fährmann	2cf3f53839	[yuki] add thread extractor (closes #111 )	6 years ago
Mike Fährmann	09d2f3e5e7	[postprocessor:ugoira] improve libx264 detection	6 years ago
Mike Fährmann	c402cc4047	[hentaifoundry] add 'popular' and 'recent' extractors for "Popular Pictures" and "Recent Pictures" listings	6 years ago
Mike Fährmann	a5fc311dfa	[hentaifoundry] add 'favorite' extractor	6 years ago
Mike Fährmann	1c95a0173f	[hentaifoundry] split 'artist' into 'user'+'artist' and some smaller changes ... 'user' is the name of the account an image is listed at and 'artist' is now the name of the account who created the image. For example "https://www.hentai-foundry.com/user/Tenpura/faves/pictures" - 'user': Tenpura - 'artist' of the only image: LewdBrush	6 years ago
Mike Fährmann	55f5c87160	[postprocessor:ugoira] add 'libx264-prevent-odd' option A rather crude workaround for "width/height not divisible by 2" errors when using libx264.	6 years ago
Mike Fährmann	8c8da11bb8	do not create directory structures when using '-s'	6 years ago
Mike Fährmann	e066f35118	update extractor tests	6 years ago
Mike Fährmann	006f75b538	[hentaifoundry] rewrite + more metadata - extract width, height, artist per image - improve pattern regex - better extensibility for other listings	6 years ago
Mike Fährmann	eeb7424783	[hentaifoundry] add support for "scraps" (#110 )	6 years ago
Mike Fährmann	6ea9a78588	[wallhaven] add login capabilities Being logged in is required to access NSFW wallpapers.	6 years ago
Mike Fährmann	c9290d8212	[wallhaven] add wallpaper and search extractors todo: - login support to gain access to NSFW wallpapers - extractors for tag-, similar-, latest-listings - skip() support	6 years ago
Mike Fährmann	26cbcb3a72	[flickr] improve error handling (#109 )	6 years ago
Mike Fährmann	2be4c9ffe3	[sankaku] small code improvements	6 years ago
Mike Fährmann	529aa21dd9	move FileAdapter definition into recursive.py	6 years ago
Mike Fährmann	31a5c7c2c0	release version 1.5.3	6 years ago
Mike Fährmann	22ab509a70	[bobx] rename "model" to "idol" extractor	6 years ago
Mike Fährmann	99137f1bee	[sankaku] send login info as formdata Previously they were erroneously send as URL parameters.	6 years ago
Mike Fährmann	fa64c38d5b	[sankaku] fix pagination for user favorites (#106 )	6 years ago
Mike Fährmann	69fd61ea86	[bobx] add gallery and model extractors	6 years ago
Mike Fährmann	0232d80cec	[deviantart] convert 'published_time' to int (fixes #108 ) The 'published_time' field (a timestamp) changed from integer to string and caused journal creation to fail.	6 years ago
Mike Fährmann	7742cf8601	[tumblr] change 'reblogs' option (#103 ) - rename "deleted" to "same-blog" - change test for deleted original post to test if original post owner has the same UUID (full blog name) as the one being downloaded from - add 'blog[uuid]' metadata to allow comparison with 'reblogged_from_uuid'	6 years ago
Mike Fährmann	d4d95d3154	[tumblr] improve rewrite rules for video URLs	6 years ago
Mike Fährmann	542a25c389	[ngomik] fix extraction	6 years ago
Mike Fährmann	a666ddd16b	[tumblr] extend 'reblogs' functionality (#103 ) Setting 'reblogs' to "deleted" will check if the parent post of a reblog has been deleted and download its media content if that is the case, otherwise it will be skipped. This is a rather costly operation (1 API request per reblogged post) and should therefore be used with care.	6 years ago
Mike Fährmann	c9b8e6aefc	[reddit] fix submission-ID parsing (#104 ) Uppercase characters caused a ValueError exception	6 years ago
Mike Fährmann	488abeca0b	[hentaicafe] adjust default directory format A separate folder for each chapter is rather pointless if almost all manga have only one chapter each.	6 years ago
Mike Fährmann	b4eca2633e	[tumblr] support /archive URLs	6 years ago
Mike Fährmann	aa1de70da0	[tumblr] recognize inline videos (#102 )	6 years ago
Mike Fährmann	3ecea4cf36	[hentaicafe] add chapter and manga extractors (#101 )	6 years ago
Mike Fährmann	41249f3ead	improve extractor.get_downloader()	6 years ago
Mike Fährmann	eb3185d6a3	update exception hierarchy	6 years ago
Mike Fährmann	e9ae6fd080	improve downloader/postprocessor module loading - handle arguments of any type without propagating an exception - prevent potential security risk through relative imports	6 years ago
Mike Fährmann	712b58a93b	[postprocessor] add black-/whitelist options Each post-processor config dict now supports a list of extractor categories for which it should/shouldn't be active for. For example: "postprocessors": [ {"name": "classify", "whitelist": ["tumblr", "deviantart"], ... } ]	6 years ago
Mike Fährmann	0bc8ef51c8	[smugmug] Handle albums with no explicit owner (#100 )	6 years ago
Mike Fährmann	ff83ee22b0	release version 1.5.2	6 years ago
Mike Fährmann	b47af4637a	[mangadex] update URL pattern Manga URLs now begin with /title/ instead of /manga/	6 years ago
Mike Fährmann	75862715ac	[behance] add user extractor	6 years ago
Mike Fährmann	a493fed376	[deviantart] fix journal creation if no 'username' is set	6 years ago
Mike Fährmann	6ecb36d88c	[postprocessor:ugoira] add 'ffmpeg-output' option	6 years ago
Mike Fährmann	02a4a67f6d	[postprocessor:ugoira] support danbooru sources	6 years ago
Mike Fährmann	5b8a314de7	[tumblr] replace inline URLs with higher quality ones (#98 )	6 years ago
Mike Fährmann	2af2bb7911	[mangadex] fix relative page URLs	6 years ago
Mike Fährmann	590c0b3ad5	re-implement and improve filename formatter A format string now gets parsed only once instead of re-parsing it each time it is applied to a set of data. The initial parsing causes directory path creation to be at about 2x slower than before, since each format string there is used only once, but building a filename, the more common operation, is at least 2x faster. The "directory slowness" cancels at about 5 filenames and everything above that is significantly faster.	6 years ago
Mike Fährmann	34b556922d	update/restore tests	6 years ago
Mike Fährmann	ab2bfaeb46	[ngomik] add replacement for 'subapics' http://subapics.com/ got discontinued and replaced by http://ngomik.in/. ngomik.in is still displaying a link to the "old site" showing a big "Account Suspended" sign.	6 years ago
Mike Fährmann	a2eeef1f5e	[behance] replace test The "UVMW Studio" account and their galleries are gone.	6 years ago
Mike Fährmann	e9dd2eff1d	[twitter] add extractor for media-tweet timelines (#96 ) For example "https://twitter.com/PicturesEarth/media". They are different from normal timelines in that they do not contain any (re)tweets from other users and feature all media the user ever posted, including responses to other tweets.	6 years ago
Mike Fährmann	f45c9f2141	[gfycat] test-updates and code-adjustments	6 years ago
Mike Fährmann	9b1c39032c	[twitter] changes and improvements - rename User- to TimelineExtractor - rename 'userid' to 'user_id' to conform to the other ..._id values - adjust archive_fmt to deal with retweets - emulate browser behavior for API calls	6 years ago
Mike Fährmann	10365394d7	[twitter] add support for user-timelines (closes #96 ) also adds a 'retweets' option to filter retweeted content	6 years ago
Mike Fährmann	e3055d356c	release version 1.5.1	6 years ago
Mike Fährmann	d3f1eed2a6	[pinterest] improvements - add stop condition for pin-related pins - improve URL patterns - make Pylint happy	6 years ago
Mike Fährmann	2801a0d997	[exhentai] skip "Content Warning" page when not logged in (closes #97)	6 years ago
Mike Fährmann	63fa0b2006	[pinterest] add extractors for related pins Related pins can not be accessed by adding a "#related" fragment to the end of a Pinterest URL, for example: - https://www.pinterest.com/pin/858146903966145189/#related - https://www.pinterest.com/g1952849/test-/#related There are no explicit real URLs for related pins, using an option to enable them results in "clunky" code, and a custom "related:<URL>" scheme doesn't feel right either.	6 years ago
Mike Fährmann	1694039de0	[komikcast] update ad-filter	6 years ago
Mike Fährmann	a74591b84b	[tumblr] remove "original image" functionality Accessing higher/original quality images on https://s3.amazonaws.com/data.tumblr.com and http://data.tumblr.com is no longer possible and any HTTP request results in 403 Forbidden. A few images can still be accessed through https//a.tumblr.com [1][2], but not as "_raw", just "_1280", and that might also be "fixed" in the near future. [1] https://a.tumblr.com/tumblr_kzjlfiTnfe1qz4rgho1_1280.jpg [2] https://a.tumblr.com/ee589c6345f29d2d5935cecb49b0a705/tumblr_oztu02dIHp1wgha4yo1_1280.png	6 years ago
Mike Fährmann	38d4f43cc0	[komikcast] skip ads	6 years ago
Mike Fährmann	4313c95bc9	improve error message for OAuth2 authentication	6 years ago
Mike Fährmann	b55e39d1ee	[mangadex] improve extraction - cache manga API results - add artist, author and date fields to chapter metadata - remove Manga-/ChapterExtractor inheritance - minor code simplifications and improvements	6 years ago
Mike Fährmann	b1c4c1e13c	[mangadex] fix extraction	6 years ago
Mike Fährmann	3c90df6635	[piczel] add user, folder and image extractors	6 years ago
Mike Fährmann	2a9f3341a2	[behance] fix title extraction	6 years ago
Mike Fährmann	3fc2f269fa	[behance] filter 'fields' list	6 years ago
Mike Fährmann	b67339155f	[rule34] update test results 'metadata' tag type has been removed	6 years ago
Mike Fährmann	a86f2bfc80	[pinterest] update not-found redirects	6 years ago
Mike Fährmann	7442d2940c	release version 1.5.0	6 years ago
Mike Fährmann	b040ca0718	[rule34] small unit test fixes	6 years ago
Mike Fährmann	b164231bca	[sankaku] increase default values for 'wait-min/-max'	6 years ago
Mike Fährmann	68d6033a5d	use 'retries' and 'timeout' options for regular HTTP requests	6 years ago
Mike Fährmann	f3793660ef	update tests	6 years ago
Mike Fährmann	df082e923c	[behance] add gallery extractor (#95 )	6 years ago
Mike Fährmann	c83fc62abc	prioritize archive over disk access (#87 )	6 years ago
Mike Fährmann	e0dd8dff5f	implement L<maxlen>/<replacement>/ format option The L option allows for the contents of a format field to be replaced with <replacement> if its length is greater than <maxlen>. Example: {f:L5/too long/} -> "foo" (if "f" is "foo") -> "too long" (if "f" is "foobar") (#92) (#94)	6 years ago
Mike Fährmann	5f27cfeff6	[deviantart] remove `prefer-public` option All API requests now always use a public token and only switch to a private token for pagination results if `refresh-token` is set and less deviations than requested were returned.	6 years ago
Mike Fährmann	bb89a1e6d7	[mangahere] use http:// invalid SSL cert for quite some time now	6 years ago
Mike Fährmann	212130b048	[deviantart] improve public-private token switching - rename option to `prefer-public` - now also works for galleries with less than 24 items	6 years ago
Mike Fährmann	886d662582	[deviantart] add option to minimize refresh-token usage Always trying with a public token first and repeating the API request with a private token if deviations are missing doesn't quite work for galleries and folders with less than 25 items, so its an option and not the default.	6 years ago
Mike Fährmann	d98e47817d	[deviantart] reduce refresh-token usage Instead of using a refresh-token-based access-token for every API request, they are now only used for paginated results. API requests to get a user's profile and the original download URL now always use a public access-token.	6 years ago
Mike Fährmann	54a0d72dc8	[postprocessor:ugoira] improve frame rate handling By default FFmpeg assumes a 25 FPS input frame rate, leading to dropped frames if the source requires a higher frame rate than that. This commit adds a `framerate` option (default "auto"), which allows to automatically assign a (more or less) fitting frame rate based on delays between ugoira frames and avoids dropped frames.	6 years ago
Mike Fährmann	84854fcad7	[myportfolio] add user and gallery extractors (#95 )	6 years ago
Mike Fährmann	39f609b4c6	include current Git HEAD in debug output	6 years ago
Mike Fährmann	c9f70e0a19	[paheal] use HTTPS	6 years ago
Mike Fährmann	e8311eb1ed	drop Python 3.3 support	6 years ago
Mike Fährmann	ff436692bf	["deviantart] add 'journals' option	6 years ago
Mike Fährmann	00032b828c	[deviantart] add 'wait-min' option	6 years ago
Mike Fährmann	a6fe2bb594	[whatisthisimnotgoodwithcomputers] remove extractor	6 years ago
Mike Fährmann	0ba93650e0	[8chan] replace unit test URL the other thread is no longer accessible	6 years ago
Mike Fährmann	8fe9056b16	implement string slicing for format strings It is now possible to slice string (or list) values of format string replacement fields with the same syntax as in regular Python code. "{digits}" -> "0123456789" "{digits[2:-2]}" -> "234567" "{digits[:5]}" -> "01234" The optional third parameter (step) has been left out to simplify things.	6 years ago
Mike Fährmann	269dc2bbd5	[sankaku] add 'tags' option (#94 )	6 years ago
Mike Fährmann	173add6935	[nijie] fix artist_id extraction view_popup.php pages for older images or dojins either have the artist_id value at a different place or not at all.	6 years ago
Mike Fährmann	6996f5c118	[mangahere] fix and improve chapter extraction	6 years ago
Mike Fährmann	764331823b	release version 1.4.2	6 years ago
Mike Fährmann	1d43cbbf52	[gelbooru] tag-splitting for non-api mode	6 years ago
Mike Fährmann	2eefaa99a3	[mangapark] support .net and .com mirrors	6 years ago
Mike Fährmann	c20c0a4820	[safebooru] add pool extractor	6 years ago
Mike Fährmann	f916279ae6	[rule34] add pool extractor	6 years ago
Mike Fährmann	3dbc7c5f8d	[gelbooru] restore pool functionality	6 years ago
Mike Fährmann	a2c74bc6f0	[gelbooru] inherit from BooruExtractor class Breaks pool functionality when using API calls (for now), but reduces code clutter and enables the `tags` option.	6 years ago
Mike Fährmann	4a57509392	generalize tag-splitting option (#92 ) - extend functionality to other booru sites: - http://behoimi.org/ - https://konachan.com/ - https://e621.net/ - https://rule34.xxx/ - https://safebooru.org/ - https://yande.re/	6 years ago
Mike Fährmann	188e956c4e	[imagefap] use HTTPS + update test results	6 years ago
Mike Fährmann	87853538b4	[yandere] add option to split tags by type (#92 )	6 years ago
Mike Fährmann	a699787d01	[deviantart] update URL patterns to new format DeviantArt changed its URL format from https://<name>.deviantart.com/... to https://www.deviantart.com/<name>/... With this change both formats will be supported.	6 years ago
Mike Fährmann	9e3415886c	[senmanga] fix/update tests	6 years ago
Mike Fährmann	973cf98e88	fix download skip for files without extension	6 years ago
Mike Fährmann	b8c97d2295	use 'extractor.request()' for more HTTP requests	6 years ago
Mike Fährmann	cc15c6105c	release version 1.4.1	6 years ago
Mike Fährmann	150a6b9064	[xvideos] fix metadata extraction	6 years ago
Mike Fährmann	7a98cc9798	[smugmug] update tests My test account expired and all uploaded images got deleted.	6 years ago
Mike Fährmann	4eb94aca17	[postprocessor:ugoira] pass '-f' if not present	6 years ago
Mike Fährmann	0c1c4557dd	[postprocessor:ugoira] add option for two-pass encoding	6 years ago
Mike Fährmann	a9e276bc37	reset delete-flag Since 'PathFormat' objects are being reused, setting `delete` to True once caused all files downloaded after to be deleted as well.	6 years ago
Mike Fährmann	91340d9d27	[pixiv] fix ugoira test	6 years ago
Mike Fährmann	709c5d466d	add '--zip' and '--ugoira-conv' command-line options	6 years ago
Mike Fährmann	eb7a1f3b98	[pixiv] rework ugoira handling Frame information now gets attached to the ZIP file's keyword dict instead of being written to a separate text file.	6 years ago
Mike Fährmann	017188d268	improve extractor.request() Replace the 'fatal' parameter with 'expect', which is a list/range of HTTP status codes >= 400 that should also be accepted.	6 years ago
Mike Fährmann	613b692275	[postprocessor:ugoira] add a few options - ffmpeg-location: path to the ffmpeg (or avconv) executable - ffmpeg-args: additional command line args for ffmpeg - extension: filename extension of the resulting video file	6 years ago
Mike Fährmann	a444755979	[postprocessor] add 'ugoira' to convert pixiv animations to webm	6 years ago
Mike Fährmann	f10bd5cdbe	[4chan] unescape filenames	6 years ago
Mike Fährmann	eec081dd3e	[postprocessor:zip] delete directory (#85 )	6 years ago
Mike Fährmann	2d1a104739	[mangadex] unescape manga names and chapter titles pretty sure I previously tested if unescaping strings from the embedded JSON object was necessary ... maybe they changed it	6 years ago
Mike Fährmann	3bcce77f6d	release version 1.4.0	6 years ago
Mike Fährmann	6ac403c5d3	add postprocessor config example	6 years ago
Mike Fährmann	2403c405e3	Merge branch 'postprocessor'	6 years ago
Mike Fährmann	baccf8a958	improve postprocessor handling - add pathfmt argument for __init__() - add finalization step - add option to keep or delete zipped files	6 years ago
Mike Fährmann	2628911ba0	[pp:exec] add 'async' option	6 years ago
Mike Fährmann	7646bdbcfd	improve postprocessor initialization code	6 years ago
Mike Fährmann	37d97ff02c	[pp:classify] use temppath	6 years ago
Mike Fährmann	97189e50cd	[pp:zip] use temppath; add options	6 years ago
Mike Fährmann	821535b458	adjust PathFormat class	6 years ago
Mike Fährmann	a47c6136cd	[simplyhentai] avoid redirects for all-pages.json (#89 )	6 years ago
Mike Fährmann	ad14de19c6	[imgur] support "unmuted" URLs	6 years ago
Mike Fährmann	72e66f0aac	[simplyhentai] improve URL pattern [ci skip]	6 years ago
Mike Fährmann	cdcc3427a0	[simplyhentai] add video extractor (#89 ) All videos hosted on their own servers seem be to dead, but myhentai.tv embeds, which are most of the videos, work fine.	6 years ago
Mike Fährmann	f9a6a19658	[simplyhentai] add image extractor (#89 )	6 years ago
Mike Fährmann	ebf596b399	[pawoo] restore metadata fields + smaller improvements	6 years ago
Mike Fährmann	f7e7306e5a	[komikcast] update URL pattern and unescape image URLs	6 years ago
Mike Fährmann	70f3617d88	[mangafox] fix URL extraction	6 years ago
Mike Fährmann	a62bd81e9b	[pixiv] fix filter for 'type=all'	6 years ago
Mike Fährmann	12797e3b1f	update configuration.rst ... again - some more 'Path' references - fixed some inconsistencies and errors - added note about logging config for files	6 years ago
Mike Fährmann	55b0913412	[simplyhentai] add gallery extractor (#89 )	6 years ago
Mike Fährmann	ae9a37a528	implement text.split_html()	6 years ago
Mike Fährmann	b08d95ebe4	add an 'encoding' option for logging files (default 'utf-8')	6 years ago
Mike Fährmann	513d807632	explicitly open config files as utf-8	6 years ago
Mike Fährmann	2df1a15fb8	add '-s/--simulate' to run data extraction without download Useful for quick testing (even though -g and -j kind of do the same) and to fill a download archive without actually downloading the files. -s does the same as the default behaviour, except downloading stuff. Maybe it should get a more fitting name, as it does actually write to disk (cache, archive)?	6 years ago
Mike Fährmann	15cce22d82	[mangadex] fix parsing of unusual chapter strings	6 years ago
Mike Fährmann	ecdc3475b8	[pixhost] support .to TLDs	6 years ago
Mike Fährmann	f3d770d4e2	Merge branch '1.4-dev'	6 years ago
Mike Fährmann	d0ae3ed52c	[postprocessor] add 'zip' to write files to a ZIP archive (#85)	6 years ago
Mike Fährmann	ca4008e1c1	[postprocessor] add 'classify' to sort downloads by fileext	6 years ago
Mike Fährmann	d378c0a323	[postprocessor] add 'exec' to execute user-defined processes	6 years ago
Mike Fährmann	76c32d58e5	[postprocessor] initial code	6 years ago
Mike Fährmann	1ff626db97	[pixiv] improve bookmark extraction - combine 'favorite' and 'bookmark' extractors - it is now one extractor class, but its subcategory still distinguishes between your own bookmarks ('bookmark') and other user's bookmarks ('favorite') like before - allow filtering by bookmark tags and public/private bookmarks - fix pagination for bookmark results	6 years ago
Mike Fährmann	0a1863fce3	[pixiv] respect more query parameters for user URLs The API endpoint responsible for user illustrations does not provide sufficient filter capabilities* to match the actual website, so we are spinning our own filters. Respected parameters are 'type': illust, manga, ugoira 'tag' : any image tag (this was already supported) 'p' : the page to start on * - API can filter for illustrations and manga, but not for ugoira. - 'offset' is applied before filtering - no 'tag' filter	6 years ago
Mike Fährmann	f43d446692	[mangahere] extract chapter titles	6 years ago
Mike Fährmann	b8e53b8c6b	[pixiv] move query parsing out of constructor better exception handling, among other things	6 years ago
Mike Fährmann	909d105ae6	[pixiv] add extractor for illusts from followed users	6 years ago
Mike Fährmann	7f899bd5d8	Merge branch 'master' into 1.4-dev	6 years ago
Mike Fährmann	fe69d01083	[pixiv] add extractor for search results	6 years ago
Mike Fährmann	247f785af1	[pixiv] use App API Transitioning to the App API breaks favorites archive IDs (there is no longer any bookmark ID information), but the favorites API endpoint of the public API was gone anyways ...	6 years ago
Mike Fährmann	92fc199b07	[reddit] allow arbitrary subdomains	6 years ago
Mike Fährmann	4cea886177	[imgur] allow longer album hashes	6 years ago
Mike Fährmann	e1e23165a0	[pinterest] catch JSON decode errors	6 years ago
Mike Fährmann	789608c107	[imagebam] fix extraction for certain galleries	6 years ago
Mike Fährmann	7a58151566	fix util.parse_bytes invocations (should be text.parse_bytes)	6 years ago
Mike Fährmann	1c1e086d01	use common base class for OAuth1.0 based API interfaces	6 years ago
Mike Fährmann	f3483a2b7c	[smugmug] add OAuth support	6 years ago
Mike Fährmann	6a31ada9e3	re-implement OAuth1.0 code OAuth support for SmugMug needs some additional features (auth-rebuild on redirect, query parameters in URL, ...) and fixing this in the old code wouldn't work all that well.	6 years ago
Mike Fährmann	ec158776ed	[deviantart] add extractor for popular listings	6 years ago
Mike Fährmann	0e3883303f	[pixiv] implement AppAPI wrapper	6 years ago
Mike Fährmann	e2157f594e	[mangadex] fix manga extraction (closes #84 ) Chapter listings for manga now use https://mangadex.org/manga/<id>/_/chapters/2/ as URL instead of https://mangadex.org/manga/<id>/_//2/	6 years ago
Mike Fährmann	69a5e6ddb3	Merge branch 'master' into 1.4-dev	6 years ago
Mike Fährmann	82c50fa609	release version 1.3.5	6 years ago
Mike Fährmann	3ce5296313	[smugmug] code cleanup - combine User and Node extractors - (re)move miscellaneous helper functions - rename "Owner" to "User"	6 years ago
Mike Fährmann	42ed7667b8	[smugmug] support user- and general album URLs	6 years ago
Mike Fährmann	8bf3cdd82b	implement logging options Standard logging to stderr, logfiles, and unsupported URL files (which are now handled through the logging module) can now be configured by setting their respective option keys (log, logfile, unsupportedfile) to a dict and specifying the following options; - format: format string for logging messages available keys: see [1] default: "[{name}][{levelname}] {message}" - format-date: format string for {asctime} fields in logging messages available keys: see [2] default: "%Y-%m-%d %H:%M:%S" - level: the lowercase levelname until which the logger should activate; available levels are debug, info, warning, error, exception default: "info" - path: path of the file to be written to - mode: 'mode' argument when opening the specified file can be either "w" to truncate the file or "a" to append to it (see [3]) If 'output.log', '.logfile', or '.unsupportedfile' is a string, it will be interpreted, as it has been, as the filepath (or as format string for .log) [1] https://docs.python.org/3/library/logging.html#logrecord-attributes [2] https://docs.python.org/3/library/time.html#time.strftime [3] https://docs.python.org/3/library/functions.html#open	6 years ago
Mike Fährmann	2ea0d1da42	[smugmug] improve API code; use data expansions	6 years ago
Mike Fährmann	16e014baaa	[smugmug] added image and album extractor just some initial code that still requires a lot of work ... TODO: - folders - old-style albums (which are nearly all of them ...) - images from users - OAuth It could also happen that the API credentials used will become invalid whenever my 14 day trial period ends (7 days remaining), but that would just require users to supply their own.	6 years ago
Mike Fährmann	d96b3474e5	[puremashiro] remove module site has been unreachable for a couple of weeks and now the DNS record is gone as well	6 years ago
Mike Fährmann	b44a296404	[gomanga] remove module site has been unreachable for a couple of weeks and the cloudflare status page shows host errors	6 years ago
Mike Fährmann	95392554ee	use text.urljoin()	6 years ago
Mike Fährmann	2395d870dd	[pinterest] unquote board and user names, better errors	6 years ago
Mike Fährmann	8b79eaafea	[tumblr] log actual time of rate limit resets ... instead of the amount of seconds until a reset	7 years ago
Mike Fährmann	0f1e07f627	[pinterest] scrap OAuth implementation; code improvements OAuth authentication isn't needed anymore and other tools like Postman are better suited for this job anyway.	7 years ago
Mike Fährmann	55d4d23860	[pinterest] use Pinterest's "Web" API (#83 ) no access tokens, no user credentials of any kind ...	7 years ago
Mike Fährmann	2721417dd8	Merge branch 'master' into 1.4-dev	7 years ago
Mike Fährmann	c6d5154fc3	fix flake8 errors, ignore W504 pycodestyle 2.4.0 enforces some new style guidelines	7 years ago
Mike Fährmann	2d17a9e07f	improve extractor.request() - better retry behavior - exponential back-off - removed 'allow_empty' argument	7 years ago
Mike Fährmann	80521ae1f6	[deviantart] improve API error handling The previous implementation would retry requests with 4xx status codes in an infinite loop, which is especially a problem when querying non-existent users or groups. These are now properly handled with a NotFoundError exception.	7 years ago
Mike Fährmann	e54b43be08	[mangadex] add title info for chapter extractors	7 years ago
Mike Fährmann	f471161920	Merge branch 'master' into 1.4-dev	7 years ago
Mike Fährmann	a2020c736e	release version 1.3.4	7 years ago
Mike Fährmann	eb37fbf0e8	[hentaifoundry] improve extractor - use common base class - better pagination - respect '.../page/<num>' - implement skip() / --range support - get YII_CSRF_TOKEN from cookies	7 years ago
Mike Fährmann	80bead739d	[oauth] require custom client-* values for pinterest	7 years ago
Mike Fährmann	cc36f88586	rename safe_int to parse_int; move parse_* to text module	7 years ago
Mike Fährmann	ff643793bd	improve and document cloudflare bypass code	7 years ago
Mike Fährmann	10cc59f3b5	fix extractor names	7 years ago
Mike Fährmann	b1325d4d2c	fix extractor docstrings	7 years ago
Mike Fährmann	df7e18399e	[luscious] fix image order	7 years ago
Mike Fährmann	d10579edb5	[pinterest] improve PinterestAPI code; remove OAuth mentions on another note: access_tokens have been set to only allow for 10 requests per hour (from 200 yesterday)	7 years ago
Mike Fährmann	4bd182c107	[pinterest] implement `oauth:pinterest` (#83 ) Pinterest access tokens are rate limited at 200 requests per hour (or maybe per 2 or 3 hours?) so having just one access token for all users isn't going to work in the long run.	7 years ago
Mike Fährmann	9651f3fce0	[pinterest] improve error messages (#83 )	7 years ago
Mike Fährmann	dbe250f7e5	[pinterest] update access_token (#83 )	7 years ago
Mike Fährmann	dd49127408	[spectrumnexus] remove module Site stopped hosting manga scans (http://view.thespectrum.net/)	7 years ago
Mike Fährmann	5c487300ee	improve 'parse_query()' and add tests - another irrelevant micro-optimization ! - use urllib.parse.parse_qsl directly instead of parse_qs, which just packs the results of parse_qsl in a different data structure - reduced memory requirements since no additional dict and lists are created	7 years ago
Mike Fährmann	728c64a3fb	[tumblr] rename 'offset' to 'num and adjust formats Trying to somehow emulate Tumblr filenames is a bad idea ...	7 years ago
Mike Fährmann	4ffa94f634	remove 'shorten_path()' and 'shorten_filename()'	7 years ago
Mike Fährmann	27eab4e467	rewrite text tests and improve functions - test more edge cases - consistently return an empty string for invalid arguments - remove the ungreedy-flag in 'remove_html()'	7 years ago
Mike Fährmann	e3f2bd4087	add tests for 'text.clean_xml()' and improve it	7 years ago
Mike Fährmann	6d8b191ea7	improve 'parse_query()' and add tests - another irrelevant micro-optimization ! - use urllib.parse.parse_qsl directly instead of parse_qs, which just packs the results of parse_qsl in a different data structure - reduced memory requirements since no additional dict and lists are created	7 years ago
Mike Fährmann	51ea699083	add 'abort()' as function to filter expressions calling 'abort()' in a filter aborts the current extractor run in a cleaner way than using something like 1/0, which causes an error message to be printed	7 years ago
Mike Fährmann	6bd857a319	[tumblr] handle rate limits / 429 errors - wait for the hourly limit to reset - abort upon exceeding the daily limit (it doesn't seem useful to potentially wait for several hours)	7 years ago
Mike Fährmann	7073ab7707	[komikcast] update regex to only match manga pages The 'readerarea' section now includes some (shady) external Javascript file, which got matched as well.	7 years ago
Mike Fährmann	a1fa4b43b0	Revert "[tumblr] add option to sort photosets by upload order" This reverts commit `4a26ae32df`.	7 years ago
Mike Fährmann	48a83a89e9	[loveisover] remove module archive.loveisover.me was shut down on 2018-03-29; https://www.archiveteam.org/index.php?title=4chan#archive.loveisover.me	7 years ago
Mike Fährmann	564e12ca8f	replace 'imgyt' with 'imxto' https://img.yt/ wasn't available for a couple of days, but has now re-emerged as https://imx.to/ with a new web-interface. Links to older images still work (see tests).	7 years ago
Mike Fährmann	1b80fa82a9	[imgur] update URL pattern and tests	7 years ago
Mike Fährmann	4a26ae32df	[tumblr] add option to sort photosets by upload order	7 years ago
Mike Fährmann	6b72be8ee6	[tumblr] add 'hash' keyword 'hash' is the middle part of the filename in a tumblr image URL. For example an image with '.../tumblr_p6tgemp1NZ1wgha4yo1_250.png' as its URL would have 'p6tgemp1NZ1wgha4yo1' as hash.	7 years ago
Mike Fährmann	ffc0c67701	release version 1.3.3	7 years ago
Mike Fährmann	d11fcf4804	smaller changes and fixes - fix the cloudflare challenge result if the last decimal places are zero (JS`s toFixed() removes trailing zeroes) - fix downloading of kissmanga chapter-pages hosted on blogspot (accessing blogspot with "kissmanga.com" as referrer yields a 401) - disable certificate validation for 'mangahere' tests - update flickr test result	7 years ago
Mike Fährmann	f6c95dccf9	[cloudflare] fix bypass procedure Cloudflare challenges, at least for kissmanga and readcomiconline, now use slightly different Javascript expressions. Instead of a single value per expression, they now have a numerator and a denominator of a fractional value, which in the end gets truncated to 10 decimal places.	7 years ago
Mike Fährmann	759ba26fb0	[luscious] proper image order for picture albums ... and (try) to start with the first image instead of somewhere in the middle of an album.	7 years ago
Mike Fährmann	68e9fbee16	[tumblr] check all 4 keys/secrets before using OAuth it was possible to cause a crash by setting api-key or -secret to null. (this commit also slightly improves the blog-cache implementation)	7 years ago
Mike Fährmann	4810d446bb	remove the obsolete safeprint() and error() functions - safeprint() was used to print values which might have caused a UnicodeEncodeError, but that is no longer necessary (`0381ae5`) - errors are now handled via logging output (`f94e370`)	7 years ago
Mike Fährmann	0381ae5318	replace error handlers for stdout and co. Python3.5 and lower throw an UnicodeEncodeError when trying to print not-encodable characters when not using 'utf-8' as encoding. Setting their error handlers to 'replace' should help.	7 years ago
Mike Fährmann	f8168c693e	[tumblr] avoid calls to '/blog/.../info' The same information returned by the 'blog/.../info' API endpoint is also included in the result of every 'blog/.../posts' call.	7 years ago
Mike Fährmann	64d7c85b55	[exhentai] improve metadata - add 'width', 'height' and 'size' (in bytes) for each image - change the former 'size' and 'size_units' into 'gallery_size'	7 years ago
Mike Fährmann	64b22e0fc1	[pawoo] update URL pattern adds support for 'https://pawoo.net/@.../media'	7 years ago
Mike Fährmann	7b562907c3	[nijie] add favorites extractor adds support for 'https://nijie.info/user_like_illust_view.php?id=...'	7 years ago
Mike Fährmann	445db75955	[nijie] improve extraction and metadata - add 'title' and 'description' - split 'artist_id' into 'user_id' and 'artist_id' - 'user_id' is the ID of the user from which the image entry originates from - 'artist_id' is the ID of the actual image artist - improve pagination and URL patterns	7 years ago
Mike Fährmann	a112e3f2a0	[nijie] add doujin extractor adds support for "https://nijie.info/members_dojin.php?id=<artist_id>"	7 years ago
Mike Fährmann	f39153b6e9	[nhentai] add extractor for search results	7 years ago
Mike Fährmann	52d41c41e7	[exhentai] add extractor for favorited galleries	7 years ago
Mike Fährmann	63cc2599c4	[exhentai] add extractor for search results	7 years ago
Mike Fährmann	d1c91a1f2b	[mangadex] fix manga-page extraction	7 years ago
Mike Fährmann	299ae24996	[test] add a few downloader tests	7 years ago
Mike Fährmann	dd314279fb	[test] add unit tests for extractor module functions	7 years ago
Mike Fährmann	a993d0ea90	release version 1.3.2	7 years ago
Mike Fährmann	e7525b1b0e	[artstation] add challenge extractor (#80 )	7 years ago
Mike Fährmann	3f2dd6b6f8	avoid double path-separators (#74)	7 years ago
Mike Fährmann	f5c6a2d7f5	[nhentai] use API to get gallery info	7 years ago
Mike Fährmann	b2ba2b821d	[hitomi] fix image URLs and improve metadata - use '?a.hitomi.la' as subdomain depending in gallery-id - add 'characters', 'tags' and 'date' information - support multiple entires per metadata-value - rename 'num' to 'page'	7 years ago
Mike Fährmann	3905474805	[booru] call update_page() with correct dict (closes #82 )	7 years ago
Mike Fährmann	44c267e362	[artstation] add search extractor (#80 )	7 years ago
Mike Fährmann	40ca562d7b	[artstation] add album extractor (#80 )	7 years ago
Mike Fährmann	7121eeae8b	check supportedsites.rst in release script	7 years ago
Mike Fährmann	c59f9b71f1	release version 1.3.1	7 years ago
Mike Fährmann	f367d5c281	[deviantart] move delay-increase after expect_error check [ci skip]	7 years ago
Mike Fährmann	557cb94f81	[deviantart] use proper exponential backoff on API errors ... and use separate API credentials for unit tests.	7 years ago
Mike Fährmann	723cc66bb1	[artstation] add user-, image- and likes-extractors	7 years ago
Mike Fährmann	b69cc94f0e	[util] implement bencode()	7 years ago
Mike Fährmann	4d74749496	[tests] rework filters for extractor tests CI incompatible tests will now only be skipped if tests are run in a CI environment.	7 years ago
Mike Fährmann	d6ef52897c	[imgchili] remove module All previously hosted images yield a 404 and the main page is just a logo.	7 years ago
Mike Fährmann	7847ab1d5a	[imagehosts] remove even more dead sites All removed sites either - reject all incoming connections or - display a message from their domain registrar	7 years ago
Mike Fährmann	5f37d40a3e	[komikcast] bypass cloudflare challenge	7 years ago
Mike Fährmann	f9884e2338	[pixiv] update URL pattern add support for 'https://www.pixiv.net/user/<id>'	7 years ago
Mike Fährmann	85ed023c2e	[mangadex] remove the trailing ' - MangaDex' in a better way str.rstrip() works differently than assumed.	7 years ago
Mike Fährmann	9fb82e6b43	apply expand_path() to archive paths	7 years ago
Mike Fährmann	32bbd12f08	update extractor tests	7 years ago
Mike Fährmann	ca326bd275	[deviantart] fix folder and collection archive IDs {folder[index]} and {collection[index]} are both '0' when being delegated from Gallery- or FavoriteExtractors, as there is no way of knowing a folder's index when getting folder-information from the API.	7 years ago
Mike Fährmann	e32fe1cdf1	[pinterest] cast IDs to int ... and update test results. Image URLs changed from https://s-media-cache-ak0.pinimg.com/... to https://i.pinimg.com/...	7 years ago
Mike Fährmann	179ecee965	[turboimagehost] fix extraction	7 years ago
Mike Fährmann	1400868f53	[mangadex] general improvements - support >100 chapter entries per manga - custom archive ID format - detect non-existing chapters	7 years ago
Mike Fährmann	749fbbfa6c	[mangadex] add chapter- and manga-extractor	7 years ago
Mike Fährmann	b58449fd88	release version 1.3.0	7 years ago
Mike Fährmann	6e38cf5aab	[mangareader] use 'https://' The site now redirects from http://mangareader.net/ to https://mangareader.net/	7 years ago
Mike Fährmann	1d71123f91	[pixiv] update archive IDs and add metadata-fields (Pixiv bookmarks actually have their own IDs, comments and tags, independent of the bookmarked image, which makes creating an archive ID a lot easier)	7 years ago
Mike Fährmann	858fdbdb22	[tumblr] improve 'inline' extraction 'quote' posts store their HTML content in the 'source' field	7 years ago
Mike Fährmann	1d54a8e07d	fix logging output during downloads from: filename.ext[download][warning] ... to: filename.ext [download][warning] ...	7 years ago
Mike Fährmann	5008e105ee	update archive IDs ... to behave in a more straightforward way when dealing with bookmarks/favourites/etc. specific IDs are now grouped by their owner, album-id, ... to allow for duplicates when it would be expected.	7 years ago
Mike Fährmann	829ddf4ac1	[sankaku] general improvements - simplify regex - unquote search tags - increase default wait-time between HTTP requests - downloading several hundreds of images always resulted in '429 Too Many Requests' eventually - circumvent paging restrictions for unauthenticated users by only using the 'next' parameter - setting 'page' to a constant, low value (or simply omitting it) does the trick	7 years ago
Jad	49463f76bb	support multi-page URL (#79 ) * support multi-page URL * fix * all done. * fix, again	7 years ago
Mike Fährmann	19aefdfde3	[directlink] update test results	7 years ago
Mike Fährmann	74029c50bb	[directlink] unquote metadata fields	7 years ago
Mike Fährmann	2fad0b1f1b	add 'U' conversion for format strings to unquote their content (#74)	7 years ago
Mike Fährmann	8cdce21dcb	make archive keys user-configurable	7 years ago
Mike Fährmann	8f338347b6	[imagehosts] cleanup removed - chronos.to - unable to resolve hostname - coreimg.net - same - imgmaid.net - same - hosturimage.com - everything returns 404 - imageontime.org - redirects to some shady site - imgupload.yt - cloudflare error 522, host down - img4ever.net - read timeout	7 years ago
Mike Fährmann	edfd3d9fc9	[yeet] remove module - archive.yeet.net returns a 500 server error - yeet.net moved to yeet.rip, but the archive is gone	7 years ago
Mike Fährmann	e1e0668ca8	add option to set default replacement field value Missing or undefined keywords will now be replaced with the value set for 'keywords-default'. The default is Python's 'None', which is equivalent to setting this option to JSON's 'null'.	7 years ago
Mike Fährmann	ac3da8115e	[util] don't add text: URLs to list of downloaded URLs	7 years ago
Mike Fährmann	8704d850bf	add explicit proxy support (#76 ) - '--proxy' as command-line argument - 'extractor.*.proxy' as config option	7 years ago
Mike Fährmann	367b963d37	[pixiv] fix ugoira extraction ... again (#78 ) Some animations are not available for mobile devices, so we pretend to be a desktop browser when requesting the ugoira page.	7 years ago
Mike Fährmann	b79f1f2ca7	[pixiv] fix ugoira extraction (closes #78 )	7 years ago
Mike Fährmann	731ffd4986	improve text.filename_from_url() performance - urlsplit() is faster than urlparse() - rpartition() is faster than rindex() + slicing - new version is 2.3 times as fast	7 years ago
Mike Fährmann	d122203be1	[mangastream] fix extraction	7 years ago
Mike Fährmann	8809b32aed	release version 1.2.0	7 years ago
Mike Fährmann	b50bdbf3d7	change config specifiers in input file format Instead of a dictionary/object, input file options are now specified by a 'key=value' pair starting with '-' for options only applying to the next URL or '-G' for Global options applying to all following URLs. See the docstring of parse_inputfile() for details. Example option specifiers: - filename = "{id}.{extension}" - extractor.pixiv.user.directory = ["Pixiv Users", "{user[id]}"] -spaces="are_optional" -G keywords = {"global": "option"}	7 years ago
Mike Fährmann	f970a8f13c	fix adding keys to download archive when using skip=false	7 years ago
Mike Fährmann	179bcdd349	adjust archive-ids	7 years ago
Mike Fährmann	be3ea4425d	test archive-id creation and uniqueness	7 years ago
Mike Fährmann	3cec533c28	Merge branch 'archive'	7 years ago
Mike Fährmann	20af86b2ea	add more extractor tests for mangastream, reddit and imgur	7 years ago
Mike Fährmann	b73b8b4f50	add OAuth unittests	7 years ago
Mike Fährmann	4d2fadfb6f	restore skip actions with download archive	7 years ago
Mike Fährmann	65773263fc	[util] implement OAuthSession.urlencode() (closes #75 ) - Python's own urllib.parse.urlencode() has no quote_via argument in Python 3.3 and 3.4, which is necessary to follow OAuth 1.0 quoting rules.	7 years ago
Mike Fährmann	7e0207bcf4	[imgur] strip trailing '?1' from 'ext'	7 years ago
Mike Fährmann	cf147dfee9	[hentai2read] fix manga extraction - site changed its HTML structure	7 years ago
Mike Fährmann	f5f2d29f56	[nijie] fix dojin extraction - correctly extract artist_id - set extension to "jpg" if it was empty and let filetype checks do the rest	7 years ago
Mike Fährmann	7f7c16ae37	add option to specify additional key-value pairs	7 years ago
Mike Fährmann	d38bf2f54c	[tumblr] recognize /image/... URLs xyz.tumblr.com/image/123 refers to the same images as xyz.tumblr.com/post/123.	7 years ago
Mike Fährmann	057668e17e	extend input-file format with per-URL config and comments - see docstring of parse_inputfile() for details - TODO: unittests, recursion (currently setting for example {"extractor": {"key": "value"}} will override the whole "extractor" branch instead of merging {"key": "value"} into the already existing dictionary)	7 years ago
Mike Fährmann	5b3c34aa96	use generic chapter-extractor in more modules	7 years ago
Mike Fährmann	347baf7ac5	improve util.parse_range() performance It is never going to actually matter, but using partition() instead of split() is twice as fast.	7 years ago
Mike Fährmann	7b5ba69951	[hentaihere] ensure consistent extraction results sometimes there is a random space before the next <a>	7 years ago
Mike Fährmann	377b78b3c9	[hentai2read] fix manga name extraction	7 years ago
Mike Fährmann	54c36a8a34	[subapics] add chapter- and manga-extractor (#70 )	7 years ago
Mike Fährmann	2dd3aeeeae	[komikcast] add chapter- and manga-extractor (#70 )	7 years ago
Mike Fährmann	7a412f5c32	implement generic manga-chapter extractor	7 years ago
Mike Fährmann	aa38eab2be	allow not-defined fields in format strings ... and replace them with "None", for now	7 years ago
Mike Fährmann	6a07e38366	implement extractor.add() and .add_module() ... as a public and non-hacky way to add (external) extractors to gallery-dl's pool and make them available for extractor.find()	7 years ago
Mike Fährmann	c0dd922c13	add '--download-archive' cmdline option … as well as a config file equivalent	7 years ago
Mike Fährmann	8c3b713362	rework DownloadJob.handle_url(); include archive functionality todo: "abort" and "exit" skip modes if download is skipped because of archive	7 years ago
Mike Fährmann	34873dbd90	set 'archive_fmt' values These are going to be used to create an unique id for each image.	7 years ago
Mike Fährmann	a34cebc253	[luscious] jump to first image if cover does not link to it	7 years ago
Mike Fährmann	84a52a9256	add DownloadArchive class	7 years ago
Mike Fährmann	915807dd77	log HTTP errors as warnings	7 years ago
Mike Fährmann	db7f04dd97	emit log messages on download failure and when retrying with fallback URLs	7 years ago
Mike Fährmann	d951f13e37	add config option for unsupported-URL file for consistency's sake	7 years ago
Mike Fährmann	619387cbb1	update extractor unittest results	7 years ago
Mike Fährmann	364e335440	smaller adjustments and improvements - requests and urllib3 version on 1 line - close input file after reading from it - use expand_path for unsupported-urls file - remove unnecessary logging from options.py	7 years ago
Mike Fährmann	c9a9664a65	change --write-log behaviour - log files now get truncated when opening them (mode "w" instead of "a") - log verbosity to file depends on -q/-v (same as logging to stderr)	7 years ago
Mike Fährmann	97f4f15ec0	add option to write logging output to a file - '--write-log FILE' as cmdline argument - 'output.logfile' as config file option	7 years ago
Mike Fährmann	f94e3706a8	use logging module for error messages during downloads	7 years ago
Mike Fährmann	db91cf871c	document message identifiers	7 years ago
Mike Fährmann	0dd48d644f	update test results nothing broke, but things got updated or changed	7 years ago
Mike Fährmann	1e93955170	[batoto] remove module Site officially shut down on 2018.01.18	7 years ago
Mike Fährmann	27fce6f600	fix UrlJob behavior	7 years ago
Mike Fährmann	76509a6d3c	[imgur] update test results	7 years ago
Mike Fährmann	9fccd7b783	[tumblr] provide fallback URLs (#64 ) Each image now produces 3 URLs: - amazonaws.com _raw (or _1280 for older images) - amazonaws.com _500 - media.tumblr.com (URL returned by API)	7 years ago
Mike Fährmann	b837420291	fix minor urllist issues	7 years ago
Mike Fährmann	9d69401391	initial support for multiple URLs per image	7 years ago
Mike Fährmann	6174a5c4ef	[download] adjust filename extension on filetype mismatch (closes #63)	7 years ago
Mike Fährmann	91ed147cef	[oauth] use custom key/secret values during oauth:…	7 years ago
Mike Fährmann	421a9740a3	[tumblr] add 'tumblr:' to force Tumblr extractor (#71 )	7 years ago
Mike Fährmann	40d35c87bc	[paheal] add tag- and post-extractors (closes #69 )	7 years ago
Mike Fährmann	cc0c2cca57	[reddit] add extractor for reddit-hosted images (closes #68 )	7 years ago
Mike Fährmann	f10ffc0839	update extractor blacklist to also allow classes	7 years ago
Mike Fährmann	b6797032e3	release version 1.1.2	7 years ago
Mike Fährmann	35e09869d1	[mangapark] fix image URLs and use HTTPS	7 years ago
Mike Fährmann	9a049bdf51	[tumblr] add 'likes' extractor (#65 )	7 years ago
Mike Fährmann	67d4462d26	[batoto] rudimentary Cloudflare bypass	7 years ago
Mike Fährmann	29d75fc3fa	[tumblr] add support for OAuth authentication (#65 )	7 years ago
Mike Fährmann	4edb25346e	[slideshare] support mobile URLs (closes #67 )	7 years ago
Mike Fährmann	e420a28bbc	fix cookie tests	7 years ago
Mike Fährmann	b33efc99a4	[idolcomplex] add support for idol.sankakucomplex.com	7 years ago
Mike Fährmann	75b2e84b6d	[tumblr] use s3.amazonaws.com for image URLs (#64 )	7 years ago
Mike Fährmann	5b094328b5	[puremashiro] add chapter- and manga-extractor (closes #66 ) Also adds support for region subtags in language codes (e.g. en-us)	7 years ago
Mike Fährmann	974e73bdbb	[booru] smaller code adjustments	7 years ago
Mike Fährmann	03b8a548cb	[tumblr] change `reblogs` default value to `true` (#61 )	7 years ago
Mike Fährmann	d235f68f59	[tumblr] add option to filter reblogged posts (#61 ) Reblogs are ignored by default, but can be included by setting 'extractor.tumblr.reblogs' to 'true'.	7 years ago
Mike Fährmann	a794fffc6d	[batoto] extend chapter-string regex (closes #60 ) Non-numeric chapter indices exist after all ...	7 years ago
Mike Fährmann	1219ebb7f5	[danbooru] use alternate subdomains; support safebooru	7 years ago
Mike Fährmann	9e8a84ab6c	[booru] rewrite using Mixin classes (#59 ) - improved code structure - improved URL patterns - better pagination to work around page limits on - Danbooru - e621 - 3dbooru	7 years ago
Mike Fährmann	0876541e43	[seiga] update tests	7 years ago
Mike Fährmann	1a70857a12	update extractor-unittest capabilities - "count" can now be a string defining a comparison in the form of '<operator> <value>', for example: '> 12' or '!= 1'. If its value is not a string, it is assumed to be a concrete integer as before. - "keyword" can now be a dictionary defining tests for individual keys. These tests can either be a type, a concrete value or a regex starting with "re:". Dictionaries can be stacked inside each other. Optional keys can be indicated with a "?" before its name. For example: "keyword:" { "image_id": int, "gallery_id", 123, "name": "re:pattern", "user": { "id": 321, }, "?optional": None, }	7 years ago
Mike Fährmann	88bb0798fd	delay initialization of PathFormat objects This allows the DeviantArt group-check to be moved inside the Extractor.items() method which in turn allows for better exception handling. As a new general rule: Never raise exceptions during extractor initialization.	7 years ago
Mike Fährmann	c24e0e70a7	[pixiv] simplify main loop	7 years ago
Mike Fährmann	c1e331edbb	[mangapark] replace manga test	7 years ago
Mike Fährmann	5488643fac	add requests and urllib3 versions to debug output	7 years ago
Mike Fährmann	9d73ed4772	fix issue with using 'skip()' when a filter is present calling skip() skips over unfiltered items and does not apply the filter expression to them, which is not what should happen	7 years ago
Mike Fährmann	28cd78aae0	[kissmanga] extend chapter-string regex (closes #58 )	7 years ago
Mike Fährmann	0ba618dd1a	release version 1.1.1	7 years ago
Mike Fährmann	a3e9b51bea	[imgbox] update test results Image URLs of older galleries have been updated to the new format. https://i.imgbox.com/qHhw7lpG.png --> https://images3.imgbox.com/6d/9a/qHhw7lpG_o.png	7 years ago
Mike Fährmann	d241a0fb60	[util] replace '/' with '\' in base-directory paths ... on Windows to have consistent path separators.	7 years ago
Mike Fährmann	d0886f411e	[gelbooru] re-enable API use (closes #56 ) Gelbooru's API allows access to all images and is not restricted to the first 20000. This also adds an option to select between API use and manual information extraction in case their API gets disabled again.	7 years ago
Mike Fährmann	8102aae311	[mangahere] support ".cc" TLD and mobile URLs	7 years ago

... 11 12 13 14 15 ...

2143 Commits (4ae8a25567ec32d3faa391fae7f4e61de73f408b)