gallery-dl

Commit Graph

Author	SHA1	Message	Date
Mike Fährmann	86c00f9e66	[danbooru] move extractor logic from booru.py	5 years ago
Mike Fährmann	1d4a369ea2	update extractor test results	5 years ago
Mike Fährmann	7625912b31	[piczel] improve and update - fix tag names - fix a bug in _pagination() - parse datetime in 'created_at' as 'date' - rewrite main loop - replace user profile test	5 years ago
Mike Fährmann	913b8333cc	write DeviantArt refresh-tokens to cache (#616 ) Writing the token is currently disabled by default and must be enabled with 'extractor.oauth.cache'. 'extractor.deviantart.refresh-token' must be set to '"cache"' to use the cached token.	5 years ago
Mike Fährmann	2a4f227e08	warn about expired cookies	5 years ago
Mike Fährmann	4e361b3008	add tests for specific datetime values	5 years ago
Mike Fährmann	80ecb99089	[hitomi] fix extraction	5 years ago
Mike Fährmann	247c9e1416	[vsco] update gallery URL pattern	5 years ago
Mike Fährmann	19ae6f3fc4	update test results - twitter: Don't test the whole kwdict, only the actual content, since the keyword hash changes whenever that user changes his display name. - khinsider: Download host changed	5 years ago
Mike Fährmann	cc5079c844	[hiperdex] add chapter and manga extractors (closes #606 )	5 years ago
Mike Fährmann	64bdec8430	[deviantart] check availability of intermediary URLs (fixes #609 )	5 years ago
Mike Fährmann	5607dd3646	[hitomi] follow multiple redirects	5 years ago
Mike Fährmann	765b2a0527	[hentaihand] add extractors (closes #605 )	5 years ago
Mike Fährmann	d94215d119	[tumblr] replace '-' with ' ' in tag searches (fixes #611 ) To search for tags with actual minus signs in them (there shouldn't be too many,) manually replace those with url-encoded minus characters ('-' -> '%2d') before inputting them into gallery-dl: https://s679874.tumblr.com/tagged/tag-with-minus -> https://s679874.tumblr.com/tagged/tag%2dwith%2dminus	5 years ago
Mike Fährmann	e6cd49e78b	update extractor test results	5 years ago
Mike Fährmann	5d9437b398	[vsco] skip "invalid" entities	5 years ago
Mike Fährmann	650f2b6d58	[furaffinity] accept sfw.furaffinity.net URLs (closes #608 ) Just as an alias for regular URLs with no extra content filtering.	5 years ago
Mike Fährmann	74e684e828	[twitter] change default value for 'videos' to 'true' Every other 'videos' option defaulted to 'true', except Twitter.	5 years ago
Mike Fährmann	c7cf9dd111	[furaffinity] support classic layout (#284 )	5 years ago
Mike Fährmann	138135c190	[furaffinity] add extractors (#284 )	5 years ago
Mike Fährmann	b9c574bd1d	[patreon] log skipped files (#590 )	5 years ago
Mike Fährmann	80ea9104b8	[8kun] adjust URL pattern	5 years ago
Mike Fährmann	ce26070231	[pixiv] reduce calls to '/user/detail'	5 years ago
Mike Fährmann	da0d5f6092	[oauth] add 'port' option (#604 )	5 years ago
Mike Fährmann	719b63d0ca	[bcy] add user and post extractors (#592 )	5 years ago
Mike Fährmann	6426e3efc7	[khinsider] fix and improve metadata extraction	5 years ago
Mike Fährmann	b7eb6cecbb	[pixiv] handle tags at the end of new bookmark URLs	5 years ago
Mike Fährmann	109f6c8685	[patreon] filter duplicate files per post (#590 )	5 years ago
Mike Fährmann	b38cf59711	[sexcom] fix image URLs & parse 'date' fields	5 years ago
Mike Fährmann	1f4c9c5f9d	[8kun] add thread and board extractors (closes #582 )	5 years ago
Mike Fährmann	facc5daa6d	[twitter] force old login page layout (fixes #584 , fixes #598 )	5 years ago
Mike Fährmann	d1de7dc296	[hitomi] implement workaround for "broken" redirects Some galleries redirect to a new "version" with different gallery id. This new version might not be available any more, but the /reader/ page for the original gallery id can still work.	5 years ago
Mike Fährmann	40fe062851	[pixiv] fix user id for bookmarks API calls (closes #596 )	5 years ago
Mike Fährmann	91aaaf1a9e	[pixiv] add 'rating' metadata field (#595 ) A human-friendlier representation of 'x_restrict'	5 years ago
Mike Fährmann	dff33b260c	[reddit] add 'videos' option	5 years ago
Mike Fährmann	2ad43618cc	[piczel] fix extraction	5 years ago
Mike Fährmann	cf7a67d67f	[yaplog] remove module Yaplog! ended its service on 2020-01-31	5 years ago
Mike Fährmann	e0dd073ce0	[twitter] replace embedded tweet test the old one was deleted	5 years ago
Mike Fährmann	ec36df4851	[deviantart] fix video extraction from 'extended_fetch' results DeviantArt is now serving videos from wixmp servers (1), instead of the former film00.deviantart.com (2), even though those URLS are still functional. They seem to also have re-encoded those videos. The 10 MB 1080p video from (2) is now only available in 720p at ~20 MB (with a higher bitrate, but still …). Other videos are still available in 1080p, but not this one for some reason. (Changing the '720p' in (1) to '1080p' doesn't work.) (1) https://wixmp-ed30a86b8c4ca887773594c2.wixmp.com/v/mp4/9feaa2c9-1baf-4fc2-84f7-f3384b34cefe/d5gxnb5-282a2e9a-b552-40ff-8542-b3c5eed823f5.720p.a837d7cec12c41be8ca2ee53152cea3a.mp4 (2) https://film00.deviantart.net/4c1d/v/mp4/2012/279/d/1/_video____brushes_i_use_in_paint_tool_sai_by_chi_u-d5gxnb5.mp4	5 years ago
Mike Fährmann	48be2266ed	[deviantart] better error message for 'extended_fetch' (#585 )	5 years ago
Mike Fährmann	71851a6241	[pixiv] update URLs of followed users to the new format	5 years ago
Mike Fährmann	d086f30b42	[reddit] restore archive keys for i.redd.it images	5 years ago
Mike Fährmann	56f1c96168	implement 'parent-directory' option (#551 )	5 years ago
Mike Fährmann	ae07f92f7e	[reddit] rewrite extractor logic (closes #551 ) Handle images and videos hosted on Reddit "natively", allowing them to use reddit-specific metadata to build directory and file names.	5 years ago
Mike Fährmann	2852691d78	[paheal] replace test URL searching for 'k-on' doesn't yield any results anymore	5 years ago
Mike Fährmann	2a9be48511	improve util.load/save_cookiestxt() and add tests - take a file object as argument instead of an filename - accept whitespace before comments (" # comment") - map expiration "0" to None and not the number 0	5 years ago
Mike Fährmann	e35c2ea1a6	[weibo] use youtube-dl to download from m3u8 manifests	5 years ago
Mike Fährmann	6703b8a86b	[blogger] implement video extraction (closes #587 )	5 years ago
Mike Fährmann	c1a6862863	implement functions to load/save cookies.txt files (closes #586 ) The methods of the standard libraries' MozillaCookieJar have several shortcomings (#HttpOnly_ cookies, 0 expiration timestamps, etc.) and require construction of an ultimately pointless CookieJar object.	5 years ago
Mike Fährmann	25d5ec4ff3	[twitter] add option to extract TwitPic embeds (#579 )	5 years ago
Mike Fährmann	32d7195d08	[pinterest] improve detection of invalid pin.it links	5 years ago
Mike Fährmann	174117f827	allow multiple hashes for content tests	5 years ago
Alice	f498a9057f	[twitter] Fix stop before real end (#573 ) * [twitter] Fix stop before real end Fix for https://github.com/mikf/gallery-dl/issues/544. Makes sure that it really reached the end by checking that both "min_position" is null and "has_more_items" is false before stopping. * [twitter] Fix stop before real end (update)	5 years ago
Mike Fährmann	8bb32ee188	[hitomi] fix image URLs	5 years ago
Mike Fährmann	bd5ce9855c	allow GalleryExtractors to set URL-independent extensions	5 years ago
Mike Fährmann	af42c75152	[mangadex] revert domain to 'mangadex.org'	5 years ago
Mike Fährmann	e89413da22	update test results	5 years ago
Mike Fährmann	33a6e0ac6e	[hentaifoundry] extract more metadata (closes #565 )	5 years ago
Mike Fährmann	5cac79c3d9	[erolord] remove extractor	5 years ago
Mike Fährmann	b9cbf932b4	[pixiv] update URL patterns (fixes #568 ) Pixiv now uses new URLs for - user profiles and illustration listings: - https://www.pixiv.net/en/users/<ID> - https://www.pixiv.net/en/users/<ID>/artworks - bookmarks: - https://www.pixiv.net/en/users/<ID>/bookmarks/artworks	5 years ago
Mike Fährmann	988cc2ec23	[mangadex] change domain to mangadex.cc (closes #559 )	5 years ago
Mike Fährmann	f8e137d6b4	[deviantart] show warning about private deviations only once … per call to '_pagination()'	5 years ago
Mike Fährmann	939fec8ecd	[deviantart] match new search/popular URLs (closes #538 )	5 years ago
Mike Fährmann	09cc88b715	[deviantart] match '/favourites/all' URLs (closes #555 )	5 years ago
Mike Fährmann	3811fd8a25	fix time formatting for Python 3.4 and 3.5 'datetime.time.isoformat()' only has an optional 'timespec' argument since Python 3.6.	5 years ago
Mike Fährmann	43ab9572b4	[twitter] handle API rate limits (#526 )	5 years ago
Mike Fährmann	569747a78d	implement extractor.wait()	5 years ago
Mike Fährmann	5532e9c158	[twitter] handle quoted tweets (#526 ) … and categorize them as retweets	5 years ago
Mike Fährmann	0b4cb8e57a	[mangahere] send 'isAdult' cookie (fixes #556 )	5 years ago
Mike Fährmann	1afb91363c	[imagefap] generalize URL patterns and add tests (#552 )	5 years ago
Xope Totec	f701e9f33a	Handle beta.imagefap.com URLs (#552 )	5 years ago
Mike Fährmann	ce54b8c04c	let extractors opt-out of cookie option usage useful to avoid sending unnecessary cookies when all authentication is done through OAuth tokens	5 years ago
Mike Fährmann	5ad92fc196	[newgrounds] fix tags metadata extraction	5 years ago
Mike Fährmann	82f7f4172a	update test results	5 years ago
Mike Fährmann	1f2a69f3c5	add '_extractor' information to redirect results	5 years ago
Mike Fährmann	a27f43dad1	[pixiv] wait and retry after rate limit error (closes #535 )	5 years ago
Mike Fährmann	6b373cb7e2	[exhentai] restrict default directory name length (#545 )	5 years ago
Mike Fährmann	b347bf68c7	[deviantart] add extractor for followed users (#515 )	5 years ago
Mike Fährmann	c0f391a4e2	[pixiv] support listing followed users (#515 )	5 years ago
Mike Fährmann	896896a490	[twitter] fix URLs forwarded to youtube-dl (closes #540 ) Since commit `3bba763` data["user"] is an entire dict object and no longer just the user nickname …	5 years ago
Mike Fährmann	1e2713b895	[artstation] fix search result pagination (closes #537 )	5 years ago
Mike Fährmann	bf3df3d0b0	[directlink] send Referer headers (closes #536 )	5 years ago
Mike Fährmann	9be7ff600e	[imagetwist] replace test image the old one expired, it seems	5 years ago
Mike Fährmann	66905b1664	[foolslide] add fallback for chapter data extraction	5 years ago
Mike Fährmann	48e42e73fb	[reddit] change default value for 'comments' to '0'	5 years ago
Mike Fährmann	9c0928457a	[reddit] fix errors with 't1_…' submissions	5 years ago
Mike Fährmann	bf658fd84b	[vsco] implement 'videos' option	5 years ago
Mike Fährmann	95c90722ee	[instagram] implement 'videos' option (closes #521 )	5 years ago
Mike Fährmann	d0920e84e9	update test results	5 years ago
Mike Fährmann	8c11e81c9f	Merge commit '63e6993716db8d8bedfb7b0d445c7161493046b6'	5 years ago
Mike Fährmann	63e6993716	merge 'bypost' functionality into metadata postprocessor	5 years ago
Mike Fährmann	31a29835ff	[realbooru] simplify extractors and update tests (#514 )	5 years ago
The Oddball	9a4ce20b8e	[realbooru] Add Realbooru extractor (#514 )	5 years ago
Mike Fährmann	72b8fbfbad	[instagram] make post-page extraction nonfatal	5 years ago
Mike Fährmann	922b8a9595	[weibo] raise NotFoundError for unavailable/deleted statuses	5 years ago
Mike Fährmann	0cd157300e	[patreon] fix regex pattern for posts The previous one would match the first number in the URL slug as post ID, which would fail for posts with numbers in their title.	5 years ago
Mike Fährmann	fe19e233f3	[xvideos] improve - derive from GalleryExtractor - match '…-channels' URLs - "better" metadata structure	5 years ago
Mike Fährmann	d3e44e899d	raise NotFoundErrors for 404 responses in GalleryExtractors	5 years ago
Mike Fährmann	a4dd8b3dab	improve _check_cookies() Only loop over all cookies once instead of calling cookiejar._find() for each cookie name.	5 years ago
Mike Fährmann	76e60d10a6	[patreon] raise proper exception if creator/post doesn't exist	5 years ago
Mike Fährmann	9e63804347	[patreon] make retrieving user info nonfatal (#508 ) … and fall back to the included data if an error occurs.	5 years ago
Mike Fährmann	964dc57286	[vsco] improve image resolutions https://im.vsco.co/ URLs redirect to the appropriate CDN server and occasionally insert a '/1200x1600/' into the image path, limiting image dimensions. This commit constructs redirect targets out of the given im,vsco.co URLs without sending extra HTTP requests and without any "builtin" resolution restrictions.	5 years ago
Mike Fährmann	0629fe8fa4	[vsco] fix user profile extraction … again Given the pattern from last time, collections will also change in due time and use cursor-based pagination.	5 years ago
Mike Fährmann	ab17ea9632	[deviantart] only print warning if 'original' is enabled	5 years ago
Mike Fährmann	2188db6284	[gelbooru] fix non-API tag extraction	5 years ago
Mike Fährmann	c4702ec9b6	simplify some logging calls	5 years ago
Gio	c0b9ad678d	Separate metadata from handle_url into handle_metadata, commenting	5 years ago
Mike Fährmann	c9ef1b21c3	[patreon] get partial user info without /api/user/<id> (#507 ) It's a lot less data, but doesn't invoke any additional HTTP requests with potential Cloudflare CAPTCHAs.	5 years ago
Mike Fährmann	0ab9bb1721	[4chan] add extractor for entire boards (closes #510 )	5 years ago
Gio	cfc70a97ab	Added an additional channel for downloading the metadata of an entire post or gallery.	5 years ago
Mike Fährmann	15f9bb3d14	add option to disable pyOpenSSL usage (#508 ) (pyOpenSSL is now disabled by default)	5 years ago
Mike Fährmann	c8e99e3b3b	[deviantart] fix crash on missing "token" field (#505 )	5 years ago
Mike Fährmann	6ed2c7823c	[deviantart] disable original downloads if no cookies set For 'deviation' and 'scraps' extractors only, since original file downloads for those two will always fail with a 404 Not Found when not logged in.	5 years ago
Mike Fährmann	50deab5265	[deviantart] fix URL generation from /extended_fetch results (closes #505)	5 years ago
Mike Fährmann	1f209da4c0	[pixiv] match new search URLs (closes #507 )	5 years ago
Mike Fährmann	e17907ee2a	change default value of 'cookies-update' to 'true'	5 years ago
Mike Fährmann	07dafad26d	[twitter] attempt to fix infinite loops (#499 ) (Hopefully this doesn't break anything else)	5 years ago
Mike Fährmann	71acbdabf4	[2chan] fix metadata extraction	5 years ago
Mike Fährmann	c0a1241648	[livedoor] force https:// for image URLs	5 years ago
Mike Fährmann	6e23c0da09	[imgur] add extractor for subreddit links (closes #500 )	5 years ago
Mike Fährmann	372ffe95ee	[oauth] adjust Flickr redirect URI (fixes #503 ) Flickr now automatically forces https:// for all redirect URIs.	5 years ago
Mike Fährmann	004812258d	[hentaifox] fix extraction	5 years ago
Mike Fährmann	e2710702d4	fix Cloudflare bypss	5 years ago
Mike Fährmann	8759403f37	[plurk] add delay between comment requests	5 years ago
Mike Fährmann	a28552fd19	update test results - hbrowse: one tag got removed - mangoxo: gallery changed owner - photobucket: ?, but photo still downloads	5 years ago
Mike Fährmann	dcaa3d01bd	[imagefap] adapt to new image URL format	5 years ago
Mike Fährmann	e62c209ca0	[nijie] fix 'date' parsing	5 years ago
Mike Fährmann	3bba763ab9	[twitter] improve - update metadata structure - combine all user… entries into their own dict - let 'user' always specify the Timeline owner - add 'author' entry that specifies the original Tweet author - create directories per post (closes #491) - fix username issues with /i/web/ URLs	5 years ago
Mike Fährmann	db35c3b581	[directlink] separate filenames from paths With this, all default filename formats specify an '{extension}' and PathFormat.set_extension() reliably works for all files.	5 years ago
Mike Fährmann	41a3169c67	[foolfuuka] use '{extension}' in default filename format	5 years ago
Mike Fährmann	e9aed62c91	[imgur] unescape image titles	5 years ago
Mike Fährmann	2c332edaad	[plurk] fix comment pagination	5 years ago
Mike Fährmann	a3fa45bbb1	[behance] get images from 'media_collection' modules	5 years ago
Mike Fährmann	359c3bc1c5	[deviantart] revert to getting download URLs from OAuth API This commit (partially) reverts `27b5b24`, `94eb7c6`, and `a437e78`. Download URLs from the 'extended_fetch' endpoint are now only usable for logged in users, while those from the respective OAuth API endpoint are working again. Everything except scraps and direct deviation links should be fixed, and those two categories will work with exported cookies. (#488) TODO: - "native" login with --username and --password - better handling of internally stored cookies	5 years ago
Mike Fährmann	42b9633c7e	update test results	5 years ago
Mike Fährmann	b28bd1c73e	[bobx] set generated session cookie (closes #482 ) This reverts commit `490831f` and also restores original image downloads by setting a randomly generated session cookie. No login required.	5 years ago
Mike Fährmann	ae09f87602	improve SharedConfigMixin config lookups	5 years ago
Mike Fährmann	f5604492c3	update interface of config functions	5 years ago
Mike Fährmann	4ca883c66f	[smugmug] replace test for custom URLs The old one (http://www.creativedogportraits.com/) is empty and/or no longer handled by SmugMug.	5 years ago
Mike Fährmann	d45fabb79d	match user profile handling on deviantart and newgrounds	5 years ago
Mike Fährmann	ea80dadd09	[deviantart] restore archive keys Commit `9fdc5e7` changed 'username' fields to have consistent capitalization, but that invalidated the archive keys of several extractors where 'username' was usually lowercase.	5 years ago
Mike Fährmann	ea094692c8	[vsco] fix collection extraction (#480 )	5 years ago
Mike Fährmann	490831f84a	[bobx] "fix" image download URLs Access to original images got restricted to (paid) members only. All that's publicly accessible now are essentially preview pictures.	5 years ago
Mike Fährmann	978cb03f81	update misc test results - Livedoor now uses https:// for its image URLs - Instagram image URLs got simplified	5 years ago
Mike Fährmann	fca87974fe	[sexcom] fix video downloads by sending specific Referer headers	5 years ago
Mike Fährmann	edc080468d	[instagram] make 'video_url' fields optional (fixes #479 ) [ci skip]	5 years ago
Mike Fährmann	9fdc5e74cb	[deviantart] ensure consistent username capitalization (#455 ) The 'username' field was capitalized in a very inconsistent manner: Either all lowercase, or as given by the input URL, or with the "original" capitalization, depending on the extractor used among other things. Now usernames use their original capitalization for all extractors. ('UserName' instead of 'username' or 'uSeRnAmE')	5 years ago
Mike Fährmann	b1f0609de5	[newgrounds] rewrite (#394 ) - restructure extractor hierarchy - extract more metadata - extract videos without youtube-dl - be more resilient to errors TODO: - favorites - games, but that might be near impossible for non-flash titles	5 years ago
Mike Fährmann	3ece3976ae	[newgrounds] implement login support (#394 )	5 years ago
Mike Fährmann	3a07c06865	[newgrounds] update - create directory per post - rename variables and methods	5 years ago
Mike Fährmann	5513b66eb0	[vsco] fix user profile extraction	5 years ago
Mike Fährmann	abfcb356fc	[flickr] support 3k, 4k, 5k, and 6k photo sizes (closes #472 )	5 years ago
Mike Fährmann	521fcd2eb9	[imgbb] fix error in galleries without user info (closes #471 )	5 years ago
Mike Fährmann	8061263d4c	[imgbb] improve pagination logic - avoid unnecessary API calls for small or empty galleries - combine duplicate code	5 years ago
Mike Fährmann	da6789b2b0	disable unique archive id checks for some tests - same image twice in a livedoor blog post - unreliable results for related pinterest items	5 years ago
Mike Fährmann	b0197098e6	[imgur] get title from webpage if missing in API response (closes #467)	5 years ago
Mike Fährmann	dd5d2b2eac	[deviantart] add user profile extractor (#377 , #419 )	5 years ago
Mike Fährmann	a437e78620	[deviantart] minimize cookie usage during scraps extraction (#445)	5 years ago
Mike Fährmann	1a197d2195	store the original cookiejar as Extractor._cookiejar	5 years ago
Mike Fährmann	de83ae4576	make 'method' argument of Extractor.request keyword-only	5 years ago
Mike Fährmann	4325695d74	[luscious] expand GraphQL queries	5 years ago
Mike Fährmann	94dbdbf506	[nijie] change default filename format … to be consistent with Pixiv filenames	5 years ago
Mike Fährmann	c18fadc221	[instagram] extract videos without youtube-dl (#391 )	5 years ago
Mike Fährmann	f15eedb634	[sexcom] set Referer header for file downloads (closes #464 )	5 years ago
Mike Fährmann	2a3bd4e3c7	rename extractor classes starting with a digit	5 years ago
Mike Fährmann	b3b9da6d74	[photobucket] replace test URL The other user deleted all of is images.	5 years ago
Mike Fährmann	64786363be	[4chan] simplify - remove 'chan.py' - slight adjustments to directory and filenames	5 years ago
Mike Fährmann	557e2c018b	[8chan] remove module	5 years ago
Mike Fährmann	e14782a948	[instagram] simplify graphql extraction for post pages	5 years ago
Mike Fährmann	c01ff78467	[twitter] extend 'videos' option to force extraction with ytdl (closes #459)	5 years ago
Mike Fährmann	f8ac67ce50	[hitomi] extend URL pattern + follow redirects	5 years ago
Mike Fährmann	e877ca97c3	[naver] adjust directory names and metadata structure	5 years ago
Mike Fährmann	702f2fbd1f	[issuu] add publication and user extractors (#413 )	5 years ago
Mike Fährmann	8361d874d7	[hitomi] fix extraction	5 years ago
Mike Fährmann	5fa6ff04dd	[instagram] extract '__additionalDataLoaded' (#391 ) The '_sharedData' of Post pages is missing its 'graphql' part for logged in users. This data is now included in the parameters of a function call to '__additionalDataLoaded(...)' And, of course, video extraction with youtube-dl broke because of this change as well.	5 years ago
Mike Fährmann	87a87bff7e	[simplyhentai] fix image URLs	5 years ago
Mike Fährmann	4409d00141	embed error messages in StopExtraction exceptions	5 years ago
Mike Fährmann	d44f790e81	adjust output for HTTP status related errors	5 years ago
Mike Fährmann	109718a5e3	[blogger] add blog and post extractors (closes #364 )	5 years ago
Mike Fährmann	49a6b1b6c0	[twitter] extract video stream info without youtube-dl (#452 ) This should allow video downloads when logged in without 'forward-cookies' disabled and from protected tweets. youtube-dl still gets used to download HLS playlists, but the data extraction part, which doesn't work with youtube-dl at the moment, now gets handled by gallery-dl itself.	5 years ago
Mike Fährmann	9f0dbf2a72	[twitter] raise proper exception for protected Tweets	5 years ago
Mike Fährmann	6e08ada4fe	[luscious] simplify some metadata entries	5 years ago
Mike Fährmann	9e3a8607ee	[deviantart] update usernames (#455 ) In the case that a user changed his username, requesting deviations with an old name might cause problems (missing deviations, etc.) The internal 'username' value therefore now gets updated to the current username taken from the user profile.	5 years ago
Mike Fährmann	2eb38810c5	[twitter] fix image extraction when logged in (#452 ) ... for individual tweets. To get a Tweet page with the old Twitter layout, an Internet Explorer User-Agent (e.g. Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko) as well as a Referer header pointing to the page itself is required. The "app_shell_visited" cookie appears to be optional at the moment, but that is what a regular web browser would send.	5 years ago
Mike Fährmann	8f38a35b91	[imgur] use API with "public" client_id (#446 ) Using the API endpoints makes it possible to access NSFW content without logging in.	5 years ago
Mike Fährmann	b23c822b23	[luscious] use GraphQL	5 years ago
Mike Fährmann	ef17d94469	update test results	5 years ago
Mike Fährmann	2057c6ba29	[naver] add blog and post extractors (closes #447 )	5 years ago
Mike Fährmann	389d2d7e38	implement 'cookies-update' option (#445 )	5 years ago
Mike Fährmann	fbc0a6a059	[nozomi] skip unavailable posts (#388 )	5 years ago
Mike Fährmann	ae98dbcbb3	[nozomi] implement searching for negated terms (#388 ) It's incredibly slow and resource intensive (> 1GB of memory), but that is also how it is implemented on nozomi.la itself.	5 years ago
Mike Fährmann	1c03a389df	[twitter] small improvements to search extractor - put search results in separate directories - set 'max_position' to '-1' for first request -> prevent duplicate results - add a test - flake8	5 years ago
Mike Fährmann	c3042978b8	[deviantart] match "/gallery/all" (closes #449 )	5 years ago
Alice	bcddcca6db	Add search downloading to twitter.py (#448 ) Adds the functionality to download search results on twitter.com/search. Since twitter only allows downloading of up to 3,200 of a users most recent tweets, you will be unable to download old images from users with a lot of tweets. To bypass this, you can use the twitter search to get the tweets from the sections in time you were stopped at. An example search would be "from:user since:2015-01-01 until:2016-01-01 filter:images". The URL you would use will look something like this https://twitter.com/search?f=tweets&q=from%3Asupernaturepics%20since%3A2015-01-01%20until%3A2016-01-01%20filter%3Aimages&src=typd&lang=en The _tweets_from_api function had to be changed because it would not get the next page of results using the last "data-tweet-id". It would return the same JSON but with a "min_position" string added. Using this string for the "max_position" param from the second page onwards correctly returned the next pages. This change does not interfere with how the other extractors work as far as I know. The 2 regex patterns in the extractors had to be changed to not match the search URL.	5 years ago
Mike Fährmann	1693d97bd3	update extractor class hierarchies - let the GalleryExtractor class inherit directly from Extractor - make ChapterExtractor a subclass of GalleryExtractor - change enumeration field names of GalleryExtractors to 'num'	5 years ago
Mike Fährmann	7ebd984e8d	[imgur] print error message if no JSON data is found (#446 )	5 years ago
Mike Fährmann	5882b00f2f	[imgur] implement login support (#446 )	5 years ago
Mike Fährmann	91643ca54b	[nozomi] add search extractor (#388 )	5 years ago
Mike Fährmann	df2b3c6888	restore OAuth2 authentication error messages	5 years ago
Mike Fährmann	6779512fc7	[nozomi] add post and tag extractors (#388 )	5 years ago
Mike Fährmann	6abe5f5bbb	[patreon] fix pagination (#444 ) The Patreon-provided URLs for the next set of posts aren't always complete, i.e. they can be missing their scheme and the subsequent double slash: "www.patreon.com/…"	5 years ago
Mike Fährmann	d4ffd6c952	[yaplog] improve metadata extraction (#443 ) - provide a fallback if there is no numerical image ID - add a 'filename' field - convert 'date' to an actual datetime object	5 years ago
Mike Fährmann	15af2f8464	[hitomi] fallback to /reader/ page if main page returns 404 Some galleries return a 404: Not Found error when trying to access them through the main gallery URL, but their content is still available on the respective /reader/ page.	5 years ago
Mike Fährmann	dc6ad81e2e	[yaplog] prevent crash on empty posts (#443 )	5 years ago
Mike Fährmann	94eb7c6cad	[deviantart] fix sta.sh extraction (436)	5 years ago
Mike Fährmann	27b5b2497e	[deviantart] fix download URLs (#436 ) ... except for sta.sh content. Instead of using the old '/api/v1/oauth2/deviation/download' endpoint, which started delivering URLs to 404 pages a while ago, it is also possible to get a download URL from the relatively new '/_napi/da-browse/shared_api/deviation/extended_fetch' endpoint used by DeviantArt's Eclipse interface. The current strategy is therefore: - Iterate over deviations using the OAuth2 API - Fetch original download URLs with the new NAPI/Shared API	5 years ago
Mike Fährmann	93aac8dfea	[yaplog] fix incomplete image URLs (#443 )	5 years ago
Mike Fährmann	a782b009b8	[yaplog] match blog names with '-' (#443 )	5 years ago
Mike Fährmann	cf5e716b9d	[hitomi] fix image URLs	5 years ago
Mike Fährmann	5a54efa025	[xhamster] unescape 'title' and 'description'	5 years ago
Mike Fährmann	1b9bf4fc6e	[behance] fix 'tags' extraction	5 years ago
Mike Fährmann	bb97e87989	[komikcast] ignore banner image	5 years ago
Mike Fährmann	0ff90a3f7d	[gfycat] include title in default filenames (closes #434 )	5 years ago
Mike Fährmann	de4e2029d1	[nsfwalbum] update test album the old one is no longer available	5 years ago
Mike Fährmann	1faec285d1	[nijie] further improvements (closes #423 ) - provide a 'user_name' metadata field - usually the same as 'artist_id', except for favorite downloads - extract the whole description text and properly escape HTML entities - fixed an issue with titles or tags containing double quotes	5 years ago
Mike Fährmann	6d0a533d68	[reddit] respect 'comments:0' for single submissions (#429 )	5 years ago
Mike Fährmann	803d8f814e	[oauth] update scope for reddit tokens (#428 ) '/user/<username>/...' requires the 'history' scope to be accessible (https://www.reddit.com/dev/api/#GET_user_{username}_{where})	5 years ago
Mike Fährmann	46ba173ded	[reddit] fix documentation inconsistencies (closes #429 ) - Require 'reddit.comments' to be a number and convert it to an integer to be extra sure - Link to the README's OAuth section were appropriate	5 years ago
Mike Fährmann	20eb6c401f	[nijie] improvements and fixes (#423 ) - ignore unavailable image pages - more metadata fields: artist_name, date, tags - rename 'index' to 'num' - improved code structure	5 years ago
Mike Fährmann	d1ea08c67d	[weibo] fixes and improvements - ignore unavailable videos (fixes #427) - handle empty 'geo' fields - consistent metadata fields for images and videos	5 years ago
Mike Fährmann	38d97f3da6	[deviantart] add debug message about API credentials (#424 )	5 years ago
Mike Fährmann	80c2104fb5	[deviantart] fix 429 handling if 'fatal' is False (closes #424 )	5 years ago
Mike Fährmann	913460240d	[reddit] fix 'extractor.blacklist()' arguments The second argument must support 'append()'.	5 years ago
Mike Fährmann	22bac14452	[pixiv] match '/artworks/' URLs	5 years ago
Mike Fährmann	66cac207ac	[twitter] match and use 'i/web' status URLs	5 years ago
Mike Fährmann	946f2751e2	[reddit] add 'user' extractor (closes #350 )	5 years ago
Mike Fährmann	c14abb9fb8	[reddit] improve URL parameter handling for subreddit links	5 years ago
Mike Fährmann	ee8b654464	[instagram] implement 'highlights' option (closes #329 )	5 years ago
Mike Fährmann	f63c3097a9	[instagram] rework some code paths - combine fetching an HTML page and extracting its 'shared_data' - move 'shared_data' and field access info out of '_extract_page()' - introduce a '_request_graphql()' method	5 years ago
Mike Fährmann	4330133114	[imgur] add 'favorite' extractor (closes #420 ) … and use a newer site-internal API endpoint for user posts	5 years ago
Mike Fährmann	ee5e20221f	[imgth] fix image URLs	5 years ago
Mike Fährmann	b63b126808	[hentaicafe] extend URL pattern	5 years ago
Mike Fährmann	d780f0357e	[imgur] add user extractor	5 years ago
Mike Fährmann	11ea689013	[simplyhentai] fix image and video URLs	5 years ago
Mike Fährmann	15632a1570	[tsumino] fix extraction	5 years ago
Mike Fährmann	d92802fd37	[luscious] fix detection of unavailable galleries	5 years ago
Mike Fährmann	f99da2b866	[imgbb] detect invalid album and user profile links and update test results, since the old album got deleted	5 years ago
Mike Fährmann	01bc7adadc	[deviantart] improve journal detection (#419 ) Some journal-like posts are not reported to be journals (isJournal is set to False), even though they have a textContent field. https://www.deviantart.com/gliitchlord/art/brashstrokes-812942668	5 years ago
Mike Fährmann	6e12907de6	[deviantart] improve handling of private deviations (#414 ) - don't try to call '/deviation/metadata' with an empty list of deviation ids - print a warning when detecting private deviations without having a 'refresh-token'	5 years ago
Mike Fährmann	e7690ac694	[vsco] update URL pattern (closes #410 )	5 years ago
Mike Fährmann	1848788970	update test results etc	5 years ago
Mike Fährmann	d5fbb2d9de	[tumblr] ignore audio links from Spotify etc.	5 years ago
Mike Fährmann	b1cddce865	Revert "[simplyhentai] fix extraction; remove image+video extractors" This reverts commit `d1db5180ab`.	5 years ago
Mike Fährmann	d23660c04d	[hentaicafe] restore default 'request()' behavior	5 years ago
Mike Fährmann	9ae58a6b3e	[exhentai] update image limit checks - adjust cost of original images - delay limit initialization until gallery and first image page have been requested and all cookies are available	5 years ago
Mike Fährmann	6fe9a134bf	[lineblog] add blog and post extractors (closes #404 )	5 years ago
Mike Fährmann	4e8a548a61	[livedoor] update metadata extraction	5 years ago
Mike Fährmann	f9285f99e6	[pixiv] fix authentication	5 years ago
Mike Fährmann	6f3df3999a	[fuskator] add gallery and search extractor (closes #407 )	5 years ago
Mike Fährmann	bc0ca66c99	[twitter] small improvements - handle reply tweets (#403) - unset cookies in Tweet extractor to "force" the legacy interface	5 years ago
Mike Fährmann	f02a768b5c	[danbooru] add 'ugoira' option (#406 ) to choose between ZIP archives or converted video files for Ugoira posts	5 years ago
Mike Fährmann	dedea3b4db	[deviantart] fix journal creation (#400 )	5 years ago
Mike Fährmann	c6c5cb1898	improve 'deviantart.quality' description	5 years ago
Mike Fährmann	efb64ad031	[deviantart] generate filenames (#392 , #400 )	5 years ago
Mike Fährmann	b2151f3928	[seiga] support mobile URLs (closes #401 )	5 years ago
Mike Fährmann	20fd2d8450	[flickr] skip unavailable images/videos (fixes #398 )	5 years ago
Mike Fährmann	5cc7be2536	[piczel] update and improve - use proper pagination (fixes #396) - update API host and endpoints - "fix" double slash // in image URLs	5 years ago
Mike Fährmann	49f6d7176d	[deviantart] restore filenames (#392 ) <title>_by_<user>_<id> --> <title>_by_<user>-<id>	5 years ago
Mike Fährmann	63daa68d67	[deviantart] improvements (#392 ) - consistent 'filename' entries, at least as far as possible - GIFs and SWFs don't have a <title>_by_<artist>_<id> anywhere in their metadata - Generating <id> (from 'deviationid'?) might be something that needs to be figured out, so we can build those filenames ourselves - better code structure etc. - tests for videos, archives, and flash animations	5 years ago
Mike Fährmann	d1db5180ab	[simplyhentai] fix extraction; remove image+video extractors	5 years ago
Mike Fährmann	30d6e284b0	[deviantart] use NAPI for artworks and scraps (#392 ) TODO: - journal downloads - test for all media types	5 years ago
Mike Fährmann	7d6af936c5	[imgur] simplify gallery extraction	5 years ago
Mike Fährmann	51d10783fc	[patreon] include image info in API results (#383 )	5 years ago
Mike Fährmann	7a5e78741c	[booru] build directory path for each file (#385 )	5 years ago
Mike Fährmann	b1728f512d	[patreon] support multi image posts and post URLs (#383 )	5 years ago
Mike Fährmann	c50d60a53d	[reactor] fix image URLs	5 years ago
Mike Fährmann	32447d0d24	[pixiv] simplify default filename format (#366)	5 years ago
Mike Fährmann	829b1ccf04	[imgur] distinguish album and gallery URLs (#380 ) A gallery can be either an album or a single image.	5 years ago
Mike Fährmann	23251356cb	require 'extension' data for each URL (#382 )	5 years ago
Mike Fährmann	a67413d64f	[xhamster] use input URL domain Don't rewrite all URLs as 'https://xhamster.com/...'	5 years ago
Mike Fährmann	423f68f585	[deviantart] fix scraps extraction (closes #376 )	5 years ago
Mike Fährmann	3bf20ffb70	[instagram] add support for story highlights	5 years ago
Mike Fährmann	a732e9c430	[instagram] update query hashes and headers	5 years ago
Mike Fährmann	2ccf6a9e35	[instagram] make extractor tests happy (#373 )	5 years ago
Leonardo Taccari	bc5eaf7746	[instagram] Add support for IGTV (#373 ) Add support for IGTV profile (instagram.com/<username>/channel/) and IGTV medias (instagram.com/tv/<short_id>).	5 years ago
Mike Fährmann	eb7da159e2	[imagebam] update URL test results Image URLs are now using https://, but the website itself is still served as http://.	5 years ago
Mike Fährmann	189acbeac9	[imgbb] add extractor for individual images (closes #363 )	5 years ago
Mike Fährmann	ad3ac02fbc	[pixiv] update metadata entries (#366 ) - change 'num' to a simple enumerating integer - change default filename format - provide content of the old 'num' field as 'suffix' - add 'filename' for ugoira	5 years ago
Mike Fährmann	1ff4c4ec03	[adultempire] consistent artist order	5 years ago
Leonardo Taccari	2df050e627	[instagram] Add support for stories (#371 ) * [instagram] Add support for stories Add support for Instagram user's stories (https://www.instagram.com/stories/<username>/). First the shared_data in instagram.com/stories/<username> is fetched in order to retrieve the user_id that is then passed to fetch the stories via the corresponding graphql query. Please note that fetching stories is supported only when authentication is enabled and the corresponding <username> is followed. * [instagram] Add an only-matching test for stories * [instagram] Simplify InstagramExtractor.items() and _extract_stories() Simplify handling of typename in InstagramExtractor.items() and multi-line string in _extract_stories(). NFCI.	5 years ago
Mike Fährmann	f4bc75e854	fix rate limit handling for OAuth APIs (#368 )	5 years ago
Mike Fährmann	3957d27d79	[deviantart] add 'quality' option (#369 )	5 years ago
Mike Fährmann	64b2935d8e	[pixiv] provide 'filename' and change default filename format to '{filename}.{extension}' (closes #366)	5 years ago
Mike Fährmann	fa60109e97	[exhentai] don't use e-hentai.org for exhentai URLs	5 years ago
Mike Fährmann	4a0c98bfc9	miscellaneous fixes and adjustments	5 years ago
Mike Fährmann	2c839f3760	[imgbb] add user extractor + login support (#361 )	5 years ago
Mike Fährmann	2153206093	[imgbb] add album extractor (#361 )	5 years ago
Mike Fährmann	beb4fab2e6	[exhentai] improve limit and error handling (#360 ) - check image limit before opening the first gallery or image page - prevent any further exhentai extractors from running after the image limit has been reached	5 years ago
Mike Fährmann	81b35ed3cb	[exhentai] catch more error states (#356 , #360 ) - warn on MPV-enabled galleries - catch parsing errors for gallery pages and image info - write page content to debug output	5 years ago
Mike Fährmann	6ce22f606b	[exhentai] update login procedure and tests Logging in now follows the natural login flow that also happens in a browser more closely and collects more cookies than just ipb_member_id and ipb_pass_hash. Test URLs have been updated and now point to the e-hentai.org domain.	5 years ago
Mike Fährmann	dc73d02d87	[exhentai] always use e-hentai.org as domain + set nw cookie	5 years ago
Mike Fährmann	40637556fa	[ngomik] fix extraction	5 years ago
Mike Fährmann	3969f9cbbd	[behance] fix collection extraction	5 years ago
Mike Fährmann	17a3426845	[gelbooru] enable all content when not using API	5 years ago
Mike Fährmann	279db2c5b2	[vsco] add collection & image extractor + video support (#331 )	5 years ago
Mike Fährmann	d9d44ad953	[tsumino] update test results	5 years ago
Mike Fährmann	60cf40380a	[vsco] add user extractor (#331 )	5 years ago
Mike Fährmann	3fe5ccdfa6	[adultempire] add gallery extractor (closes #340 )	5 years ago
Mike Fährmann	5d968412ca	[deviantart] case-insensitive folder name matching (fixes #343 )	5 years ago
Mike Fährmann	a3c736fedc	[500px] fix extraction Maximum available image dimensions have been reduced to 4096px on the longest edge. (from 5000px) A few (unimportant) metadata fields are no longer available or have been changed to 'null'.	5 years ago
Mike Fährmann	1133b7fcbd	[smugmug] update unit tests The account used for tests before has been deleted.	5 years ago
Mike Fährmann	21991acc49	add 'ciphers' option; update default User-Agent	5 years ago
Mike Fährmann	84f4d3bc0b	replace urllib3's default cipher list with Firefox's (#342 ) Avoids Cloudflare CAPTCHAs on both Linux in Windows without pyOpenSSL installed.	5 years ago
Mike Fährmann	feb98cf196	[twitter] improve 'content' formatting; add option (#338 ) - include emoticons - leave newlines intact - remove pic.twitter.com/ links at the end	5 years ago
Mike Fährmann	8d1ae9b715	[tumblr] enable date-min/-max/-format options (#337 )	5 years ago
Mike Fährmann	09f37fde39	[reddit] move date-min/-max handling into Extractor class	5 years ago
Mike Fährmann	0151e250f5	[twitter] extract 'content' metadata (closes #333 )	5 years ago
Mike Fährmann	56c7a66a4a	detect Cloudflare CAPTCHAs and update cipher list	5 years ago
Mike Fährmann	a7b42b37a2	[35photo] fix extraction	5 years ago
Mike Fährmann	04b8d0894a	[newgrounds] improve metadata extraction	5 years ago
Mike Fährmann	12da6bd0c9	[simplyhentai] fix/improve extraction	5 years ago
Mike Fährmann	fdec59f8e2	replace extractor.request() 'expect' argument with - 'fatal': allow 4xx status codes - 'notfound': raise NotFoundError on 404	5 years ago
Mike Fährmann	2ff73873f0	[erolord] add gallery extractor (closes #326 )	5 years ago
Mike Fährmann	b4da8c5a97	[sexcom] add extractor for related pins (#325 )	5 years ago
Mike Fährmann	69997e92db	[sexcom] skip unavailable pins (#325 )	5 years ago
Mike Fährmann	bc6b0cfddc	[shopify] skip consecutive duplicate products Not filtering duplicate URLs anymore caused the archive ID uniqueness test to fail.	5 years ago
Mike Fährmann	b89f0d8d3c	update extractor result tests	5 years ago
Mike Fährmann	69205df68d	allow '-1' for infinite retries (#300 )	5 years ago
Mike Fährmann	f7b5c4c3e7	use values of 'retries' options correctly The RE-tries option now specifies exactly that: the maximum number a failed HTTP request is re-tried. For example a value of 2 will now correctly stop after 3 attempts: the initial one + 2 re-tries. The maximum wait-time now also caps at 30min and increases exponentially for both extractor.request() and downloader.http.download().	5 years ago
Mike Fährmann	40da44b17f	Merge branch 'v1.9.0'	5 years ago
Mike Fährmann	7a99e85943	[kissmanga] fix download URLs and file extensions The current Blogspot image URLs hosted on Kissmanga end with an "invalid" query parameter (/000.png&upx=...), which doesn't get recognized by 'spliturl()' and 'parseurl()' as such and gets therefore included in the 'extension' field from 'text.nameext_from_url()'.	5 years ago
Mike Fährmann	055102431f	[hitomi] handle Game CG galleries with scenes (fixes #321 )	5 years ago
Mike Fährmann	a9c89085fb	[instagram] implement login support (#195 )	5 years ago
Mike Fährmann	7856e5e7dc	]deviantart] "fix" scraps extraction	5 years ago
Mike Fährmann	082cb24acd	[pururin] fix extraction Missing metadata information would lead to unnecessary exceptions.	5 years ago
Mike Fährmann	98554cbab8	[mangoxo] fix login	5 years ago
Mike Fährmann	108963d138	[imagefap] include Referer headers	5 years ago
Mike Fährmann	e314621366	[nsfwalbum] fix default directory_fmt (#287 )	5 years ago
Mike Fährmann	18a1f8c6cd	[vanillarock] add post and tag extractors (closes #254 )	5 years ago
Mike Fährmann	f0c5093812	[nsfwalbum] add album extractor (closes #287 )	5 years ago
Mike Fährmann	61e413d85d	[hentaifoundry] stop disabling IPv6 addresses The rogue address mentioned in `a138d58` is no longer included in the DNS results for www.hentai-foundry.com.	5 years ago
Mike Fährmann	76ae9957c2	[deviantart] force legacy version for single deviations Let's see how long this works ... DeviantArt is rolling out a new version of their website, including a new internal and potentially usable API (rewrite incoming, yay). The issue with the new layout is that it doesn't include the "old" UUIDs for single deviations, i.e. mapping a numeric deviation ID to its UUID counterpart is impossible with the new layout.	5 years ago
Mike Fährmann	520c8ba106	[hentaicafe] extract 'tags' and 'artist' metadata (closes #238 ) These metadata fields will only be filled in when using a top-level URL, because that's the only place this information is available. Using a Foolslide URL (1) will leave these fields empty. (1) https://hentai.cafe/manga/read/.../en/0/1/"	5 years ago
Mike Fährmann	b51baa9a4b	[hitomi] fix empty language detection; parse datetime	5 years ago
Mike Fährmann	258e8b2060	[deviantart] small code improvements	5 years ago
Mike Fährmann	a77340c647	[keenspot] fix extraction for "TwoKinds"	5 years ago
Mike Fährmann	03e6876fbe	[instagram] provide 'description' metadata (#310 )	5 years ago
Mike Fährmann	ec3e8601f1	[slickpic] add user extractor (#249 )	5 years ago
Mike Fährmann	97ef416218	[8muses] support multi-page listings (#305 )	5 years ago
Mike Fährmann	f5961ac968	[deviantart] download deviations with no 'content' field Some deviations (possibly only from sta.sh sources) are downloadable (i.e. 'is_downloadable' is true and /deviation/download/ works), but have no 'content' or similar in their JSON representation. (fixes #307)	5 years ago
Mike Fährmann	4e07f99e3e	[mangoxo] change token message level to debug The login page currently doesn't provide and require a login token (logging in works without a token), so printing a warning during each login is unnecessary.	5 years ago
Mike Fährmann	d997c10320	[8muses] add album extractor (#305 )	5 years ago
Mike Fährmann	e05a96db5e	[deviantart] rename 'stash' to 'extra' (#302 ) 'stash' is already used as a name for the StashExtractor and therefore expected to be a dictionary.	5 years ago
Mike Fährmann	2184e3a86b	[slickpic] add album extractor (#249 )	5 years ago
Mike Fährmann	c23bf263fe	[deviantart] rename 'external' to 'stash' (#302 ) restrict extracted URLs to ones from https://sta.sh/...	5 years ago
Mike Fährmann	c73c2cda50	[pornhub] add gallery & user extractor (#282 )	5 years ago
Mike Fährmann	7c6cb908f9	[xhamster] update test results	5 years ago
Mike Fährmann	2fb85178da	[deviantart] add 'external' option (#302 ) If a description is available, this will extract URLs from the description text and try to find Extractors for them.	5 years ago
Mike Fährmann	f85e42cffc	[deviantart] fix --range for deviation & stash extractor	5 years ago
Mike Fährmann	40c7eb3424	[livedoor] improve extraction (fixes #301 )	5 years ago
Mike Fährmann	62335b9015	[paheal] adjust test results	5 years ago
Mike Fährmann	aa1ca4ed35	[shopify] skip deleted products (#175 ) Product pages which return a 4xx status code will now be skipped instead of raising an exception.	5 years ago
Mike Fährmann	096009367b	[xhamster] add gallery & user extractor (#281 )	5 years ago
Mike Fährmann	208202b962	[tumblr] improve error handling (#297 ) In some cases Tumblr's API responds with an HTML document. Trying to decode it as JSON would raise an uncaught exception.	5 years ago
Mike Fährmann	c08c340178	[directlink] make pattern case insensitive (fixes #296 )	5 years ago
Mike Fährmann	95b4a53b9c	[keenspot] improve pagination (#223 ) The old code would skip the last comic page for some series.	5 years ago
Mike Fährmann	731c7cbd5b	[keenspot] support all comics and "random" access (#223 )	5 years ago
Mike Fährmann	6a34f4b0c1	skip tests on read timeouts; print list of skipped tests	5 years ago
Mike Fährmann	1c36e65e9b	[exhentai] choose site version depending on input URL (#278 ) Use e-hentai.org as root and cookiedomain if the input URL is from e-hentai (or g.e-hentai), use exhentai.org otherwise.	5 years ago
Mike Fährmann	6da3e21237	[downloader:ytdl] provide 'filename' metadata (closes #291 )	5 years ago
Mike Fährmann	d33f5a7423	[wallhaven] rewrite - use API - remove login support, add 'api-key' option - remove support for "alpha" subdomain - alpha.wallhaven.cc used numeric IDs that can't be translated to the new ID system - support direct links to wallpapers	5 years ago
Mike Fährmann	5499934ae2	[ngomik] fix extraction	5 years ago
Mike Fährmann	f1893b2b5b	[deviantart] add 'folders' option (#276 )	5 years ago
Mike Fährmann	c849574def	[keenspot] add comic extractor (#223 ) Doesn't work for - http://brawlinthefamily.keenspot.com/ - http://flipside.keenspot.com/ - http://lastblood.keenspot.com/ - http://mysticrevolution.keenspot.com/ - http://porcelain.keenspot.com/ - http://twokinds.keenspot.com/ yet, because of custom layouts.	5 years ago
Mike Fährmann	8bd5a19515	[hentainexus] add '_extractor' data	5 years ago
Mike Fährmann	2a085a5e96	[sankakucomplex] fix 'date' values (#258 )	5 years ago
Mike Fährmann	bcd1801aa8	[sankakucomplex] add 'tag' extractor (#258 )	5 years ago
Mike Fährmann	74c2415138	[sankakucomplex] move article extractor to its own module (#258 )	5 years ago
Mike Fährmann	4465a3ea68	[kissmanga][readcomiconline] add 'captcha' option (#279 ) to configure how to handle CAPTCHA page redirects: - either interactively wait for the user to solve the CAPTCHA - or raise StopExtraction like before	5 years ago
Mike Fährmann	1e3e15c4f3	[sankaku] add article extractor (#258 )	5 years ago
Mike Fährmann	48233f00c0	[readcomiconline] detect 'AreYouHuman' redirects (#279 )	5 years ago
Mike Fährmann	1cde38110d	[livedoor] return 'date' as datetime object	5 years ago
Mike Fährmann	e88824e1a7	[livedoor] fix adjustments for https:// URLs	5 years ago
Mike Fährmann	b3e4664715	[hentainexus] fix extraction	5 years ago
Mike Fährmann	399e8e965a	also update urllib3's cipher list for versions >= 1.25	5 years ago
Mike Fährmann	f837ea98cb	[deviantart] don't call 'extend()' on folders (fixes #271 )	5 years ago
Mike Fährmann	bb32a2d490	[patreon] use file extensions from original filenames (#268 )	5 years ago
Mike Fährmann	efa805c5d7	[sankaku] update pagination end condition (fixes #265 ) Pagination over popular listings (`date:...+order:popular") never terminates, not even on the site itself, and at some point returns the same results over and over again.	5 years ago
Mike Fährmann	a4ba34c835	[booru] prevent crash when no tags are present (#259 )	5 years ago
Mike Fährmann	ca3bad1779	[patreon] small fixes and adjustments (#226 ) - fix datetime parsing - rename 'user' to 'creator' - convert 'id' to integer - improve tests	5 years ago
Leonardo Taccari	fb09dd962a	[instagram] Fix extraction after `rhx_gis' field removal	5 years ago
Mike Fährmann	7a14aaed7d	[luscious] fix extraction	5 years ago
Mike Fährmann	e82cadac61	[patreon] add extractors (#226 )	5 years ago
Mike Fährmann	4891f4a328	[hentainexus] add search extractor (#256 )	5 years ago
Mike Fährmann	c02f12ce2f	avoid Cloudflare CAPTCHAs for OpenSSL < 1.1.1 see https://github.com/Anorov/cloudflare-scrape/pull/242	5 years ago
Mike Fährmann	0b4be57a10	[sankaku] fix error when no tags available (closes #259 ) [ci skip]	5 years ago
Mike Fährmann	9890bfdf23	[flickr] improve code and metadata - simplify pagination - add more metadata and slightly change its structure - convert suitable values to int or list - move keys from ["photo"] to the base level - proper video support (#246) - rename method and variable names to better fit with other extractors	5 years ago
Mike Fährmann	aa8e366b90	[luscious] fix tag extraction	5 years ago
Mike Fährmann	ba8eb1ffec	[hentainexus] add gallery extractor (#256 )	5 years ago
Mike Fährmann	b1db194c14	[reactor] update and improve - split 'tags' into a list - parse 'date' into a datetime object - fix webm/mp4 URLs	5 years ago
Mike Fährmann	b0e85a42e3	apply workaround from `4736912` in parse_datetime() itself	5 years ago
Mike Fährmann	8de5866fd2	[twitter] replace unit test URLs https://twitter.com/PicturesEarth was deleted	5 years ago
Mike Fährmann	74c7304c6b	[newgrounds] extract 'date', 'favorites', and 'score'	5 years ago
Mike Fährmann	4736912d4e	[pixiv] work around strptime limitations in Python < 3.7 "%z" doesn't allow a colon separator in older Python versions: - "+0900" is OK - "+09:00" raises an exception	5 years ago
Mike Fährmann	1f7fa9dc8e	[exhentai] update data extraction code - parse 'date' to datetime object - use 'text.extract_from()'	5 years ago
Mike Fährmann	80fdb11508	[pixiv] add 'date' metadata field (closes #248 )	5 years ago
Mike Fährmann	049e9fd6ce	[twitter] fix pagination end condition Some timelines would cause an endless loop because 'has_more_items' is always True, even if it would return the same list of tweets over and over again.	5 years ago
Mike Fährmann	51e0e92429	[deviantart] fix GIF downloads (#242 ) The "original" download URL for GIF animations is only a preview version of the original file.	5 years ago
Leonardo Taccari	f347d2d152	[instagram] Fix for missing `edge_media_to_comment' field and add `date' metadata (#250 ) * [instagram] Remove no longer always present `comments' field `edge_media_to_comment' is no longer always present in the response (also for the same media sometimes is present and sometimes is not present). * [instagram] Add `date' metadata	5 years ago
Mike Fährmann	5fd94c6b83	import urllib3 from requests.packages	5 years ago
Mike Fährmann	35f343206c	update default SSL cipher list in urllib3 < 1.25 Cloudflare now also checks the client's SSL/TLS cipher capabilities and produces a 403: Forbidden response with CAPTCHA if they are insufficient. This commit replaces the default cipher list in urllib3 < 1.25 with the one from 1.25 (1), which doesn't cause problems as long as the client platform actually supports these ciphers. On some platforms (tested with Python 3.4 on Linux and Python 3.7 on an outdated Windows 7 VM) it is necessary to install pyOpenSSL to get everything to work. Explicitly setting a minimum/maximum version for urllib3 is also no longer necessary and installing gallery-dl will therefore not pull a incompatible urllib3 version (#229) Fixes the "403: Forbidden" error on Artstation (#227) (1) `0cedb3b0f1`	5 years ago
Mike Fährmann	fc5e4f2b21	[hitomi] simplify data extraction code	5 years ago
Mike Fährmann	2756cc8dde	[hitomi] set Referer header (fixes #239 )	5 years ago
Mike Fährmann	dcc1592dbf	[twitter] add fallback URLs (#237 )	5 years ago
Mike Fährmann	1c665fd4bd	[mangoxo] fix login	5 years ago
Mike Fährmann	add7e693d0	[tumblr] provide parsed 'date' metadata (#232 )	5 years ago
Mike Fährmann	9544683d56	[deviantart] provide 'date' metadata (#232 )	5 years ago
Mike Fährmann	0d7e8be987	[dynastyscans] simplify image extractor	5 years ago
Mike Fährmann	9aa0bb5afe	[dynastyscans] encode "[]" in search queries urllib3 1.25 classifies URLs with unencoded "[" or "]" as invalid and raises an exception	5 years ago
Mike Fährmann	fe849382d8	[komikcast] improve extraction	5 years ago
Mike Fährmann	0318c610dc	[sexcom] add extractor for search results (#147 )	5 years ago
Mike Fährmann	a247c94c34	[sexcom] add pin and board extractors (#147 )	5 years ago
Mike Fährmann	6264a46212	use 'utcfromtimestamp()' 'fromtimestamp()' converts its results to the local timezone and causes problems when running tests on a different machine.	5 years ago
Mike Fährmann	d84e7c6861	[twitter] extract 'date' metadata (#224 )	5 years ago
Mike Fährmann	f2cf1c1d73	use 'text.extract_from()' in a few places	5 years ago
Mike Fährmann	e25ebc4bff	don't disable certificate checks anymore Executables generated with PyInstaller auto-include the root certificate file and certificate checks now work out-of-the-box.	6 years ago
Mike Fährmann	70be494161	[plurk] add a 'comments' options (#212 )	6 years ago
Mike Fährmann	0b2ff406f6	[plurk] add timeline- and post-extractors (#212 )	6 years ago
Mike Fährmann	d6ddb74cde	update test results - deviantart: 'index' is now an integer - flickr: image file with lower quality - paheal: image server name changed - rule34: post got deleted	6 years ago
Mike Fährmann	87b0929bec	Revert "[flickr] restore image quality" This reverts commit `3f513f1056`. Both live.staticflickr and farmN.staticflickr servers now produce the same image file with a lower overall quality than before this change in Flickr's end.	6 years ago
Mike Fährmann	e7cd5510d5	[pixnet] add extractors (closes #177 ) for: - users/blogs: http://albertayu773.pixnet.net/ - folders: https://albertayu773.pixnet.net/album/folder/1405768 - sets : https://albertayu773.pixnet.net/album/set/15078995 - photos : https://albertayu773.pixnet.net/album/photo/159443828	6 years ago
Mike Fährmann	155e1faeaf	[imagebam] support galleries with >100 images (fixes #219 )	6 years ago
Mike Fährmann	9587aea98f	[deviantart] don't rewrite URLs for newer deviations The '/intermediary/' trick stopped working for recently posted deviations, but it still appears to be functional for older ones.	6 years ago
Mike Fährmann	f2220938cb	[mangoxo] improve channel extraction (#184 )	6 years ago
Mike Fährmann	d9b94a585d	[mangoxo] add login support (#184 ) A very recent change: It is now only possible to see more than the first 5 images of an album if you are logged in.	6 years ago
Mike Fährmann	49a6522c38	ensure consistent headers and params ordering Necessary to avoid being labeled a bot and getting a CAPTCHA response after solving a Cloudflare challenge.	6 years ago
Mike Fährmann	e730fc9045	[twitter] add login support (#214 )	6 years ago
Mike Fährmann	2c32dc76cb	[yaplog] update metadata structure (#190 ) Put all blog post related fields in its own dict. 'image_id' -> 'id' 'post_id' -> 'post[id]' 'title' -> 'post[title]' etc ...	6 years ago
Mike Fährmann	35919a9bb8	[livedoor] add blog- and post-extractors (#190 )	6 years ago
Mike Fährmann	3f513f1056	[flickr] restore image quality Flickr started serving images from live.staticflickr.com (see `ec88ff1`), but the old farmN.staticflickr.com URLs still work - at least for the time being. Filesize (and most likely quality as well) for images from live.… is severely reduced compared to images from farmN.… for non-original files, so all live URLs are replaced to point to a randomly chosen farm server.	6 years ago
Mike Fährmann	060859cc68	fix URL patterns allow https:// as well as http://	6 years ago
Mike Fährmann	13526f3624	[yaplog] fix archive_id and posts with more than 24 images - 'post_id' and 'image_id' are only unique per user - /image/ pages only show a maximum of 24 images, but there can be more images than that in a blog post - let extraction run in its own thread and maybe improve speed - #190	6 years ago
Mike Fährmann	2ff043edfa	[yaplog] add user- and post-extractors (#190 )	6 years ago
Mike Fährmann	790f15a56f	[photobucket] use HTTPS	6 years ago
Mike Fährmann	6da665f32e	[mangoxo] add album- and channel-extractors (closes #184 )	6 years ago
Mike Fährmann	21e80d60ff	[wikiart] docstring fixes	6 years ago
Mike Fährmann	c70b21248d	[wikiart] add extractors (#179 ) for - artists: https://www.wikiart.org/en/thomas-cole - artist-listings: https://www.wikiart.org/en/artists-by-century/12 - artwork-listings: https://www.wikiart.org/en/paintings-by-media/grisaille	6 years ago
Mike Fährmann	0f02e85961	[reactor] use "/full/" URLs (closes #210 ) Putting a "/full/" in image URLs potentially gives higher resolution and better quality.	6 years ago
Mike Fährmann	17c11393f5	[weibo] allow user-ids in status URLs	6 years ago
Mike Fährmann	ec88ff1562	[flickr] relax unit test results Images are now randomly served from the 'live.staticflickr.com' domain instead of the "old" 'farmN.staticflickr.com' one, making it impossible to use static 'url' and 'keyword' hashes as results. Image quality doesn't appear to be effected by which image-server is used. Files from 'farmN' and 'live' are the same.	6 years ago
Mike Fährmann	00d604cafb	[luscious] fix SearchExtractor URL-pattern	6 years ago
Mike Fährmann	1384ebf907	[luscious] fix metadata extraction - remove 'artist', 'language', and 'lang' fields - replace 'section' with 'genre' - provide 'tags' as list - use GalleryExtractor as base class	6 years ago
Mike Fährmann	5398bfbd69	[exhentai] fix search and favorite extraction removes basically all metadata, but that can be compensated for with the right search query. writing "parsers" for all 4 possible views that have been introduced in the latest changes is too much of a hassle ...	6 years ago
Leonardo Taccari	790b1336a6	[instagram] Add support for hashtags Add support for hashtags (TagPage-s), i.e. explore/tags/<tag> URLs. This also introduce a get_metadata() method in order to append possible further metadata per-(sub)extractor. Refactor and generalize _extract_profilepage() to _extract_page() in order to be reused by _extract_profilepage() and _extract_tagpage() simply by passing the type of page (`ProfilePage' or `TagPage') and picking up the respective fields in shared data.	6 years ago
Mike Fährmann	a9bdd0f153	[instagram] fix syntax for Python 3.4 Python 3.4 doesn't like '**common' in dict literals. This also makes '_ytdl_index' zero-based.	6 years ago
Mike Fährmann	eacebf41e4	fix typo in README	6 years ago
Leonardo Taccari	1e38f65996	[instagram] Add support for GraphSidecar media types (#201 ) * [instagram] Add support for GraphSidecar media types Refactor _extract_postpage() to always return a list of medias. Fetch common keywords and gracefully handle GraphSidecar media type by extracting each single media and adding `sidecar_media_id' and `sidecar_shortcode' keywords to indicate the parent of sidecar childrens. While here join the copyright comment lines in a single one. Closes #178. * [instagram] Use `yield from' instead of `for ... yield' (thanks @mikf)! * [instagram] Adjust filename for GraphSidecar medias Add a possible leading `media_id' of the sidecar for GraphSidecar media. Thanks to @mikf for the suggestion! * [instagram] Add extra metadata for youtube-dl in GraphSidecar childrens GraphSidecar children ytdl: URLs when consumed by youtube-dl redirects to the URL of their parent. In GraphSidecar-s with multiple GraphVideo-s this leads to downloading the same video multiple times. Add a `_ytdl_index' field to indicate the index of the youtube-dl playlist corresponding the children of the sidecar. This will be used by the `ytdl' downloader.	6 years ago
Mike Fährmann	6ba67b0537	[hypnohub] add extractors (closes #196 )	6 years ago
Mike Fährmann	fe27154a10	[komikcast] fix extraction ... again	6 years ago
Mike Fährmann	5ec55ec4fc	[deviantart] improve URLs for non-downloadable deviations	6 years ago

... 7 8 9 10 11 ...

2006 Commits (25074aec4730064b86c2314f891b25a06900e25b)