gallery-dl

Commit Graph

Author	SHA1	Message	Date
Mike Fährmann	6514828d4e	emit debug logging message when loading cookies from file	2 years ago
Mike Fährmann	9f06e79868	implement '"user-agent": "browser"' (#2636 )	2 years ago
Mike Fährmann	86790da2d5	update Cloudflare IUAM detection again	2 years ago
Mike Fährmann	8b1fe0bcf1	emit debug logging messages before calling time.sleep() (#2982 )	2 years ago
Mike Fährmann	73a52a95b0	update Cloudflare IUAM detection	2 years ago
Mike Fährmann	eb68d45544	add global 'warnings' option (#2762 )	2 years ago
Mike Fährmann	e4f48cc810	make it easier to disable default 'browser' settings Previously it was necessary to set 'browser' to a non-empty, non-string value to disable any default 'browser' value. Now '-o browser=' or '-o browser=false' is enough.	2 years ago
Mike Fährmann	92b75bcdce	limit path length for --write-pages output on Windows (#2733 )	2 years ago
Mike Fährmann	de20cadc68	add 'brotli' as optional dependency (#2716 ) only send 'Accept-Encoding: br' if supported	2 years ago
Mike Fährmann	3a5d5c3a91	update default User-Agent header to Firefox 102 ESR snd update headers and ciphers for "browser": "firefox"	2 years ago
Mike Fährmann	535cbcb185	cache extracted browser cookies (in memory, for as long as gallery-dl is running) Extracting encrypted cookies from a chromium-based browser can take a long time, so repeating this process for each extractor should be avoided. Same goes for creating a temporary copy of the entire cookie database.	2 years ago
Mike Fährmann	6742f3bc1e	implement --cookies-from-browser (#1606 ) most of the code is adapted from yt-dlp's implementation and should work the same.	2 years ago
Mike Fährmann	c4b9f7bab8	update functions working with cookies.txt files - rename - load_cookiestxt -> cookiestxt_load - save_cookiestxt -< cookiestxt_store - in cookiestxt_load, add cookies directly to a cookie jar instead of storing them in a list first - other unnoticeable performance increases	2 years ago
Mike Fährmann	3f02e483c6	[e621] fix applying request_interval_min (#2533 ) Setting this property after calling Extractor.__init__() has no effect.	2 years ago
Mike Fährmann	29db716a63	implement 'datetime_to_timestamp()' and rename 'to_timestamp()' to the more descriptive 'datetime_to_timestamp_string()'	3 years ago
Mike Fährmann	500a479026	fix a third(!) bug in _check_cookies() (#2372 ) turns out tests are worthless if you get em wrong ...	3 years ago
Mike Fährmann	47cf05c4ab	refactor proxy handling code (#2357 ) - allow gallery-dl proxy settings to overwrite environment proxies - allow specifying different proxies for data extraction and download - add 'downloader.proxy' option - '-o extractor.proxy=–PROXY_URL -o downloader.proxy=null' now has the same effect as youtube-dl's '--geo-verification-proxy'	3 years ago
Mike Fährmann	bddcec49f1	implement 'text.root_from_url()' use domain from input URL for kemono	3 years ago
Mike Fährmann	f5b2b9333f	fix another bug in _check:cookies (#2160 ) regression introduced in `ed317bfc` Added a couple of tests to hopefully catch such bugs before they land in a release.	3 years ago
Mike Fährmann	ed317bfcf1	warn about cookies expiring in less than 24 hours requires an expiration timestamp, so this only works with cookies from a cookies.txt file	3 years ago
Mike Fährmann	b4f8e15a1f	allow BaseExtractors to use the domain pf the matched URL	3 years ago
Mike Fährmann	f58364f6a8	update Firefox cipher list	3 years ago
Mike Fährmann	7e6981dda6	rename 'disabletls12' to 'tls12' and let config options override any default settings	3 years ago
Mike Fährmann	bb3e182562	overhaul session initialization - share adapter & connection pool across sessions with the same ssl options, ssl ciphers, and source address - simplify browser emulation to just a list of headers and ciphers	3 years ago
Robert Pendell	4c651f6252	[patreon] Disable TLS 1.2 by default (#2249 ) Disables TLS 1.2 on Patreon by default.	3 years ago
Robert Pendell	392cf079f7	Add ability to disable TLS 1.2 (#2243 ) Fix for Patreon Cloudflare issues by having only TLS v1.3 or higher establish HTTPS connections This now allows you to disable it on a per-host or global basis. Add disabletls12 as a config option either under extractor.(host) or just under extractor. Option is false by default. Example: "patreon": { "disabletls12": true, "cookies": { "session_id": "X" } }	3 years ago
Mike Fährmann	de754590e0	add --source-address command-line option (closes #2206 )	3 years ago
Mike Fährmann	6f2e0c9c3d	fix cookie checks for patreon, fanbox, fantia The changes in `9a255344` caused a warning about missing cookies to be displayed even if those cookies were present, because _check_cookies() did not account for an empty cookiedomain.	3 years ago
Mike Fährmann	ad30653b17	allow running a BaseExtractor for any URL by prefixing it with '<base-category>:' For example: shopify:https://partakefoods.com/products/crunchy-cookie-variety-pack gelbooru_v01:https://5naf.booru.org/index.php?page=post&s=view&id=46963 Available base categories are: mastodon, shopify, moebooru, gelbooru_v01, gelbooru_v02, reactor, foolslide, foolfuuka, philomena	3 years ago
Mike Fährmann	dad2875a3e	fix calculating retry sleep times (fixes #1990 )	3 years ago
Mike Fährmann	e69ee41f25	implement 'page-reverse' option (#1854 )	3 years ago
Mike Fährmann	c9e6693530	allow specifying a minimum/maximum for 'sleep-*' options (#1835 ) for example '"sleep-request": [5.0, 10.0]' to wait between 5 and 10 seconds between each HTTP request	3 years ago
Mike Fährmann	2ff2974353	[common] update default argument handling in Extractor.request() more lines of code, but slightly less execution time	3 years ago
Mike Fährmann	d79bcb6236	allow extractors to register a 'finalize()' method	3 years ago
Mike Fährmann	bb6a130942	automatically set required DDoS-GUARD cookies (#1779 ) for kemono.party and seiso.party	3 years ago
Mike Fährmann	bd08ee2859	remove most 'yield Message.Version' statements only leave them in oauth.py as noop results	3 years ago
Mike Fährmann	9cb5ea5eda	update default User-Agent headers	3 years ago
Mike Fährmann	0179581340	add 'T' format string conversion (#1646 ) to convert 'date'/datetime to timestamp	3 years ago
Mike Fährmann	94faf8c85a	add type check before applying 'browser' option (fixes #1358 )	4 years ago
Mike Fährmann	6cfc9613fe	update some code in Extractor constructor - combine '_init_headers' and '_emulate_browser' functionality into new '_init_session' - add 'headers' and 'ciphers' options	4 years ago
Mike Fährmann	29ea54dc41	[patreon] use '"browser": "firefox"' by default (#1117 )	4 years ago
Mike Fährmann	cf5fa75d4c	add 'browser' option (#1117 ) - change default user agent to Firefox ESR 78 on Windows 10 - remove 'ciphers' option	4 years ago
Mike Fährmann	e1a12761d7	strip '/' from instance root URLs	4 years ago
Mike Fährmann	d656892670	remove cloudflare.py The old IUAM challenge doesn't get used anymore, i.e. code to bypass it is pointless, and the 'is_...()' checks are simple enough to directly include them in 'extractor.request()'.	4 years ago
Mike Fährmann	88fae99811	remove 'generate_extractors()'	4 years ago
Mike Fährmann	745a114c61	[common] implement BaseExtractor class Should be used when the same extractor logic applies to different instances/domains of several sites, e.g. FoolFuuka, Shopify, etc. This will replace the functionality of 'generate_extractors()' in a more efficient way, by condensing everything into 1 class and not dynamically generating an extractor class for each instance.	4 years ago
Mike Fährmann	0d406c8daf	[common] restrict values used in 'generate_extractors()'	4 years ago
Mike Fährmann	8ca7f54750	rename '_request_…' variables - remove '_' at the beginning - _request_last -> request_timestamp	4 years ago
Mike Fährmann	c57a918f4a	[e621] implement delay via '_request_interval_min'	4 years ago
Mike Fährmann	1e3dd7330e	merge SharedConfigMixin functionality into Extractor	4 years ago
Mike Fährmann	198c33ec36	also collect post processors from 'basecategory' entries (fixes #1084)	4 years ago
Mike Fährmann	1e313d5b84	implement 'sleep-request' option	4 years ago
Mike Fährmann	055c32e0f7	precompute extractor config paths	4 years ago
Mike Fährmann	231dd4c800	accumulate postprocessor objects (#994 ) Instead of one 'postprocessors' setting overwriting all others lower in the hierarchy, all postprocessors along the config path will now get collected into one big list. For example '--mtime-from-date' will therefore no longer cause other postprocessor settings in a config file to get ignored.	4 years ago
Mike Fährmann	f6fd449b59	reduce wait time growth rate from exponential to linear Waiting for 2**N seconds after each error grows too fast. Simply waiting N seconds seems far more reasonable.	4 years ago
Mike Fährmann	2c9766b29f	fix UnboundLocalError in Extractor.request() introduced in `d6a271d`	4 years ago
Mike Fährmann	d6a271d2c7	add 'response' objects to 'HttpError's	4 years ago
Mike Fährmann	53cc498d9c	improve config lookup when there are multiple possible locations This specifically applies to all Mastodon extractors and all extractors with a 'basecategory', i.e. 'booru', 'foolslide', etc. Values inside those general config locations wouldn't be recognized when a value with the same was set on the 'extractor' level. For example 'extractor.mastodon.directory' should be used over 'extractor.directory' when both are set, but this was impossible with the previous implementation. (fixes #843)	4 years ago
Mike Fährmann	1ae1df0d27	update '--write-pages' (#737 ) - fix infinite recursion for responses with multiple entries in 'history' - hide values of Set-Cookie headers - only write the response content by default (use '-o write-pages=all' to also include HTTP headers)	4 years ago
Mike Fährmann	15c3d29062	move dump_response() into a separate function (#737 )	4 years ago
Mike Fährmann	a363da4b43	include redirects and headers in --write-pages dumps (#737 )	4 years ago
Mike Fährmann	3201fe3521	add global SENTINEL object	4 years ago
Mike Fährmann	f8f95e68a7	improve '--write-pages' (#737 ) - move code into its own function - add enumeration index to filenames - dump responses regardless of status code	4 years ago
Vrihub	4cc761c730	Implement --write-pages option (#736 ) * Implement --write-pages option * Fix long lines * Fix file mode to binary * Fix pattern for Windows compatibility	4 years ago
Mike Fährmann	5d7ca76885	retry Cloudflare challenges	4 years ago
Mike Fährmann	d02f7c1118	improve Extractor.wait() - allow 'until' to be a datetime object - do "time calculations" with UTC timestamps - set a default 'reason'	5 years ago
Mike Fährmann	2a4f227e08	warn about expired cookies	5 years ago
Mike Fährmann	56f1c96168	implement 'parent-directory' option (#551 )	5 years ago
Mike Fährmann	2a9be48511	improve util.load/save_cookiestxt() and add tests - take a file object as argument instead of an filename - accept whitespace before comments (" # comment") - map expiration "0" to None and not the number 0	5 years ago
Mike Fährmann	c1a6862863	implement functions to load/save cookies.txt files (closes #586 ) The methods of the standard libraries' MozillaCookieJar have several shortcomings (#HttpOnly_ cookies, 0 expiration timestamps, etc.) and require construction of an ultimately pointless CookieJar object.	5 years ago
Mike Fährmann	bd5ce9855c	allow GalleryExtractors to set URL-independent extensions	5 years ago
Mike Fährmann	3811fd8a25	fix time formatting for Python 3.4 and 3.5 'datetime.time.isoformat()' only has an optional 'timespec' argument since Python 3.6.	5 years ago
Mike Fährmann	569747a78d	implement extractor.wait()	5 years ago
Mike Fährmann	ce54b8c04c	let extractors opt-out of cookie option usage useful to avoid sending unnecessary cookies when all authentication is done through OAuth tokens	5 years ago
Mike Fährmann	d3e44e899d	raise NotFoundErrors for 404 responses in GalleryExtractors	5 years ago
Mike Fährmann	a4dd8b3dab	improve _check_cookies() Only loop over all cookies once instead of calling cookiejar._find() for each cookie name.	5 years ago
Mike Fährmann	15f9bb3d14	add option to disable pyOpenSSL usage (#508 ) (pyOpenSSL is now disabled by default)	5 years ago
Mike Fährmann	e17907ee2a	change default value of 'cookies-update' to 'true'	5 years ago
Mike Fährmann	e2710702d4	fix Cloudflare bypss	5 years ago
Mike Fährmann	ae09f87602	improve SharedConfigMixin config lookups	5 years ago
Mike Fährmann	f5604492c3	update interface of config functions	5 years ago
Mike Fährmann	d45fabb79d	match user profile handling on deviantart and newgrounds	5 years ago
Mike Fährmann	1a197d2195	store the original cookiejar as Extractor._cookiejar	5 years ago
Mike Fährmann	de83ae4576	make 'method' argument of Extractor.request keyword-only	5 years ago
Mike Fährmann	d44f790e81	adjust output for HTTP status related errors	5 years ago
Mike Fährmann	389d2d7e38	implement 'cookies-update' option (#445 )	5 years ago
Mike Fährmann	1693d97bd3	update extractor class hierarchies - let the GalleryExtractor class inherit directly from Extractor - make ChapterExtractor a subclass of GalleryExtractor - change enumeration field names of GalleryExtractors to 'num'	5 years ago
Mike Fährmann	f4bc75e854	fix rate limit handling for OAuth APIs (#368 )	5 years ago
Mike Fährmann	21991acc49	add 'ciphers' option; update default User-Agent	5 years ago
Mike Fährmann	84f4d3bc0b	replace urllib3's default cipher list with Firefox's (#342 ) Avoids Cloudflare CAPTCHAs on both Linux in Windows without pyOpenSSL installed.	5 years ago
Mike Fährmann	09f37fde39	[reddit] move date-min/-max handling into Extractor class	5 years ago
Mike Fährmann	56c7a66a4a	detect Cloudflare CAPTCHAs and update cipher list	5 years ago
Mike Fährmann	fdec59f8e2	replace extractor.request() 'expect' argument with - 'fatal': allow 4xx status codes - 'notfound': raise NotFoundError on 404	5 years ago
Mike Fährmann	69205df68d	allow '-1' for infinite retries (#300 )	5 years ago
Mike Fährmann	f7b5c4c3e7	use values of 'retries' options correctly The RE-tries option now specifies exactly that: the maximum number a failed HTTP request is re-tried. For example a value of 2 will now correctly stop after 3 attempts: the initial one + 2 re-tries. The maximum wait-time now also caps at 30min and increases exponentially for both extractor.request() and downloader.http.download().	5 years ago
Mike Fährmann	399e8e965a	also update urllib3's cipher list for versions >= 1.25	5 years ago
Mike Fährmann	c02f12ce2f	avoid Cloudflare CAPTCHAs for OpenSSL < 1.1.1 see https://github.com/Anorov/cloudflare-scrape/pull/242	5 years ago
Mike Fährmann	5fd94c6b83	import urllib3 from requests.packages	5 years ago
Mike Fährmann	35f343206c	update default SSL cipher list in urllib3 < 1.25 Cloudflare now also checks the client's SSL/TLS cipher capabilities and produces a 403: Forbidden response with CAPTCHA if they are insufficient. This commit replaces the default cipher list in urllib3 < 1.25 with the one from 1.25 (1), which doesn't cause problems as long as the client platform actually supports these ciphers. On some platforms (tested with Python 3.4 on Linux and Python 3.7 on an outdated Windows 7 VM) it is necessary to install pyOpenSSL to get everything to work. Explicitly setting a minimum/maximum version for urllib3 is also no longer necessary and installing gallery-dl will therefore not pull a incompatible urllib3 version (#229) Fixes the "403: Forbidden" error on Artstation (#227) (1) `0cedb3b0f1`	5 years ago
Mike Fährmann	e25ebc4bff	don't disable certificate checks anymore Executables generated with PyInstaller auto-include the root certificate file and certificate checks now work out-of-the-box.	6 years ago

1 2 3 4 5

229 Commits (aa6d00613f33041dec657988fbc3ed33bb1d5967)