Leonardo Taccari
160328d21c
[instagram] Add support for user's saved medias ( #644 )
...
* [instagram] Gracefully handle possible 'HttpErrorPage' in _extract_page()
`HttpErrorPage' is returned in shared_data at least when not authenticated or
when trying to fetch other users saved medias
(i.e. `instagram.com/<user>/saved/').
Gracefully handle it by returning nothing.
* [instagram] Add support for user's saved medias
(Please note that this need the user to be authenticated and they can
only see their saved media (not other users ones).)
Close #643 .
* [instagram] Bump copyright year
5 years ago
Mike Fährmann
e0b0e8d62a
release version 1.13.2
5 years ago
Mike Fährmann
d3482ace7f
[furaffinity] extract more metadata
...
- views
- favorites
- comments
- rating
- fa_category (since 'category' is already in use)
- theme
- species
- gender
- width
- height
5 years ago
Mike Fährmann
f6c5edb76b
pre-compile regex pattern for remove_html() and split_html()
5 years ago
Mike Fährmann
fdd2dd5136
[kabeuchi] add 'user' extractor ( closes #561 )
5 years ago
Mike Fährmann
59edcdc822
[hitomi] restore metadata fields from before f33b13a
...
... and add a 'metadata' option to disable
visiting the gallery page and extracting data from it
if this is not needed.
5 years ago
Mike Fährmann
2d5703c493
[twitter] use a simpler data structure to store cookies in cache
...
Use a dict with name-value pairs instead of an entire
RequestsCookieJar object.
5 years ago
Mike Fährmann
87d4f83597
[newgrounds] make post extraction nonfatal
5 years ago
Mike Fährmann
823fbeaae6
[newgrounds] add 'favorite' extractor ( #394 )
5 years ago
Mike Fährmann
a45fbc38ea
[pixiv] implement 'avatar' option ( #595 , #623 )
5 years ago
Mike Fährmann
a63a376ad2
[mangoxo] fix login
5 years ago
Mike Fährmann
ebc70e87ce
[e621] update to new interface / API endpoints ( closes #635 )
5 years ago
Mike Fährmann
d1cf7ccdb3
[instagram] add 'post_shortcode' metadata field ( #525 )
5 years ago
Mike Fährmann
32df8d06fe
[twitter] add 'bookmark' extractor ( closes #625 )
5 years ago
Mike Fährmann
3fb41c34c8
[bcy] reduce requests to '/item/detail/<id>' ( #613 )
...
The former implementation would try to use the embedded data from
'/item/detail/' pages for every post, even if that wasn't really
necessary.
This commit also fixes some issues with posts only visible to
logged in users.
5 years ago
Mike Fährmann
f33b13aacf
[hitomi] simplify metadata extraction
...
Use the data from https://ltn.hitomi.la/galleries/ <id>.js for both
image URLs and metadata and ignore any gallery or reader pages.
This removes 'artist', 'characters', 'group', and 'parody' metadata
fields since this information is, as for now, only available in
gallery pages.
5 years ago
Mike Fährmann
115fd2c6f2
"fix" incomplete MIME types ( #632 )
...
e-/exhentai's original image downloads currently send
incomplete/invalid Content-Type headers, "jpg" instead
of "image/jpg" etc, since the last update.
(https://forums.e-hentai.org/index.php?showtopic=236113 )
This change prepends any Content-Type value missing a
media type specification with "image/", transforming it
into a valid MIME type.
(A global solution to a local problem, but it shouldn't
cause any issues anywhere else)
5 years ago
Mike Fährmann
72122eb9b3
release version 1.13.1
5 years ago
Mike Fährmann
adcd7cb24a
[downloader:http] add another MIME type for '.rar' files ( #628 )
5 years ago
Mike Fährmann
ce5e2a58fe
[imgbb] update test results
...
Image server domain changed from
https://image.ibb.co/ to https://i.ibb.co/
5 years ago
Mike Fährmann
f117e32910
[danbooru] restore 'popular' functionality
5 years ago
Mike Fährmann
39b48d665b
[hiperdex] use proper name for 'chapter_minor'
5 years ago
Mike Fährmann
8fbbaa54ff
[bcy] fix partial image URLs ( #613 )
...
Images from new posts can have incomplete/partial URLs (1)
without any filename extension when fetching their data from
'/apiv3/user/selfPosts', so now all data gets taken from
'/item/detail/ID' pages.
It is currently unknown how to get the non-watermarked original version
of these images, or if that is possible at all. (2)
Images with a watermark will have their 'filter' metadata field set to
"watermark". For original images this field is an empty string "".
Enabling the 'noop' option will, in addition to the watermarked version,
yield the the '~noop.image' filter version (3),
where 'filter' is set to "noop".
(1) "https://img-bcy-qn.pstatp.com/banciyuan/3ccdff22479c4060aadc86718209b281 "
(2) "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~tplv-banciyuan-logo-v3:wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug==.image "
(3) "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~noop.image "
5 years ago
Mike Fährmann
86c00f9e66
[danbooru] move extractor logic from booru.py
5 years ago
Mike Fährmann
1d4a369ea2
update extractor test results
5 years ago
Mike Fährmann
7625912b31
[piczel] improve and update
...
- fix tag names
- fix a bug in _pagination()
- parse datetime in 'created_at' as 'date'
- rewrite main loop
- replace user profile test
5 years ago
Mike Fährmann
ec85bf90de
use context managers in cache.py & add tests
5 years ago
Mike Fährmann
913b8333cc
write DeviantArt refresh-tokens to cache ( #616 )
...
Writing the token is currently disabled by default and must be
enabled with 'extractor.oauth.cache'.
'extractor.deviantart.refresh-token' must be set to '"cache"'
to use the cached token.
5 years ago
Mike Fährmann
2a4f227e08
warn about expired cookies
5 years ago
Mike Fährmann
34887ae139
fix bugs in DatabaseCacheDecorator.update()/.invalidate()
...
- call db.commit() after changes have been made
- remove 'LIMIT 1' from the DELETE statement in invalidate()
(only available if SQLite3 was compiled with the right flags
enabled, syntax error otherwise)
5 years ago
Mike Fährmann
380b693fad
[downloader:http] add more MIME types for '.bmp' files ( #621 )
5 years ago
Mike Fährmann
4e361b3008
add tests for specific datetime values
5 years ago
Mike Fährmann
80ecb99089
[hitomi] fix extraction
5 years ago
Mike Fährmann
247c9e1416
[vsco] update gallery URL pattern
5 years ago
Mike Fährmann
19ae6f3fc4
update test results
...
- twitter:
Don't test the whole kwdict, only the actual content, since the
keyword hash changes whenever that user changes his display name.
- khinsider:
Download host changed
5 years ago
Mike Fährmann
cc5079c844
[hiperdex] add chapter and manga extractors ( closes #606 )
5 years ago
Mike Fährmann
64bdec8430
[deviantart] check availability of intermediary URLs ( fixes #609 )
5 years ago
Mike Fährmann
5607dd3646
[hitomi] follow multiple redirects
5 years ago
Mike Fährmann
765b2a0527
[hentaihand] add extractors ( closes #605 )
5 years ago
Mike Fährmann
d94215d119
[tumblr] replace '-' with ' ' in tag searches ( fixes #611 )
...
To search for tags with actual minus signs in them
(there shouldn't be too many,) manually replace those
with url-encoded minus characters ('-' -> '%2d')
before inputting them into gallery-dl:
https://s679874.tumblr.com/tagged/tag-with-minus
->
https://s679874.tumblr.com/tagged/tag%2dwith%2dminus
5 years ago
Mike Fährmann
5cdf1b1319
fix --verbose/--quiet
...
caused by 383795b
5 years ago
Mike Fährmann
78e8d33c97
release version 1.13.0
5 years ago
Mike Fährmann
e6cd49e78b
update extractor test results
5 years ago
Mike Fährmann
90e4c645ba
[formatter] allow multiple "special" format specifiers ( #595 )
...
It is now, for example, possible to specify multiple replacement
operations per format replacement field: {name:Ra/b/Rc/d/}
5 years ago
Mike Fährmann
5d9437b398
[vsco] skip "invalid" entities
5 years ago
Mike Fährmann
650f2b6d58
[furaffinity] accept sfw.furaffinity.net URLs ( closes #608 )
...
Just as an alias for regular URLs with no extra content filtering.
5 years ago
Mike Fährmann
219c4cc78c
[formatter] allow for numeric list and string indices
5 years ago
Mike Fährmann
7d1da614d9
[formatter] implement field name alternatives ( #525 )
...
The format string '{a|b|c}' will now try to use the value from 'a' and
fall back to 'b' and 'c' if accessing a field raises an exception or
if its value is None.
5 years ago
Mike Fährmann
74e684e828
[twitter] change default value for 'videos' to 'true'
...
Every other 'videos' option defaulted to 'true', except Twitter.
5 years ago
Mike Fährmann
c7cf9dd111
[furaffinity] support classic layout ( #284 )
5 years ago
Mike Fährmann
138135c190
[furaffinity] add extractors ( #284 )
5 years ago
Mike Fährmann
b9c574bd1d
[patreon] log skipped files ( #590 )
5 years ago
Mike Fährmann
80ea9104b8
[8kun] adjust URL pattern
5 years ago
Mike Fährmann
c76c8b765a
[cloudflare] unescape challenge URL
5 years ago
Mike Fährmann
ce26070231
[pixiv] reduce calls to '/user/detail'
5 years ago
Mike Fährmann
da0d5f6092
[oauth] add 'port' option ( #604 )
5 years ago
Mike Fährmann
719b63d0ca
[bcy] add user and post extractors ( #592 )
5 years ago
Mike Fährmann
6426e3efc7
[khinsider] fix and improve metadata extraction
5 years ago
Mike Fährmann
4a3d2405de
[postprocessor:ugoira] small optimization
...
Use tuples instead of lists when extending the list of
command-line arguments.
5 years ago
Mike Fährmann
b7eb6cecbb
[pixiv] handle tags at the end of new bookmark URLs
5 years ago
Mike Fährmann
109f6c8685
[patreon] filter duplicate files per post ( #590 )
5 years ago
Mike Fährmann
b38cf59711
[sexcom] fix image URLs & parse 'date' fields
5 years ago
Mike Fährmann
1f4c9c5f9d
[8kun] add thread and board extractors ( closes #582 )
5 years ago
Mike Fährmann
facc5daa6d
[twitter] force old login page layout ( fixes #584 , fixes #598 )
5 years ago
Mike Fährmann
d1de7dc296
[hitomi] implement workaround for "broken" redirects
...
Some galleries redirect to a new "version" with different gallery id.
This new version might not be available any more, but the /reader/
page for the original gallery id can still work.
5 years ago
Mike Fährmann
40fe062851
[pixiv] fix user id for bookmarks API calls ( closes #596 )
5 years ago
Mike Fährmann
91aaaf1a9e
[pixiv] add 'rating' metadata field ( #595 )
...
A human-friendlier representation of 'x_restrict'
5 years ago
Mike Fährmann
dff33b260c
[reddit] add 'videos' option
5 years ago
Mike Fährmann
2ad43618cc
[piczel] fix extraction
5 years ago
Mike Fährmann
cf7a67d67f
[yaplog] remove module
...
Yaplog! ended its service on 2020-01-31
5 years ago
Mike Fährmann
e0dd073ce0
[twitter] replace embedded tweet test
...
the old one was deleted
5 years ago
Mike Fährmann
ec36df4851
[deviantart] fix video extraction from 'extended_fetch' results
...
DeviantArt is now serving videos from wixmp servers (1), instead of
the former film00.deviantart.com (2), even though those URLS are still
functional.
They seem to also have re-encoded those videos. The 10 MB 1080p video
from (2) is now only available in 720p at ~20 MB (with a higher
bitrate, but still …). Other videos are still available in 1080p, but
not this one for some reason.
(Changing the '720p' in (1) to '1080p' doesn't work.)
(1) https://wixmp-ed30a86b8c4ca887773594c2.wixmp.com/v/mp4/9feaa2c9-1baf-4fc2-84f7-f3384b34cefe/d5gxnb5-282a2e9a-b552-40ff-8542-b3c5eed823f5.720p.a837d7cec12c41be8ca2ee53152cea3a.mp4
(2) https://film00.deviantart.net/4c1d/v/mp4/2012/279/d/1/_video____brushes_i_use_in_paint_tool_sai_by_chi_u-d5gxnb5.mp4
5 years ago
Mike Fährmann
48be2266ed
[deviantart] better error message for 'extended_fetch' ( #585 )
5 years ago
Mike Fährmann
383795b550
prevent superfluous calls to Logger.makeRecord()
...
… by setting an appropriate minimal logging level for the root Logger.
5 years ago
Mike Fährmann
71851a6241
[pixiv] update URLs of followed users to the new format
5 years ago
Mike Fährmann
d086f30b42
[reddit] restore archive keys for i.redd.it images
5 years ago
Mike Fährmann
56f1c96168
implement 'parent-directory' option ( #551 )
5 years ago
Mike Fährmann
ae07f92f7e
[reddit] rewrite extractor logic ( closes #551 )
...
Handle images and videos hosted on Reddit "natively",
allowing them to use reddit-specific metadata to build directory
and file names.
5 years ago
Mike Fährmann
2852691d78
[paheal] replace test URL
...
searching for 'k-on' doesn't yield any results anymore
5 years ago
Mike Fährmann
2a9be48511
improve util.load/save_cookiestxt() and add tests
...
- take a file object as argument instead of an filename
- accept whitespace before comments (" # comment")
- map expiration "0" to None and not the number 0
5 years ago
Mike Fährmann
e35c2ea1a6
[weibo] use youtube-dl to download from m3u8 manifests
5 years ago
Mike Fährmann
6703b8a86b
[blogger] implement video extraction ( closes #587 )
5 years ago
Mike Fährmann
c1a6862863
implement functions to load/save cookies.txt files ( closes #586 )
...
The methods of the standard libraries' MozillaCookieJar have
several shortcomings (#HttpOnly_ cookies, 0 expiration timestamps, etc.)
and require construction of an ultimately pointless CookieJar object.
5 years ago
Mike Fährmann
5d73b7f29c
release version 1.12.3
5 years ago
Mike Fährmann
37247dbaff
miscellaneous fixes
5 years ago
Mike Fährmann
0e9dc5c88e
fix AttributeError when accessing 'temppath'
...
[ci skip]
5 years ago
Mike Fährmann
25d5ec4ff3
[twitter] add option to extract TwitPic embeds ( #579 )
5 years ago
Mike Fährmann
254f7c3999
implement a post-processor module to compare file versions
...
(#530 )
5 years ago
Mike Fährmann
32d7195d08
[pinterest] improve detection of invalid pin.it links
5 years ago
Mike Fährmann
0b84068d84
remove temp files before downloading from fallback URLs
...
otherwise the next call to download() with a fallback URL could see
the partially downloaded "remains" from the previous, failed download
attempt and "continue" it, writing the second half of a potentially
different version of that file.
5 years ago
Mike Fährmann
760b9b4db4
add remove_file() and remove_directory() helpers
...
these functions call os.unlink() or os.rmdir()
while catching and suppressing potential OSErrors
5 years ago
Mike Fährmann
b2d542ad40
improve PathFormat._enum_file()
...
open only one try-except block for the whole loop,
instead of one for each iteration in os.path.exists()
5 years ago
Mike Fährmann
174117f827
allow multiple hashes for content tests
5 years ago
Alice
f498a9057f
[twitter] Fix stop before real end ( #573 )
...
* [twitter] Fix stop before real end
Fix for https://github.com/mikf/gallery-dl/issues/544 . Makes sure that it really reached the end by checking that both "min_position" is null and "has_more_items" is false before stopping.
* [twitter] Fix stop before real end (update)
5 years ago
Mike Fährmann
8bb32ee188
[hitomi] fix image URLs
5 years ago
Mike Fährmann
bd5ce9855c
allow GalleryExtractors to set URL-independent extensions
5 years ago
Mike Fährmann
af42c75152
[mangadex] revert domain to 'mangadex.org'
5 years ago
Mike Fährmann
200aea308a
[downloader:common] enable 'job'/'extractor' for logging messages
...
(#574 )
5 years ago
Mike Fährmann
e89413da22
update test results
5 years ago
Mike Fährmann
33a6e0ac6e
[hentaifoundry] extract more metadata ( closes #565 )
5 years ago
Mike Fährmann
5cac79c3d9
[erolord] remove extractor
5 years ago
Mike Fährmann
b9cbf932b4
[pixiv] update URL patterns ( fixes #568 )
...
Pixiv now uses new URLs for
- user profiles and illustration listings:
- https://www.pixiv.net/en/users/ <ID>
- https://www.pixiv.net/en/users/ <ID>/artworks
- bookmarks:
- https://www.pixiv.net/en/users/ <ID>/bookmarks/artworks
5 years ago
Mike Fährmann
9d369de592
release version 1.12.2
5 years ago
Mike Fährmann
988cc2ec23
[mangadex] change domain to mangadex.cc ( closes #559 )
5 years ago
Mike Fährmann
f8e137d6b4
[deviantart] show warning about private deviations only once
...
… per call to '_pagination()'
5 years ago
Mike Fährmann
939fec8ecd
[deviantart] match new search/popular URLs ( closes #538 )
5 years ago
Mike Fährmann
09cc88b715
[deviantart] match '/favourites/all' URLs ( closes #555 )
5 years ago
Mike Fährmann
3811fd8a25
fix time formatting for Python 3.4 and 3.5
...
'datetime.time.isoformat()' only has an optional 'timespec' argument
since Python 3.6.
5 years ago
Mike Fährmann
43ab9572b4
[twitter] handle API rate limits ( #526 )
5 years ago
Mike Fährmann
569747a78d
implement extractor.wait()
5 years ago
Mike Fährmann
5532e9c158
[twitter] handle quoted tweets ( #526 )
...
… and categorize them as retweets
5 years ago
Mike Fährmann
0b4cb8e57a
[mangahere] send 'isAdult' cookie ( fixes #556 )
5 years ago
Mike Fährmann
025f6e3398
add fallback for missing WITHOUT ROWID support ( #553 )
5 years ago
Mike Fährmann
87c8b89ddd
[postprocessor:metadata] add 'directory' option ( #520 )
5 years ago
Mike Fährmann
1afb91363c
[imagefap] generalize URL patterns and add tests ( #552 )
5 years ago
Xope Totec
f701e9f33a
Handle beta.imagefap.com URLs ( #552 )
5 years ago
Mike Fährmann
ce54b8c04c
let extractors opt-out of cookie option usage
...
useful to avoid sending unnecessary cookies when all authentication
is done through OAuth tokens
5 years ago
Mike Fährmann
5ad92fc196
[newgrounds] fix tags metadata extraction
5 years ago
Mike Fährmann
82f7f4172a
update test results
5 years ago
Mike Fährmann
1f2a69f3c5
add '_extractor' information to redirect results
5 years ago
Mike Fährmann
2d4887b75b
improve KeywordJob output for "parent" extractors ( closes #548 )
5 years ago
Mike Fährmann
a27f43dad1
[pixiv] wait and retry after rate limit error ( closes #535 )
5 years ago
Mike Fährmann
6b373cb7e2
[exhentai] restrict default directory name length ( #545 )
5 years ago
Mike Fährmann
b347bf68c7
[deviantart] add extractor for followed users ( #515 )
5 years ago
Mike Fährmann
c0f391a4e2
[pixiv] support listing followed users ( #515 )
5 years ago
Mike Fährmann
2e2fc7f0ad
prevent infinite recursion when spawning extractors ( closes #489 )
5 years ago
Mike Fährmann
896896a490
[twitter] fix URLs forwarded to youtube-dl ( closes #540 )
...
Since commit 3bba763
data["user"] is an entire dict object
and no longer just the user nickname …
5 years ago
Mike Fährmann
1e2713b895
[artstation] fix search result pagination ( closes #537 )
5 years ago
Mike Fährmann
bf3df3d0b0
[directlink] send Referer headers ( closes #536 )
5 years ago
Mike Fährmann
83909ab5d4
release version 1.12.1
5 years ago
Mike Fährmann
9be7ff600e
[imagetwist] replace test image
...
the old one expired, it seems
5 years ago
Mike Fährmann
66905b1664
[foolslide] add fallback for chapter data extraction
5 years ago
Mike Fährmann
48e42e73fb
[reddit] change default value for 'comments' to '0'
5 years ago
Mike Fährmann
9c0928457a
[reddit] fix errors with 't1_…' submissions
5 years ago
Mike Fährmann
58391d492d
cache archive keys generated in __contains__() ( #524 )
...
To avoid writing a different key to the archive than what was checked
against before the file download.
5 years ago
Mike Fährmann
bf658fd84b
[vsco] implement 'videos' option
5 years ago
Mike Fährmann
95c90722ee
[instagram] implement 'videos' option ( closes #521 )
5 years ago
Mike Fährmann
1921c127a5
make OSErrors during file downloads nonfatal ( closes #512 )
...
… except ENOSPC (No space left on device), since there is no reason to
continue downloading in that case.
All other errors that would prevent downloading data and writing it to
disk get already raised during directory creation and are therefore not
checked here.
5 years ago
Mike Fährmann
d0920e84e9
update test results
5 years ago
Mike Fährmann
8c11e81c9f
Merge commit '63e6993716db8d8bedfb7b0d445c7161493046b6'
5 years ago
Mike Fährmann
63e6993716
merge 'bypost' functionality into metadata postprocessor
5 years ago
Mike Fährmann
31a29835ff
[realbooru] simplify extractors and update tests ( #514 )
5 years ago
The Oddball
9a4ce20b8e
[realbooru] Add Realbooru extractor ( #514 )
5 years ago
Mike Fährmann
f9e74320de
retain trailing zeroes in Cloudflare challenge answers
5 years ago
Mike Fährmann
72b8fbfbad
[instagram] make post-page extraction nonfatal
5 years ago
Mike Fährmann
922b8a9595
[weibo] raise NotFoundError for unavailable/deleted statuses
5 years ago
Mike Fährmann
0cd157300e
[patreon] fix regex pattern for posts
...
The previous one would match the first number in the URL slug as
post ID, which would fail for posts with numbers in their title.
5 years ago
Mike Fährmann
fe19e233f3
[xvideos] improve
...
- derive from GalleryExtractor
- match '…-channels' URLs
- "better" metadata structure
5 years ago
Mike Fährmann
d3e44e899d
raise NotFoundErrors for 404 responses in GalleryExtractors
5 years ago
Mike Fährmann
a4dd8b3dab
improve _check_cookies()
...
Only loop over all cookies once instead of calling
cookiejar._find() for each cookie name.
5 years ago
Mike Fährmann
76e60d10a6
[patreon] raise proper exception if creator/post doesn't exist
5 years ago
Mike Fährmann
9e63804347
[patreon] make retrieving user info nonfatal ( #508 )
...
… and fall back to the included data if an error occurs.
5 years ago
Mike Fährmann
964dc57286
[vsco] improve image resolutions
...
https://im.vsco.co/ URLs redirect to the appropriate CDN server
and occasionally insert a '/1200x1600/' into the image path,
limiting image dimensions.
This commit constructs redirect targets out of the given
im,vsco.co URLs without sending extra HTTP requests
and without any "builtin" resolution restrictions.
5 years ago
Mike Fährmann
0629fe8fa4
[vsco] fix user profile extraction … again
...
Given the pattern from last time, collections will also change
in due time and use cursor-based pagination.
5 years ago
Mike Fährmann
ab17ea9632
[deviantart] only print warning if 'original' is enabled
5 years ago
Mike Fährmann
2188db6284
[gelbooru] fix non-API tag extraction
5 years ago
Mike Fährmann
c4702ec9b6
simplify some logging calls
5 years ago
Gio
c0b9ad678d
Separate metadata from handle_url into handle_metadata, commenting
5 years ago
Mike Fährmann
c9ef1b21c3
[patreon] get partial user info without /api/user/<id> ( #507 )
...
It's a lot less data, but doesn't invoke any additional
HTTP requests with potential Cloudflare CAPTCHAs.
5 years ago
Mike Fährmann
0ab9bb1721
[4chan] add extractor for entire boards ( closes #510 )
5 years ago
Mike Fährmann
c59b98c81b
[downloader:http] improve rate limit handling
...
- Move the download "logic" with rate limit checks into its own
method that only gets used if a rate limit should be enforced
- Fix an issue where suspending gallery-dl during a download would
basically ignore the rate limit for the remaining download when
resuming its execution.
5 years ago
Mike Fährmann
bbbafc1c24
[downloader:http] catch both possible SSLException instances
...
With pyOpenSSL installed, but disabled, the SSLError exception
would be set to the one from pyOpenSSL, which could never get raised.
This commit solves this problem by catching both, the native SSLError
exception as well as the one from pyOpenSSL (if available.1)
5 years ago
Gio
c20bb5c338
Naming convention, as per travis.
5 years ago
Gio
6ed4fc07ff
Don't print intentional metadata skips to the console.
5 years ago
Gio
cfc70a97ab
Added an additional channel for downloading the metadata of an entire post or gallery.
5 years ago
Mike Fährmann
f451be48c3
release version 1.12.0
5 years ago
Mike Fährmann
15f9bb3d14
add option to disable pyOpenSSL usage ( #508 )
...
(pyOpenSSL is now disabled by default)
5 years ago
Mike Fährmann
c8e99e3b3b
[deviantart] fix crash on missing "token" field ( #505 )
5 years ago
Mike Fährmann
6ed2c7823c
[deviantart] disable original downloads if no cookies set
...
For 'deviation' and 'scraps' extractors only, since original file
downloads for those two will always fail with a 404 Not Found
when not logged in.
5 years ago
Mike Fährmann
50deab5265
[deviantart] fix URL generation from /extended_fetch results
...
(closes #505 )
5 years ago
Mike Fährmann
1f209da4c0
[pixiv] match new search URLs ( closes #507 )
5 years ago
Mike Fährmann
e17907ee2a
change default value of 'cookies-update' to 'true'
5 years ago
Mike Fährmann
07dafad26d
[twitter] attempt to fix infinite loops ( #499 )
...
(Hopefully this doesn't break anything else)
5 years ago
Mike Fährmann
71acbdabf4
[2chan] fix metadata extraction
5 years ago
Mike Fährmann
c0a1241648
[livedoor] force https:// for image URLs
5 years ago
Mike Fährmann
6e23c0da09
[imgur] add extractor for subreddit links ( closes #500 )
5 years ago
Mike Fährmann
38c05df290
[oauth] add custom/default indicator to log messages ( #501 )
5 years ago
Mike Fährmann
372ffe95ee
[oauth] adjust Flickr redirect URI ( fixes #503 )
...
Flickr now automatically forces https:// for all redirect URIs.
5 years ago
Mike Fährmann
004812258d
[hentaifox] fix extraction
5 years ago
Mike Fährmann
e2710702d4
fix Cloudflare bypss
5 years ago
Mike Fährmann
8759403f37
[plurk] add delay between comment requests
5 years ago
Mike Fährmann
a28552fd19
update test results
...
- hbrowse: one tag got removed
- mangoxo: gallery changed owner
- photobucket: ?, but photo still downloads
5 years ago
Mike Fährmann
dcaa3d01bd
[imagefap] adapt to new image URL format
5 years ago
Mike Fährmann
e62c209ca0
[nijie] fix 'date' parsing
5 years ago
Mike Fährmann
3bba763ab9
[twitter] improve
...
- update metadata structure
- combine all user… entries into their own dict
- let 'user' always specify the Timeline owner
- add 'author' entry that specifies the original Tweet author
- create directories per post (closes #491 )
- fix username issues with /i/web/ URLs
5 years ago
Mike Fährmann
26d2334550
[postprocessor:metadata] rename 'format' to 'content-format'
...
Just to be consistent with the other 'extension-format' option name,
and only 'format' is also still accepted.
5 years ago
Mike Fährmann
a412531451
[postprocessor:metadata] implement 'extension-format' option
...
closes #477
5 years ago
Mike Fährmann
0f1538af78
split filename formatting into its own function
5 years ago
Mike Fährmann
db35c3b581
[directlink] separate filenames from paths
...
With this, all default filename formats specify an '{extension}'
and PathFormat.set_extension() reliably works for all files.
5 years ago
Mike Fährmann
41a3169c67
[foolfuuka] use '{extension}' in default filename format
5 years ago
Mike Fährmann
e9aed62c91
[imgur] unescape image titles
5 years ago
Mike Fährmann
bca2222559
add '--exec-after'
5 years ago
Mike Fährmann
ed6592ea1a
remove '--abort-on-skip'
5 years ago
Mike Fährmann
2c332edaad
[plurk] fix comment pagination
5 years ago
Mike Fährmann
a3fa45bbb1
[behance] get images from 'media_collection' modules
5 years ago
Mike Fährmann
359c3bc1c5
[deviantart] revert to getting download URLs from OAuth API
...
This commit (partially) reverts 27b5b24
, 94eb7c6
, and a437e78
.
Download URLs from the 'extended_fetch' endpoint are now only
usable for logged in users, while those from the respective
OAuth API endpoint are working again. Everything except
scraps and direct deviation links should be fixed, and those
two categories will work with exported cookies. (#488 )
TODO:
- "native" login with --username and --password
- better handling of internally stored cookies
5 years ago
Mike Fährmann
42b9633c7e
update test results
5 years ago
Mike Fährmann
b28bd1c73e
[bobx] set generated session cookie ( closes #482 )
...
This reverts commit 490831f
and also restores original image downloads
by setting a randomly generated session cookie. No login required.
5 years ago
Mike Fährmann
ae09f87602
improve SharedConfigMixin config lookups
5 years ago
Mike Fährmann
b5c964332b
improve config.py test coverage
5 years ago
Mike Fährmann
f5604492c3
update interface of config functions
5 years ago
Mike Fährmann
4ca883c66f
[smugmug] replace test for custom URLs
...
The old one (http://www.creativedogportraits.com/ ) is empty and/or
no longer handled by SmugMug.
5 years ago
Mike Fährmann
d45fabb79d
match user profile handling on deviantart and newgrounds
5 years ago
Mike Fährmann
ea80dadd09
[deviantart] restore archive keys
...
Commit 9fdc5e7
changed 'username' fields to have consistent
capitalization, but that invalidated the archive keys of several
extractors where 'username' was usually lowercase.
5 years ago
Mike Fährmann
3fc1e12949
[postprocessor:metadata] filter private entries
...
i.e. keys starting with an underscore
5 years ago
Mike Fährmann
ea094692c8
[vsco] fix collection extraction ( #480 )
5 years ago
Mike Fährmann
490831f84a
[bobx] "fix" image download URLs
...
Access to original images got restricted to (paid) members only.
All that's publicly accessible now are essentially preview pictures.
5 years ago
Mike Fährmann
978cb03f81
update misc test results
...
- Livedoor now uses https:// for its image URLs
- Instagram image URLs got simplified
5 years ago
Mike Fährmann
fca87974fe
[sexcom] fix video downloads by sending specific Referer headers
5 years ago
Mike Fährmann
bbbeff4c41
[downloader.http] implement file-specific HTTP headers
5 years ago
Mike Fährmann
edc080468d
[instagram] make 'video_url' fields optional ( fixes #479 )
...
[ci skip]
5 years ago
Mike Fährmann
9fdc5e74cb
[deviantart] ensure consistent username capitalization ( #455 )
...
The 'username' field was capitalized in a very inconsistent manner:
Either all lowercase, or as given by the input URL, or with the
"original" capitalization, depending on the extractor used among
other things.
Now usernames use their original capitalization for all extractors.
('UserName' instead of 'username' or 'uSeRnAmE')
5 years ago
Mike Fährmann
b1f0609de5
[newgrounds] rewrite ( #394 )
...
- restructure extractor hierarchy
- extract more metadata
- extract videos without youtube-dl
- be more resilient to errors
TODO:
- favorites
- games, but that might be near impossible for non-flash titles
5 years ago
Mike Fährmann
3ece3976ae
[newgrounds] implement login support ( #394 )
5 years ago
Mike Fährmann
3a07c06865
[newgrounds] update
...
- create directory per post
- rename variables and methods
5 years ago
Mike Fährmann
5513b66eb0
[vsco] fix user profile extraction
5 years ago
Mike Fährmann
abfcb356fc
[flickr] support 3k, 4k, 5k, and 6k photo sizes ( closes #472 )
5 years ago
Mike Fährmann
521fcd2eb9
[imgbb] fix error in galleries without user info ( closes #471 )
5 years ago
Mike Fährmann
8061263d4c
[imgbb] improve pagination logic
...
- avoid unnecessary API calls for small or empty galleries
- combine duplicate code
5 years ago
Mike Fährmann
da6789b2b0
disable unique archive id checks for some tests
...
- same image twice in a livedoor blog post
- unreliable results for related pinterest items
5 years ago
Mike Fährmann
67e54ed8ea
release version 1.11.1
5 years ago
Mike Fährmann
ce98a86c0e
fix data file inclusion in source distributions
5 years ago
Mike Fährmann
6c86fbfe2a
release version 1.11.0
5 years ago
Mike Fährmann
94a94f3b86
miscellaneous stuff
5 years ago
Mike Fährmann
b0197098e6
[imgur] get title from webpage if missing in API response
...
(closes #467 )
5 years ago
Mike Fährmann
dd5d2b2eac
[deviantart] add user profile extractor ( #377 , #419 )
5 years ago
Mike Fährmann
a437e78620
[deviantart] minimize cookie usage during scraps extraction
...
(#445 )
5 years ago
Mike Fährmann
1a197d2195
store the original cookiejar as Extractor._cookiejar
5 years ago
Mike Fährmann
de83ae4576
make 'method' argument of Extractor.request keyword-only
5 years ago
Mike Fährmann
a5be08a830
[downloader:ytdl] forward proxy settings
5 years ago
Mike Fährmann
4325695d74
[luscious] expand GraphQL queries
5 years ago
Mike Fährmann
94dbdbf506
[nijie] change default filename format
...
… to be consistent with Pixiv filenames
5 years ago
Mike Fährmann
9e88e7a344
[postprocessor:exec] improve ( #421 , #413 )
...
- add 'final' option
- include job status in pp finalization
- improve and extend documentation
5 years ago
Mike Fährmann
c18fadc221
[instagram] extract videos without youtube-dl ( #391 )
5 years ago
Mike Fährmann
f15eedb634
[sexcom] set Referer header for file downloads ( closes #464 )
5 years ago
Mike Fährmann
2a3bd4e3c7
rename extractor classes starting with a digit
5 years ago
Mike Fährmann
b3b9da6d74
[photobucket] replace test URL
...
The other user deleted all of is images.
5 years ago
Mike Fährmann
64786363be
[4chan] simplify
...
- remove 'chan.py'
- slight adjustments to directory and filenames
5 years ago
Mike Fährmann
557e2c018b
[8chan] remove module
5 years ago
Mike Fährmann
e14782a948
[instagram] simplify graphql extraction for post pages
5 years ago
Mike Fährmann
c01ff78467
[twitter] extend 'videos' option to force extraction with ytdl
...
(closes #459 )
5 years ago
Mike Fährmann
f8ac67ce50
[hitomi] extend URL pattern + follow redirects
5 years ago
Mike Fährmann
e877ca97c3
[naver] adjust directory names and metadata structure
5 years ago
Mike Fährmann
702f2fbd1f
[issuu] add publication and user extractors ( #413 )
5 years ago
Mike Fährmann
8361d874d7
[hitomi] fix extraction
5 years ago
Mike Fährmann
5fa6ff04dd
[instagram] extract '__additionalDataLoaded' ( #391 )
...
The '_sharedData' of Post pages is missing its 'graphql' part for
logged in users. This data is now included in the parameters of a
function call to '__additionalDataLoaded(...)'
And, of course, video extraction with youtube-dl broke because of
this change as well.
5 years ago
Mike Fährmann
5af291ba5c
include failed downloads and child extractors in exit status
5 years ago
Mike Fährmann
322c2e7ed4
renaming variables
...
mostly 'keyword(s)' to 'kwdict'
5 years ago
Mike Fährmann
87a87bff7e
[simplyhentai] fix image URLs
5 years ago
Mike Fährmann
4409d00141
embed error messages in StopExtraction exceptions
5 years ago
Mike Fährmann
d5e3910270
adjust 'util.raises()'
5 years ago
Mike Fährmann
d44f790e81
adjust output for HTTP status related errors
5 years ago
Mike Fährmann
03e0cec715
return with non-zero exit status on error
5 years ago
Mike Fährmann
c887493a80
overhaul exception stuff
5 years ago
Mike Fährmann
109718a5e3
[blogger] add blog and post extractors ( closes #364 )
5 years ago
Mike Fährmann
244d396b0b
add '--ugoira-conv-lossless' command-line option ( #432 )
...
and cleanup the arguments for the regular '--ugoira-conv':
- remove '-an'
- enable two-pass encoding
5 years ago
Mike Fährmann
49a6b1b6c0
[twitter] extract video stream info without youtube-dl ( #452 )
...
This should allow video downloads when logged in without
'forward-cookies' disabled and from protected tweets.
youtube-dl still gets used to download HLS playlists, but the data
extraction part, which doesn't work with youtube-dl at the moment,
now gets handled by gallery-dl itself.
5 years ago
Mike Fährmann
9f0dbf2a72
[twitter] raise proper exception for protected Tweets
5 years ago
Mike Fährmann
083e14ad9a
[downloader:ytdl] add data from '_ytdl_extra' to info_dicts
5 years ago
Mike Fährmann
6e08ada4fe
[luscious] simplify some metadata entries
5 years ago
Mike Fährmann
9e3a8607ee
[deviantart] update usernames ( #455 )
...
In the case that a user changed his username, requesting deviations
with an old name might cause problems (missing deviations, etc.)
The internal 'username' value therefore now gets updated to the
current username taken from the user profile.
5 years ago
Mike Fährmann
2eb38810c5
[twitter] fix image extraction when logged in ( #452 )
...
... for individual tweets.
To get a Tweet page with the old Twitter layout, an Internet
Explorer User-Agent (e.g. Mozilla/5.0 (Windows NT 6.1; WOW64;
Trident/7.0; rv:11.0) like Gecko) as well as a Referer header
pointing to the page itself is required. The "app_shell_visited"
cookie appears to be optional at the moment, but that is what
a regular web browser would send.
5 years ago
Mike Fährmann
8f38a35b91
[imgur] use API with "public" client_id ( #446 )
...
Using the API endpoints makes it possible to access NSFW content
without logging in.
5 years ago
Mike Fährmann
b23c822b23
[luscious] use GraphQL
5 years ago
Mike Fährmann
ef17d94469
update test results
5 years ago
Mike Fährmann
2057c6ba29
[naver] add blog and post extractors ( closes #447 )
5 years ago
Mike Fährmann
389d2d7e38
implement 'cookies-update' option ( #445 )
5 years ago
Mike Fährmann
fbc0a6a059
[nozomi] skip unavailable posts ( #388 )
5 years ago
Mike Fährmann
ae98dbcbb3
[nozomi] implement searching for negated terms ( #388 )
...
It's incredibly slow and resource intensive (> 1GB of memory),
but that is also how it is implemented on nozomi.la itself.
5 years ago
Mike Fährmann
1c03a389df
[twitter] small improvements to search extractor
...
- put search results in separate directories
- set 'max_position' to '-1' for first request
-> prevent duplicate results
- add a test
- flake8
5 years ago
Mike Fährmann
c3042978b8
[deviantart] match "/gallery/all" ( closes #449 )
5 years ago
Alice
bcddcca6db
Add search downloading to twitter.py ( #448 )
...
Adds the functionality to download search results on twitter.com/search. Since twitter only allows downloading of up to 3,200 of a users most recent tweets, you will be unable to download old images from users with a lot of tweets. To bypass this, you can use the twitter search to get the tweets from the sections in time you were stopped at. An example search would be "from:user since:2015-01-01 until:2016-01-01 filter:images". The URL you would use will look something like this https://twitter.com/search?f=tweets&q=from%3Asupernaturepics%20since%3A2015-01-01%20until%3A2016-01-01%20filter%3Aimages&src=typd&lang=en
The _tweets_from_api function had to be changed because it would not get the next page of results using the last "data-tweet-id". It would return the same JSON but with a "min_position" string added. Using this string for the "max_position" param from the second page onwards correctly returned the next pages. This change does not interfere with how the other extractors work as far as I know. The 2 regex patterns in the extractors had to be changed to not match the search URL.
5 years ago
Mike Fährmann
1693d97bd3
update extractor class hierarchies
...
- let the GalleryExtractor class inherit directly from Extractor
- make ChapterExtractor a subclass of GalleryExtractor
- change enumeration field names of GalleryExtractors to 'num'
5 years ago
Mike Fährmann
7ebd984e8d
[imgur] print error message if no JSON data is found ( #446 )
5 years ago
Mike Fährmann
5882b00f2f
[imgur] implement login support ( #446 )
5 years ago
Mike Fährmann
91643ca54b
[nozomi] add search extractor ( #388 )
5 years ago
Mike Fährmann
df2b3c6888
restore OAuth2 authentication error messages
5 years ago
Mike Fährmann
6779512fc7
[nozomi] add post and tag extractors ( #388 )
5 years ago
Mike Fährmann
6abe5f5bbb
[patreon] fix pagination ( #444 )
...
The Patreon-provided URLs for the next set of posts aren't
always complete, i.e. they can be missing their scheme and
the subsequent double slash: "www.patreon.com/…"
5 years ago
Mike Fährmann
ff1e4a86aa
release version 1.10.6
5 years ago
Mike Fährmann
d4ffd6c952
[yaplog] improve metadata extraction ( #443 )
...
- provide a fallback if there is no numerical image ID
- add a 'filename' field
- convert 'date' to an actual datetime object
5 years ago
Mike Fährmann
15af2f8464
[hitomi] fallback to /reader/ page if main page returns 404
...
Some galleries return a 404: Not Found error when trying to access
them through the main gallery URL, but their content is still
available on the respective /reader/ page.
5 years ago
Mike Fährmann
8af59a4bba
fix & update docs
...
- update Requests links
- add example for --exec
- set '-dev' version
5 years ago
Mike Fährmann
dc6ad81e2e
[yaplog] prevent crash on empty posts ( #443 )
5 years ago
Mike Fährmann
94eb7c6cad
[deviantart] fix sta.sh extraction (436)
5 years ago
Mike Fährmann
1032cfa34b
[downloader:http] extend mimetype map with archive formats
5 years ago
Mike Fährmann
27b5b2497e
[deviantart] fix download URLs ( #436 )
...
... except for sta.sh content.
Instead of using the old '/api/v1/oauth2/deviation/download' endpoint,
which started delivering URLs to 404 pages a while ago,
it is also possible to get a download URL from the relatively new
'/_napi/da-browse/shared_api/deviation/extended_fetch' endpoint
used by DeviantArt's Eclipse interface.
The current strategy is therefore:
- Iterate over deviations using the OAuth2 API
- Fetch original download URLs with the new NAPI/Shared API
5 years ago
Mike Fährmann
93aac8dfea
[yaplog] fix incomplete image URLs ( #443 )
5 years ago
Mike Fährmann
a782b009b8
[yaplog] match blog names with '-' ( #443 )
5 years ago
Mike Fährmann
cf5e716b9d
[hitomi] fix image URLs
5 years ago
Mike Fährmann
ad81c07204
[postprocessor] match logger names of downloader modules
...
The logger name for a postprocessor object got changed to
"postprocessor.<module-name>" instead of just
"postprocessor"
5 years ago
Mike Fährmann
03bc8adfc7
[postprocessor:exec] run after file moved to target location
...
(#421 )
5 years ago
Mike Fährmann
35958bebd4
[postprocessor:exec] fix filename quoting on Windows ( #421 )
5 years ago
Mike Fährmann
b06c372e4d
[postprocessor:exec] improve; add command-line option ( #421 )
5 years ago
Mike Fährmann
5a54efa025
[xhamster] unescape 'title' and 'description'
5 years ago
Mike Fährmann
1b9bf4fc6e
[behance] fix 'tags' extraction
5 years ago
Mike Fährmann
bb97e87989
[komikcast] ignore banner image
5 years ago
Mike Fährmann
0ff90a3f7d
[gfycat] include title in default filenames ( closes #434 )
5 years ago
Mike Fährmann
fabdc3b0c6
release version 1.10.5
5 years ago
Mike Fährmann
de4e2029d1
[nsfwalbum] update test album
...
the old one is no longer available
5 years ago
Mike Fährmann
1faec285d1
[nijie] further improvements ( closes #423 )
...
- provide a 'user_name' metadata field
- usually the same as 'artist_id', except for favorite downloads
- extract the whole description text and properly escape HTML entities
- fixed an issue with titles or tags containing double quotes
5 years ago
Mike Fährmann
6d0a533d68
[reddit] respect 'comments:0' for single submissions ( #429 )
5 years ago
Mike Fährmann
803d8f814e
[oauth] update scope for reddit tokens ( #428 )
...
'/user/<username>/...' requires the 'history' scope to be accessible
(https://www.reddit.com/dev/api/#GET_user_{username}_{where} )
5 years ago
Mike Fährmann
46ba173ded
[reddit] fix documentation inconsistencies ( closes #429 )
...
- Require 'reddit.comments' to be a number and convert it to an
integer to be extra sure
- Link to the README's OAuth section were appropriate
5 years ago
Mike Fährmann
20eb6c401f
[nijie] improvements and fixes ( #423 )
...
- ignore unavailable image pages
- more metadata fields: artist_name, date, tags
- rename 'index' to 'num'
- improved code structure
5 years ago
Mike Fährmann
d1ea08c67d
[weibo] fixes and improvements
...
- ignore unavailable videos (fixes #427 )
- handle empty 'geo' fields
- consistent metadata fields for images and videos
5 years ago
Mike Fährmann
38d97f3da6
[deviantart] add debug message about API credentials ( #424 )
5 years ago
Mike Fährmann
80c2104fb5
[deviantart] fix 429 handling if 'fatal' is False ( closes #424 )
5 years ago
Mike Fährmann
913460240d
[reddit] fix 'extractor.blacklist()' arguments
...
The second argument must support 'append()'.
5 years ago
Mike Fährmann
22bac14452
[pixiv] match '/artworks/' URLs
5 years ago
Mike Fährmann
66cac207ac
[twitter] match and use 'i/web' status URLs
5 years ago
Mike Fährmann
946f2751e2
[reddit] add 'user' extractor ( closes #350 )
5 years ago
Mike Fährmann
c14abb9fb8
[reddit] improve URL parameter handling for subreddit links
5 years ago
Mike Fährmann
ee8b654464
[instagram] implement 'highlights' option ( closes #329 )
5 years ago
Mike Fährmann
f63c3097a9
[instagram] rework some code paths
...
- combine fetching an HTML page and extracting its 'shared_data'
- move 'shared_data' and field access info out of '_extract_page()'
- introduce a '_request_graphql()' method
5 years ago
Mike Fährmann
4330133114
[imgur] add 'favorite' extractor ( closes #420 )
...
… and use a newer site-internal API endpoint for user posts
5 years ago
Mike Fährmann
ee5e20221f
[imgth] fix image URLs
5 years ago
Mike Fährmann
b63b126808
[hentaicafe] extend URL pattern
5 years ago
Mike Fährmann
d780f0357e
[imgur] add user extractor
5 years ago
Mike Fährmann
11ea689013
[simplyhentai] fix image and video URLs
5 years ago
Mike Fährmann
15632a1570
[tsumino] fix extraction
5 years ago
Mike Fährmann
d92802fd37
[luscious] fix detection of unavailable galleries
5 years ago
Mike Fährmann
f99da2b866
[imgbb] detect invalid album and user profile links
...
and update test results, since the old album got deleted
5 years ago
Mike Fährmann
01bc7adadc
[deviantart] improve journal detection ( #419 )
...
Some journal-like posts are not reported to be journals (isJournal
is set to False), even though they have a textContent field.
https://www.deviantart.com/gliitchlord/art/brashstrokes-812942668
5 years ago
Mike Fährmann
776e9e073f
close archive on job completion ( #417 )
5 years ago
Mike Fährmann
5ac9732adc
call 'sys.exit()' on Ctrl+c
5 years ago
Mike Fährmann
9178b54eae
handle errors when opening download archive file ( #417 )
5 years ago
Mike Fährmann
6e12907de6
[deviantart] improve handling of private deviations ( #414 )
...
- don't try to call '/deviation/metadata' with an empty list of
deviation ids
- print a warning when detecting private deviations without having
a 'refresh-token'
5 years ago
Mike Fährmann
4203931d79
release version 1.10.4
5 years ago
Mike Fährmann
e7690ac694
[vsco] update URL pattern ( closes #410 )
5 years ago
Mike Fährmann
1848788970
update test results etc
5 years ago
Mike Fährmann
d5fbb2d9de
[tumblr] ignore audio links from Spotify etc.
5 years ago
Mike Fährmann
b1cddce865
Revert "[simplyhentai] fix extraction; remove image+video extractors"
...
This reverts commit d1db5180ab
.
5 years ago
Mike Fährmann
d23660c04d
[hentaicafe] restore default 'request()' behavior
5 years ago
Mike Fährmann
9ae58a6b3e
[exhentai] update image limit checks
...
- adjust cost of original images
- delay limit initialization until gallery and first image page have
been requested and all cookies are available
5 years ago
Mike Fährmann
6fe9a134bf
[lineblog] add blog and post extractors ( closes #404 )
5 years ago
Mike Fährmann
4e8a548a61
[livedoor] update metadata extraction
5 years ago
Mike Fährmann
f9285f99e6
[pixiv] fix authentication
5 years ago
Mike Fährmann
6f3df3999a
[fuskator] add gallery and search extractor ( closes #407 )
5 years ago
Mike Fährmann
bc0ca66c99
[twitter] small improvements
...
- handle reply tweets (#403 )
- unset cookies in Tweet extractor to "force" the legacy interface
5 years ago
Mike Fährmann
682105b8ee
prevent crash when loading unavailable downloader ( #405 )
5 years ago
Mike Fährmann
5fcebb69c2
[postprocessor:ugoira] improve error messages ( #406 )
5 years ago
Mike Fährmann
f02a768b5c
[danbooru] add 'ugoira' option ( #406 )
...
to choose between ZIP archives or converted video files
for Ugoira posts
5 years ago
Mike Fährmann
9646ccb320
release version 1.10.3
5 years ago
Mike Fährmann
dedea3b4db
[deviantart] fix journal creation ( #400 )
5 years ago
Mike Fährmann
c6c5cb1898
improve 'deviantart.quality' description
5 years ago
Mike Fährmann
8eaae58045
[downloader:http] change log message level to 'debug'
5 years ago
Mike Fährmann
efb64ad031
[deviantart] generate filenames ( #392 , #400 )
5 years ago
Mike Fährmann
0ce98169b8
improve path generation
...
- fix 'abspath()' results for Python <3.7 (closes #402 )
- 'abspath()' in Python 3.7+ removes trailing path separators
- in Python <3.7 it doesn't
- filter empty path segments
5 years ago
Mike Fährmann
b2151f3928
[seiga] support mobile URLs ( closes #401 )
5 years ago
Mike Fährmann
20fd2d8450
[flickr] skip unavailable images/videos ( fixes #398 )
5 years ago
Mike Fährmann
60c8e090da
[postprocessor:zip] fix archive names ( closes #397 )
...
Remove the trailing path separator introduced in 3284c62
before
adding the archive's filename extension.
[ci skip]
5 years ago
Mike Fährmann
7c09545f70
[downloader:ytdl] add 'outtmpl' option ( #395 )
5 years ago
Mike Fährmann
5cc7be2536
[piczel] update and improve
...
- use proper pagination (fixes #396 )
- update API host and endpoints
- "fix" double slash // in image URLs
5 years ago
Mike Fährmann
0c1c7abb4d
release version 1.10.2
5 years ago
Mike Fährmann
49f6d7176d
[deviantart] restore filenames ( #392 )
...
<title>_by_<user>_<id> --> <title>_by_<user>-<id>
5 years ago
Mike Fährmann
63daa68d67
[deviantart] improvements ( #392 )
...
- consistent 'filename' entries, at least as far as possible
- GIFs and SWFs don't have a <title>_by_<artist>_<id> anywhere in
their metadata
- Generating <id> (from 'deviationid'?) might be something that needs
to be figured out, so we can build those filenames ourselves
- better code structure etc.
- tests for videos, archives, and flash animations
5 years ago
Mike Fährmann
d1db5180ab
[simplyhentai] fix extraction; remove image+video extractors
5 years ago
Mike Fährmann
30d6e284b0
[deviantart] use NAPI for artworks and scraps ( #392 )
...
TODO:
- journal downloads
- test for all media types
5 years ago
Mike Fährmann
7d6af936c5
[imgur] simplify gallery extraction
5 years ago
Mike Fährmann
3284c62f22
ensure PathFormat.directory ends with a path separator
...
... plus some other small optimizations
5 years ago
Mike Fährmann
ebabc5caf1
[downloader:http] treat 416 without downloaded data as error
...
Downloading https://pbs.twimg.com/media/EB2cGUYX4AI2Vuu.jpg:orig (NSFW)
sometimes returns a 416 status code, even though no 'Range' header was
sent and no data was downloaded prior.
This code usually means a file has already been downloaded completely
and the download method indicates success, but in this case it causes
an exception down the pipeline since no file was created.
5 years ago
Mike Fährmann
2495b99347
[postprocessor:classify] improve path generation ( fixes #138 )
...
It still doesn't work for converted ugoira animations thanks to how
those files are handled, but everything else, including files with
unknown or changing file extension, now works as it should.
5 years ago
Mike Fährmann
e77a656437
optimize directory path generation
...
- use str.join() instead of os.path.join()
(less "features", but 10x as fast)
- cache directory formatters
- detect and optimize field access for 1-element format strings
5 years ago
Mike Fährmann
51d10783fc
[patreon] include image info in API results ( #383 )
5 years ago
Mike Fährmann
7a5e78741c
[booru] build directory path for each file ( #385 )
5 years ago
Mike Fährmann
b1728f512d
[patreon] support multi image posts and post URLs ( #383 )
5 years ago
Mike Fährmann
454bf1ebf9
preserve enumeration index after 'set_extension()' ( #306 )
5 years ago
Mike Fährmann
f5039b897f
replace DownloadArchive.check() with __contains__()
...
Interestingly enough, 'a in obj' is slightly faster than
'obj.check(a)' and is also nicer to look at, I think.
5 years ago
Mike Fährmann
5a210991b6
Remove control characters from filesystem paths
...
- add 'path-remove' option to specify the set of characters that
should be removed
- rename 'restrict-filenames' to 'path-restrict'
- #348 , #380
5 years ago
Mike Fährmann
c50d60a53d
[reactor] fix image URLs
5 years ago
Mike Fährmann
32447d0d24
[pixiv] simplify default filename format
...
(#366 )
5 years ago
Mike Fährmann
5f8621b29d
improve output of active post processor modules
5 years ago
Mike Fährmann
2cbbc3dec4
add a 'whitelist' to '--ugoira-conv' ( #382 )
5 years ago
Mike Fährmann
829b1ccf04
[imgur] distinguish album and gallery URLs ( #380 )
...
A gallery can be either an album or a single image.
5 years ago
Mike Fährmann
23251356cb
require 'extension' data for each URL ( #382 )
5 years ago
Mike Fährmann
a67413d64f
[xhamster] use input URL domain
...
Don't rewrite all URLs as 'https://xhamster.com/ ...'
5 years ago
Mike Fährmann
0bb873757a
update PathFormat class
...
- change 'has_extension' from a simple flag/bool to a field that
contains the original filename extension
- rename 'keywords' to 'kwdict' and some other stuff as well
- inline 'adjust_path()'
- put enumeration index before filename extension (#306 )
5 years ago
Mike Fährmann
423f68f585
[deviantart] fix scraps extraction ( closes #376 )
5 years ago
Mike Fährmann
3bf20ffb70
[instagram] add support for story highlights
5 years ago
Mike Fährmann
a732e9c430
[instagram] update query hashes and headers
5 years ago
Mike Fährmann
2ccf6a9e35
[instagram] make extractor tests happy ( #373 )
5 years ago
Mike Fährmann
8dc42bb178
implement 'enumerate' for 'extractor.skip' ( #306 )
...
[ci skip]
5 years ago
Leonardo Taccari
bc5eaf7746
[instagram] Add support for IGTV ( #373 )
...
Add support for IGTV profile (instagram.com/<username>/channel/)
and IGTV medias (instagram.com/tv/<short_id>).
5 years ago
Mike Fährmann
b7fb93e2b2
[downloader:http] add 'adjust-extensions' option
5 years ago
Mike Fährmann
eb7da159e2
[imagebam] update URL test results
...
Image URLs are now using https://, but the website itself is still
served as http://.
5 years ago
Mike Fährmann
189acbeac9
[imgbb] add extractor for individual images ( closes #363 )
5 years ago
Mike Fährmann
ad3ac02fbc
[pixiv] update metadata entries ( #366 )
...
- change 'num' to a simple enumerating integer
- change default filename format
- provide content of the old 'num' field as 'suffix'
- add 'filename' for ugoira
5 years ago
Mike Fährmann
1ff4c4ec03
[adultempire] consistent artist order
5 years ago
Leonardo Taccari
2df050e627
[instagram] Add support for stories ( #371 )
...
* [instagram] Add support for stories
Add support for Instagram user's stories
(https://www.instagram.com/stories/ <username>/).
First the shared_data in instagram.com/stories/<username> is fetched in
order to retrieve the user_id that is then passed to fetch the stories
via the corresponding graphql query.
Please note that fetching stories is supported only when authentication
is enabled and the corresponding <username> is followed.
* [instagram] Add an only-matching test for stories
* [instagram] Simplify InstagramExtractor.items() and _extract_stories()
Simplify handling of typename in InstagramExtractor.items() and multi-line
string in _extract_stories(). NFCI.
5 years ago
Mike Fährmann
f4bc75e854
fix rate limit handling for OAuth APIs ( #368 )
5 years ago
Mike Fährmann
3957d27d79
[deviantart] add 'quality' option ( #369 )
5 years ago
Mike Fährmann
64b2935d8e
[pixiv] provide 'filename' and change default filename format
...
to '{filename}.{extension}' (closes #366 )
5 years ago
Mike Fährmann
2f33bac030
release version 1.10.1
5 years ago
Mike Fährmann
fa60109e97
[exhentai] don't use e-hentai.org for exhentai URLs
5 years ago
Mike Fährmann
dfe552421b
release version 1.10.0
5 years ago
Mike Fährmann
0609afd1e4
update default cache directory ... again
...
Use a 'gallery-dl' subdirectory in ~/.cache to adhere to how other
programs store their cached data, and call os.makedirs() so it also
works without an existing ~/.cache directory.
5 years ago
Mike Fährmann
4a0c98bfc9
miscellaneous fixes and adjustments
5 years ago
Mike Fährmann
2c839f3760
[imgbb] add user extractor + login support ( #361 )
5 years ago
Mike Fährmann
a8b60b2bd9
change default cache directory for unix systems
...
Use either $XDG_CACHE_HOME or ~/.cache (if the former isn't set)
and store potentially sensitive cookies and tokens in a user's
home directory and not in the world-readable /tmp.
5 years ago
Mike Fährmann
4b6edfbfd2
restrict permissions without importing 'pathlib'
...
and only on non-Windows systems.
1. On Windows the 'mode' argument for os.open() has no (visible) effect
on access permissions for new files.
2. The default location for 'cache.file' on Windows is in
%USERPROFILE%\AppData\Local\Temp which can only be accessed by the
owner himself (or an admin).
5 years ago
Leonardo Taccari
afce1ee1eb
Avoid possible sensitive information disclosure via cache.file
...
Previously cache.file could be created world readable leading to
possible sensitive information disclosure on multi-user systems.
Restrict permissions only to the owner by creating an empty file.
Please note that cache.file created before this commit may need a
`chmod 600' or similar!
5 years ago
Mike Fährmann
2153206093
[imgbb] add album extractor ( #361 )
5 years ago
Mike Fährmann
beb4fab2e6
[exhentai] improve limit and error handling ( #360 )
...
- check image limit before opening the first gallery or image page
- prevent any further exhentai extractors from running after the image
limit has been reached
5 years ago
Mike Fährmann
81b35ed3cb
[exhentai] catch more error states ( #356 , #360 )
...
- warn on MPV-enabled galleries
- catch parsing errors for gallery pages and image info
- write page content to debug output
5 years ago
Mike Fährmann
a90280f4e7
[postprocessor:zip] add 'mode' option ( #355 )
5 years ago
Mike Fährmann
6ce22f606b
[exhentai] update login procedure and tests
...
Logging in now follows the natural login flow that also happens in a
browser more closely and collects more cookies than just ipb_member_id
and ipb_pass_hash.
Test URLs have been updated and now point to the e-hentai.org domain.
5 years ago
Mike Fährmann
dc73d02d87
[exhentai] always use e-hentai.org as domain + set nw cookie
5 years ago
Mike Fährmann
40637556fa
[ngomik] fix extraction
5 years ago
Mike Fährmann
3969f9cbbd
[behance] fix collection extraction
5 years ago
Mike Fährmann
20f7b07312
ensure postproc finalize() is called during C-c or crash ( #355 )
5 years ago
Mike Fährmann
17a3426845
[gelbooru] enable all content when not using API
5 years ago
Mike Fährmann
279db2c5b2
[vsco] add collection & image extractor + video support ( #331 )
5 years ago
Mike Fährmann
547ea71463
[downloader.ytdl] add 'forward-cookies' option ( #352 )
...
The "long" name is necessary because just calling it 'cookies' would
clash with how the lookup for '--cookies' is implemented.
5 years ago
Mike Fährmann
d9d44ad953
[tsumino] update test results
5 years ago
Mike Fährmann
b1bea8aaeb
add 'restrict-filenames' option ( #348 )
5 years ago
Mike Fährmann
60cf40380a
[vsco] add user extractor ( #331 )
5 years ago
Mike Fährmann
3fe5ccdfa6
[adultempire] add gallery extractor ( closes #340 )
5 years ago
Mike Fährmann
b3851e01d9
release version 1.9.0
5 years ago
Mike Fährmann
5d968412ca
[deviantart] case-insensitive folder name matching ( fixes #343 )
5 years ago
Mike Fährmann
a3c736fedc
[500px] fix extraction
...
Maximum available image dimensions have been reduced to 4096px
on the longest edge. (from 5000px)
A few (unimportant) metadata fields are no longer available or have
been changed to 'null'.
5 years ago
Mike Fährmann
1133b7fcbd
[smugmug] update unit tests
...
The account used for tests before has been deleted.
5 years ago
Mike Fährmann
21991acc49
add 'ciphers' option; update default User-Agent
5 years ago
Mike Fährmann
84f4d3bc0b
replace urllib3's default cipher list with Firefox's ( #342 )
...
Avoids Cloudflare CAPTCHAs on both Linux in Windows without
pyOpenSSL installed.
5 years ago
Mike Fährmann
feb98cf196
[twitter] improve 'content' formatting; add option ( #338 )
...
- include emoticons
- leave newlines intact
- remove pic.twitter.com/ links at the end
5 years ago
Mike Fährmann
1740086d8a
add 'repl' and 'sep' arguments to text.replace_html()
5 years ago
Mike Fährmann
8d1ae9b715
[tumblr] enable date-min/-max/-format options ( #337 )
5 years ago
Mike Fährmann
09f37fde39
[reddit] move date-min/-max handling into Extractor class
5 years ago
Mike Fährmann
7b77ecc35a
fix paths for files without extension ( #220 )
5 years ago
Mike Fährmann
c41ff9441e
improve find() for downloaders and postprocessors
5 years ago
Mike Fährmann
0151e250f5
[twitter] extract 'content' metadata ( closes #333 )
5 years ago
Mike Fährmann
16c582aaf9
implement 'mtime' post-processor ( #332 )
...
This can set a file's modification time according to a UNIX timestamp
or a datetime object from its metadata.
5 years ago
Mike Fährmann
62097284fe
add 'download' option ( #220 )
5 years ago
Mike Fährmann
fe7805de7c
improve attribute access in DownloadJob.handle_url()
...
Storing a value in a local variable an accessing it that way is faster
than going through 'self' if it is accessed more than once.
5 years ago
Mike Fährmann
56c7a66a4a
detect Cloudflare CAPTCHAs and update cipher list
5 years ago
Mike Fährmann
a7b42b37a2
[35photo] fix extraction
5 years ago
Mike Fährmann
04b8d0894a
[newgrounds] improve metadata extraction
5 years ago
Mike Fährmann
12da6bd0c9
[simplyhentai] fix/improve extraction
5 years ago
Mike Fährmann
fdec59f8e2
replace extractor.request() 'expect' argument
...
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
5 years ago
Mike Fährmann
2ff73873f0
[erolord] add gallery extractor ( closes #326 )
5 years ago
Mike Fährmann
b4da8c5a97
[sexcom] add extractor for related pins ( #325 )
5 years ago
Mike Fährmann
69997e92db
[sexcom] skip unavailable pins ( #325 )
5 years ago
Mike Fährmann
8966930c5c
[downloader:http] try to import SSL exception class from OpenSSL
...
(#324 )
5 years ago
Mike Fährmann
bc6b0cfddc
[shopify] skip consecutive duplicate products
...
Not filtering duplicate URLs anymore caused the archive ID uniqueness
test to fail.
5 years ago
Mike Fährmann
b89f0d8d3c
update extractor result tests
5 years ago
Mike Fährmann
69205df68d
allow '-1' for infinite retries ( #300 )
5 years ago
Mike Fährmann
f7b5c4c3e7
use values of 'retries' options correctly
...
The RE-tries option now specifies exactly that: the maximum number a
failed HTTP request is re-tried. For example a value of 2 will now
correctly stop after 3 attempts: the initial one + 2 re-tries.
The maximum wait-time now also caps at 30min and increases exponentially
for both extractor.request() and downloader.http.download().
5 years ago
Mike Fährmann
6393b47db2
add '-A/--abort'; deprecate '--abort-on-skip'
5 years ago
Mike Fährmann
f2000a69aa
implement 'image-unique' and 'chapter-unique' options ( #303 )
...
The default value for both is 'false', i.e. duplicate URLs are NOT
ignored.
The previous behavior was to always ignore duplicate URLs to make
'--abort-on-skip' work properly when new images where added to the
beginning of a collection while gallery-dl is running.
5 years ago
Mike Fährmann
40da44b17f
Merge branch 'v1.9.0'
5 years ago