Mike Fährmann
a3de234e70
[hitomi] add extractor for tag searches ( closes #697 )
4 years ago
Mike Fährmann
456f6e8d05
[nozomi] move '_unpack()' method to global scope
4 years ago
Mike Fährmann
55ac408bdf
[hitomi] fix extraction of galleries without tags
4 years ago
Mike Fährmann
db6685eeae
[aryion] support downloading from folders ( fixes #694 )
4 years ago
Mike Fährmann
fa2952ac55
[furaffinity] add 'following' extractor ( #515 )
4 years ago
Mike Fährmann
9b194520db
[newgrounds] add 'following' extractor ( closes #684 )
4 years ago
Mike Fährmann
6386ee54e1
[deviantart] add extractor info to 'following' results
4 years ago
Mike Fährmann
d5273f9b0c
[hiperdex] update domain to hiperdex.net
4 years ago
Mike Fährmann
08674a91f3
[patreon] fix hash extraction from download URLs ( closes #693 )
...
The old method was assuming every URL path ends with '/1'. For URLs
where this is not the case, the segment containing the post ID was
used as file hash.
4 years ago
Mike Fährmann
a31c1aae72
release version 1.13.4
5 years ago
Mike Fährmann
a6286bb551
[hiperdex] add 'artist' extractor ( #606 )
5 years ago
Mike Fährmann
291033720a
[hiperdex] fix manga extraction
5 years ago
Mike Fährmann
dfc0557807
[vsco] fix collection extraction
5 years ago
Mike Fährmann
fd438f0d78
update extractor test results
5 years ago
Mike Fährmann
bae1e8ed12
[deviantart] fix JPEG quality replacement pattern
...
'q_\d+' would sometimes also replace something in the 'token' query
parameter, invalidating the URL.
5 years ago
Mike Fährmann
cf4cef3d63
[aryion] adjust 'date' to UTC time
5 years ago
Mike Fährmann
a0f4c295c0
add optional 'utcoffset' argument to 'parse_datetime()'
5 years ago
Mike Fährmann
6c531be294
[aryion] fix malformed 'last-modified' headers ( #390 )
5 years ago
Mike Fährmann
38bc6430d3
[downloader:http] don't overwrite existing '_mtime' fields
5 years ago
Mike Fährmann
dc65f7d8dc
[aryion] use generic download URLs ( #390 )
...
i.e. /g4/data.php?id=…
- get filename & extension from Content-Disposition header
- handle all downloadable file types (docx, swf, etc)
5 years ago
Mike Fährmann
96b78bcf04
[aryion] include path in default directory format ( #390 )
5 years ago
Mike Fährmann
300264f676
read config files from PyInstaller exe directory ( closes #682 )
5 years ago
Mike Fährmann
6143050980
[aryion] add gallery and post extractors ( #390 , #673 )
5 years ago
Mike Fährmann
9e7dfc0cfc
[myportfolio] fix extraction of galleries without title
5 years ago
Mike Fährmann
88fca0a172
[mastodon] update OAuth credentials for pawoo.net ( #665 )
5 years ago
Mike Fährmann
4ae8a25567
[mastodon] use 'combine_dict()' to combine extractor info dicts
5 years ago
Mike Fährmann
220c06b86e
[mastodon] handle rate limits
5 years ago
Mike Fährmann
d02f7c1118
improve Extractor.wait()
...
- allow 'until' to be a datetime object
- do "time calculations" with UTC timestamps
- set a default 'reason'
5 years ago
Mike Fährmann
5d7404ab58
[oauth] use the new name for 'DeviantartAPI' ( fixes #670 )
5 years ago
Mike Fährmann
762c758af4
[hiperdex] fix extraction
5 years ago
Mike Fährmann
f9a590f92b
[deviantart] apply HTTP request limits in more places
...
"Request blocked" can also happen on sta.sh and for *any* HTTP
request directed at deviantart.com
5 years ago
Mike Fährmann
2587296deb
[mastodon] add access tokens for mastodon.social and baraag.net
...
(closes #665 )
5 years ago
Mike Fährmann
ff7c0b7eff
[deviantart] handle "Request blocked" errors ( #655 )
...
- add a 2 second wait time between requests to deviantart.com
- catch 403 "Request blocked" errors and wait for 3 minutes until
retrying
5 years ago
Mike Fährmann
c874684f05
[deviantart] retrieve *all* download URLs through OAuth API
...
'/extended_fetch' as well as Deviation webpages now again contain
Deviation UUIDs needed to grab Deviation info through the OAuth API,
meaning cookies are no longer necessary to grab original files.
The only instance were cookies are still needed are scraps marked as
"mature", since those entries are hidden for public users.
(#655 , #657 , #660 )
5 years ago
Mike Fährmann
5c27b25a8f
[deviantart] improve sta.sh extraction
...
Extract all sta.sh items in a single extractor run.
Don't spawn a new StashExtractor for each individual sta.sh item to
preserve the current requests.Session and its opened TCP connections.
5 years ago
Mike Fährmann
e2fc4eaa6f
[deviantart] detect stash folders ( fixes #659 )
5 years ago
Mike Fährmann
c034159701
[piczel] fix extraction for single images
5 years ago
Mike Fährmann
699036ea0c
[weibo] accept status URLs with non-numeric IDs ( #664 )
5 years ago
Mike Fährmann
fe96f99e4b
[hentainexus] reduce line length (flake8) & update test
5 years ago
墨焓
6f81cac8fa
Add metadata to hentainexus: circle, event, title_conventional. ( #661 )
5 years ago
Mike Fährmann
3ed72f82dc
release version 1.13.3
5 years ago
Mike Fährmann
6f911aeb1c
[deviantart] add error message for cloudFront blocks ( #655 )
5 years ago
Mike Fährmann
7499d71d02
[simplyhentai] ignore certificate errors in video test
5 years ago
Mike Fährmann
4203dc0bdc
[mangapark] fix metadata extraction
5 years ago
Mike Fährmann
6ecb0a19cf
handle sys.stdin being None when using '-' as input file ( #653 )
5 years ago
Mike Fährmann
1b82d36ab2
[deviantart] handle decode errors for extended_fetch results ( #655 )
...
This isn't going to solve the underlying problem, but it should at
least provide the server response when those errors happen.
5 years ago
Mike Fährmann
09f2271528
[35photo] add 'tag' extractor
5 years ago
Mike Fährmann
77fda8190c
[35photo] simplify/remove tests for the 'genre' extractor
...
There is still a nice genre overview page (https://35photo.pro/genre/ )
but the individual sub-pages don't list photos anymore
5 years ago
Mike Fährmann
4bc161ca0f
prevent crash when sys.stdout and co. are None ( #653 )
5 years ago
Mike Fährmann
fb846c9ee5
[instagram] reduce line lengths and make flake8 happy
5 years ago
Mike Fährmann
ad2efa8509
[e621] derive from Danbooru extractors ( #651 )
...
- use extractor implementations from 'danbooru'
- use "page": "b[ID]" to paginate over results instead of
"tags": "id:<[ID]", avoiding infinite loops with certain
post orders
- bump User-Agent version
5 years ago
Mike Fährmann
9b39e1cd7e
[e621] fix bug in API rate limiting ( #651 )
5 years ago
Mike Fährmann
b607d0ad7f
[twitter] fix typo in 'x-twitter-auth-type' header ( #625 )
5 years ago
Mike Fährmann
9159cb8fb3
remove trailing dots and spaces from directory names ( #647 )
5 years ago
Mike Fährmann
2c3b9e1450
[nozomi] support multiple images per post ( #646 )
...
This changes the default filename format as well as archive IDs,
since those assumed that each post would only have one image.
5 years ago
Mike Fährmann
c606d0c854
[instagram] update pattern for user profile URLs
...
Allow for query parameters and fragments,
for example https://www.instagram.com/instagram/?hl=en
5 years ago
Mike Fährmann
2530db3f4d
[mangadex] transform 'date' timestamps to datetime objects
5 years ago
Mike Fährmann
ae2a33243b
[newgrounds] catch general Exceptions
5 years ago
Mike Fährmann
32e36d8f02
[sexcom] replace tests
5 years ago
Mike Fährmann
33b42dc847
[nozomi] sort search results ( fixes #646 )
5 years ago
Mike Fährmann
eaa60a438b
[piczel] fix extraction
...
- manually filter by folder_id
- extract data for single posts from embedded JSON, since the
'/api/gallery/image/<id>' endpoint is no longer available
5 years ago
Mike Fährmann
5bcc7184c9
[danbooru][e621] increase page limits
5 years ago
Mike Fährmann
90d15e3682
[instagram] use 'itertools.chain()'
5 years ago
Leonardo Taccari
160328d21c
[instagram] Add support for user's saved medias ( #644 )
...
* [instagram] Gracefully handle possible 'HttpErrorPage' in _extract_page()
`HttpErrorPage' is returned in shared_data at least when not authenticated or
when trying to fetch other users saved medias
(i.e. `instagram.com/<user>/saved/').
Gracefully handle it by returning nothing.
* [instagram] Add support for user's saved medias
(Please note that this need the user to be authenticated and they can
only see their saved media (not other users ones).)
Close #643 .
* [instagram] Bump copyright year
5 years ago
Mike Fährmann
e0b0e8d62a
release version 1.13.2
5 years ago
Mike Fährmann
d3482ace7f
[furaffinity] extract more metadata
...
- views
- favorites
- comments
- rating
- fa_category (since 'category' is already in use)
- theme
- species
- gender
- width
- height
5 years ago
Mike Fährmann
f6c5edb76b
pre-compile regex pattern for remove_html() and split_html()
5 years ago
Mike Fährmann
fdd2dd5136
[kabeuchi] add 'user' extractor ( closes #561 )
5 years ago
Mike Fährmann
59edcdc822
[hitomi] restore metadata fields from before f33b13a
...
... and add a 'metadata' option to disable
visiting the gallery page and extracting data from it
if this is not needed.
5 years ago
Mike Fährmann
2d5703c493
[twitter] use a simpler data structure to store cookies in cache
...
Use a dict with name-value pairs instead of an entire
RequestsCookieJar object.
5 years ago
Mike Fährmann
87d4f83597
[newgrounds] make post extraction nonfatal
5 years ago
Mike Fährmann
823fbeaae6
[newgrounds] add 'favorite' extractor ( #394 )
5 years ago
Mike Fährmann
a45fbc38ea
[pixiv] implement 'avatar' option ( #595 , #623 )
5 years ago
Mike Fährmann
a63a376ad2
[mangoxo] fix login
5 years ago
Mike Fährmann
ebc70e87ce
[e621] update to new interface / API endpoints ( closes #635 )
5 years ago
Mike Fährmann
d1cf7ccdb3
[instagram] add 'post_shortcode' metadata field ( #525 )
5 years ago
Mike Fährmann
32df8d06fe
[twitter] add 'bookmark' extractor ( closes #625 )
5 years ago
Mike Fährmann
3fb41c34c8
[bcy] reduce requests to '/item/detail/<id>' ( #613 )
...
The former implementation would try to use the embedded data from
'/item/detail/' pages for every post, even if that wasn't really
necessary.
This commit also fixes some issues with posts only visible to
logged in users.
5 years ago
Mike Fährmann
f33b13aacf
[hitomi] simplify metadata extraction
...
Use the data from https://ltn.hitomi.la/galleries/ <id>.js for both
image URLs and metadata and ignore any gallery or reader pages.
This removes 'artist', 'characters', 'group', and 'parody' metadata
fields since this information is, as for now, only available in
gallery pages.
5 years ago
Mike Fährmann
115fd2c6f2
"fix" incomplete MIME types ( #632 )
...
e-/exhentai's original image downloads currently send
incomplete/invalid Content-Type headers, "jpg" instead
of "image/jpg" etc, since the last update.
(https://forums.e-hentai.org/index.php?showtopic=236113 )
This change prepends any Content-Type value missing a
media type specification with "image/", transforming it
into a valid MIME type.
(A global solution to a local problem, but it shouldn't
cause any issues anywhere else)
5 years ago
Mike Fährmann
72122eb9b3
release version 1.13.1
5 years ago
Mike Fährmann
adcd7cb24a
[downloader:http] add another MIME type for '.rar' files ( #628 )
5 years ago
Mike Fährmann
ce5e2a58fe
[imgbb] update test results
...
Image server domain changed from
https://image.ibb.co/ to https://i.ibb.co/
5 years ago
Mike Fährmann
f117e32910
[danbooru] restore 'popular' functionality
5 years ago
Mike Fährmann
39b48d665b
[hiperdex] use proper name for 'chapter_minor'
5 years ago
Mike Fährmann
8fbbaa54ff
[bcy] fix partial image URLs ( #613 )
...
Images from new posts can have incomplete/partial URLs (1)
without any filename extension when fetching their data from
'/apiv3/user/selfPosts', so now all data gets taken from
'/item/detail/ID' pages.
It is currently unknown how to get the non-watermarked original version
of these images, or if that is possible at all. (2)
Images with a watermark will have their 'filter' metadata field set to
"watermark". For original images this field is an empty string "".
Enabling the 'noop' option will, in addition to the watermarked version,
yield the the '~noop.image' filter version (3),
where 'filter' is set to "noop".
(1) "https://img-bcy-qn.pstatp.com/banciyuan/3ccdff22479c4060aadc86718209b281 "
(2) "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~tplv-banciyuan-logo-v3:wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug==.image "
(3) "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~noop.image "
5 years ago
Mike Fährmann
86c00f9e66
[danbooru] move extractor logic from booru.py
5 years ago
Mike Fährmann
1d4a369ea2
update extractor test results
5 years ago
Mike Fährmann
7625912b31
[piczel] improve and update
...
- fix tag names
- fix a bug in _pagination()
- parse datetime in 'created_at' as 'date'
- rewrite main loop
- replace user profile test
5 years ago
Mike Fährmann
ec85bf90de
use context managers in cache.py & add tests
5 years ago
Mike Fährmann
913b8333cc
write DeviantArt refresh-tokens to cache ( #616 )
...
Writing the token is currently disabled by default and must be
enabled with 'extractor.oauth.cache'.
'extractor.deviantart.refresh-token' must be set to '"cache"'
to use the cached token.
5 years ago
Mike Fährmann
2a4f227e08
warn about expired cookies
5 years ago
Mike Fährmann
34887ae139
fix bugs in DatabaseCacheDecorator.update()/.invalidate()
...
- call db.commit() after changes have been made
- remove 'LIMIT 1' from the DELETE statement in invalidate()
(only available if SQLite3 was compiled with the right flags
enabled, syntax error otherwise)
5 years ago
Mike Fährmann
380b693fad
[downloader:http] add more MIME types for '.bmp' files ( #621 )
5 years ago
Mike Fährmann
4e361b3008
add tests for specific datetime values
5 years ago
Mike Fährmann
80ecb99089
[hitomi] fix extraction
5 years ago
Mike Fährmann
247c9e1416
[vsco] update gallery URL pattern
5 years ago
Mike Fährmann
19ae6f3fc4
update test results
...
- twitter:
Don't test the whole kwdict, only the actual content, since the
keyword hash changes whenever that user changes his display name.
- khinsider:
Download host changed
5 years ago
Mike Fährmann
cc5079c844
[hiperdex] add chapter and manga extractors ( closes #606 )
5 years ago
Mike Fährmann
64bdec8430
[deviantart] check availability of intermediary URLs ( fixes #609 )
5 years ago
Mike Fährmann
5607dd3646
[hitomi] follow multiple redirects
5 years ago
Mike Fährmann
765b2a0527
[hentaihand] add extractors ( closes #605 )
5 years ago
Mike Fährmann
d94215d119
[tumblr] replace '-' with ' ' in tag searches ( fixes #611 )
...
To search for tags with actual minus signs in them
(there shouldn't be too many,) manually replace those
with url-encoded minus characters ('-' -> '%2d')
before inputting them into gallery-dl:
https://s679874.tumblr.com/tagged/tag-with-minus
->
https://s679874.tumblr.com/tagged/tag%2dwith%2dminus
5 years ago
Mike Fährmann
5cdf1b1319
fix --verbose/--quiet
...
caused by 383795b
5 years ago
Mike Fährmann
78e8d33c97
release version 1.13.0
5 years ago
Mike Fährmann
e6cd49e78b
update extractor test results
5 years ago
Mike Fährmann
90e4c645ba
[formatter] allow multiple "special" format specifiers ( #595 )
...
It is now, for example, possible to specify multiple replacement
operations per format replacement field: {name:Ra/b/Rc/d/}
5 years ago
Mike Fährmann
5d9437b398
[vsco] skip "invalid" entities
5 years ago
Mike Fährmann
650f2b6d58
[furaffinity] accept sfw.furaffinity.net URLs ( closes #608 )
...
Just as an alias for regular URLs with no extra content filtering.
5 years ago
Mike Fährmann
219c4cc78c
[formatter] allow for numeric list and string indices
5 years ago
Mike Fährmann
7d1da614d9
[formatter] implement field name alternatives ( #525 )
...
The format string '{a|b|c}' will now try to use the value from 'a' and
fall back to 'b' and 'c' if accessing a field raises an exception or
if its value is None.
5 years ago
Mike Fährmann
74e684e828
[twitter] change default value for 'videos' to 'true'
...
Every other 'videos' option defaulted to 'true', except Twitter.
5 years ago
Mike Fährmann
c7cf9dd111
[furaffinity] support classic layout ( #284 )
5 years ago
Mike Fährmann
138135c190
[furaffinity] add extractors ( #284 )
5 years ago
Mike Fährmann
b9c574bd1d
[patreon] log skipped files ( #590 )
5 years ago
Mike Fährmann
80ea9104b8
[8kun] adjust URL pattern
5 years ago
Mike Fährmann
c76c8b765a
[cloudflare] unescape challenge URL
5 years ago
Mike Fährmann
ce26070231
[pixiv] reduce calls to '/user/detail'
5 years ago
Mike Fährmann
da0d5f6092
[oauth] add 'port' option ( #604 )
5 years ago
Mike Fährmann
719b63d0ca
[bcy] add user and post extractors ( #592 )
5 years ago
Mike Fährmann
6426e3efc7
[khinsider] fix and improve metadata extraction
5 years ago
Mike Fährmann
4a3d2405de
[postprocessor:ugoira] small optimization
...
Use tuples instead of lists when extending the list of
command-line arguments.
5 years ago
Mike Fährmann
b7eb6cecbb
[pixiv] handle tags at the end of new bookmark URLs
5 years ago
Mike Fährmann
109f6c8685
[patreon] filter duplicate files per post ( #590 )
5 years ago
Mike Fährmann
b38cf59711
[sexcom] fix image URLs & parse 'date' fields
5 years ago
Mike Fährmann
1f4c9c5f9d
[8kun] add thread and board extractors ( closes #582 )
5 years ago
Mike Fährmann
facc5daa6d
[twitter] force old login page layout ( fixes #584 , fixes #598 )
5 years ago
Mike Fährmann
d1de7dc296
[hitomi] implement workaround for "broken" redirects
...
Some galleries redirect to a new "version" with different gallery id.
This new version might not be available any more, but the /reader/
page for the original gallery id can still work.
5 years ago
Mike Fährmann
40fe062851
[pixiv] fix user id for bookmarks API calls ( closes #596 )
5 years ago
Mike Fährmann
91aaaf1a9e
[pixiv] add 'rating' metadata field ( #595 )
...
A human-friendlier representation of 'x_restrict'
5 years ago
Mike Fährmann
dff33b260c
[reddit] add 'videos' option
5 years ago
Mike Fährmann
2ad43618cc
[piczel] fix extraction
5 years ago
Mike Fährmann
cf7a67d67f
[yaplog] remove module
...
Yaplog! ended its service on 2020-01-31
5 years ago
Mike Fährmann
e0dd073ce0
[twitter] replace embedded tweet test
...
the old one was deleted
5 years ago
Mike Fährmann
ec36df4851
[deviantart] fix video extraction from 'extended_fetch' results
...
DeviantArt is now serving videos from wixmp servers (1), instead of
the former film00.deviantart.com (2), even though those URLS are still
functional.
They seem to also have re-encoded those videos. The 10 MB 1080p video
from (2) is now only available in 720p at ~20 MB (with a higher
bitrate, but still …). Other videos are still available in 1080p, but
not this one for some reason.
(Changing the '720p' in (1) to '1080p' doesn't work.)
(1) https://wixmp-ed30a86b8c4ca887773594c2.wixmp.com/v/mp4/9feaa2c9-1baf-4fc2-84f7-f3384b34cefe/d5gxnb5-282a2e9a-b552-40ff-8542-b3c5eed823f5.720p.a837d7cec12c41be8ca2ee53152cea3a.mp4
(2) https://film00.deviantart.net/4c1d/v/mp4/2012/279/d/1/_video____brushes_i_use_in_paint_tool_sai_by_chi_u-d5gxnb5.mp4
5 years ago
Mike Fährmann
48be2266ed
[deviantart] better error message for 'extended_fetch' ( #585 )
5 years ago
Mike Fährmann
383795b550
prevent superfluous calls to Logger.makeRecord()
...
… by setting an appropriate minimal logging level for the root Logger.
5 years ago
Mike Fährmann
71851a6241
[pixiv] update URLs of followed users to the new format
5 years ago
Mike Fährmann
d086f30b42
[reddit] restore archive keys for i.redd.it images
5 years ago
Mike Fährmann
56f1c96168
implement 'parent-directory' option ( #551 )
5 years ago
Mike Fährmann
ae07f92f7e
[reddit] rewrite extractor logic ( closes #551 )
...
Handle images and videos hosted on Reddit "natively",
allowing them to use reddit-specific metadata to build directory
and file names.
5 years ago
Mike Fährmann
2852691d78
[paheal] replace test URL
...
searching for 'k-on' doesn't yield any results anymore
5 years ago
Mike Fährmann
2a9be48511
improve util.load/save_cookiestxt() and add tests
...
- take a file object as argument instead of an filename
- accept whitespace before comments (" # comment")
- map expiration "0" to None and not the number 0
5 years ago
Mike Fährmann
e35c2ea1a6
[weibo] use youtube-dl to download from m3u8 manifests
5 years ago
Mike Fährmann
6703b8a86b
[blogger] implement video extraction ( closes #587 )
5 years ago
Mike Fährmann
c1a6862863
implement functions to load/save cookies.txt files ( closes #586 )
...
The methods of the standard libraries' MozillaCookieJar have
several shortcomings (#HttpOnly_ cookies, 0 expiration timestamps, etc.)
and require construction of an ultimately pointless CookieJar object.
5 years ago
Mike Fährmann
5d73b7f29c
release version 1.12.3
5 years ago
Mike Fährmann
37247dbaff
miscellaneous fixes
5 years ago
Mike Fährmann
0e9dc5c88e
fix AttributeError when accessing 'temppath'
...
[ci skip]
5 years ago
Mike Fährmann
25d5ec4ff3
[twitter] add option to extract TwitPic embeds ( #579 )
5 years ago