Mike Fährmann
5c32a7bf58
[deviantart] allow selecting source for 'extra' ( #1356 )
...
Setting 'extra' to "stash" or "deviations" will only download embedded
sta.sh content or deviations. 'true' still downloads both.
4 years ago
Mike Fährmann
5ad2b9c82b
[deviantart] extend 'extra' option
...
also download from embedded DeviantArt posts
4 years ago
Mike Fährmann
23be48427c
[deviantart] fix 'folders' option ( closes #1302 )
...
don't assume parent folders are listed before their children
4 years ago
Mike Fährmann
c6cc86d7d0
[deviantart] update parameters for '/browse/popular'
...
- limit results to 50 when also querying metadata (fixes #1267 )
- remove deprecated 'category_path' parameter
4 years ago
Mike Fährmann
c26de0929d
[deviantart] provide 'extension' for original file downloads
...
(#1272 )
4 years ago
Mike Fährmann
193dca2ce1
update extractor test results
4 years ago
Mike Fährmann
e2d4ca4955
[deviantart] improve '--range' for favorites ( closes #1226 )
4 years ago
Mike Fährmann
91db8df1c7
[deviantart] add 'index_base36' metadata field ( closes #1099 )
...
This is the same ID as found in 'filename' without the 'd' in front,
which is just 'index' encoded in base36.
4 years ago
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
4 years ago
Mike Fährmann
3ebb174f2c
add missing extractor info when spawning new ones ( fixes #1051 )
...
Not having this information causes the blacklist/whitelist logic to
trigger and prevents things from functioning as intended when using
default settings.
Fixes issues for 8muses, deviantart, exhentai, and mangoxo.
4 years ago
Mike Fährmann
136df52d1f
[deviantart] support watchers-only/paid deviations ( #995 )
4 years ago
Mike Fährmann
f6fd449b59
reduce wait time growth rate from exponential to linear
...
Waiting for 2**N seconds after each error grows too fast.
Simply waiting N seconds seems far more reasonable.
4 years ago
Mike Fährmann
c6c06c41f6
[deviantart] don't add journal text to description ( #712 )
4 years ago
Mike Fährmann
41d03160ff
[deviantart] also search journals for sta.sh links ( #712 )
...
when 'extra' is enabled
4 years ago
Mike Fährmann
dfcf2a2c91
write OAuth token to cache by default ( #616 )
4 years ago
Mike Fährmann
6294e2c540
add 'text.ensure_http_scheme()'
4 years ago
Mike Fährmann
65b1cb7acd
[deviantart] use private access tokens for Journals ( fixes #738 )
4 years ago
Mike Fährmann
999efec5cc
[deviantart] limit API wait times to 2**9=512 seconds ( #721 )
4 years ago
Mike Fährmann
6386ee54e1
[deviantart] add extractor info to 'following' results
4 years ago
Mike Fährmann
bae1e8ed12
[deviantart] fix JPEG quality replacement pattern
...
'q_\d+' would sometimes also replace something in the 'token' query
parameter, invalidating the URL.
5 years ago
Mike Fährmann
d02f7c1118
improve Extractor.wait()
...
- allow 'until' to be a datetime object
- do "time calculations" with UTC timestamps
- set a default 'reason'
5 years ago
Mike Fährmann
f9a590f92b
[deviantart] apply HTTP request limits in more places
...
"Request blocked" can also happen on sta.sh and for *any* HTTP
request directed at deviantart.com
5 years ago
Mike Fährmann
ff7c0b7eff
[deviantart] handle "Request blocked" errors ( #655 )
...
- add a 2 second wait time between requests to deviantart.com
- catch 403 "Request blocked" errors and wait for 3 minutes until
retrying
5 years ago
Mike Fährmann
c874684f05
[deviantart] retrieve *all* download URLs through OAuth API
...
'/extended_fetch' as well as Deviation webpages now again contain
Deviation UUIDs needed to grab Deviation info through the OAuth API,
meaning cookies are no longer necessary to grab original files.
The only instance were cookies are still needed are scraps marked as
"mature", since those entries are hidden for public users.
(#655 , #657 , #660 )
5 years ago
Mike Fährmann
5c27b25a8f
[deviantart] improve sta.sh extraction
...
Extract all sta.sh items in a single extractor run.
Don't spawn a new StashExtractor for each individual sta.sh item to
preserve the current requests.Session and its opened TCP connections.
5 years ago
Mike Fährmann
e2fc4eaa6f
[deviantart] detect stash folders ( fixes #659 )
5 years ago
Mike Fährmann
6f911aeb1c
[deviantart] add error message for cloudFront blocks ( #655 )
5 years ago
Mike Fährmann
1b82d36ab2
[deviantart] handle decode errors for extended_fetch results ( #655 )
...
This isn't going to solve the underlying problem, but it should at
least provide the server response when those errors happen.
5 years ago
Mike Fährmann
913b8333cc
write DeviantArt refresh-tokens to cache ( #616 )
...
Writing the token is currently disabled by default and must be
enabled with 'extractor.oauth.cache'.
'extractor.deviantart.refresh-token' must be set to '"cache"'
to use the cached token.
5 years ago
Mike Fährmann
64bdec8430
[deviantart] check availability of intermediary URLs ( fixes #609 )
5 years ago
Mike Fährmann
ec36df4851
[deviantart] fix video extraction from 'extended_fetch' results
...
DeviantArt is now serving videos from wixmp servers (1), instead of
the former film00.deviantart.com (2), even though those URLS are still
functional.
They seem to also have re-encoded those videos. The 10 MB 1080p video
from (2) is now only available in 720p at ~20 MB (with a higher
bitrate, but still …). Other videos are still available in 1080p, but
not this one for some reason.
(Changing the '720p' in (1) to '1080p' doesn't work.)
(1) https://wixmp-ed30a86b8c4ca887773594c2.wixmp.com/v/mp4/9feaa2c9-1baf-4fc2-84f7-f3384b34cefe/d5gxnb5-282a2e9a-b552-40ff-8542-b3c5eed823f5.720p.a837d7cec12c41be8ca2ee53152cea3a.mp4
(2) https://film00.deviantart.net/4c1d/v/mp4/2012/279/d/1/_video____brushes_i_use_in_paint_tool_sai_by_chi_u-d5gxnb5.mp4
5 years ago
Mike Fährmann
48be2266ed
[deviantart] better error message for 'extended_fetch' ( #585 )
5 years ago
Mike Fährmann
f8e137d6b4
[deviantart] show warning about private deviations only once
...
… per call to '_pagination()'
5 years ago
Mike Fährmann
939fec8ecd
[deviantart] match new search/popular URLs ( closes #538 )
5 years ago
Mike Fährmann
09cc88b715
[deviantart] match '/favourites/all' URLs ( closes #555 )
5 years ago
Mike Fährmann
ce54b8c04c
let extractors opt-out of cookie option usage
...
useful to avoid sending unnecessary cookies when all authentication
is done through OAuth tokens
5 years ago
Mike Fährmann
b347bf68c7
[deviantart] add extractor for followed users ( #515 )
5 years ago
Mike Fährmann
ab17ea9632
[deviantart] only print warning if 'original' is enabled
5 years ago
Mike Fährmann
c8e99e3b3b
[deviantart] fix crash on missing "token" field ( #505 )
5 years ago
Mike Fährmann
6ed2c7823c
[deviantart] disable original downloads if no cookies set
...
For 'deviation' and 'scraps' extractors only, since original file
downloads for those two will always fail with a 404 Not Found
when not logged in.
5 years ago
Mike Fährmann
50deab5265
[deviantart] fix URL generation from /extended_fetch results
...
(closes #505 )
5 years ago
Mike Fährmann
359c3bc1c5
[deviantart] revert to getting download URLs from OAuth API
...
This commit (partially) reverts 27b5b24
, 94eb7c6
, and a437e78
.
Download URLs from the 'extended_fetch' endpoint are now only
usable for logged in users, while those from the respective
OAuth API endpoint are working again. Everything except
scraps and direct deviation links should be fixed, and those
two categories will work with exported cookies. (#488 )
TODO:
- "native" login with --username and --password
- better handling of internally stored cookies
5 years ago
Mike Fährmann
d45fabb79d
match user profile handling on deviantart and newgrounds
5 years ago
Mike Fährmann
ea80dadd09
[deviantart] restore archive keys
...
Commit 9fdc5e7
changed 'username' fields to have consistent
capitalization, but that invalidated the archive keys of several
extractors where 'username' was usually lowercase.
5 years ago
Mike Fährmann
9fdc5e74cb
[deviantart] ensure consistent username capitalization ( #455 )
...
The 'username' field was capitalized in a very inconsistent manner:
Either all lowercase, or as given by the input URL, or with the
"original" capitalization, depending on the extractor used among
other things.
Now usernames use their original capitalization for all extractors.
('UserName' instead of 'username' or 'uSeRnAmE')
5 years ago
Mike Fährmann
dd5d2b2eac
[deviantart] add user profile extractor ( #377 , #419 )
5 years ago
Mike Fährmann
a437e78620
[deviantart] minimize cookie usage during scraps extraction
...
(#445 )
5 years ago
Mike Fährmann
9e3a8607ee
[deviantart] update usernames ( #455 )
...
In the case that a user changed his username, requesting deviations
with an old name might cause problems (missing deviations, etc.)
The internal 'username' value therefore now gets updated to the
current username taken from the user profile.
5 years ago
Mike Fährmann
c3042978b8
[deviantart] match "/gallery/all" ( closes #449 )
5 years ago
Mike Fährmann
df2b3c6888
restore OAuth2 authentication error messages
5 years ago
Mike Fährmann
94eb7c6cad
[deviantart] fix sta.sh extraction (436)
5 years ago
Mike Fährmann
27b5b2497e
[deviantart] fix download URLs ( #436 )
...
... except for sta.sh content.
Instead of using the old '/api/v1/oauth2/deviation/download' endpoint,
which started delivering URLs to 404 pages a while ago,
it is also possible to get a download URL from the relatively new
'/_napi/da-browse/shared_api/deviation/extended_fetch' endpoint
used by DeviantArt's Eclipse interface.
The current strategy is therefore:
- Iterate over deviations using the OAuth2 API
- Fetch original download URLs with the new NAPI/Shared API
5 years ago
Mike Fährmann
38d97f3da6
[deviantart] add debug message about API credentials ( #424 )
5 years ago
Mike Fährmann
80c2104fb5
[deviantart] fix 429 handling if 'fatal' is False ( closes #424 )
5 years ago
Mike Fährmann
01bc7adadc
[deviantart] improve journal detection ( #419 )
...
Some journal-like posts are not reported to be journals (isJournal
is set to False), even though they have a textContent field.
https://www.deviantart.com/gliitchlord/art/brashstrokes-812942668
5 years ago
Mike Fährmann
6e12907de6
[deviantart] improve handling of private deviations ( #414 )
...
- don't try to call '/deviation/metadata' with an empty list of
deviation ids
- print a warning when detecting private deviations without having
a 'refresh-token'
5 years ago
Mike Fährmann
dedea3b4db
[deviantart] fix journal creation ( #400 )
5 years ago
Mike Fährmann
efb64ad031
[deviantart] generate filenames ( #392 , #400 )
5 years ago
Mike Fährmann
49f6d7176d
[deviantart] restore filenames ( #392 )
...
<title>_by_<user>_<id> --> <title>_by_<user>-<id>
5 years ago
Mike Fährmann
63daa68d67
[deviantart] improvements ( #392 )
...
- consistent 'filename' entries, at least as far as possible
- GIFs and SWFs don't have a <title>_by_<artist>_<id> anywhere in
their metadata
- Generating <id> (from 'deviationid'?) might be something that needs
to be figured out, so we can build those filenames ourselves
- better code structure etc.
- tests for videos, archives, and flash animations
5 years ago
Mike Fährmann
30d6e284b0
[deviantart] use NAPI for artworks and scraps ( #392 )
...
TODO:
- journal downloads
- test for all media types
5 years ago
Mike Fährmann
423f68f585
[deviantart] fix scraps extraction ( closes #376 )
5 years ago
Mike Fährmann
f4bc75e854
fix rate limit handling for OAuth APIs ( #368 )
5 years ago
Mike Fährmann
3957d27d79
[deviantart] add 'quality' option ( #369 )
5 years ago
Mike Fährmann
5d968412ca
[deviantart] case-insensitive folder name matching ( fixes #343 )
5 years ago
Mike Fährmann
fdec59f8e2
replace extractor.request() 'expect' argument
...
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
5 years ago
Mike Fährmann
7856e5e7dc
]deviantart] "fix" scraps extraction
5 years ago
Mike Fährmann
76ae9957c2
[deviantart] force legacy version for single deviations
...
Let's see how long this works ...
DeviantArt is rolling out a new version of their website, including a
new internal and potentially usable API (rewrite incoming, yay).
The issue with the new layout is that it doesn't include the "old"
UUIDs for single deviations, i.e. mapping a numeric deviation ID to its
UUID counterpart is impossible with the new layout.
5 years ago
Mike Fährmann
258e8b2060
[deviantart] small code improvements
5 years ago
Mike Fährmann
f5961ac968
[deviantart] download deviations with no 'content' field
...
Some deviations (possibly only from sta.sh sources) are downloadable
(i.e. 'is_downloadable' is true and /deviation/download/ works), but
have no 'content' or similar in their JSON representation.
(fixes #307 )
5 years ago
Mike Fährmann
e05a96db5e
[deviantart] rename 'stash' to 'extra' ( #302 )
...
'stash' is already used as a name for the StashExtractor and therefore
expected to be a dictionary.
5 years ago
Mike Fährmann
c23bf263fe
[deviantart] rename 'external' to 'stash' ( #302 )
...
restrict extracted URLs to ones from https://sta.sh/ ...
5 years ago
Mike Fährmann
2fb85178da
[deviantart] add 'external' option ( #302 )
...
If a description is available, this will extract URLs from the
description text and try to find Extractors for them.
5 years ago
Mike Fährmann
f85e42cffc
[deviantart] fix --range for deviation & stash extractor
5 years ago
Mike Fährmann
f1893b2b5b
[deviantart] add 'folders' option ( #276 )
5 years ago
Mike Fährmann
f837ea98cb
[deviantart] don't call 'extend()' on folders ( fixes #271 )
5 years ago
Mike Fährmann
51e0e92429
[deviantart] fix GIF downloads ( #242 )
...
The "original" download URL for GIF animations is only a preview version
of the original file.
5 years ago
Mike Fährmann
9544683d56
[deviantart] provide 'date' metadata ( #232 )
5 years ago
Mike Fährmann
d6ddb74cde
update test results
...
- deviantart: 'index' is now an integer
- flickr: image file with lower quality
- paheal: image server name changed
- rule34: post got deleted
6 years ago
Mike Fährmann
9587aea98f
[deviantart] don't rewrite URLs for newer deviations
...
The '/intermediary/' trick stopped working for recently posted
deviations, but it still appears to be functional for older ones.
6 years ago
Mike Fährmann
5ec55ec4fc
[deviantart] improve URLs for non-downloadable deviations
6 years ago
Mike Fährmann
c7a6b0ed90
[deviantart] add 'metadata' option ( #189 )
6 years ago
Mike Fährmann
a2af2d2965
adjust cache maxage values
6 years ago
Mike Fährmann
13e0f2a78f
[deviantart] add 'scraps' extractor ( closes #168 )
6 years ago
Mike Fährmann
c7b8421333
[deviantart] don't match 'www' as a potential username
6 years ago
Mike Fährmann
61741d7333
provide type information for Queue messages
...
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
6 years ago
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
fa7fa2f8ff
[deviantart1 update tests]
6 years ago
Mike Fährmann
6c71e9cf5d
[deviantart] add separate 'sta.sh' extractor ( #113 )
...
- supports multiple stashed deviations per page
- explicitly mentions sta.sh support on supportedsites.rst
6 years ago
Mike Fährmann
7471933d5f
use extractor.request for all other API calls
...
- deviantart
- pawoo
- pixiv
- reddit
6 years ago
Mike Fährmann
7e2d6bcd62
[deviantart] fix original image downloads
6 years ago
Mike Fährmann
d1f3d32eec
[fallenangels] unescape chapter titles
6 years ago
Mike Fährmann
2221cf97ff
implement 'update()' for caches
6 years ago
Mike Fährmann
d8492df51b
[deviantart] extend functionality of 'original' option
6 years ago
Mike Fährmann
1532d1b690
fix 'range' tests and update a few test results
6 years ago
Mike Fährmann
e066f35118
update extractor tests
6 years ago
Mike Fährmann
0232d80cec
[deviantart] convert 'published_time' to int ( fixes #108 )
...
The 'published_time' field (a timestamp) changed from integer to string
and caused journal creation to fail.
6 years ago
Mike Fährmann
a493fed376
[deviantart] fix journal creation if no 'username' is set
6 years ago