Mike Fährmann
1b9bf4fc6e
[behance] fix 'tags' extraction
5 years ago
Mike Fährmann
bb97e87989
[komikcast] ignore banner image
5 years ago
Mike Fährmann
0ff90a3f7d
[gfycat] include title in default filenames ( closes #434 )
5 years ago
Mike Fährmann
de4e2029d1
[nsfwalbum] update test album
...
the old one is no longer available
5 years ago
Mike Fährmann
1faec285d1
[nijie] further improvements ( closes #423 )
...
- provide a 'user_name' metadata field
- usually the same as 'artist_id', except for favorite downloads
- extract the whole description text and properly escape HTML entities
- fixed an issue with titles or tags containing double quotes
5 years ago
Mike Fährmann
6d0a533d68
[reddit] respect 'comments:0' for single submissions ( #429 )
5 years ago
Mike Fährmann
803d8f814e
[oauth] update scope for reddit tokens ( #428 )
...
'/user/<username>/...' requires the 'history' scope to be accessible
(https://www.reddit.com/dev/api/#GET_user_{username}_{where} )
5 years ago
Mike Fährmann
46ba173ded
[reddit] fix documentation inconsistencies ( closes #429 )
...
- Require 'reddit.comments' to be a number and convert it to an
integer to be extra sure
- Link to the README's OAuth section were appropriate
5 years ago
Mike Fährmann
20eb6c401f
[nijie] improvements and fixes ( #423 )
...
- ignore unavailable image pages
- more metadata fields: artist_name, date, tags
- rename 'index' to 'num'
- improved code structure
5 years ago
Mike Fährmann
d1ea08c67d
[weibo] fixes and improvements
...
- ignore unavailable videos (fixes #427 )
- handle empty 'geo' fields
- consistent metadata fields for images and videos
5 years ago
Mike Fährmann
38d97f3da6
[deviantart] add debug message about API credentials ( #424 )
5 years ago
Mike Fährmann
80c2104fb5
[deviantart] fix 429 handling if 'fatal' is False ( closes #424 )
5 years ago
Mike Fährmann
913460240d
[reddit] fix 'extractor.blacklist()' arguments
...
The second argument must support 'append()'.
5 years ago
Mike Fährmann
22bac14452
[pixiv] match '/artworks/' URLs
5 years ago
Mike Fährmann
66cac207ac
[twitter] match and use 'i/web' status URLs
5 years ago
Mike Fährmann
946f2751e2
[reddit] add 'user' extractor ( closes #350 )
5 years ago
Mike Fährmann
c14abb9fb8
[reddit] improve URL parameter handling for subreddit links
5 years ago
Mike Fährmann
ee8b654464
[instagram] implement 'highlights' option ( closes #329 )
5 years ago
Mike Fährmann
f63c3097a9
[instagram] rework some code paths
...
- combine fetching an HTML page and extracting its 'shared_data'
- move 'shared_data' and field access info out of '_extract_page()'
- introduce a '_request_graphql()' method
5 years ago
Mike Fährmann
4330133114
[imgur] add 'favorite' extractor ( closes #420 )
...
… and use a newer site-internal API endpoint for user posts
5 years ago
Mike Fährmann
ee5e20221f
[imgth] fix image URLs
5 years ago
Mike Fährmann
b63b126808
[hentaicafe] extend URL pattern
5 years ago
Mike Fährmann
d780f0357e
[imgur] add user extractor
5 years ago
Mike Fährmann
11ea689013
[simplyhentai] fix image and video URLs
5 years ago
Mike Fährmann
15632a1570
[tsumino] fix extraction
5 years ago
Mike Fährmann
d92802fd37
[luscious] fix detection of unavailable galleries
5 years ago
Mike Fährmann
f99da2b866
[imgbb] detect invalid album and user profile links
...
and update test results, since the old album got deleted
5 years ago
Mike Fährmann
01bc7adadc
[deviantart] improve journal detection ( #419 )
...
Some journal-like posts are not reported to be journals (isJournal
is set to False), even though they have a textContent field.
https://www.deviantart.com/gliitchlord/art/brashstrokes-812942668
5 years ago
Mike Fährmann
6e12907de6
[deviantart] improve handling of private deviations ( #414 )
...
- don't try to call '/deviation/metadata' with an empty list of
deviation ids
- print a warning when detecting private deviations without having
a 'refresh-token'
5 years ago
Mike Fährmann
e7690ac694
[vsco] update URL pattern ( closes #410 )
5 years ago
Mike Fährmann
1848788970
update test results etc
5 years ago
Mike Fährmann
d5fbb2d9de
[tumblr] ignore audio links from Spotify etc.
5 years ago
Mike Fährmann
b1cddce865
Revert "[simplyhentai] fix extraction; remove image+video extractors"
...
This reverts commit d1db5180ab
.
5 years ago
Mike Fährmann
d23660c04d
[hentaicafe] restore default 'request()' behavior
5 years ago
Mike Fährmann
9ae58a6b3e
[exhentai] update image limit checks
...
- adjust cost of original images
- delay limit initialization until gallery and first image page have
been requested and all cookies are available
5 years ago
Mike Fährmann
6fe9a134bf
[lineblog] add blog and post extractors ( closes #404 )
5 years ago
Mike Fährmann
4e8a548a61
[livedoor] update metadata extraction
5 years ago
Mike Fährmann
f9285f99e6
[pixiv] fix authentication
5 years ago
Mike Fährmann
6f3df3999a
[fuskator] add gallery and search extractor ( closes #407 )
5 years ago
Mike Fährmann
bc0ca66c99
[twitter] small improvements
...
- handle reply tweets (#403 )
- unset cookies in Tweet extractor to "force" the legacy interface
5 years ago
Mike Fährmann
f02a768b5c
[danbooru] add 'ugoira' option ( #406 )
...
to choose between ZIP archives or converted video files
for Ugoira posts
5 years ago
Mike Fährmann
dedea3b4db
[deviantart] fix journal creation ( #400 )
5 years ago
Mike Fährmann
c6c5cb1898
improve 'deviantart.quality' description
5 years ago
Mike Fährmann
efb64ad031
[deviantart] generate filenames ( #392 , #400 )
5 years ago
Mike Fährmann
b2151f3928
[seiga] support mobile URLs ( closes #401 )
5 years ago
Mike Fährmann
20fd2d8450
[flickr] skip unavailable images/videos ( fixes #398 )
5 years ago
Mike Fährmann
5cc7be2536
[piczel] update and improve
...
- use proper pagination (fixes #396 )
- update API host and endpoints
- "fix" double slash // in image URLs
5 years ago
Mike Fährmann
49f6d7176d
[deviantart] restore filenames ( #392 )
...
<title>_by_<user>_<id> --> <title>_by_<user>-<id>
5 years ago
Mike Fährmann
63daa68d67
[deviantart] improvements ( #392 )
...
- consistent 'filename' entries, at least as far as possible
- GIFs and SWFs don't have a <title>_by_<artist>_<id> anywhere in
their metadata
- Generating <id> (from 'deviationid'?) might be something that needs
to be figured out, so we can build those filenames ourselves
- better code structure etc.
- tests for videos, archives, and flash animations
5 years ago
Mike Fährmann
d1db5180ab
[simplyhentai] fix extraction; remove image+video extractors
5 years ago
Mike Fährmann
30d6e284b0
[deviantart] use NAPI for artworks and scraps ( #392 )
...
TODO:
- journal downloads
- test for all media types
5 years ago
Mike Fährmann
7d6af936c5
[imgur] simplify gallery extraction
5 years ago
Mike Fährmann
51d10783fc
[patreon] include image info in API results ( #383 )
5 years ago
Mike Fährmann
7a5e78741c
[booru] build directory path for each file ( #385 )
5 years ago
Mike Fährmann
b1728f512d
[patreon] support multi image posts and post URLs ( #383 )
5 years ago
Mike Fährmann
c50d60a53d
[reactor] fix image URLs
5 years ago
Mike Fährmann
32447d0d24
[pixiv] simplify default filename format
...
(#366 )
5 years ago
Mike Fährmann
829b1ccf04
[imgur] distinguish album and gallery URLs ( #380 )
...
A gallery can be either an album or a single image.
5 years ago
Mike Fährmann
23251356cb
require 'extension' data for each URL ( #382 )
5 years ago
Mike Fährmann
a67413d64f
[xhamster] use input URL domain
...
Don't rewrite all URLs as 'https://xhamster.com/ ...'
5 years ago
Mike Fährmann
423f68f585
[deviantart] fix scraps extraction ( closes #376 )
5 years ago
Mike Fährmann
3bf20ffb70
[instagram] add support for story highlights
5 years ago
Mike Fährmann
a732e9c430
[instagram] update query hashes and headers
5 years ago
Mike Fährmann
2ccf6a9e35
[instagram] make extractor tests happy ( #373 )
5 years ago
Leonardo Taccari
bc5eaf7746
[instagram] Add support for IGTV ( #373 )
...
Add support for IGTV profile (instagram.com/<username>/channel/)
and IGTV medias (instagram.com/tv/<short_id>).
5 years ago
Mike Fährmann
eb7da159e2
[imagebam] update URL test results
...
Image URLs are now using https://, but the website itself is still
served as http://.
5 years ago
Mike Fährmann
189acbeac9
[imgbb] add extractor for individual images ( closes #363 )
5 years ago
Mike Fährmann
ad3ac02fbc
[pixiv] update metadata entries ( #366 )
...
- change 'num' to a simple enumerating integer
- change default filename format
- provide content of the old 'num' field as 'suffix'
- add 'filename' for ugoira
5 years ago
Mike Fährmann
1ff4c4ec03
[adultempire] consistent artist order
5 years ago
Leonardo Taccari
2df050e627
[instagram] Add support for stories ( #371 )
...
* [instagram] Add support for stories
Add support for Instagram user's stories
(https://www.instagram.com/stories/ <username>/).
First the shared_data in instagram.com/stories/<username> is fetched in
order to retrieve the user_id that is then passed to fetch the stories
via the corresponding graphql query.
Please note that fetching stories is supported only when authentication
is enabled and the corresponding <username> is followed.
* [instagram] Add an only-matching test for stories
* [instagram] Simplify InstagramExtractor.items() and _extract_stories()
Simplify handling of typename in InstagramExtractor.items() and multi-line
string in _extract_stories(). NFCI.
5 years ago
Mike Fährmann
f4bc75e854
fix rate limit handling for OAuth APIs ( #368 )
5 years ago
Mike Fährmann
3957d27d79
[deviantart] add 'quality' option ( #369 )
5 years ago
Mike Fährmann
64b2935d8e
[pixiv] provide 'filename' and change default filename format
...
to '{filename}.{extension}' (closes #366 )
5 years ago
Mike Fährmann
fa60109e97
[exhentai] don't use e-hentai.org for exhentai URLs
5 years ago
Mike Fährmann
4a0c98bfc9
miscellaneous fixes and adjustments
5 years ago
Mike Fährmann
2c839f3760
[imgbb] add user extractor + login support ( #361 )
5 years ago
Mike Fährmann
2153206093
[imgbb] add album extractor ( #361 )
5 years ago
Mike Fährmann
beb4fab2e6
[exhentai] improve limit and error handling ( #360 )
...
- check image limit before opening the first gallery or image page
- prevent any further exhentai extractors from running after the image
limit has been reached
5 years ago
Mike Fährmann
81b35ed3cb
[exhentai] catch more error states ( #356 , #360 )
...
- warn on MPV-enabled galleries
- catch parsing errors for gallery pages and image info
- write page content to debug output
5 years ago
Mike Fährmann
6ce22f606b
[exhentai] update login procedure and tests
...
Logging in now follows the natural login flow that also happens in a
browser more closely and collects more cookies than just ipb_member_id
and ipb_pass_hash.
Test URLs have been updated and now point to the e-hentai.org domain.
5 years ago
Mike Fährmann
dc73d02d87
[exhentai] always use e-hentai.org as domain + set nw cookie
5 years ago
Mike Fährmann
40637556fa
[ngomik] fix extraction
5 years ago
Mike Fährmann
3969f9cbbd
[behance] fix collection extraction
5 years ago
Mike Fährmann
17a3426845
[gelbooru] enable all content when not using API
5 years ago
Mike Fährmann
279db2c5b2
[vsco] add collection & image extractor + video support ( #331 )
5 years ago
Mike Fährmann
d9d44ad953
[tsumino] update test results
5 years ago
Mike Fährmann
60cf40380a
[vsco] add user extractor ( #331 )
5 years ago
Mike Fährmann
3fe5ccdfa6
[adultempire] add gallery extractor ( closes #340 )
5 years ago
Mike Fährmann
5d968412ca
[deviantart] case-insensitive folder name matching ( fixes #343 )
5 years ago
Mike Fährmann
a3c736fedc
[500px] fix extraction
...
Maximum available image dimensions have been reduced to 4096px
on the longest edge. (from 5000px)
A few (unimportant) metadata fields are no longer available or have
been changed to 'null'.
5 years ago
Mike Fährmann
1133b7fcbd
[smugmug] update unit tests
...
The account used for tests before has been deleted.
5 years ago
Mike Fährmann
21991acc49
add 'ciphers' option; update default User-Agent
5 years ago
Mike Fährmann
84f4d3bc0b
replace urllib3's default cipher list with Firefox's ( #342 )
...
Avoids Cloudflare CAPTCHAs on both Linux in Windows without
pyOpenSSL installed.
5 years ago
Mike Fährmann
feb98cf196
[twitter] improve 'content' formatting; add option ( #338 )
...
- include emoticons
- leave newlines intact
- remove pic.twitter.com/ links at the end
5 years ago
Mike Fährmann
8d1ae9b715
[tumblr] enable date-min/-max/-format options ( #337 )
5 years ago
Mike Fährmann
09f37fde39
[reddit] move date-min/-max handling into Extractor class
5 years ago
Mike Fährmann
0151e250f5
[twitter] extract 'content' metadata ( closes #333 )
5 years ago
Mike Fährmann
56c7a66a4a
detect Cloudflare CAPTCHAs and update cipher list
5 years ago
Mike Fährmann
a7b42b37a2
[35photo] fix extraction
5 years ago
Mike Fährmann
04b8d0894a
[newgrounds] improve metadata extraction
5 years ago
Mike Fährmann
12da6bd0c9
[simplyhentai] fix/improve extraction
5 years ago
Mike Fährmann
fdec59f8e2
replace extractor.request() 'expect' argument
...
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
5 years ago
Mike Fährmann
2ff73873f0
[erolord] add gallery extractor ( closes #326 )
5 years ago
Mike Fährmann
b4da8c5a97
[sexcom] add extractor for related pins ( #325 )
5 years ago
Mike Fährmann
69997e92db
[sexcom] skip unavailable pins ( #325 )
5 years ago
Mike Fährmann
bc6b0cfddc
[shopify] skip consecutive duplicate products
...
Not filtering duplicate URLs anymore caused the archive ID uniqueness
test to fail.
5 years ago
Mike Fährmann
b89f0d8d3c
update extractor result tests
5 years ago
Mike Fährmann
69205df68d
allow '-1' for infinite retries ( #300 )
5 years ago
Mike Fährmann
f7b5c4c3e7
use values of 'retries' options correctly
...
The RE-tries option now specifies exactly that: the maximum number a
failed HTTP request is re-tried. For example a value of 2 will now
correctly stop after 3 attempts: the initial one + 2 re-tries.
The maximum wait-time now also caps at 30min and increases exponentially
for both extractor.request() and downloader.http.download().
5 years ago
Mike Fährmann
40da44b17f
Merge branch 'v1.9.0'
5 years ago
Mike Fährmann
7a99e85943
[kissmanga] fix download URLs and file extensions
...
The current Blogspot image URLs hosted on Kissmanga end with an
"invalid" query parameter (/000.png&upx=...), which doesn't get
recognized by 'spliturl()' and 'parseurl()' as such and gets therefore
included in the 'extension' field from 'text.nameext_from_url()'.
5 years ago
Mike Fährmann
055102431f
[hitomi] handle Game CG galleries with scenes ( fixes #321 )
5 years ago
Mike Fährmann
a9c89085fb
[instagram] implement login support ( #195 )
5 years ago
Mike Fährmann
7856e5e7dc
]deviantart] "fix" scraps extraction
5 years ago
Mike Fährmann
082cb24acd
[pururin] fix extraction
...
Missing metadata information would lead to unnecessary exceptions.
5 years ago
Mike Fährmann
98554cbab8
[mangoxo] fix login
5 years ago
Mike Fährmann
108963d138
[imagefap] include Referer headers
5 years ago
Mike Fährmann
e314621366
[nsfwalbum] fix default directory_fmt ( #287 )
5 years ago
Mike Fährmann
18a1f8c6cd
[vanillarock] add post and tag extractors ( closes #254 )
5 years ago
Mike Fährmann
f0c5093812
[nsfwalbum] add album extractor ( closes #287 )
5 years ago
Mike Fährmann
61e413d85d
[hentaifoundry] stop disabling IPv6 addresses
...
The rogue address mentioned in a138d58
is no longer included in the DNS
results for www.hentai-foundry.com.
5 years ago
Mike Fährmann
76ae9957c2
[deviantart] force legacy version for single deviations
...
Let's see how long this works ...
DeviantArt is rolling out a new version of their website, including a
new internal and potentially usable API (rewrite incoming, yay).
The issue with the new layout is that it doesn't include the "old"
UUIDs for single deviations, i.e. mapping a numeric deviation ID to its
UUID counterpart is impossible with the new layout.
5 years ago
Mike Fährmann
520c8ba106
[hentaicafe] extract 'tags' and 'artist' metadata ( closes #238 )
...
These metadata fields will only be filled in when using a top-level
URL, because that's the only place this information is available. Using
a Foolslide URL (1) will leave these fields empty.
(1) https://hentai.cafe/manga/read/.../en/0/1/ "
5 years ago
Mike Fährmann
b51baa9a4b
[hitomi] fix empty language detection; parse datetime
5 years ago
Mike Fährmann
258e8b2060
[deviantart] small code improvements
5 years ago
Mike Fährmann
a77340c647
[keenspot] fix extraction for "TwoKinds"
5 years ago
Mike Fährmann
03e6876fbe
[instagram] provide 'description' metadata ( #310 )
5 years ago
Mike Fährmann
ec3e8601f1
[slickpic] add user extractor ( #249 )
5 years ago
Mike Fährmann
97ef416218
[8muses] support multi-page listings ( #305 )
5 years ago
Mike Fährmann
f5961ac968
[deviantart] download deviations with no 'content' field
...
Some deviations (possibly only from sta.sh sources) are downloadable
(i.e. 'is_downloadable' is true and /deviation/download/ works), but
have no 'content' or similar in their JSON representation.
(fixes #307 )
5 years ago
Mike Fährmann
4e07f99e3e
[mangoxo] change token message level to debug
...
The login page currently doesn't provide and require a login token
(logging in works without a token), so printing a warning during
each login is unnecessary.
5 years ago
Mike Fährmann
d997c10320
[8muses] add album extractor ( #305 )
5 years ago
Mike Fährmann
e05a96db5e
[deviantart] rename 'stash' to 'extra' ( #302 )
...
'stash' is already used as a name for the StashExtractor and therefore
expected to be a dictionary.
5 years ago
Mike Fährmann
2184e3a86b
[slickpic] add album extractor ( #249 )
5 years ago
Mike Fährmann
c23bf263fe
[deviantart] rename 'external' to 'stash' ( #302 )
...
restrict extracted URLs to ones from https://sta.sh/ ...
5 years ago
Mike Fährmann
c73c2cda50
[pornhub] add gallery & user extractor ( #282 )
5 years ago
Mike Fährmann
7c6cb908f9
[xhamster] update test results
5 years ago
Mike Fährmann
2fb85178da
[deviantart] add 'external' option ( #302 )
...
If a description is available, this will extract URLs from the
description text and try to find Extractors for them.
5 years ago
Mike Fährmann
f85e42cffc
[deviantart] fix --range for deviation & stash extractor
5 years ago
Mike Fährmann
40c7eb3424
[livedoor] improve extraction ( fixes #301 )
5 years ago
Mike Fährmann
62335b9015
[paheal] adjust test results
5 years ago
Mike Fährmann
aa1ca4ed35
[shopify] skip deleted products ( #175 )
...
Product pages which return a 4xx status code will now be skipped instead
of raising an exception.
5 years ago
Mike Fährmann
096009367b
[xhamster] add gallery & user extractor ( #281 )
5 years ago
Mike Fährmann
208202b962
[tumblr] improve error handling ( #297 )
...
In some cases Tumblr's API responds with an HTML document.
Trying to decode it as JSON would raise an uncaught exception.
5 years ago
Mike Fährmann
c08c340178
[directlink] make pattern case insensitive ( fixes #296 )
5 years ago
Mike Fährmann
95b4a53b9c
[keenspot] improve pagination ( #223 )
...
The old code would skip the last comic page for some series.
5 years ago
Mike Fährmann
731c7cbd5b
[keenspot] support all comics and "random" access ( #223 )
5 years ago
Mike Fährmann
6a34f4b0c1
skip tests on read timeouts; print list of skipped tests
5 years ago
Mike Fährmann
1c36e65e9b
[exhentai] choose site version depending on input URL ( #278 )
...
Use e-hentai.org as root and cookiedomain if the input URL is from
e-hentai (or g.e-hentai), use exhentai.org otherwise.
5 years ago
Mike Fährmann
6da3e21237
[downloader:ytdl] provide 'filename' metadata ( closes #291 )
5 years ago