Mike Fährmann
f4e3cee6ac
use yt-dlp by default ( #1850 , #2028 )
3 years ago
Mike Fährmann
f1b142e993
{kemonoparty[ change default 'files' order to attachments,file,inline
...
(#1991 )
3 years ago
Mike Fährmann
275543b2d2
update extractor test results
3 years ago
Mike Fährmann
e7ea4f2567
[mangoxo] fix metadata extraction
3 years ago
Mike Fährmann
e298882acc
[kemonoparty] match URLs with www subdomain
3 years ago
Mike Fährmann
addb72e1bb
[reactor] support thatpervert.com ( closes #2029 )
3 years ago
Mike Fährmann
d8d9502e1e
[reactor] inherit from BaseExtractor
3 years ago
Mike Fährmann
f4ea216c95
[shopify] support loungeunderwear.com ( closes #2053 )
3 years ago
Mike Fährmann
93cef78450
[gelbooru] workaround pagination limits
...
Gelbooru only allows to retrieve the latest 20k posts for a tag search.
Add 'id:<N' to the search tags to work around that limitation, where N
is the ID of the last retrieved post.
http://gelbooru.me/index.php?page=forum&s=view&id=1467
3 years ago
Mike Fährmann
f2ae179713
[exhentai] fix extraction for disowned galleries ( closes #2055 )
3 years ago
Alice
612850438e
[skeb] add 'thumbnails' option ( #2047 ) ( #2051 )
3 years ago
Mike Fährmann
11a3d96d13
[mangadex] load additional metadata using includes[] directives
...
- always provide 'artist', 'author', and 'group' metadata fields (#2049 )
- remove 'metadata' option
3 years ago
Mike Fährmann
19e00f1322
[dynastyscans] provide 'date' as proper datetime object ( #2050 )
3 years ago
Mike Fährmann
af6424f398
allow testing metadata in list elements
3 years ago
Mike Fährmann
c67756e187
[kemonoparty] add 'dms' option ( #2008 )
3 years ago
Mike Fährmann
3a7a19c7b9
[dynastyscans] add 'manga' extractor ( closes #2035 )
3 years ago
Mike Fährmann
9bc83af3a6
[kemonoparty] 'postfile' -> 'file' ( #1991 )
...
to stay consistent with the existing file types for kemono
3 years ago
Mike Fährmann
522782c09d
[subscribestar] emit metadata for posts without media ( #1569 )
3 years ago
Mike Fährmann
1c8aaf9318
[subscribestar] add 'num' enumeration index ( closes #2040 )
3 years ago
Mike Fährmann
d433735750
[kemonoparty] skip duplicate files ( #2032 , #1991 , #1899 )
...
Extract the SHA-256 file hash from URLs
and skip files with the same hash in the same post.
- provide a 'hash' metadata field (empty string if not available)
- remove 'patreon-skip-file' option
3 years ago
Mike Fährmann
d4ec245554
[kemonoparty] implement a 'files' option ( #1991 )
...
similar to 8d676151
3 years ago
Mike Fährmann
ab8eea1a24
[twitter] fix extractor for direct image links ( fixes #2030 )
3 years ago
Mike Fährmann
2076d40681
[ytdl] improve error handling ( #1680 )
3 years ago
Mike Fährmann
2aaac3c997
[instagram] include user metadata for 'tagged' downloads ( #2024 )
...
Adds
- tagged_owner_id
- tagged_full_name
- tagged_username
containing the values for the user profile the URL originated from,
e.g. 'instagram' for https://www.instagram.com/instagram/tagged/ .
3 years ago
Mike Fährmann
cfa4876848
[philomena] support furbooru.org ( closes #1995 )
3 years ago
Mike Fährmann
4377f1c284
[twitter] distinguish between fatal & nonfatal errors ( #2020 )
...
only show a warning for nonfatal errors
and do not raise a StopExtraction exception
3 years ago
Kyle Anthony Williams
a14b72be21
[webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net ( #2005 )
...
* [webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net
This trick to avoid having to set a Referer header comes from
Webtoon's RSS feeds. The two URLs below are equivalent in content:
https://webtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
https://swebtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
The URL with the domain "webtoon-phinf.pstatic.net" needs a Referer
header, and the domain "swebtoon-phinf.pstatic.net" does not. This
is because of the environment "swebtoon" images live in, one without
explicit network control: RSS feeds on sites such as Feedly. This change should
make it easier for gallery-dl developers to embed Webtoon comics without
worrying about headers.
3 years ago
Mike Fährmann
6e3658ef52
[kemonoparty] provide 'date' metadata for gumroad ( #2007 )
...
Not the 'published' or 'edited' values since they are 'null',
but still better then nothing at all.
3 years ago
Mike Fährmann
37c9dedee1
[seisoparty] remove module
3 years ago
Mike Fährmann
efa178cc91
[ytdl] implement parsing ytdl command-line options ( #1680 )
...
- adds 'config-file' and 'cmdline-args' options
for both ytdl downloader and extractor
- create 'ytdl' helper module, which combines YoutubeDL creation
and option parsing.
- most likely a buggy mess due to incompatibilities between the
original youtube-dl and yt-dlp.
3 years ago
Mike Fährmann
7cb303d745
[redgifs] improve URL extraction
...
Fields inside 'urls' can be None, which would have caused an exception
with the old method.
3 years ago
Mike Fährmann
2befed1a96
[redgifs] update search URL pattern ( #1984 )
3 years ago
Mike Fährmann
b315a0ecef
[redgifs] update to API v2 ( #1984 )
3 years ago
Mike Fährmann
f0fc3b0ba1
[kemonoparty] add 'comments' option ( #1980 )
3 years ago
Mike Fährmann
1fac74b14d
[reddit] prevent crash for galleries with no 'media_metadata'
...
(fixes #2001 )
3 years ago
Mike Fährmann
211de95dd0
update extractor test results
3 years ago
Mike Fährmann
8bea02c38c
[deviantart] fix 'index' values for stashed deviations
3 years ago
Mike Fährmann
dd88a7d980
{cyberdrop] restore video extraction ( fixes #1993 )
...
fixes a regression introduced in f33c2ef7
3 years ago
Mike Fährmann
fa5646eadc
[mangoxo] fix login and extraction
3 years ago
Mike Fährmann
4c49174579
[mangakakalot] update domain and fix extraction
3 years ago
YongChan Cho
14852f7050
[hitomi] fix image path ( #1988 )
3 years ago
Mike Fährmann
dad2875a3e
fix calculating retry sleep times ( fixes #1990 )
3 years ago
Mike Fährmann
9156e90f1f
[twitter] add 'pinned' option
3 years ago
Mike Fährmann
06b414c9a3
[redgifs] 'gfyId' -> 'id' ( #1984 )
3 years ago
Ryu juheon
d4614e5ba4
[hitomi] fix image URLs ( #1982 )
3 years ago
Mike Fährmann
6434ccf9e8
[redgifs] split from 'gfycat' ( #1984 )
...
Update API endpoints and metadata names - mostly 'gfycat' -> 'gif' -
and remove some obsolete checks.
3 years ago
Mike Fährmann
e4696b40ba
[instagram] update query hashes
3 years ago
Alice
bfd7401b1e
[skeb] add 'user' and 'post' extractors ( #1031 ) ( #1971 )
...
* Create skeb.py
* Update __init__.py
* Update supportedsites.py
* Update supportedsites.md
* Update supportedsites.py
* Update skeb.py
3 years ago
Ryu juheon
6b6d92d51c
[hitomi]: fix image URLs ( #1975 )
3 years ago
Mike Fährmann
dcb201ff19
[gfycat] show warning when there are no available formats
3 years ago
Mike Fährmann
e436a2607b
[gfycat] consistent 'userName' values for 'user' downloads ( #1962 )
...
by using the name from the input URL and not relying on possibly faulty
or incomplete API results.
'userData[username]', if available, will still have the original name.
3 years ago
Mike Fährmann
f1487a3cfa
[kemonoparty:discord] improve 'inline' extraction ( #1940 )
...
- extract media.discordapp.*NET* URLs
- rewrite media.discordapp.net to cdn.discordapp.com
- use a more restricted set of characters for the URL path
3 years ago
Mike Fährmann
02a247f4e5
[deviantart] full resolution for non-downloadable images ( #293 )
...
Many thanks to @Ironchest337 for discovering this method
and providing a well-documented implementation.
3 years ago
Mike Fährmann
a7ddb5f5fa
[deviantart] update 'search' argument handling ( fixes #1911 )
...
- use 'alltime' by default
- support newer 'order' values (most-recent, this-week, etc)
3 years ago
Mike Fährmann
c19e762fdf
[vk] add 'album' extractor ( #474 , fixes #1952 )
...
todo: better metadata for albums
3 years ago
Mike Fährmann
8bb442f20d
[redgifs][gfycat] provide fallback URLs ( fixes #1962 )
...
and extend the 'format' option
3 years ago
Mike Fährmann
b6443c576d
[kemonoparty:discord] extract 'inline' files
3 years ago
Mike Fährmann
bcbf9bcf36
[kemonoparty] split 'discord' extractor ( #1940 )
...
in 'server' and 'channel'
3 years ago
Mike Fährmann
db857b40d8
[kemonoparty] improve inline extraction ( #1899 )
3 years ago
Mike Fährmann
975e0a4fe0
[furaffinity] unquote search queries ( #1958 )
...
instead of unescape
(unquote -> url params, unescape -> html entities)
3 years ago
Mike Fährmann
8d676151b7
[patreon] implement 'files' option ( #1935 )
3 years ago
Mike Fährmann
6695ef2e10
[patreon] better filenames for 'content' images ( #1954 )
3 years ago
Mike Fährmann
70005e3275
[kemonoparty:discord] support downloading from a specific channel
...
https://kemono.party/discord/server/ <server-id>#<channel-name>>
3 years ago
Mike Fährmann
003f25931d
[kemonoparty:discord] provide a 'channel_name'
3 years ago
Mike Fährmann
28bdd58e6d
[nhentai] simplify
3 years ago
Mike Fährmann
50098762e3
[nhentai] add 'tag' extractor ( closes #1950 )
3 years ago
Mike Fährmann
fe6ce5495a
[kemonoparty] add 'discord' extractor ( #1827 , #1940 )
3 years ago
Mike Fährmann
918fc9974d
[picarto] add 'gallery' extractor ( closes #1931 )
3 years ago
Mike Fährmann
e33125ad39
[pixiv] add 'sketch' extractor ( #1497 )
3 years ago
Mike Fährmann
e9dc6ff262
[inkbunny] add 'following' extractor ( #515 )
3 years ago
Mike Fährmann
9c8fc6e7b4
[inkbunny] match "long" URLs for pools and favorites ( #1937 )
3 years ago
Mike Fährmann
f33c2ef73b
[cyberdrop] extract direct download URLs ( #1943 )
...
do not rely on redirects from f.cyberdrop.cc
3 years ago
Mike Fährmann
b93915c113
[inkbunny] add 'pool' extractor ( #1937 )
3 years ago
Mike Fährmann
373d3e1c57
[seisoparty] implement login with username & password ( #1906 )
3 years ago
Mike Fährmann
7c5f62b453
[seisoparty] add 'favorite' extractor ( #1906 )
3 years ago
Mike Fährmann
d93b5474c3
[mangadex] update parameter handling for API requests
...
- move common parameters into '_pagination()'
- add 'ratings' (#1908 ) and 'api-parameters' options
3 years ago
Mike Fährmann
cd66c3c415
[twitter] add 'size' option ( #1881 )
3 years ago
Mike Fährmann
fb98b3fdaf
[redgifs][gfycat] remove webtoken code ( fixes #1907 )
3 years ago
Mike Fährmann
96215c926e
[mangadex] fix retrieving chapters from 'pornographic' titles
...
(fixes #1908 )
3 years ago
Mike Fährmann
da9685609c
[kemonoparty] update file download URLs
...
(closes #1902 , fixes #1903 )
3 years ago
Mike Fährmann
783eae6fc5
[hiperdex] fix extraction
3 years ago
Mike Fährmann
e0bdacd932
[fappic] add 'image' extractor ( closes #1898 )
3 years ago
Mike Fährmann
9377543162
[mastodon] add 'following' extractor ( #1891 )
3 years ago
Mike Fährmann
2c2932973c
[mastodon] support specifying accounts by ID
...
Same as a3b473bd
for Twitter
Instead of just
https://instance.tld/@user
it is now also possible to refer to that account with
https://instance.tld/users/user
https://instance.tld/@id:12345
https://instance.tld/users/id:12345
3 years ago
Mike Fährmann
94143eb86c
[twitter] add 'quote_by' metadata field ( #1481 )
...
Only present for tweets quoted by another tweet.
Represents the tweet_id of said tweet quoting this one.
3 years ago
Mike Fährmann
a23f5d45d7
[deviantart] fix bug with fetching premium content ( #1879 )
...
When a user has both 'watchers' and 'paid' folders and one of them is
inaccessible, the other one could get handled as inaccessible as well.
3 years ago
Mike Fährmann
ada36c2044
[deviantart] update default archive_fmt for single deviations
...
(#1874 )
use the same as gallery downloads
3 years ago
Mike Fährmann
da16eabb82
[twitter] ensure card entries have a 'url' ( #1868 )
3 years ago
Mike Fährmann
e69ee41f25
implement 'page-reverse' option ( #1854 )
3 years ago
cyberdrop-me
c83668c2ff
[CyberDrop] Change directory name format ( #1871 )
...
Album IDs are random, organization would be much better having the album name then the identifier at the end
3 years ago
Mike Fährmann
e4684c5cb9
[desktopography] simplify ( #1740 )
3 years ago
Giacomo Rossetto
4a7d7899ff
Implement desktopography extractor ( #1740 )
3 years ago
Alice
9992ff38da
[fantia] add 'date' metadata field ( #1853 )
3 years ago
Mike Fährmann
fba95c3a9e
[nozomi] preserve case of search tags ( fixes #1860 )
3 years ago
Mike Fährmann
4b3e309b90
[aryion] update/improve pagination ( #1849 )
...
Manually increment the 'p' query parameter,
instead of relying on a "Next" link which only works up to page 200.
3 years ago
Mike Fährmann
266ed9b62e
[aryion] add 'tag' extractor ( closes #1849 )
3 years ago
Mike Fährmann
6bbeaac029
[mangadex] fix extraction ( fixes #1852 )
3 years ago
Mike Fährmann
e9bf8d2591
[instagram] update default delay to 6-12 seconds ( #1835 )
3 years ago
Mike Fährmann
c9e6693530
allow specifying a minimum/maximum for 'sleep-*' options ( #1835 )
...
for example '"sleep-request": [5.0, 10.0]' to wait between 5 and 10
seconds between each HTTP request
3 years ago
Mike Fährmann
2ff2974353
[common] update default argument handling in Extractor.request()
...
more lines of code, but slightly less execution time
3 years ago
Mike Fährmann
0fd959a2a7
[twitter] support '/with_replies' URLs ( closes #1833 )
3 years ago
Mike Fährmann
e93360e45d
[reddit] extend subcategory depending on input URL ( closes #1836 )
...
- https://www.reddit.com/r/lavaporn/
-> 'subreddit'
- https://www.reddit.com/r/lavaporn/new/
-> 'subreddit-new'
- https://www.reddit.com/user/username/
-> 'user'
- https://www.reddit.com/user/username/gilded/
-> 'user-gilded'
3 years ago
Mike Fährmann
7bbb1f92d7
[gelbooru_v02] add 'favorite' extractor ( closes #1834 )
3 years ago
Mike Fährmann
4ec11af6a4
[kemonoparty] implement login with username & password ( #1824 )
3 years ago
Mike Fährmann
0e33746fe0
[artstation] use '/album/all' view for user portfolios ( #1826 )
3 years ago
Mike Fährmann
4f5f9ed1e5
[oauth] fix typo
...
this has been here since February ...
(8974f036
)
3 years ago
Mike Fährmann
83bbb628d8
[kemonoparty] add 'favorite' extractor ( #1824 )
3 years ago
Mike Fährmann
35d75a4071
[erome] send Referer header for file downloads ( fixes #1829 )
3 years ago
Mike Fährmann
44f572c27f
[deviantart] implement a 'auto-unwatch' option ( #1466 , #1757 )
3 years ago
Mike Fährmann
d79bcb6236
allow extractors to register a 'finalize()' method
3 years ago
Mike Fährmann
47a780942c
update extractor test results
3 years ago
Mike Fährmann
eed6ef3de0
[pixiv] fix pixivision title extraction
3 years ago
Mike Fährmann
7645cdfb88
[inkbunny] fix extraction ( closes #1816 )
...
'digitalsales', 'forsale', and 'printsales'
are no longer included in the data returned from the API.
3 years ago
Mike Fährmann
3e36543c98
[nhentai] add 'favorite' extractor ( #1814 )
3 years ago
Mike Fährmann
656358ea92
[nhentai] use API endpoint for gallery data
3 years ago
Mike Fährmann
8cd7759682
[reddit] cleanup ReddeitAPI.__init__ ( #1813 )
...
- remove warning about 'client-id'/'user-agent' mismatch
- only use 'user-agent' from config for custom 'client-id'
3 years ago
Mike Fährmann
0a94fe5774
[reddit] delay RedditAPI initialization ( #1813 )
...
Move it outside the constructor so that eventual exceptions can get
caught in the expected places.
3 years ago
Mike Fährmann
57854624a1
[exhentai] improve image limits check ( #1808 )
...
Check for a 'text/html' Content-Type instead of the very specific
137 bytes Content-Length, which might change depending on compression
or other factors.
3 years ago
Mike Fährmann
96fec14ef7
[deviantart] rename 'watch' option to 'auto-watch'
...
(#1466 , #1757 )
Similar reason as in e05a96db
.
'watch' is already used by the WatchExtractor class.
3 years ago
Mike Fährmann
e75f2de9da
[deviantart] add 'comments' option ( #1800 )
3 years ago
Mike Fährmann
6ce16c6d31
[deviantart] add 'tag' extractor ( closes #1803 )
3 years ago
Mike Fährmann
4e9f8fe395
[shopify] support windsorstore.com ( #1793 )
3 years ago
Mike Fährmann
95157e0f4b
[shopify] use API for product listings ( #1793 )
3 years ago
Mike Fährmann
6651da27e9
[twitter] fix 'url' extraction for users without 'expanded_url'
...
(#1532 , #1787 )
3 years ago
Mike Fährmann
ecc8da4704
[deviantart] implement a 'watch' option ( #1466 , #1757 )
3 years ago
Mike Fährmann
a4f249c22e
[deviantart] prevent exception on empty videos ( fixes #1796 )
3 years ago
Mike Fährmann
ae78d95a5f
[twitter] fix issue when filtering quote tweets ( #1792 )
...
When a user quotes his own Tweet and that Tweet gets filtered by
'"quoted": false', it could also get filtered when it appeared later
as regular Tweet.
3 years ago
Mike Fährmann
6b229ac829
[furaffinity] expand URL pattern for searches ( closes #1780 )
3 years ago
Mike Fährmann
0817f468ef
[twitter] expand t.co links in user descriptions ( #1532 , #1787 )
3 years ago
Mike Fährmann
7c0ae88185
[twitter] add 'url' to user objects ( #1532 , #1787 )
3 years ago
Mike Fährmann
5919dc5b5a
[twitter] slightly improve '_transform_user()'
3 years ago
Mike Fährmann
c04f7ab139
[foolfuuka] add 'gallery' extractor ( #1785 )
3 years ago
Mike Fährmann
ddd175de77
[mangadex] prevent KeyError for manga without English title
3 years ago
Mike Fährmann
20ee091289
[429chan] add 'thread' and 'board' extractors ( closes #1773 )
3 years ago
Mike Fährmann
6b56b3ebe1
[twitter] report API errors as generic StopExtraction exceptions
...
prevents duplicate logging messages for nonexistent users
(#1759 )
3 years ago
Mike Fährmann
51eb50749f
[foolslide] remove entry for kobato.hologfx.com
3 years ago
Mike Fährmann
4718f9c5dd
[oauth] use defaults when config values are set to None/null
...
(fixes #1778 )
3 years ago
James C. Wise
1f02878351
[Deviantart] [ #1776 ] Remove the "you need session cookies to download mature scraps" warning ( #1777 )
3 years ago
Mike Fährmann
bb6a130942
automatically set required DDoS-GUARD cookies ( #1779 )
...
for kemono.party and seiso.party
3 years ago
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
3 years ago
Mike Fährmann
c866fcba48
[twitter] fix 'logout' ( #1719 )
...
delete 'auth_token' cookie and cookies.txt path
3 years ago
Mike Fährmann
9cb5ea5eda
update default User-Agent headers
3 years ago
Mike Fährmann
52984f7e22
[twitter] add option to log out when blocked ( #1719 )
3 years ago
Mike Fährmann
ed4b3c48cb
fix flake8 and other tests
3 years ago
enormous-muscles
975e1ac6e2
Add Wikieat extractor ( #1699 )
...
* Add Wikieat extractor
* Add Wikieat extractor to extractor list
3 years ago
Nyasume
fa6af46756
Added ability to download GIFs instead of mp4 from Luscious and Reactor ( #1701 )
3 years ago
Ryu JuHeon
9429eaa0a3
[hitomi]: fix image URLs ( #1765 )
3 years ago
Mike Fährmann
c34dbc86bb
[kemonoparty] update file server domain ( #1764 )
3 years ago
Mike Fährmann
e5a93e113f
[twitter] extend 'replies' option ( #1254 )
...
Allow setting 'replies to '"self"' to only download from self-replies.
3 years ago
Mike Fährmann
f9096584ab
[behance] fix 'collection' extraction
3 years ago