Mike Fährmann
0f1e7ff319
[twitter] fix extraction ( #2275 )
3 years ago
Mike Fährmann
dee0d22561
update extractor test results
3 years ago
Mike Fährmann
d7b8e04b50
[kemonoparty] use 'Accept-Encoding: identity' for all downloads
...
(#2267 )
fixes issues when data send with 'Content-Encoding: gzip' or other
encodings is larger than the actual file
3 years ago
enormous-muscles
55326377d8
Add Kohlchan extractor ( #2251 )
3 years ago
Mike Fährmann
cc7dce5755
[sexcom] add 'pins' extractor ( closes #2265 )
3 years ago
Mike Fährmann
02e18f56be
[e621] add 'favorite' extractor ( closes #2250 )
3 years ago
Mike Fährmann
70e6e1549e
[twitter] provide fallback URLs for card images
...
f2e8aedd74 (commitcomment-64057751)
3 years ago
Mike Fährmann
86fa412b47
[hitomi] add 'format' option ( #2260 )
...
default is 'webp' since downloading original files is no longer allowed
3 years ago
Mike Fährmann
492436f936
[twitter] add 'warnings' option ( #2258 )
...
disable reporting any non-fatal errors by default
3 years ago
Mike Fährmann
a5163e4c70
[twitter] restore 'logout' functionality ( #1719 )
3 years ago
Mike Fährmann
f58364f6a8
update Firefox cipher list
3 years ago
Mike Fährmann
7e6981dda6
rename 'disabletls12' to 'tls12'
...
and let config options override any default settings
3 years ago
Mike Fährmann
bb3e182562
overhaul session initialization
...
- share adapter & connection pool across sessions with the same
ssl options, ssl ciphers, and source address
- simplify browser emulation to just a list of headers and ciphers
3 years ago
Mike Fährmann
e670dc518e
[weibo] update pagination code ( fixes #2244 )
...
- send proper headers and query parameters
- use 'since_id' instead of page numbers
- set a 1-2 second delay between requests
3 years ago
Robert Pendell
4c651f6252
[patreon] Disable TLS 1.2 by default ( #2249 )
...
Disables TLS 1.2 on Patreon by default.
3 years ago
Robert Pendell
392cf079f7
Add ability to disable TLS 1.2 ( #2243 )
...
Fix for Patreon Cloudflare issues by having only TLS v1.3 or higher establish HTTPS connections
This now allows you to disable it on a per-host or global basis. Add disabletls12 as a config option either under extractor.(host) or just under extractor. Option is false by default.
Example:
"patreon":
{
"disabletls12": true,
"cookies": {
"session_id": "X"
}
}
3 years ago
Mike Fährmann
d33227fc38
[twitter] restore errors for protected timelines etc ( fixes #2237 )
3 years ago
Mike Fährmann
ebd3d5c1cc
[bunkr] fix .mp4 downloads ( closes #2239 )
3 years ago
Mike Fährmann
e2be199124
[gelbooru] improve and fix pagination ( #2230 , #2232 )
...
Use 'id:<POSTID' as a tag instead of going through pages with 'pid'.
Something similar was already implemented in 93cef784
,
but that got broken again in 3085aac4
.
3 years ago
Mike Fährmann
8230f31800
[twitter] update query hashes
3 years ago
Mike Fährmann
c180806cec
[twitter] fix deleted/invalid retweets ( #2225 )
3 years ago
Mike Fährmann
a2eecc6aa8
[kemonoparty] fix DMs extraction ( #2008 )
3 years ago
Mike Fährmann
2bf554a896
[twitter] fix several errors ( #2212 , #2216 , #2225 )
...
- fix Tweets with deleted quotes
- fix suspended Tweets without 'legacy' entry
- fix unified_cards without 'type'
3 years ago
Mike Fährmann
e5242b83bf
[twitter] define directory format for events ( #2109 )
3 years ago
Mike Fährmann
efb3e65a6a
[sexcom] extend URL pattern ( fixes #2220 )
3 years ago
vsyx
3f2b6335d7
[instagram] fix highlights extraction ( #2197 )
...
* [instagram] fix highlights extraction
* [instagram] improve highlights extraction
- 'yield' individual reels instead of collecting them in a list
and returning them all at once
- reduce 'chunk_size' to an even saver value
(instagram.com also uses 5)
3 years ago
Mike Fährmann
5ed26e1773
[twitter] fix pinned tweets ( #2216 )
...
caused by the changes in dffa440ede
3 years ago
Mike Fährmann
a9f78e6527
[twitter] improve error handling
...
- handle accounts without 'rest_id'
- handle timelines with empty 'instructions'
3 years ago
Mike Fährmann
729b07c1f5
[twitter] simplify
...
- use dict with common GraphQL variables
- reduce 'variables' size with custom JSON encoder instance
- centralise TwitterAPI() creation
3 years ago
Mike Fährmann
7cb29224f0
[philomena] fix search parameter escaping ( #2215 )
...
The pluses from search terms in /tags/ URLs need to be
replaced with spaces to get accepted by Philomena.
3 years ago
Mike Fährmann
9ca8bb2dc0
[twitter] improve error handling
3 years ago
Mike Fährmann
9a221494c3
[twitter] add 'event' extractor ( closes #2109 )
3 years ago
Mike Fährmann
14867dad6b
[twitter] fix unified cards from search results
3 years ago
Mike Fährmann
dffa440ede
[twitter] improve handling of deleted tweets ( #2212 )
3 years ago
Mike Fährmann
54ef874ba4
[twitter] fix retweet filter ( #2212 )
3 years ago
Mike Fährmann
cb43f7731b
[twitter] update to GraphQL API ( #2212 )
...
The old REST API endpoints, which were not used by Twitter since
summer 2021, are going to finally be phased out it seems, with
'/2/timeline/profile/USERID.json' being the first one.
Only Twitter's search doesn't have a GraphQL interface yet.
3 years ago
Mike Fährmann
de754590e0
add --source-address command-line option ( closes #2206 )
3 years ago
Mike Fährmann
698f35215e
[blogger] support new image domain ( fixes #2204 )
3 years ago
Mike Fährmann
c587b678d0
[mangadex] re-enable warning for external chapters ( #2193 )
3 years ago
Mike Fährmann
f2e8aedd74
[twitter] changes to 'cards' option
...
- change default value to 'true'
- only invoke youtube-dl for cards unsupported by gallery
when 'cards' is set to "ytdl"
"cards": true --> only download card images
"cards": "ytdl" --> download card images and
use youtube_dl on otherwise unsupported cards
3 years ago
Mike Fährmann
2d34d8ff8b
[reddit] allow downloading from quarantined subreddits ( #2180 )
3 years ago
Mike Fährmann
17c9c47ca0
[hitomi] fix 'tag' extraction ( fixes #2189 )
3 years ago
Mike Fährmann
df2f0c09bb
[twitter] support "image_carousel_website" unified cards
3 years ago
Mike Fährmann
cdc96e1217
[gelbooru] improve video file detection ( fixes #2188 )
...
not all files from 'https://video-cdnN.gelbooru.com ' are videos
3 years ago
Mike Fährmann
4acc31bd9f
[newgrounds] set suitabilities filter before starting a search
3 years ago
Mike Fährmann
170711af7e
[mangadex] fix extraction ( closes #2177 )
3 years ago
Mike Fährmann
199e7616a7
[rule34] use https://api.rule34.xxx for API requests
3 years ago
Mike Fährmann
37beb1298e
[newgrounds] add 'search' extractor ( closes #2161 )
3 years ago
Mike Fährmann
8b910dd8ae
[hitomi] fix image URLs
...
again and again ...
3 years ago
Mike Fährmann
3085aac4d8
[gelbooru] handle changed API response format ( #2157 )
3 years ago
Mike Fährmann
38e2af29d6
[hitomi] fix image URLs
...
update '_parse_gg()' yet again
3 years ago
Mike Fährmann
6f2e0c9c3d
fix cookie checks for patreon, fanbox, fantia
...
The changes in 9a255344
caused a warning about missing cookies to be
displayed even if those cookies were present, because _check_cookies()
did not account for an empty cookiedomain.
3 years ago
Mike Fährmann
1e0278702d
[hitomi] update '_parse_gg()'
3 years ago
Mike Fährmann
becc7f85a6
[hitomi] fix image URLs
3 years ago
Mike Fährmann
6af8d71da6
[kemonoparty] use service as subcategory ( closes #2147 )
3 years ago
Vrihub
96fcff182c
generic extractor ( #735 )
...
* Generic extractor, see issue #683
* Fix failed test_names test, no subcategory needed
* Prefix directory_fmt with "generic"
* Relax regex (would break some urls)
* Flake8 compliance
* pattern: don't require a scheme
This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).
* Fix using g: and r: on urls without http(s) scheme
Almost all extractors accept urls without an initial http(s) scheme.
Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).
This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.
This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.
* Relax imageurl_pattern_ext: allow relative urls
* First round of small suggested changes
* Support image urls starting with "//"
* self.baseurl: remove trailing slash
* Relax regexp (didn't catch some image urls)
* Some fixes and cleanup
* Fix domain pattern; option to enable extractor
Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
3 years ago
Mike Fährmann
4376b39a2b
[sexcom] fix and improve embed extraction ( fixes #2145 )
3 years ago
Mike Fährmann
6d190834ee
[instagram] fix error when PostPage data is not in GraphQL format
...
(#2037 )
3 years ago
Mike Fährmann
dd67e24aa9
[lolisafe] include file ID in filenames
...
More precisely, it now splits the full 'filename' into 'name' and 'id'
instead of overwriting 'filename'. The format string stays the same as
before. Use '{name}.{extension}' to restore the old behavior.
before:
- filename: foobar
- id : 12345
now:
- filename: foobar-12345
- name : foobar
- id : 12345
3 years ago
Mike Fährmann
f3d61de18d
[artstation] create directories per asset ( closes #2136 )
3 years ago
Mike Fährmann
49a50fb2eb
[500px] create directories per photo
3 years ago
Mike Fährmann
89bebe1bef
[500px] add 'favorite' extractor ( closes #1927 )
3 years ago
Mike Fährmann
22b0433985
[fanbox] support pixiv redirects ( closes #2122 )
3 years ago
Mike Fährmann
281828b58b
[tumblrgallery] improve search pagination ( fixes #2132 )
3 years ago
Mike Fährmann
4bec34fc94
[pixiv] allow setting a date range for search results ( #2133 )
...
with the 'scd' and 'ecd' query parameters
3 years ago
Mike Fährmann
882c614281
add album extractor for lolisafe/chibisafe instances
...
- support bunkr.is (closes #2038 )
- support zz.ht (closes #2105 )
3 years ago
Mike Fährmann
d441888bfb
[deviantart] adjust API endpoints
...
Start all endpoints with a forward slash '/'
to be consistent with other API interfaces.
3 years ago
Mike Fährmann
8f0cf0bf71
[deviantart] use '/browse/newest' for most-recent searches
...
(#2096 )
3 years ago
Mike Fährmann
0bd7607da5
[tumblrgallery] improve 'id' extraction ( #2115 )
3 years ago
Mike Fährmann
0d02a7861e
[tumblrgallery] fix extraction ( closes #2112 )
3 years ago
Mike Fährmann
62692c6842
[exhentai] add 'source' option
...
setting it to "hitomi" downloads the corresponding gallery from
hitomi.la; might be extended to other sources in the future
3 years ago
Mike Fährmann
099ed72de7
[hitomi] disable extra 'metadata' by default
...
safes one HTTP request that not needed with default filename settings
3 years ago
Mike Fährmann
9a25534490
use Extractor._check_cookies() for all cookie checks
3 years ago
Mike Fährmann
63c6bc26b5
[rule34us] extract tags per category ( #1527 )
...
like for other boorus with 'tags': true
3 years ago
Mike Fährmann
f587458a3c
[twitter] include '4096x4096' as a default image fallback
...
(closes #2107 , closes #1881 )
3 years ago
Mike Fährmann
8ed282f7f2
[kemonoparty] support coomer.party URLs ( #2100 )
3 years ago
Mike Fährmann
87ce3fa669
[furaffinity] warn when no session cookies were found
3 years ago
Mike Fährmann
159631c808
[philomena] use a default 'filter_id' if non is given
3 years ago
Mike Fährmann
ad30653b17
allow running a BaseExtractor for any URL
...
by prefixing it with '<base-category>:'
For example:
shopify:https://partakefoods.com/products/crunchy-cookie-variety-pack
gelbooru_v01:https://5naf.booru.org/index.php?page=post&s=view&id=46963
Available base categories are:
mastodon, shopify, moebooru, gelbooru_v01, gelbooru_v02,
reactor, foolslide, foolfuuka, philomena
3 years ago
Mike Fährmann
299bd2f1f5
[rule34us] add 'tag' and 'post' extractors ( #1527 )
3 years ago
Mike Fährmann
3cf1075d86
[inkbunny] add 'search' extractor ( closes #2094 )
3 years ago
Mike Fährmann
c6a23c26d7
[instagram] allow downloading specific stories ( closes #2088 )
...
https://instagram.com/stories/ <USER>/<ID> now only downloads the one
story specified by <ID> and not all stories from that user.
3 years ago
Mike Fährmann
352ffcddb0
[instagran] match post URLs with usernames ( fixes #2085 )
3 years ago
Mike Fährmann
f4e3cee6ac
use yt-dlp by default ( #1850 , #2028 )
3 years ago
Mike Fährmann
f1b142e993
{kemonoparty[ change default 'files' order to attachments,file,inline
...
(#1991 )
3 years ago
Mike Fährmann
275543b2d2
update extractor test results
3 years ago
Mike Fährmann
e7ea4f2567
[mangoxo] fix metadata extraction
3 years ago
Mike Fährmann
e298882acc
[kemonoparty] match URLs with www subdomain
3 years ago
Mike Fährmann
addb72e1bb
[reactor] support thatpervert.com ( closes #2029 )
3 years ago
Mike Fährmann
d8d9502e1e
[reactor] inherit from BaseExtractor
3 years ago
Mike Fährmann
f4ea216c95
[shopify] support loungeunderwear.com ( closes #2053 )
3 years ago
Mike Fährmann
93cef78450
[gelbooru] workaround pagination limits
...
Gelbooru only allows to retrieve the latest 20k posts for a tag search.
Add 'id:<N' to the search tags to work around that limitation, where N
is the ID of the last retrieved post.
http://gelbooru.me/index.php?page=forum&s=view&id=1467
3 years ago
Mike Fährmann
f2ae179713
[exhentai] fix extraction for disowned galleries ( closes #2055 )
3 years ago
Alice
612850438e
[skeb] add 'thumbnails' option ( #2047 ) ( #2051 )
3 years ago
Mike Fährmann
11a3d96d13
[mangadex] load additional metadata using includes[] directives
...
- always provide 'artist', 'author', and 'group' metadata fields (#2049 )
- remove 'metadata' option
3 years ago
Mike Fährmann
19e00f1322
[dynastyscans] provide 'date' as proper datetime object ( #2050 )
3 years ago
Mike Fährmann
af6424f398
allow testing metadata in list elements
3 years ago
Mike Fährmann
c67756e187
[kemonoparty] add 'dms' option ( #2008 )
3 years ago
Mike Fährmann
3a7a19c7b9
[dynastyscans] add 'manga' extractor ( closes #2035 )
3 years ago
Mike Fährmann
9bc83af3a6
[kemonoparty] 'postfile' -> 'file' ( #1991 )
...
to stay consistent with the existing file types for kemono
3 years ago
Mike Fährmann
522782c09d
[subscribestar] emit metadata for posts without media ( #1569 )
3 years ago
Mike Fährmann
1c8aaf9318
[subscribestar] add 'num' enumeration index ( closes #2040 )
3 years ago
Mike Fährmann
d433735750
[kemonoparty] skip duplicate files ( #2032 , #1991 , #1899 )
...
Extract the SHA-256 file hash from URLs
and skip files with the same hash in the same post.
- provide a 'hash' metadata field (empty string if not available)
- remove 'patreon-skip-file' option
3 years ago
Mike Fährmann
d4ec245554
[kemonoparty] implement a 'files' option ( #1991 )
...
similar to 8d676151
3 years ago
Mike Fährmann
ab8eea1a24
[twitter] fix extractor for direct image links ( fixes #2030 )
3 years ago
Mike Fährmann
2076d40681
[ytdl] improve error handling ( #1680 )
3 years ago
Mike Fährmann
2aaac3c997
[instagram] include user metadata for 'tagged' downloads ( #2024 )
...
Adds
- tagged_owner_id
- tagged_full_name
- tagged_username
containing the values for the user profile the URL originated from,
e.g. 'instagram' for https://www.instagram.com/instagram/tagged/ .
3 years ago
Mike Fährmann
cfa4876848
[philomena] support furbooru.org ( closes #1995 )
3 years ago
Mike Fährmann
4377f1c284
[twitter] distinguish between fatal & nonfatal errors ( #2020 )
...
only show a warning for nonfatal errors
and do not raise a StopExtraction exception
3 years ago
Kyle Anthony Williams
a14b72be21
[webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net ( #2005 )
...
* [webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net
This trick to avoid having to set a Referer header comes from
Webtoon's RSS feeds. The two URLs below are equivalent in content:
https://webtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
https://swebtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
The URL with the domain "webtoon-phinf.pstatic.net" needs a Referer
header, and the domain "swebtoon-phinf.pstatic.net" does not. This
is because of the environment "swebtoon" images live in, one without
explicit network control: RSS feeds on sites such as Feedly. This change should
make it easier for gallery-dl developers to embed Webtoon comics without
worrying about headers.
3 years ago
Mike Fährmann
6e3658ef52
[kemonoparty] provide 'date' metadata for gumroad ( #2007 )
...
Not the 'published' or 'edited' values since they are 'null',
but still better then nothing at all.
3 years ago
Mike Fährmann
37c9dedee1
[seisoparty] remove module
3 years ago
Mike Fährmann
efa178cc91
[ytdl] implement parsing ytdl command-line options ( #1680 )
...
- adds 'config-file' and 'cmdline-args' options
for both ytdl downloader and extractor
- create 'ytdl' helper module, which combines YoutubeDL creation
and option parsing.
- most likely a buggy mess due to incompatibilities between the
original youtube-dl and yt-dlp.
3 years ago
Mike Fährmann
7cb303d745
[redgifs] improve URL extraction
...
Fields inside 'urls' can be None, which would have caused an exception
with the old method.
3 years ago
Mike Fährmann
2befed1a96
[redgifs] update search URL pattern ( #1984 )
3 years ago
Mike Fährmann
b315a0ecef
[redgifs] update to API v2 ( #1984 )
3 years ago
Mike Fährmann
f0fc3b0ba1
[kemonoparty] add 'comments' option ( #1980 )
3 years ago
Mike Fährmann
1fac74b14d
[reddit] prevent crash for galleries with no 'media_metadata'
...
(fixes #2001 )
3 years ago
Mike Fährmann
211de95dd0
update extractor test results
3 years ago
Mike Fährmann
8bea02c38c
[deviantart] fix 'index' values for stashed deviations
3 years ago
Mike Fährmann
dd88a7d980
{cyberdrop] restore video extraction ( fixes #1993 )
...
fixes a regression introduced in f33c2ef7
3 years ago
Mike Fährmann
fa5646eadc
[mangoxo] fix login and extraction
3 years ago
Mike Fährmann
4c49174579
[mangakakalot] update domain and fix extraction
3 years ago
YongChan Cho
14852f7050
[hitomi] fix image path ( #1988 )
3 years ago
Mike Fährmann
dad2875a3e
fix calculating retry sleep times ( fixes #1990 )
3 years ago
Mike Fährmann
9156e90f1f
[twitter] add 'pinned' option
3 years ago
Mike Fährmann
06b414c9a3
[redgifs] 'gfyId' -> 'id' ( #1984 )
3 years ago
Ryu juheon
d4614e5ba4
[hitomi] fix image URLs ( #1982 )
3 years ago
Mike Fährmann
6434ccf9e8
[redgifs] split from 'gfycat' ( #1984 )
...
Update API endpoints and metadata names - mostly 'gfycat' -> 'gif' -
and remove some obsolete checks.
3 years ago
Mike Fährmann
e4696b40ba
[instagram] update query hashes
3 years ago
Alice
bfd7401b1e
[skeb] add 'user' and 'post' extractors ( #1031 ) ( #1971 )
...
* Create skeb.py
* Update __init__.py
* Update supportedsites.py
* Update supportedsites.md
* Update supportedsites.py
* Update skeb.py
3 years ago
Ryu juheon
6b6d92d51c
[hitomi]: fix image URLs ( #1975 )
3 years ago
Mike Fährmann
dcb201ff19
[gfycat] show warning when there are no available formats
3 years ago
Mike Fährmann
e436a2607b
[gfycat] consistent 'userName' values for 'user' downloads ( #1962 )
...
by using the name from the input URL and not relying on possibly faulty
or incomplete API results.
'userData[username]', if available, will still have the original name.
3 years ago
Mike Fährmann
f1487a3cfa
[kemonoparty:discord] improve 'inline' extraction ( #1940 )
...
- extract media.discordapp.*NET* URLs
- rewrite media.discordapp.net to cdn.discordapp.com
- use a more restricted set of characters for the URL path
3 years ago
Mike Fährmann
02a247f4e5
[deviantart] full resolution for non-downloadable images ( #293 )
...
Many thanks to @Ironchest337 for discovering this method
and providing a well-documented implementation.
3 years ago
Mike Fährmann
a7ddb5f5fa
[deviantart] update 'search' argument handling ( fixes #1911 )
...
- use 'alltime' by default
- support newer 'order' values (most-recent, this-week, etc)
3 years ago
Mike Fährmann
c19e762fdf
[vk] add 'album' extractor ( #474 , fixes #1952 )
...
todo: better metadata for albums
3 years ago
Mike Fährmann
8bb442f20d
[redgifs][gfycat] provide fallback URLs ( fixes #1962 )
...
and extend the 'format' option
3 years ago
Mike Fährmann
b6443c576d
[kemonoparty:discord] extract 'inline' files
3 years ago
Mike Fährmann
bcbf9bcf36
[kemonoparty] split 'discord' extractor ( #1940 )
...
in 'server' and 'channel'
3 years ago
Mike Fährmann
db857b40d8
[kemonoparty] improve inline extraction ( #1899 )
3 years ago
Mike Fährmann
975e0a4fe0
[furaffinity] unquote search queries ( #1958 )
...
instead of unescape
(unquote -> url params, unescape -> html entities)
3 years ago
Mike Fährmann
8d676151b7
[patreon] implement 'files' option ( #1935 )
3 years ago
Mike Fährmann
6695ef2e10
[patreon] better filenames for 'content' images ( #1954 )
3 years ago
Mike Fährmann
70005e3275
[kemonoparty:discord] support downloading from a specific channel
...
https://kemono.party/discord/server/ <server-id>#<channel-name>>
3 years ago
Mike Fährmann
003f25931d
[kemonoparty:discord] provide a 'channel_name'
3 years ago
Mike Fährmann
28bdd58e6d
[nhentai] simplify
3 years ago
Mike Fährmann
50098762e3
[nhentai] add 'tag' extractor ( closes #1950 )
3 years ago
Mike Fährmann
fe6ce5495a
[kemonoparty] add 'discord' extractor ( #1827 , #1940 )
3 years ago
Mike Fährmann
918fc9974d
[picarto] add 'gallery' extractor ( closes #1931 )
3 years ago
Mike Fährmann
e33125ad39
[pixiv] add 'sketch' extractor ( #1497 )
3 years ago
Mike Fährmann
e9dc6ff262
[inkbunny] add 'following' extractor ( #515 )
3 years ago
Mike Fährmann
9c8fc6e7b4
[inkbunny] match "long" URLs for pools and favorites ( #1937 )
3 years ago
Mike Fährmann
f33c2ef73b
[cyberdrop] extract direct download URLs ( #1943 )
...
do not rely on redirects from f.cyberdrop.cc
3 years ago
Mike Fährmann
b93915c113
[inkbunny] add 'pool' extractor ( #1937 )
3 years ago
Mike Fährmann
373d3e1c57
[seisoparty] implement login with username & password ( #1906 )
3 years ago
Mike Fährmann
7c5f62b453
[seisoparty] add 'favorite' extractor ( #1906 )
3 years ago
Mike Fährmann
d93b5474c3
[mangadex] update parameter handling for API requests
...
- move common parameters into '_pagination()'
- add 'ratings' (#1908 ) and 'api-parameters' options
3 years ago
Mike Fährmann
cd66c3c415
[twitter] add 'size' option ( #1881 )
3 years ago
Mike Fährmann
fb98b3fdaf
[redgifs][gfycat] remove webtoken code ( fixes #1907 )
3 years ago
Mike Fährmann
96215c926e
[mangadex] fix retrieving chapters from 'pornographic' titles
...
(fixes #1908 )
3 years ago
Mike Fährmann
da9685609c
[kemonoparty] update file download URLs
...
(closes #1902 , fixes #1903 )
3 years ago
Mike Fährmann
783eae6fc5
[hiperdex] fix extraction
3 years ago
Mike Fährmann
e0bdacd932
[fappic] add 'image' extractor ( closes #1898 )
3 years ago
Mike Fährmann
9377543162
[mastodon] add 'following' extractor ( #1891 )
3 years ago
Mike Fährmann
2c2932973c
[mastodon] support specifying accounts by ID
...
Same as a3b473bd
for Twitter
Instead of just
https://instance.tld/@user
it is now also possible to refer to that account with
https://instance.tld/users/user
https://instance.tld/@id:12345
https://instance.tld/users/id:12345
3 years ago
Mike Fährmann
94143eb86c
[twitter] add 'quote_by' metadata field ( #1481 )
...
Only present for tweets quoted by another tweet.
Represents the tweet_id of said tweet quoting this one.
3 years ago
Mike Fährmann
a23f5d45d7
[deviantart] fix bug with fetching premium content ( #1879 )
...
When a user has both 'watchers' and 'paid' folders and one of them is
inaccessible, the other one could get handled as inaccessible as well.
3 years ago
Mike Fährmann
ada36c2044
[deviantart] update default archive_fmt for single deviations
...
(#1874 )
use the same as gallery downloads
3 years ago
Mike Fährmann
da16eabb82
[twitter] ensure card entries have a 'url' ( #1868 )
3 years ago
Mike Fährmann
e69ee41f25
implement 'page-reverse' option ( #1854 )
3 years ago
cyberdrop-me
c83668c2ff
[CyberDrop] Change directory name format ( #1871 )
...
Album IDs are random, organization would be much better having the album name then the identifier at the end
3 years ago
Mike Fährmann
e4684c5cb9
[desktopography] simplify ( #1740 )
3 years ago
Giacomo Rossetto
4a7d7899ff
Implement desktopography extractor ( #1740 )
3 years ago
Alice
9992ff38da
[fantia] add 'date' metadata field ( #1853 )
3 years ago
Mike Fährmann
fba95c3a9e
[nozomi] preserve case of search tags ( fixes #1860 )
3 years ago
Mike Fährmann
4b3e309b90
[aryion] update/improve pagination ( #1849 )
...
Manually increment the 'p' query parameter,
instead of relying on a "Next" link which only works up to page 200.
3 years ago
Mike Fährmann
266ed9b62e
[aryion] add 'tag' extractor ( closes #1849 )
3 years ago
Mike Fährmann
6bbeaac029
[mangadex] fix extraction ( fixes #1852 )
3 years ago
Mike Fährmann
e9bf8d2591
[instagram] update default delay to 6-12 seconds ( #1835 )
3 years ago
Mike Fährmann
c9e6693530
allow specifying a minimum/maximum for 'sleep-*' options ( #1835 )
...
for example '"sleep-request": [5.0, 10.0]' to wait between 5 and 10
seconds between each HTTP request
3 years ago
Mike Fährmann
2ff2974353
[common] update default argument handling in Extractor.request()
...
more lines of code, but slightly less execution time
3 years ago
Mike Fährmann
0fd959a2a7
[twitter] support '/with_replies' URLs ( closes #1833 )
3 years ago
Mike Fährmann
e93360e45d
[reddit] extend subcategory depending on input URL ( closes #1836 )
...
- https://www.reddit.com/r/lavaporn/
-> 'subreddit'
- https://www.reddit.com/r/lavaporn/new/
-> 'subreddit-new'
- https://www.reddit.com/user/username/
-> 'user'
- https://www.reddit.com/user/username/gilded/
-> 'user-gilded'
3 years ago
Mike Fährmann
7bbb1f92d7
[gelbooru_v02] add 'favorite' extractor ( closes #1834 )
3 years ago
Mike Fährmann
4ec11af6a4
[kemonoparty] implement login with username & password ( #1824 )
3 years ago
Mike Fährmann
0e33746fe0
[artstation] use '/album/all' view for user portfolios ( #1826 )
3 years ago
Mike Fährmann
4f5f9ed1e5
[oauth] fix typo
...
this has been here since February ...
(8974f036
)
3 years ago
Mike Fährmann
83bbb628d8
[kemonoparty] add 'favorite' extractor ( #1824 )
3 years ago
Mike Fährmann
35d75a4071
[erome] send Referer header for file downloads ( fixes #1829 )
3 years ago
Mike Fährmann
44f572c27f
[deviantart] implement a 'auto-unwatch' option ( #1466 , #1757 )
3 years ago
Mike Fährmann
d79bcb6236
allow extractors to register a 'finalize()' method
3 years ago
Mike Fährmann
47a780942c
update extractor test results
3 years ago
Mike Fährmann
eed6ef3de0
[pixiv] fix pixivision title extraction
3 years ago
Mike Fährmann
7645cdfb88
[inkbunny] fix extraction ( closes #1816 )
...
'digitalsales', 'forsale', and 'printsales'
are no longer included in the data returned from the API.
3 years ago
Mike Fährmann
3e36543c98
[nhentai] add 'favorite' extractor ( #1814 )
3 years ago
Mike Fährmann
656358ea92
[nhentai] use API endpoint for gallery data
3 years ago
Mike Fährmann
8cd7759682
[reddit] cleanup ReddeitAPI.__init__ ( #1813 )
...
- remove warning about 'client-id'/'user-agent' mismatch
- only use 'user-agent' from config for custom 'client-id'
3 years ago
Mike Fährmann
0a94fe5774
[reddit] delay RedditAPI initialization ( #1813 )
...
Move it outside the constructor so that eventual exceptions can get
caught in the expected places.
3 years ago
Mike Fährmann
57854624a1
[exhentai] improve image limits check ( #1808 )
...
Check for a 'text/html' Content-Type instead of the very specific
137 bytes Content-Length, which might change depending on compression
or other factors.
3 years ago
Mike Fährmann
96fec14ef7
[deviantart] rename 'watch' option to 'auto-watch'
...
(#1466 , #1757 )
Similar reason as in e05a96db
.
'watch' is already used by the WatchExtractor class.
3 years ago
Mike Fährmann
e75f2de9da
[deviantart] add 'comments' option ( #1800 )
3 years ago
Mike Fährmann
6ce16c6d31
[deviantart] add 'tag' extractor ( closes #1803 )
3 years ago
Mike Fährmann
4e9f8fe395
[shopify] support windsorstore.com ( #1793 )
3 years ago
Mike Fährmann
95157e0f4b
[shopify] use API for product listings ( #1793 )
3 years ago
Mike Fährmann
6651da27e9
[twitter] fix 'url' extraction for users without 'expanded_url'
...
(#1532 , #1787 )
3 years ago
Mike Fährmann
ecc8da4704
[deviantart] implement a 'watch' option ( #1466 , #1757 )
3 years ago
Mike Fährmann
a4f249c22e
[deviantart] prevent exception on empty videos ( fixes #1796 )
3 years ago
Mike Fährmann
ae78d95a5f
[twitter] fix issue when filtering quote tweets ( #1792 )
...
When a user quotes his own Tweet and that Tweet gets filtered by
'"quoted": false', it could also get filtered when it appeared later
as regular Tweet.
3 years ago
Mike Fährmann
6b229ac829
[furaffinity] expand URL pattern for searches ( closes #1780 )
3 years ago
Mike Fährmann
0817f468ef
[twitter] expand t.co links in user descriptions ( #1532 , #1787 )
3 years ago
Mike Fährmann
7c0ae88185
[twitter] add 'url' to user objects ( #1532 , #1787 )
3 years ago
Mike Fährmann
5919dc5b5a
[twitter] slightly improve '_transform_user()'
3 years ago
Mike Fährmann
c04f7ab139
[foolfuuka] add 'gallery' extractor ( #1785 )
3 years ago
Mike Fährmann
ddd175de77
[mangadex] prevent KeyError for manga without English title
3 years ago
Mike Fährmann
20ee091289
[429chan] add 'thread' and 'board' extractors ( closes #1773 )
3 years ago
Mike Fährmann
6b56b3ebe1
[twitter] report API errors as generic StopExtraction exceptions
...
prevents duplicate logging messages for nonexistent users
(#1759 )
3 years ago
Mike Fährmann
51eb50749f
[foolslide] remove entry for kobato.hologfx.com
3 years ago
Mike Fährmann
4718f9c5dd
[oauth] use defaults when config values are set to None/null
...
(fixes #1778 )
3 years ago
James C. Wise
1f02878351
[Deviantart] [ #1776 ] Remove the "you need session cookies to download mature scraps" warning ( #1777 )
3 years ago
Mike Fährmann
bb6a130942
automatically set required DDoS-GUARD cookies ( #1779 )
...
for kemono.party and seiso.party
3 years ago
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
3 years ago
Mike Fährmann
c866fcba48
[twitter] fix 'logout' ( #1719 )
...
delete 'auth_token' cookie and cookies.txt path
3 years ago
Mike Fährmann
9cb5ea5eda
update default User-Agent headers
3 years ago
Mike Fährmann
52984f7e22
[twitter] add option to log out when blocked ( #1719 )
3 years ago
Mike Fährmann
ed4b3c48cb
fix flake8 and other tests
3 years ago
enormous-muscles
975e1ac6e2
Add Wikieat extractor ( #1699 )
...
* Add Wikieat extractor
* Add Wikieat extractor to extractor list
3 years ago
Nyasume
fa6af46756
Added ability to download GIFs instead of mp4 from Luscious and Reactor ( #1701 )
3 years ago
Ryu JuHeon
9429eaa0a3
[hitomi]: fix image URLs ( #1765 )
3 years ago
Mike Fährmann
c34dbc86bb
[kemonoparty] update file server domain ( #1764 )
3 years ago
Mike Fährmann
e5a93e113f
[twitter] extend 'replies' option ( #1254 )
...
Allow setting 'replies to '"self"' to only download from self-replies.
3 years ago
Mike Fährmann
f9096584ab
[behance] fix 'collection' extraction
3 years ago
Mike Fährmann
229498b8aa
[twitter] warn about suspended accounts etc ( closes #1759 )
3 years ago
Mike Fährmann
a5de2244d4
[furaffinity] fix using 'category-tranfer' ( #1274 )
3 years ago
Mike Fährmann
cadfad4eea
[danbooru] add 'external' option ( closes #1747 )
3 years ago
Mike Fährmann
5b1c62bfa9
[furaffinity] add 'external' option ( closes #1492 )
3 years ago
Mike Fährmann
5d5ab669fa
[instagram] use custom User-Agent header for video downloads
...
(#1682 , #1623 , #1580 )
3 years ago
Mike Fährmann
7b029dfe85
[instagram] increase default delay between HTTP requests to 8s
...
(closes #1732 )
3 years ago
Mike Fährmann
5eca3781be
[kemonoparty] fix username extraction ( #1750 )
3 years ago
Mike Fährmann
fe970fc87f
[vk] prevent exception for empty/private profiles (fixes 1742)
3 years ago
Mike Fährmann
ac91a84543
[bbc] provide fallback URLs ( #1706 )
3 years ago
Mike Fährmann
a316e44f8e
[bbc] add 'width' option ( #1706 )
3 years ago
Mike Fährmann
c37c2818fb
[nsfwalbum] retry all requests when extracting image URLs
...
(#1733 , fixes #1271 )
3 years ago
Mike Fährmann
220cfe244e
[deviantart] get original files for GIF previews ( #1731 )
3 years ago
Mike Fährmann
7a0da4f93f
[newgrounds] add 'format' option ( closes #1729 )
3 years ago
Mike Fährmann
223a4e79cd
[newgrounds] fix using 'category-tranfer' ( #1274 )
3 years ago
Mike Fährmann
4e95cef6d2
[nsfwalbum] retry backend requests ( fixes #1733 )
3 years ago
Mike Fährmann
6c11105587
[bbc] improve image dimensions ( #1706 )
...
download the 1920xN versions instead of 976x549
3 years ago
Mike Fährmann
57c1a86082
[bbc] support multi-page gallery listings ( closes #1730 )
3 years ago
Mike Fährmann
486474800f
[kemonoparty] skip duplicated patreon files ( closes #1689 )
...
this behavior can be disabled with the 'patreon-skip-file' option
3 years ago
Mike Fährmann
da7297c0b9
[comicvine] add extractor ( closes #1712 )
3 years ago
Mike Fährmann
e4788fa663
[bbc] add 'gallery' and 'programme' extractors ( closes #1706 )
3 years ago
Mike Fährmann
c3b5c88b04
update extractor test results
3 years ago
Mike Fährmann
3868ec02d1
[pururin] update domain and fix extraction
3 years ago
Mike Fährmann
b89a44090f
[naverwebtoon] fix comic extraction
3 years ago
Mike Fährmann
c8e678a5b4
[instagram] fix extraction of '/explore/tags/' posts
...
(closes #1666 )
3 years ago
Mike Fährmann
a6a51f207d
[moebooru] fix 'tags' ending with a '+' when logged in ( #1702 )
3 years ago
Mike Fährmann
f5b097165e
[ytdl] transfer YoutubeDL objects to downloader ( #1680 )
...
allows specifying downloader-specific options per subcategory
but overwrites all downloader.ytdl settings
3 years ago
Mike Fährmann
06e69ea79a
[ytdl] actually set options for YoutubeDL objects ( #1680 )
...
I somehow managed to remove the options parameter for
the YoutubeDL constructor in 9a849cdf
without noticing ...
3 years ago
Mike Fährmann
dfe1f490e9
[mangadex] use custom User-Agent header ( #1535 )
3 years ago
Mike Fährmann
36a2aff363
[vk] improve metadata extraction and URL pattern ( fixes #1691 )
...
- always fetch all user metadata
- use 'user[name]' for directory names if available
3 years ago
Mike Fährmann
e622e004f0
[ytdl] improve module imports ( #1680 )
...
Apply 'extractor.ytdl.module' for every URL, not just the first.
3 years ago
Mike Fährmann
193401ce3b
[ytdl] "fix" cookie transfer between session and ytdl ( #1680 )
...
requests' CookieJar class is not quite compatible with the standard
http.cookiejar.CookieJar used by youtube_dl
3 years ago
Mike Fährmann
9a849cdf61
[ytdl] allow setting 'module' for subcategories ( #1680 )
3 years ago
Mike Fährmann
dff0da60f9
[ytdl] add 'generic' option ( #1680 )
3 years ago
Mike Fährmann
d3da96142a
[ytdl] support cookies + username&password ( #1680 )
3 years ago
Mike Fährmann
36ac2197db
[ytdl] add extractor for sites supported by youtube-dl
...
(#1680 , #878 )
Can be used by prefixing any URL with 'ytdl:',
or by setting 'extractor,ytdl.enabled' to 'true'.
3 years ago
Mike Fährmann
64240c8d42
[imagevenue] fix extraction
...
(closes #1677 )
3 years ago
Mike Fährmann
d287d2eb88
[kemonoparty] parse 'o' query parameters ( #1674 )
3 years ago
Mike Fährmann
8b036778e3
[kemonoparty] add 'max-posts' option ( #1674 )
3 years ago
Mike Fährmann
5612ca31c2
[hitomi] fix image URLs ( closes #1679 )
3 years ago
Mike Fährmann
8ecca3af58
[pixiv] add extractor for 'pixivision' articles ( #1672 )
3 years ago
Mike Fährmann
312a28e78a
[mastodon] add 'replies' option ( #1669 )
3 years ago
Mike Fährmann
513c491cea
[mastodon] reset 'params' after first pagination iteration
...
otherwise query parameters in 'params' get specified twice the second
time around - once from the 'links["next"]' URL and once from 'params'
itself.
3 years ago
Mike Fährmann
a1f5b78039
[mastodon] add 'reblogs' option ( #1669 )
3 years ago
Mike Fährmann
21c2da454f
update extractor test results
3 years ago
Mike Fährmann
7f591c78cb
[mangafox] cleanup
3 years ago
FollieHiyuki
4763bc1e4e
Add MangaExtractor for mangafox ( #1633 )
3 years ago
Mike Fährmann
b519bf567c
[hiperdex] use domain from input URL
3 years ago
Mike Fährmann
93d356712c
[mastodon] implement 'text-posts' option ( #1569 )
...
similar to Twitter's 'text-tweets'
3 years ago
Mike Fährmann
414bdc95a3
[twitter] set 'retweet_id' for original retweets ( #1481 )
3 years ago
Mike Fährmann
5323c1c73a
[twitter] ensure guest tokens are returned as string ( #1665 )
3 years ago
Mike Fährmann
9ee45f3617
[kemonoparty] warn about missing DDoS-GUARD cookies
3 years ago
Mike Fährmann
344aab3fb7
[seisoparty] warn about missing DDoS-GUARD cookies
3 years ago
Mike Fährmann
035562bd11
[twitter] remove old-style URLs from image fallback lists
3 years ago
Mike Fährmann
daf821b8b6
[seisoparty] use user names instead of IDs by default ( #1635 )
3 years ago
Mike Fährmann
e4db1bad14
[seisoparty] also extract files hosted on 'cdn-2' servers ( #1635 )
3 years ago
Mike Fährmann
267bbf5996
[mangasee] add 'chapter' and 'manga' extractors
3 years ago
Mike Fährmann
fad4918208
[deviantart] use UUIDs in internal folder/collection URLs
3 years ago
Mike Fährmann
0179581340
add 'T' format string conversion ( #1646 )
...
to convert 'date'/datetime to timestamp
3 years ago
Mike Fährmann
f74cf52e2b
[seisoparty] add 'user' and 'post' extractors ( #1635 )
3 years ago
Mike Fährmann
759735fb02
[kemonoparty] fix 'username' extraction ( fixes #1652 )
...
The site's <title> content changed from
<title>NAME | Kemono</title>
to
<title>
NAME | Kemono
</title>
3 years ago
Mike Fährmann
a416e54765
[directlink] manually encode Referer URLs ( fixes #1647 )
...
Trying to send a non-latin-1-encodable header raises an exception,
so we encode the Referer value ourselves with 'errors=ignore'.
3 years ago
Mike Fährmann
8bdeb2a6dd
[webtoons] match arbitrary language codes ( closes #1643 )
3 years ago
Mike Fährmann
4adc44df69
[furaffinity] improve metadata extraction ( fixes #1630 )
...
Fetch 'title' and 'artist' metadata from a different location,
since for posts with an empty title the <title> element is
completely empty and does not contain the artist's name.
3 years ago
Mike Fährmann
e98fa01c44
[hitomi] update image URL code ( fixes #1637 )
3 years ago
Mike Fährmann
e9ab97396f
[kemonoparty] update default filenames and archive IDs ( #1514 )
...
Add an enumeration index so that attachments and regular files with the
same filename still get downloaded and not counted as duplicate files
(even though for patreon posts they usually are)
This invalidates all previously generated archive IDs.
To keep using old names and IDs, set
'filename' to "{id}_{title}_{filename}.{extension}" and
'archive-format' to "{service}_{user}_{id}_{filename}.{extension}".
3 years ago
Mike Fährmann
fb4b4725ba
[hiperdex] match 'hiperdex2.com' URLs
...
still doesn't properly work due to Cloudflare CAPTCHA and IUAM page
3 years ago
Mike Fährmann
95bc1139e0
[instagram] update query hashes
3 years ago
Mike Fährmann
23018a46f6
[instagram] fix login ( fixes #1631 )
3 years ago
Mike Fährmann
cac0110d8b
[redgifs] update API server address ( fixes #1632 )
...
napi.redgifs.com -> api.redgifs.com
3 years ago
Mike Fährmann
0d2961ae81
[500px] remove last query hash entry
...
forgot to include this in b56e2450
3 years ago
Mike Fährmann
7273cf8536
[pixiv] support fetching privately followed users ( fixes #1628 )
3 years ago
Mike Fährmann
e60962f7e5
[philomena] improve tag escapes handling ( fixes #1629 )
3 years ago
Mike Fährmann
d8908ca577
[unsplash] update collections URL pattern ( fixes #1627 )
3 years ago
Mike Fährmann
9ed13703cc
[sankaku] handle empty tags ( fixes #1617 )
3 years ago
Mike Fährmann
b56e245094
[500px] update GraphQL queries
...
500px changed its method from query hashes to sending the entire query
string for every request.
3 years ago
Mike Fährmann
a751afdfb3
[twitter] change some defaults
...
- 'retweets' option: true -> false
- 'quoted' option : true -> false
i.e. disable downloading tweets from other user's timelines by default
- search directory:
'["{category}", "Search", "{search}"]' ->
'["{category}", "{user[name]}"]'
i.e. change it to the same as other twitter extractors (#1308 )
3 years ago
Mike Fährmann
4e4ca3c330
[deviantart] pin API version ( #1611 )
...
'/gallery/folders' in the newest version doesn't include subfolders.
It probably only needs the right query parameter to do so, but that
doesn't seem to be documented anywhere.
3 years ago
Mike Fährmann
d09bc5bd34
[subscribestar] improve attachment filenames ( #1609 )
3 years ago
Mike Fährmann
2986bf63bf
[mangafox] update URL pattern ( fixes #1608 )
...
also accept non-numeric volume labels, e.g. vTBD
3 years ago
Mike Fährmann
53dab5c289
[mangadex] revert chapter handling ( #1535 )
...
Spawn a new ChapterExtractor for each individual chapter
instead of handling them directly with a MangaExtractor.
Doing it that way broke too many features like
--chapter-filter, --chapter-range, --zip, etc.
3 years ago
Mike Fährmann
1197ee2c20
[mangadex] add extractor for a user's followed feed ( #1535 )
3 years ago
Mike Fährmann
07c8adbd8b
[mangadex] implement login with username & password ( #1535 )
3 years ago
Mike Fährmann
3e332eaf53
[mangadex] update to API v5 ( #1535 )
3 years ago
Mike Fährmann
04f4f9badb
[oauth] prevent exceptions when reporting errors ( #1603 )
3 years ago
Mike Fährmann
a3bf878329
[idolcomplex] improve and fix pagination ( #1601 )
...
always rely on the 'next-page-url' value and its query parameters
3 years ago
Mike Fährmann
e39c4633ba
[cyberdrop] b64decode -> a2b_base64
3 years ago
Mike Fährmann
407627ec86
[foolfuuka] support 'archive.wakarimasen.moe' ( closes #1595 )
3 years ago
Mike Fährmann
78f89d2e61
[idolcomplex] fix pagination ( closes #1594 )
3 years ago
Mike Fährmann
52052a0e1a
[manganelo] update domain to 'manganato.com'
3 years ago
Mike Fährmann
c80b18a477
[weibo] extend 'retweets' option ( closes #1542 )
...
Setting 'retweets' to "original" will use metadata from the
original posts, and not from the retweeted ones.
3 years ago
Mike Fährmann
c0fa5058da
[kemonoparty] actually add a 'type' metadata field ( #1556 )
3 years ago
thatfuckingbird
264beb8556
recognize v2.mangapark URLs ( #1578 )
...
* recognize v2.mangapark URLs
* update mangapark root url to use the v2 subdomain
3 years ago
thatfuckingbird
e6811c7450
[pixiv] implement 'max-posts' option ( #1558 )
...
* implement max-rank for pixiv
* rename to max-posts and make more generic
3 years ago
Mike Fährmann
8a909e478d
[imagebam] fix extraction of NSFW images ( #1534 )
3 years ago
Mike Fährmann
b5affc62aa
[twitter] rename 'text-only' to 'text-tweets' ( #570 )
3 years ago
Mike Fährmann
724ca61f36
[twitter] add 'text-only' option ( #570 )
3 years ago
Mike Fährmann
8fd8126117
fix ISO 639-1 code for Japanese
...
"jp" -> "ja"
3 years ago
Mike Fährmann
2c60c7d798
[reactor] skip deleted/empty posts
3 years ago
Mike Fährmann
532ac79fb0
update extractor test results
3 years ago
Mike Fährmann
d7bc4a2b8b
[500px] update query hashes
3 years ago
Mike Fährmann
0f35aca728
[aryion] minor code updates
3 years ago
Mike Fährmann
2eb46452ad
[aryion] update 'needle' to not skip text posts ( fixes #1568 )
...
on "Latest Updates" pages
"class='thumb scrollthumb' href='/g4/view/" and
"class='thumb' href='/g4/view/" both end with
"thumb' href='/g4/view/"
3 years ago
Mike Fährmann
4fc9668922
[imgur] update URL patterns ( #1561 )
3 years ago
Mike Fährmann
1eabfa5c7a
[pillowfort] implement login with username & password ( #846 )
3 years ago
Mike Fährmann
24dd10ac3c
[patreon] extract user defined 'tags' ( #1539 , closes #1540 )
3 years ago
Mike Fährmann
a7e4917ee1
[pillowfort] add 'inline' option ( #846 )
...
to support images present in a post's 'content',
but not listed in 'media'.
also separates the file hash present at the beginning
of each 'filename' into its own field.
3 years ago
Mike Fährmann
efa6cc8ec3
[pillowfort] add 'external' option ( #846 )
...
for links to external Twitter posts etc.
3 years ago
Mike Fährmann
394fbb5f56
[twitter] strip useless t.co links ( #1532 )
...
The 'full_text' of Tweets with media content usually ends with a t.co
link to itself. This commit removes those.
3 years ago
Mike Fährmann
41457dbb1b
[twitter] resolve t.co URLs in 'content' ( #1532 )
3 years ago
Mike Fährmann
2b5d80862e
[kemonoparty] add 'type' metadata field ( #1556 )
...
'file', 'attachment', or 'inline'
3 years ago
Mike Fährmann
17b0ccb071
[twitter] add missing retweet media entities ( fixes #1555 )
...
from the original tweets
3 years ago
Mike Fährmann
5eeaaee01d
[pixiv] add 'metadata' option ( #1551 )
3 years ago
Mike Fährmann
0717456b4e
[kemonoparty] add 'metadata' option ( closes #1548 )
...
to fetch creator names with an additional HTTP request
3 years ago
Mike Fährmann
36ed1efcfb
[pixiv] rename "noop" value for 'tags' option to "original"
...
(#1507 )
3 years ago
Mike Fährmann
14f983eab6
[deviantart] use default ID when 'client-id' is None
3 years ago
Mike Fährmann
3e4ffb0821
[gelbooru] add extractor for '/redirect.php' URLs ( #1530 )
3 years ago
Mike Fährmann
5e54105ae4
[instagram] update query hashes
3 years ago