Mike Fährmann
459a0af4f8
[sankaku] add support for sankaku.app URLs ( closes #1193 )
4 years ago
Mike Fährmann
371e9ca6df
[pinterest] implement video support ( closes #1189 )
4 years ago
Mike Fährmann
537742c0ee
[sankaku] normalize 'created_at' metadata ( closes #1190 )
4 years ago
Mike Fährmann
ae6748996a
[pornhub] update tests
4 years ago
Mike Fährmann
bf629a2818
[instagram] add 'include' option ( closes #1180 )
...
Split the functionality of the old 'user' extractor into separate
'posts' and 'highlights' extractors, which respond to virtual URLs
('/<user>/posts' and '/<user>/highlights')
4 years ago
Mike Fährmann
78061658ea
[booru] reduce exceptions caught during _prepare_post()
...
don't catch HttpErrors etc.
4 years ago
Mike Fährmann
212ae0c399
[mangapanda] remove module
...
site now redirects to mangareader.net
4 years ago
Mike Fährmann
337b118e25
[instagram] warn about private profiles ( #1187 )
4 years ago
Mike Fährmann
465015f75a
[sankaku] reimplement login support ( #1176 , #1182 )
4 years ago
Mike Fährmann
8d2e4e5f13
[booru] improve error handling
...
e.g. for posts without a valid 'file_url' (#1176 )
4 years ago
Mike Fährmann
1d753542c2
[hentainexus] fix extraction ( fixes #1166 )
4 years ago
Mike Fährmann
a00b60fbe7
[twitter] update 'x-csrf-token' header ( fixes #1170 )
...
Twitter started using a bigger (80 instead of 16 bytes) CSRf token for
logged in users, and expects those to be used as 'x-csrf-token' header
when send via 'ct0' cookie.
Generating an 80 byte token ourselves doesn't work, and Twitter will
still insist on using its own.
4 years ago
Mike Fährmann
b88c97b873
[instagram] add 'cursor' option ( #1149 )
...
To enable at least 'some' way to continue downloading from the middle
of a user profile listing.
4 years ago
Mike Fährmann
0d406c8daf
[common] restrict values used in 'generate_extractors()'
4 years ago
Mike Fährmann
b2c55f0a72
[sankaku] remove login support
...
The old login method for 'https://chan.sankakucomplex.com/user/login '
and the cookies it produces have no effect on the results from
'beta.sankakucomplex.com'.
4 years ago
Mike Fährmann
7f3d811d7b
[moebooru] inherit from BooruExtractor
4 years ago
Mike Fährmann
a3a863fc13
[booru] add generalized extractors for *booru sites
...
similar to cc15fbe7
4 years ago
Mike Fährmann
5f23441e12
[piczel] update API URLs
4 years ago
Mike Fährmann
47114339a2
[webtoons] update 'ageGate' cookie
4 years ago
Mike Fährmann
4225f12783
[nozomi] handle empty 'date' fields ( fixes #1163 )
4 years ago
Mike Fährmann
2b93515ee0
[instagram] reimplement support for stories ( #1149 )
4 years ago
Mike Fährmann
ecdea799dd
[sankaku] use 'beta.sankakucomplex.com' API endpoints
4 years ago
Mike Fährmann
b3ecc89a9a
[instagram] use double quotes for strings when possible
4 years ago
Mike Fährmann
76285eb60d
[instagram] reimplement support for story highlights ( #1149 )
4 years ago
Mike Fährmann
8ca7f54750
rename '_request_…' variables
...
- remove '_' at the beginning
- _request_last -> request_timestamp
4 years ago
Mike Fährmann
15a122aff3
[instagram] update 'X-IG-WWW-Claim' headers
4 years ago
Mike Fährmann
e5d81bdc7b
[mangadex] handle 'external' chapters ( closes #1154 )
4 years ago
Mike Fährmann
447488fb18
[instagram] rewrite
...
(#1113 , #1122 , #1128 , #1130 , #1149 )
Rely on the results of GraphQL queries instead of requesting data
for each post separately via '/p/<shortcode>/?__a=1'.
This might result in some missing metadata, and there might be some
issues for '/channel/' and '/saved/' URLs, but at least downloading
from the regular post listings should work without issues and without
getting users blocked/banned.
TODO: reimplement support for stories
4 years ago
Mike Fährmann
cc15fbe71a
[moebooru] add generalized extractors for moebooru sites
...
- add support for sakugabooru.com (closes #1136 )
- add support for lolibooru.moe (closes #1050 )
This allows users to dynamically add support for moebooru/myimouto
based sites by adding an entry to their config file
(like for foolslide, foolfuuka, etc)
For example:
{
"extractor": {
"moebooru": {
"new-site-1": {"root": "https://site1.net "},
"new-site-2": {"root": "https://www.site2.moe "}
}
}
}
4 years ago
Mike Fährmann
43120407cc
[paheal] create directory for each post ( closes #1147 )
4 years ago
Mike Fährmann
63e61a0932
[twitter] update image URL format ( #1145 )
...
use
'/<name>?format=<fmt>&name=<size>'
instead of the potentially deprecated
'/<name>.<fmt>:<size>'
but keep all of them as fallback URLs
4 years ago
Mike Fährmann
ae6a1d5fbc
[mangoxo] fix extraction 2
4 years ago
Mike Fährmann
f6a684bc37
[hentainexus] update data decoding procedure ( #1125 )
4 years ago
Mike Fährmann
c57a918f4a
[e621] implement delay via '_request_interval_min'
4 years ago
Mike Fährmann
93ce7466e2
[2chan] skip external links
4 years ago
Mike Fährmann
b214e89b5c
[mangoxo] fix extraction
4 years ago
Mike Fährmann
578dcf805c
[mangapanda] don't force https://
4 years ago
Mike Fährmann
102c482f5e
[reddit] skip invalid/failed gallery items ( fixes #1127 )
4 years ago
Mike Fährmann
174945d2b2
[hentainexus] fix extraction ( fixes #1125 )
4 years ago
Mike Fährmann
1e3dd7330e
merge SharedConfigMixin functionality into Extractor
4 years ago
Mike Fährmann
ddfb4fd07a
[twitter] use ' https://twitter.com/i/api/ ' for logged in users
...
Doesn't seem to make a difference from what I can tell,
i.e. downloaded files are the same, but the website does it.
4 years ago
Mike Fährmann
42ccae53c4
[mangadex] switch to API v2
...
https://mangadex.org/api/v2/
https://mangadex.org/thread/351011
4 years ago
Mike Fährmann
ca44111726
[flickr] update
...
- ensure every photo has an 'owner' (#828 )
- change default directories to a more consistent schema
- create directory for each photo
4 years ago
Mike Fährmann
de0c57886d
[twitter] add 'list-members' extractor ( closes #1096 )
4 years ago
Mike Fährmann
904ba08568
[gfycat] fix default filename format
4 years ago
Mike Fährmann
a46561bc16
[500px] update query hashes
4 years ago
Mike Fährmann
2e3a0dff21
[8kun] fix file URLs of older posts ( fixes #1101 )
4 years ago
Mike Fährmann
00825cddf5
[hentaifoundry] use scheme from input URL ( fixes #1095 )
...
Let the user choose between http and https,
instead of always forcing https.
4 years ago
Mike Fährmann
8a98d3549a
[weasyl] create directory for each favorite submission
...
(#1032 )
4 years ago
Mike Fährmann
91db8df1c7
[deviantart] add 'index_base36' metadata field ( closes #1099 )
...
This is the same ID as found in 'filename' without the 'd' in front,
which is just 'index' encoded in base36.
4 years ago
Mike Fährmann
b9bfa4c675
update extractor test results
4 years ago
Mike Fährmann
1b5b789401
[mangoxo] fix metadata extraction
4 years ago
Mike Fährmann
41d4968866
[twitter] add 'list' extractor ( #1096 )
4 years ago
Mike Fährmann
5d10520f4c
[twitter] update GraphQL endpoint & fix width/height entries
4 years ago
Mike Fährmann
9b2e5f72d6
[exhentai] update image URL parsing ( #1094 )
4 years ago
Mike Fährmann
98a4d86a01
[sankakucomplex] extract videos and embeds ( closes #308 )
4 years ago
Mike Fährmann
558cde139c
[paheal] fix extraction ( fixes #1088 )
4 years ago
Mike Fährmann
0211af7ca8
[hentaifoundry] update 'YII_CSRF_TOKEN' cookie handling
...
(fixes #1083 )
4 years ago
Mike Fährmann
198c33ec36
also collect post processors from 'basecategory' entries
...
(fixes #1084 )
4 years ago
Mike Fährmann
350b1afe1c
speed up _list_classes() after iterating over all modules once
4 years ago
Mike Fährmann
18213dc5ba
release version 1.15.2
4 years ago
Mike Fährmann
b788712844
[fallenangels] fix extraction of '.5' chapters
4 years ago
Mike Fährmann
28d8541cb3
[mangafox] ensure download URLs have a scheme
4 years ago
Mike Fährmann
8e3a324c91
[mangakakalot] ignore "Go Home" buttons in chapter pages
4 years ago
Mike Fährmann
c14c5d82d6
[newgrounds] use generator for fallback URLs
4 years ago
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
4 years ago
Mike Fährmann
1686dc1757
[twitter] support media from Cards ( #1005 , #937 )
...
Can be enabled with 'extractor.twitter.cards', but for now disabled by
default because cards can redirect to rather large videos from YouTube
or Twitch.
4 years ago
Mike Fährmann
ffd38215a4
[hitomi] fix image URLs and URL pattern
...
- non-webp files are now hosted on [a-c]b.hitomi.la
- removed ampersand from invalid slug characters
4 years ago
Mike Fährmann
286718950c
[mangahere] ensure download URLs have a scheme ( fixes #1070 )
4 years ago
Mike Fährmann
76dfa11a65
[reddit] add 'date' metadata field ( closes #1068 )
4 years ago
Mike Fährmann
3f2ba629ea
[newgrounds] provide fallback URLs for video downloads ( #1042 )
4 years ago
Mike Fährmann
a3ca2f6080
update fallback URL handling
...
remove Message.Urllist and use a '_fallback' field inside a kwdict
4 years ago
Mike Fährmann
43dab3a228
[mangadex] unescape more metadata fields ( fixes #1066 )
...
like 'manga', 'author', 'artist', etc.
4 years ago
Mike Fährmann
5565025221
[xhamster] fix user profile extraction
4 years ago
Mike Fährmann
07432d6262
[seiga] fix flake8 and cookie test ( #1063 )
4 years ago
Mike Fährmann
b8daabc3ca
[pinterest] implement login support ( closes #1055 )
...
being logged allows access to secret/protected boards
4 years ago
Mike Fährmann
1b1cf01d0d
add a general 'generate_csrf_token()' function
4 years ago
Mike Fährmann
7a0ba370d1
[gelbooru] rewrite mp4 video URLs ( fixes #1048 )
4 years ago
Mike Fährmann
6491db3eaf
[blogger] handle URLs with specified width/height ( closes #1061 )
...
get highest quality for images with
/wXXX-hXXX/ instead of the usual /sXXX/
4 years ago
Mike Fährmann
783e0af26d
[hentaifoundry] update and simplify
4 years ago
Mike Fährmann
5b844a72b7
[newgrounds] handle embeds without scheme ( #1033 )
4 years ago
kurumigi
7e0e872f4f
[seiga] Add metadata for single image downloads ( #1063 )
...
* [seiga] Support image metadata.
* [seiga] Update test data.
* [seiga] Fix cookie check.
* [test_cookies] [seiga] Fit test_cookies.py to the last commit.
4 years ago
Zanny
3ec60e894a
[weasyl] api-key authentication ( #1057 )
...
* [weasyl] support api keys
* [weasyl] document api-key authentication
* [weasyl] usernames can contain ~
4 years ago
Mike Fährmann
844793847c
update extractor test results
4 years ago
Mike Fährmann
ddd6840509
[behance] fix 'collection' extraction
4 years ago
Mike Fährmann
c5e3971b18
[newgrounds] extract image embeds ( closes #1033 )
4 years ago
dawidsowa
43b156fb40
[reactor] match URLs without subdomain ( #1053 )
4 years ago
Mike Fährmann
3ebb174f2c
add missing extractor info when spawning new ones ( fixes #1051 )
...
Not having this information causes the blacklist/whitelist logic to
trigger and prevents things from functioning as intended when using
default settings.
Fixes issues for 8muses, deviantart, exhentai, and mangoxo.
4 years ago
Mike Fährmann
f9c1684af7
[newgrounds] restore original video URLs ( #1042 )
4 years ago
Mike Fährmann
73373c06ec
[weibo] handle posts with more than 9 images ( closes #926 )
...
Responses from '/api/container/getIndex' don't list more than
9 images per 'status' object, but the embedded JSON from a
'/detail/<ID>' page does.
4 years ago
Mike Fährmann
dd1e545597
[hentaifoundry] rename GalleryExtractor to PicturesExtractor
4 years ago
Mike Fährmann
c874071f5a
[kissmanga] remove module
4 years ago
Mike Fährmann
93e04bf9a9
[500px] update query hashes
4 years ago
Mike Fährmann
844502cad5
update extractor test results
4 years ago
Mike Fährmann
fad7748b6b
[xvideos] fix 'title' extraction
4 years ago
Mike Fährmann
5b927c15df
[newgrounds] fix video extraction ( closes #1042 )
4 years ago
Mike Fährmann
bdc6c8f074
improve message for 'oauth:deviantart' etc ( closes #989 )
4 years ago
Mike Fährmann
430b6d6e2e
[twitter] extend 'retweets' option ( closes #1026 )
...
Setting 'retweets' to '"original"' will use metadata from the
original retweeted Tweets, and not from the Retweet entry.
4 years ago
Mike Fährmann
b9bdd2c564
[hentaifoundry] add support for stories ( closes #734 )
4 years ago
Mike Fährmann
9a9d1924d8
[hentaicafe] add 'manga_id' metadata field ( closes #1036 )
...
This field is only available when using a non-foolslide URL
like '/hc.fyi/9874' or '/hazuki-yuuto-summer-blues/'
4 years ago