Mike Fährmann
139ff3f6ab
[kemonoparty] add 'posts' extractor ( #5194 )
7 months ago
Mike Fährmann
814ad9321e
[deviantart] skip locked/blurred posts ( #4567 , #5193 )
7 months ago
Mike Fährmann
f7f8ef8684
[twitter] support communities ( #4913 )
7 months ago
Mike Fährmann
cae77e85f8
[twitter] update query hashes
...
... as well as 'variables' and 'features' values
also remove unused legacy API code
7 months ago
Mike Fährmann
06cb518d97
[bunkr] fix extraction ( #5088 , #5151 , #5153 )
...
- remove legacy code
- map legacy domains to bunkr.sk
- use input URL domain for newer domains
- update tests (some files got slightly modified or deleted)
7 months ago
Mike Fährmann
dcc6e3f65c
merge #5134 : [bunkr] add new bunkr domains ( #5130 )
7 months ago
Mike Fährmann
4641937ca3
[imagetwist] add 'gallery' extractor ( #5190 )
7 months ago
Mike Fährmann
fde82ab0ce
[imagechest] add 'user' extractor ( #5143 )
7 months ago
Mike Fährmann
4474cea31b
merge #5187 : [skeb] add 'num' and 'count' metadata fields
7 months ago
Mike Fährmann
4cfceb23cb
[skeb] rename 'data' -> 'file' & add tests
7 months ago
Mike Fährmann
44a1a66dac
merge #5186 : Fix filename formatting silently failing under certain circumstances
7 months ago
Mike Fährmann
c83d0a1596
[weibo] add 'gifs' option ( #5183 )
7 months ago
blankie
f9a8e8cacf
[skeb] add 'num' and 'count' metadata fields
7 months ago
blankie
909830f8ea
fix filename formatting silently failing under certain circumstances
7 months ago
Mike Fährmann
af61d2b037
[wikimedia] combine most wikimedia.org sites ( #1443 )
...
add wikidata.org and wikivoyage.org
7 months ago
Mike Fährmann
c7d17f1111
[bluesky] extract 'hashtags', 'mentions', and 'uris' metadata ( #4438 )
7 months ago
Mike Fährmann
55bbd49a0e
[bluesky] download images in original resolution ( #4438 )
...
at least up to 2000 px
7 months ago
Mike Fährmann
6414dc6bca
[idolcomplex] fix pagination for tags containing ':' ( #5171 )
7 months ago
Mike Fährmann
5c2a2321a2
[bluesky] update refresh token after using it ( #4438 )
7 months ago
Mike Fährmann
9c10be54fb
[bluesky] add 'following' extractor ( #4438 )
7 months ago
Mike Fährmann
86ce35d6a1
[bluesky] simplify 'pattern'
7 months ago
Mike Fährmann
da292ded4e
[bluesky] add 'list' extractor ( #4438 )
7 months ago
Mike Fährmann
004bf7bb38
[bluesky] add 'feed' extractor ( #4438 )
7 months ago
Mike Fährmann
6aea818d4e
[bluesky] allow using DIDs as user handles ( #4438 )
7 months ago
Mike Fährmann
aee5580c62
[idolcomplex] extract 'id_alnum' metadata ( #5171 )
7 months ago
Mike Fährmann
cf7d6be2d4
[bluesky] initial support ( #4438 , #4708 , #4722 , #5047 )
8 months ago
Mike Fährmann
6ef143ea31
[idolcomplex] support alphanumeric post IDs ( #5171 )
8 months ago
Mike Fährmann
6e928300bc
[flickr] handle non-JSON errors ( #5131 )
8 months ago
Mike Fährmann
90ac6d7375
[wikimedia] use '/api.php' as default API path
8 months ago
Mike Fährmann
d7823b9f81
[pinterest] fix section URLs for boards with /?# in name ( #5104 )
8 months ago
Mike Fährmann
de752eb7b1
[naverwebtoon] support '/webtoon/' paths for all comics ( #5123 )
8 months ago
Mike Fährmann
0dacb2b24c
[downloader:http] remove 'pyopenssl' import ( #5156 )
8 months ago
Jeff Mercado
d9d0601ab1
break up line to fit 80 char
8 months ago
Jeff Mercado
6bcd3c9380
[bunkr] add new bunkr domains ( #5130 )
8 months ago
Mike Fährmann
62d6f5f8d2
[luscious] fix IndexError for files without thumbnail ( #5122 )
8 months ago
Mike Fährmann
22647c2626
[naverwebtoon] fix 'title' for comics with empty tags ( #5120 )
8 months ago
Mike Fährmann
3433481dd2
[gofile] update 'website_token' extraction
8 months ago
Mike Fährmann
1f7101d606
[archivedmoe] fix thebarchive webm URLs ( #5116 )
8 months ago
Mike Fährmann
34a4ddc399
[sankaku] add 'id-format' option ( #5073 )
8 months ago
Mike Fährmann
afd20ef42c
[kemonoparty] implement filtering duplicate revisions ( #5013 )
...
set 'revisions' to '"unique"' to have it ignore duplicate revisions
8 months ago
Mike Fährmann
c28475d325
[kemonoparty] fix deleting 'name' in orginal objects ( #5103 )
...
... when computing 'revision_hash'
regression caused by 3d68eda4
dict.copy() only creates a shallow copy
I know that and still managed to get I wrong ...
8 months ago
Mike Fährmann
beacfa7436
[bunkr] update domain to 'bunkr.sk' ( #5114 )
8 months ago
Mike Fährmann
0502256251
release version 1.26.7
8 months ago
Mike Fährmann
67c99b1366
[patreon] prevent HttpError for stream.mux.com URLs
8 months ago
Mike Fährmann
f3ad91b44f
[bunkr] update domain ( #5088 )
8 months ago
Mike Fährmann
c7a42880ab
[wikimedia] support fandom wikis ( #1443 , #2677 , #3378 )
...
Wikis hosted on fandom.com are just wikimedia instances
and support its API.
8 months ago
Mike Fährmann
5bf156f0b1
merge #5094 : [webtoons] fix extracting comic and episode name with commas
8 months ago
blankie
df718887c2
[webtoons] fix extracting comic and episode name with commas
8 months ago
Wiiplay123
6eb62f2140
Combine lh*(-**).googleusercontent.com URL regex into one line.
...
Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
8 months ago
Wiiplay123
a6fed628dd
[blogger] Fix lh*.googleusercontent.com forward slash bug, add support for lh*-**.googleusercontent.com
...
Some URLs use "lh(number)-(locale).googleusercontent.com" format, so I added support for those.
Also, "lh(number).googleusercontent.com" formats were broken because the regex was looking for a second forward slash.
Examples:
lh7.googleusercontent.com
lh7-us.googleusercontent.com
8 months ago
Mike Fährmann
6f8592eaff
[hbrowse] remove from modules list
8 months ago
Mike Fährmann
acc94ac187
[realbooru] fix extraction
...
revert ac97aca99c
8 months ago
Mike Fährmann
9599151118
[issuu] fix extraction
8 months ago
Mike Fährmann
9ca6117c67
[hbrowse] remove module
...
website gone
8 months ago
Mike Fährmann
375eefb886
[chevereto] remove 'pixl.li'
...
"Pixl is closing down"
"All images will be deleted January 1st."
8 months ago
Mike Fährmann
321861af7e
[erome] fix 'count' metadata
8 months ago
Mike Fährmann
b41d9bf616
[paheal] fix 'source' metadata
8 months ago
Mike Fährmann
b0a441f1e3
[nitter] remove 'nitter.lacontrevoie.fr'
...
"Fermeture de Nitter / Closing down Nitter"
8 months ago
Mike Fährmann
a1c1e80f67
[giantessbooru] update domain
8 months ago
Mike Fährmann
2007cb2f59
[tests] check extractor category values
8 months ago
Mike Fährmann
fc4e737f67
[wikimedia] include 'sha1' in default filenames
8 months ago
Mike Fährmann
44f2c15a04
[wikimedia] handle 'File:' paths
8 months ago
Mike Fährmann
93b4120e77
[gelbooru] support 'all' and empty tag ( #5076 )
8 months ago
Mike Fährmann
a416d4c3d5
[sankaku] support post URLs with alphanumeric IDs ( #5073 )
8 months ago
Mike Fährmann
ea553a1d55
[wikimedia] generalize ( #1443 )
...
- support mediawiki.org
- support mariowiki.com (#3660 )
- combine code into a single extractor
(use prefix as subcategory)
- handle non-wiki instances
- unescape titles
8 months ago
Mike Fährmann
89066844f4
add 'config_instance' method
...
to allow for a more streamlined access to BaseExtractor instance options
8 months ago
Mike Fährmann
c3c1635ef3
[wikimedia] update
...
- rewrite using BaseExtractor
- support most Wiki* domains
- update docs/supportedsites
- add tests
8 months ago
Ailothaen
221f54309c
[wikimedia] Improved archive identifiers
8 months ago
Ailothaen
e33056adcd
[wikimedia] Add Wikipedia/Wikimedia extractor
8 months ago
Mike Fährmann
3d68eda4ab
[kemonoparty] add 'revision_hash' metadata ( #4706 , #4727 , #5013 )
...
A SHA1 hexdigest of other relevant metadata fields like
title, content, file and attachment URLs.
This value does NOT reflect which revisions are listed on the website.
Neither does 'edited' or any other metadata field (combinations).
8 months ago
Mike Fährmann
799a8206ad
merge #5061 : [webtoons] extract more metadata
...
- author_name
- comic_name
- episode_name
- username
8 months ago
Mike Fährmann
8ffa0cd3c8
[webtoons] small optimization
...
don't extract the entire 'author_area' and
avoid creating a second 'text.extract_from()' object
8 months ago
Mike Fährmann
59cf4b3884
merge #4444 : [2ch] add 'thread' and 'board' extractors ( #1009 , #3540 )
8 months ago
Mike Fährmann
90b382304a
[deviantart] fix KeyError: 'premium_folder_data' ( #5063 )
8 months ago
Mike Fährmann
4cedf378d5
[deviantart] fix AttributeError for URLs without username ( #5065 )
...
caused by 4f367145
8 months ago
Mike Fährmann
68196589c4
[2ch] update
...
- simplify extractor code
- more metadata
- add tests
8 months ago
hunter-gatherer8
6c4abc982e
[2ch] add 'thread' and 'board' extractors
...
- [2ch] add thread extractor
- [2ch] add board extractor
- [2ch] add new entry to supported sites
8 months ago
blankie
bb446b1598
[webtoons] extract more metadata
8 months ago
Mike Fährmann
355b909f46
merge #5041 : [steamgriddb] add support ( #5033 )
8 months ago
Mike Fährmann
71e2c3e5a2
merge #5037 : [hatenablog] add support ( #5036 )
8 months ago
blankie
9f53daabb8
[hatenablog] implement additional suggestion
8 months ago
blankie
293f1559df
[hatenablog] implement suggestions
8 months ago
blankie
65f42442f5
[steamgriddb] implement another suggestion
8 months ago
blankie
8995fd5f01
[steamgriddb] implement suggestions
8 months ago
Mike Fährmann
b1c175fdd1
allow using an empty string as argument for -D/--directory
8 months ago
Mike Fährmann
2dcfb012ea
[patreon] download 'm3u8' manifests with ytdl
8 months ago
Mike Fährmann
1c68b7df01
[patreon] fix KeyError ( #5048 )
8 months ago
Mike Fährmann
2191e29e14
[nijie] fix image URL for single image posts ( #5049 )
8 months ago
Mike Fährmann
bbf96753e2
[gelbooru] only log "Incomplete API response" for favorites ( #5045 )
8 months ago
Mike Fährmann
39904c9e4e
[deviantart:avatar] add 'formats' option ( #4995 )
8 months ago
Mike Fährmann
5c43098a1a
[twitter] revert to using 'media' timeline by default ( #4953 )
...
This reverts commit a94f944148
.
8 months ago
Mike Fährmann
5f9a98cf0f
[deviantart:avatar] fix exception when 'comments' are enabled ( #4995 )
8 months ago
Mike Fährmann
887ade30a5
[batoto] support more mirror domains ( #5042 )
8 months ago
Mike Fährmann
0a382a5092
[batoto] improve 'manga_id' extraction ( #5042 )
8 months ago
blankie
100966b122
[steamgriddb] fix linting error
9 months ago
blankie
2ccb7d3bd3
[steamgriddb] add support
9 months ago
Mike Fährmann
ec958a26bc
[fuskator] make metadata extraction non-fatal ( #5039 )
...
- prevent KeyErrors
- prevent HTTP redirect
- return file URLs as list
9 months ago
blankie
2cfe788f93
[hatenablog] fix extractor naming errors
9 months ago
blankie
be6949c55d
[hatenablog] fix linting error
9 months ago
blankie
61f3b2f820
[hatenablog] add support
9 months ago