Merge remote-tracking branch 'upstream/master'

pull/3747/head
johnnylau 2 years ago
commit 8ad6387252
No known key found for this signature in database
GPG Key ID: 51660AB66999DC62

@ -1,5 +1,72 @@
# Changelog # Changelog
## 1.25.0 - 2023-03-11
### Changes
- [e621] split `e621` extractors from `danbooru` module ([#3425](https://github.com/mikf/gallery-dl/issues/3425))
- [deviantart] remove mature scraps warning ([#3691](https://github.com/mikf/gallery-dl/issues/3691))
- [deviantart] use `/collections/all` endpoint for favorites ([#3666](https://github.com/mikf/gallery-dl/issues/3666) ,#3668)
- [newgrounds] update default image and audio archive IDs to prevent ID overlap ([#3681](https://github.com/mikf/gallery-dl/issues/3681))
- rename `--ignore-config` to `--config-ignore`
### Extractors
- [catbox] add `file` extractor ([#3570](https://github.com/mikf/gallery-dl/issues/3570))
- [deviantart] add `search` extractor ([#538](https://github.com/mikf/gallery-dl/issues/538), [#1264](https://github.com/mikf/gallery-dl/issues/1264), [#2954](https://github.com/mikf/gallery-dl/issues/2954), [#2970](https://github.com/mikf/gallery-dl/issues/2970), [#3577](https://github.com/mikf/gallery-dl/issues/3577))
- [deviantart] add `gallery-search` extractor ([#1695](https://github.com/mikf/gallery-dl/issues/1695))
- [deviantart] support `fxdeviantart.com` URLs (##3740)
- [e621] implement `notes` and `pools` metadata extraction ([#3425](https://github.com/mikf/gallery-dl/issues/3425))
- [gelbooru] add `favorite` extractor ([#3704](https://github.com/mikf/gallery-dl/issues/3704))
- [imagetwist] support `phun.imagetwist.com` and `imagehaha.com` domains ([#3622](https://github.com/mikf/gallery-dl/issues/3622))
- [instagram] add `user` metadata field ([#3107](https://github.com/mikf/gallery-dl/issues/3107))
- [manganelo] update and fix metadata extraction
- [manganelo] support mobile-only chapters
- [mangasee] extract `author` and `genre` metadata ([#3703](https://github.com/mikf/gallery-dl/issues/3703))
- [misskey] add `misskey` extractors ([#3717](https://github.com/mikf/gallery-dl/issues/3717))
- [pornpics] add `gallery` and `search` extractors ([#263](https://github.com/mikf/gallery-dl/issues/263), [#3544](https://github.com/mikf/gallery-dl/issues/3544), [#3654](https://github.com/mikf/gallery-dl/issues/3654))
- [redgifs] support v3 URLs ([#3588](https://github.com/mikf/gallery-dl/issues/3588). [#3589](https://github.com/mikf/gallery-dl/issues/3589))
- [redgifs] add `collection` extractors ([#3427](https://github.com/mikf/gallery-dl/issues/3427), [#3662](https://github.com/mikf/gallery-dl/issues/3662))
- [shopify] support ohpolly.com ([#440](https://github.com/mikf/gallery-dl/issues/440), [#3596](https://github.com/mikf/gallery-dl/issues/3596))
- [szurubooru] add `tag` and `post` extractors ([#3583](https://github.com/mikf/gallery-dl/issues/3583), [#3713](https://github.com/mikf/gallery-dl/issues/3713))
- [twitter] add `transform` option
### Options
- [postprocessor:metadata] add `sort` and `separators` options
- [postprocessor:exec] implement archive options ([#3584](https://github.com/mikf/gallery-dl/issues/3584))
- add `--config-create` command-line option ([#2333](https://github.com/mikf/gallery-dl/issues/2333))
- add `--config-toml` command-line option to load config files in TOML format
- add `output.stdout`, `output.stdin`, and `output.stderr` options ([#1621](https://github.com/mikf/gallery-dl/issues/1621), [#2152](https://github.com/mikf/gallery-dl/issues/2152), [#2529](https://github.com/mikf/gallery-dl/issues/2529))
- add `hash_md5` and `hash_sha1` functions ([#3679](https://github.com/mikf/gallery-dl/issues/3679))
- implement `globals` option to enable defining custom functions for `eval` statements
- implement `archive-pragma` option to use SQLite PRAGMA statements
- implement `actions` to trigger events on logging messages ([#3338](https://github.com/mikf/gallery-dl/issues/3338), [#3630](https://github.com/mikf/gallery-dl/issues/3630))
- implement ability to load external extractor classes
- `-X/--extractors` command-line options
- `extractor.modules-sources` config option
### Fixes
- [bunkr] fix extraction ([#3636](https://github.com/mikf/gallery-dl/issues/3636), [#3655](https://github.com/mikf/gallery-dl/issues/3655))
- [danbooru] send gallery-dl User-Agent ([#3665](https://github.com/mikf/gallery-dl/issues/3665))
- [deviantart] fix crash when handling deleted deviations in status updates ([#3656](https://github.com/mikf/gallery-dl/issues/3656))
- [fanbox] fix crash with missing images ([#3673](https://github.com/mikf/gallery-dl/issues/3673))
- [imagefap] update `gallery` URLs ([#3595](https://github.com/mikf/gallery-dl/issues/3595))
- [imagefap] fix infinite pagination loop ([#3594](https://github.com/mikf/gallery-dl/issues/3594))
- [imagefap] fix metadata extraction
- [oauth] use default name for browsers without `name` attribute
- [pinterest] unescape search terms ([#3621](https://github.com/mikf/gallery-dl/issues/3621))
- [pixiv] fix `--write-tags` for `"tags": "original"` ([#3675](https://github.com/mikf/gallery-dl/issues/3675))
- [poipiku] warn about incorrect passwords ([#3646](https://github.com/mikf/gallery-dl/issues/3646))
- [reddit] update `videos` option ([#3712](https://github.com/mikf/gallery-dl/issues/3712))
- [soundgasm] rewrite ([#3578](https://github.com/mikf/gallery-dl/issues/3578))
- [telegraph] fix extraction when images are not in `<figure>` elements ([#3590](https://github.com/mikf/gallery-dl/issues/3590))
- [tumblr] raise more detailed errors for dashboard-only blogs ([#3628](https://github.com/mikf/gallery-dl/issues/3628))
- [twitter] fix some `original` retweets not downloading ([#3744](https://github.com/mikf/gallery-dl/issues/3744))
- [ytdl] fix `--parse-metadata` ([#3663](https://github.com/mikf/gallery-dl/issues/3663))
- [downloader:ytdl] prevent exception on empty results
### Improvements
- [downloader:http] use `time.monotonic()`
- [downloader:http] update `_http_retry` to accept a Python function ([#3569](https://github.com/mikf/gallery-dl/issues/3569))
- [postprocessor:metadata] speed up JSON encoding
- replace `json.loads/dumps` with direct calls to `JSONDecoder.decode/JSONEncoder.encode`
- improve `option.Formatter` performance
### Removals
- [nitter] remove `nitter.pussthecat.org`
## 1.24.5 - 2023-01-28 ## 1.24.5 - 2023-01-28
### Additions ### Additions
- [booru] add `url` option - [booru] add `url` option

@ -69,9 +69,9 @@ Standalone Executable
Prebuilt executable files with a Python interpreter and Prebuilt executable files with a Python interpreter and
required Python packages included are available for required Python packages included are available for
- `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.24.5/gallery-dl.exe>`__ - `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.25.0/gallery-dl.exe>`__
(Requires `Microsoft Visual C++ Redistributable Package (x86) <https://aka.ms/vs/17/release/vc_redist.x86.exe>`__) (Requires `Microsoft Visual C++ Redistributable Package (x86) <https://aka.ms/vs/17/release/vc_redist.x86.exe>`__)
- `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.24.5/gallery-dl.bin>`__ - `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.25.0/gallery-dl.bin>`__
Nightly Builds Nightly Builds
@ -285,7 +285,8 @@ This can be done via the
option in your configuration file by specifying option in your configuration file by specifying
- | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon
| (e.g. `Export Cookies <https://addons.mozilla.org/en-US/firefox/addon/export-cookies-txt/>`__ for Firefox) | (e.g. `Get cookies.txt LOCALLY <https://chrome.google.com/webstore/detail/get-cookiestxt-locally/cclelndahbckbenkjhflpdbgdldlbecc>`__ for Chrome,
`Export Cookies <https://addons.mozilla.org/en-US/firefox/addon/export-cookies-txt/>`__ for Firefox)
- | a list of name-value pairs gathered from your browser's web developer tools - | a list of name-value pairs gathered from your browser's web developer tools
| (in `Chrome <https://developers.google.com/web/tools/chrome-devtools/storage/cookies>`__, | (in `Chrome <https://developers.google.com/web/tools/chrome-devtools/storage/cookies>`__,

@ -4411,6 +4411,16 @@ Description
i.e. fields whose name starts with an underscore. i.e. fields whose name starts with an underscore.
metadata.skip
-------------
Type
``bool``
Default
``false``
Description
Do not overwrite already existing files.
metadata.archive metadata.archive
---------------- ----------------
Type Type
@ -4740,15 +4750,13 @@ Type
Example Example
* ``"~/.local/share/gdl-globals.py"`` * ``"~/.local/share/gdl-globals.py"``
* ``"gdl-globals"`` * ``"gdl-globals"``
Default
The ``GLOBALS`` dict in
`util.py <../gallery_dl/util.py>`__
Description Description
Path to or name of an | Path to or name of an
`importable <https://docs.python.org/3/reference/import.html>`__ `importable <https://docs.python.org/3/reference/import.html>`__
Python module whose namespace gets used as an alternative Python module,
|globals parameter|__ | whose namespace,
for compiled Python expressions. in addition to the ``GLOBALS`` dict in `util.py <../gallery_dl/util.py>`__,
gets used as |globals parameter|__ for compiled Python expressions.
.. |globals parameter| replace:: ``globals`` parameter .. |globals parameter| replace:: ``globals`` parameter
.. __: https://docs.python.org/3/library/functions.html#eval .. __: https://docs.python.org/3/library/functions.html#eval

@ -309,7 +309,7 @@ Consider all sites to be NSFW unless otherwise known.
</tr> </tr>
<tr> <tr>
<td>Hiperdex</td> <td>Hiperdex</td>
<td>https://1sthiperdex.com/</td> <td>https://hiperdex.com/</td>
<td>Artists, Chapters, Manga</td> <td>Artists, Chapters, Manga</td>
<td></td> <td></td>
</tr> </tr>
@ -1179,12 +1179,6 @@ Consider all sites to be NSFW unless otherwise known.
<td>Media Files, Replies, Search Results, Tweets</td> <td>Media Files, Replies, Search Results, Tweets</td>
<td></td> <td></td>
</tr> </tr>
<tr>
<td>Nitter.pussthecat.org</td>
<td>https://nitter.pussthecat.org/</td>
<td>Media Files, Replies, Search Results, Tweets</td>
<td></td>
</tr>
<tr> <tr>
<td>Nitter.1d4.us</td> <td>Nitter.1d4.us</td>
<td>https://nitter.1d4.us/</td> <td>https://nitter.1d4.us/</td>

@ -120,7 +120,7 @@ def main():
# eval globals # eval globals
path = config.get((), "globals") path = config.get((), "globals")
if path: if path:
util.GLOBALS = util.import_file(path).__dict__ util.GLOBALS.update(util.import_file(path).__dict__)
# loglevels # loglevels
output.configure_logging(args.loglevel) output.configure_logging(args.loglevel)

@ -0,0 +1,112 @@
# -*- coding: utf-8 -*-
# Copyright 2023 Mike Fährmann
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation.
""" """
import re
import sys
import logging
import operator
from . import util, exception
def parse(actionspec):
if isinstance(actionspec, dict):
actionspec = actionspec.items()
actions = {}
actions[logging.DEBUG] = actions_d = []
actions[logging.INFO] = actions_i = []
actions[logging.WARNING] = actions_w = []
actions[logging.ERROR] = actions_e = []
for event, spec in actionspec:
level, _, pattern = event.partition(":")
type, _, args = spec.partition(" ")
action = (re.compile(pattern).search, ACTIONS[type](args))
level = level.strip()
if not level or level == "*":
actions_d.append(action)
actions_i.append(action)
actions_w.append(action)
actions_e.append(action)
else:
actions[_level_to_int(level)].append(action)
return actions
def _level_to_int(level):
try:
return logging._nameToLevel[level]
except KeyError:
return int(level)
def action_print(opts):
def _print(_):
print(opts)
return _print
def action_status(opts):
op, value = re.match(r"\s*([&|^=])=?\s*(\d+)", opts).groups()
op = {
"&": operator.and_,
"|": operator.or_,
"^": operator.xor,
"=": lambda x, y: y,
}[op]
value = int(value)
def _status(args):
args["job"].status = op(args["job"].status, value)
return _status
def action_level(opts):
level = _level_to_int(opts.lstrip(" ~="))
def _level(args):
args["level"] = level
return _level
def action_wait(opts):
def _wait(args):
input("Press Enter to continue")
return _wait
def action_restart(opts):
return util.raises(exception.RestartExtraction)
def action_exit(opts):
try:
opts = int(opts)
except ValueError:
pass
def _exit(args):
sys.exit(opts)
return _exit
ACTIONS = {
"print" : action_print,
"status" : action_status,
"level" : action_level,
"restart": action_restart,
"wait" : action_wait,
"exit" : action_exit,
}

@ -100,13 +100,6 @@ class HttpDownloader(DownloaderBase):
adjust_extension = kwdict.get( adjust_extension = kwdict.get(
"_http_adjust_extension", self.adjust_extension) "_http_adjust_extension", self.adjust_extension)
codes = kwdict.get("_http_retry_codes")
if codes:
retry_codes = list(self.retry_codes)
retry_codes += codes
else:
retry_codes = self.retry_codes
if self.part and not metadata: if self.part and not metadata:
pathfmt.part_enable(self.partdir) pathfmt.part_enable(self.partdir)
@ -167,7 +160,10 @@ class HttpDownloader(DownloaderBase):
break break
else: else:
msg = "'{} {}' for '{}'".format(code, response.reason, url) msg = "'{} {}' for '{}'".format(code, response.reason, url)
if code in retry_codes or 500 <= code < 600: if code in self.retry_codes or 500 <= code < 600:
continue
retry = kwdict.get("_http_retry")
if retry and retry(response):
continue continue
self.log.warning(msg) self.log.warning(msg)
return False return False

@ -791,15 +791,21 @@ HTTP_HEADERS = {
("TE", "trailers"), ("TE", "trailers"),
), ),
"chrome": ( "chrome": (
("Connection", "keep-alive"),
("Upgrade-Insecure-Requests", "1"), ("Upgrade-Insecure-Requests", "1"),
("User-Agent", "Mozilla/5.0 ({}) AppleWebKit/537.36 (KHTML, " ("User-Agent", "Mozilla/5.0 ({}) AppleWebKit/537.36 (KHTML, "
"like Gecko) Chrome/92.0.4515.131 Safari/537.36"), "like Gecko) Chrome/111.0.0.0 Safari/537.36"),
("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9," ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,"
"image/webp,image/apng,*/*;q=0.8"), "image/avif,image/webp,image/apng,*/*;q=0.8,"
"application/signed-exchange;v=b3;q=0.7"),
("Referer", None), ("Referer", None),
("Sec-Fetch-Site", "same-origin"),
("Sec-Fetch-Mode", "no-cors"),
("Sec-Fetch-Dest", "empty"),
("Accept-Encoding", None), ("Accept-Encoding", None),
("Accept-Language", "en-US,en;q=0.9"), ("Accept-Language", "en-US,en;q=0.9"),
("Cookie", None), ("cookie", None),
("content-length", None),
), ),
} }
@ -838,8 +844,7 @@ SSL_CIPHERS = {
"AES128-GCM-SHA256:" "AES128-GCM-SHA256:"
"AES256-GCM-SHA384:" "AES256-GCM-SHA384:"
"AES128-SHA:" "AES128-SHA:"
"AES256-SHA:" "AES256-SHA"
"DES-CBC3-SHA"
), ),
} }

@ -21,8 +21,8 @@ import re
BASE_PATTERN = ( BASE_PATTERN = (
r"(?:https?://)?(?:" r"(?:https?://)?(?:"
r"(?:www\.)?deviantart\.com/(?!watch/)([\w-]+)|" r"(?:www\.)?(?:fx)?deviantart\.com/(?!watch/)([\w-]+)|"
r"(?!www\.)([\w-]+)\.deviantart\.com)" r"(?!www\.)([\w-]+)\.(?:fx)?deviantart\.com)"
) )
@ -997,7 +997,7 @@ class DeviantartDeviationExtractor(DeviantartExtractor):
subcategory = "deviation" subcategory = "deviation"
archive_fmt = "g_{_username}_{index}.{extension}" archive_fmt = "g_{_username}_{index}.{extension}"
pattern = (BASE_PATTERN + r"/(art|journal)/(?:[^/?#]+-)?(\d+)" pattern = (BASE_PATTERN + r"/(art|journal)/(?:[^/?#]+-)?(\d+)"
r"|(?:https?://)?(?:www\.)?deviantart\.com/" r"|(?:https?://)?(?:www\.)?(?:fx)?deviantart\.com/"
r"(?:view/|deviation/|view(?:-full)?\.php/*\?(?:[^#]+&)?id=)" r"(?:view/|deviation/|view(?:-full)?\.php/*\?(?:[^#]+&)?id=)"
r"(\d+)" # bare deviation ID without slug r"(\d+)" # bare deviation ID without slug
r"|(?:https?://)?fav\.me/d([0-9a-z]+)") # base36 r"|(?:https?://)?fav\.me/d([0-9a-z]+)") # base36
@ -1091,6 +1091,9 @@ class DeviantartDeviationExtractor(DeviantartExtractor):
# old /view/ URLs from the Wayback Machine # old /view/ URLs from the Wayback Machine
("https://www.deviantart.com/view.php?id=14864502"), ("https://www.deviantart.com/view.php?id=14864502"),
("http://www.deviantart.com/view-full.php?id=100842"), ("http://www.deviantart.com/view-full.php?id=100842"),
("https://www.fxdeviantart.com/zzz/art/zzz-1234567890"),
("https://www.fxdeviantart.com/view/1234567890"),
) )
skip = Extractor.skip skip = Extractor.skip

@ -44,6 +44,11 @@ class DirectlinkExtractor(Extractor):
("https://post-phinf.pstatic.net/MjAxOTA1MjlfMTQ4/MDAxNTU5MTI2NjcyNTkw" ("https://post-phinf.pstatic.net/MjAxOTA1MjlfMTQ4/MDAxNTU5MTI2NjcyNTkw"
".JUzkGb4V6dj9DXjLclrOoqR64uDxHFUO5KDriRdKpGwg.88mCtd4iT1NHlpVKSCaUpP" ".JUzkGb4V6dj9DXjLclrOoqR64uDxHFUO5KDriRdKpGwg.88mCtd4iT1NHlpVKSCaUpP"
"mZPiDgT8hmQdQ5K_gYyu0g.JPEG/2.JPG"), "mZPiDgT8hmQdQ5K_gYyu0g.JPEG/2.JPG"),
# internationalized domain name
("https://räksmörgås.josefsson.org/raksmorgas.jpg", {
"url": "a65667f670b194afbd1e3ea5e7a78938d36747da",
"keyword": "fd5037fe86eebd4764e176cbaf318caec0f700be",
}),
) )
def __init__(self, match): def __init__(self, match):

@ -22,17 +22,19 @@ class GelbooruBase():
basecategory = "booru" basecategory = "booru"
root = "https://gelbooru.com" root = "https://gelbooru.com"
def _api_request(self, params): def _api_request(self, params, key="post"):
if "s" not in params:
params["s"] = "post"
params["api_key"] = self.api_key params["api_key"] = self.api_key
params["user_id"] = self.user_id params["user_id"] = self.user_id
url = self.root + "/index.php?page=dapi&s=post&q=index&json=1" url = self.root + "/index.php?page=dapi&q=index&json=1"
data = self.request(url, params=params).json() data = self.request(url, params=params).json()
if "post" not in data: if key not in data:
return () return ()
posts = data["post"] posts = data[key]
if not isinstance(posts, list): if not isinstance(posts, list):
return (posts,) return (posts,)
return posts return posts
@ -158,8 +160,36 @@ class GelbooruPoolExtractor(GelbooruBase,
class GelbooruFavoriteExtractor(GelbooruBase, class GelbooruFavoriteExtractor(GelbooruBase,
gelbooru_v02.GelbooruV02FavoriteExtractor): gelbooru_v02.GelbooruV02FavoriteExtractor):
"""Extractor for gelbooru favorites"""
per_page = 100
pattern = BASE_PATTERN + r"page=favorites&s=view&id=(\d+)" pattern = BASE_PATTERN + r"page=favorites&s=view&id=(\d+)"
test = ("https://gelbooru.com/index.php?page=favorites&s=view&id=12345",) test = ("https://gelbooru.com/index.php?page=favorites&s=view&id=279415", {
"count": 3,
})
def posts(self):
# get number of favorites
params = {
"s" : "favorite",
"id" : self.favorite_id,
"limit": "1"
}
count = self._api_request(params, "@attributes")[0]["count"]
# paginate over them in reverse
params["pid"] = count // self.per_page
params["limit"] = self.per_page
while True:
favs = self._api_request(params, "favorite")
favs.reverse()
for fav in favs:
yield from self._api_request({"id": fav["favorite"]})
params["pid"] -= 1
if params["pid"] < 0:
return
class GelbooruPostExtractor(GelbooruBase, class GelbooruPostExtractor(GelbooruBase,

@ -32,6 +32,28 @@ class GenericExtractor(Extractor):
(?:\#(?P<fragment>.*))? # optional fragment (?:\#(?P<fragment>.*))? # optional fragment
""" """
test = (
("generic:https://www.nongnu.org/lzip/", {
"count": 1,
"content": "40be5c77773d3e91db6e1c5df720ee30afb62368",
"keyword": {
"description": "Lossless data compressor",
"imageurl": "https://www.nongnu.org/lzip/lzip.png",
"keywords": "lzip, clzip, plzip, lzlib, LZMA, bzip2, "
"gzip, data compression, GNU, free software",
"pageurl": "https://www.nongnu.org/lzip/",
},
}),
# internationalized domain name
("generic:https://räksmörgås.josefsson.org/", {
"count": 2,
"pattern": "^https://räksmörgås.josefsson.org/",
}),
("generic:https://en.wikipedia.org/Main_Page"),
("generic:https://example.org/path/to/file?que=1?&ry=2/#fragment"),
("generic:https://example.org/%27%3C%23/%23%3E%27.htm?key=%3C%26%3E"),
)
def __init__(self, match): def __init__(self, match):
"""Init.""" """Init."""
Extractor.__init__(self, match) Extractor.__init__(self, match)
@ -56,7 +78,7 @@ class GenericExtractor(Extractor):
self.root = self.scheme + match.group('domain') self.root = self.scheme + match.group('domain')
def items(self): def items(self):
"""Get page, extract metadata & images, yield them in suitable messages. """Get page, extract metadata & images, yield them in suitable messages
Adapted from common.GalleryExtractor.items() Adapted from common.GalleryExtractor.items()

@ -6,7 +6,7 @@
# it under the terms of the GNU General Public License version 2 as # it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation. # published by the Free Software Foundation.
"""Extractors for https://1sthiperdex.com/""" """Extractors for https://hiperdex.com/"""
from .common import ChapterExtractor, MangaExtractor from .common import ChapterExtractor, MangaExtractor
from .. import text from .. import text
@ -20,7 +20,7 @@ BASE_PATTERN = (r"((?:https?://)?(?:www\.)?"
class HiperdexBase(): class HiperdexBase():
"""Base class for hiperdex extractors""" """Base class for hiperdex extractors"""
category = "hiperdex" category = "hiperdex"
root = "https://1sthiperdex.com" root = "https://hiperdex.com"
@memcache(keyarg=1) @memcache(keyarg=1)
def manga_data(self, manga, page=None): def manga_data(self, manga, page=None):
@ -31,7 +31,7 @@ class HiperdexBase():
return { return {
"manga" : text.unescape(extr( "manga" : text.unescape(extr(
"<title>", "<").rpartition("&")[0].strip()), "<title>", "<").rpartition(" - ")[0].strip()),
"score" : text.parse_float(extr( "score" : text.parse_float(extr(
'id="averagerate">', '<')), 'id="averagerate">', '<')),
"author" : text.remove_html(extr( "author" : text.remove_html(extr(
@ -65,10 +65,10 @@ class HiperdexBase():
class HiperdexChapterExtractor(HiperdexBase, ChapterExtractor): class HiperdexChapterExtractor(HiperdexBase, ChapterExtractor):
"""Extractor for manga chapters from 1sthiperdex.com""" """Extractor for manga chapters from hiperdex.com"""
pattern = BASE_PATTERN + r"(/manga/([^/?#]+)/([^/?#]+))" pattern = BASE_PATTERN + r"(/manga/([^/?#]+)/([^/?#]+))"
test = ( test = (
("https://1sthiperdex.com/manga/domestic-na-kanojo/154-5/", { ("https://hiperdex.com/manga/domestic-na-kanojo/154-5/", {
"pattern": r"https://(1st)?hiperdex\d?.(com|net|info)" "pattern": r"https://(1st)?hiperdex\d?.(com|net|info)"
r"/wp-content/uploads/WP-manga/data" r"/wp-content/uploads/WP-manga/data"
r"/manga_\w+/[0-9a-f]{32}/\d+\.webp", r"/manga_\w+/[0-9a-f]{32}/\d+\.webp",
@ -86,7 +86,7 @@ class HiperdexChapterExtractor(HiperdexBase, ChapterExtractor):
"type" : "Manga", "type" : "Manga",
}, },
}), }),
("https://hiperdex.com/manga/domestic-na-kanojo/154-5/"), ("https://1sthiperdex.com/manga/domestic-na-kanojo/154-5/"),
("https://hiperdex2.com/manga/domestic-na-kanojo/154-5/"), ("https://hiperdex2.com/manga/domestic-na-kanojo/154-5/"),
("https://hiperdex.net/manga/domestic-na-kanojo/154-5/"), ("https://hiperdex.net/manga/domestic-na-kanojo/154-5/"),
("https://hiperdex.info/manga/domestic-na-kanojo/154-5/"), ("https://hiperdex.info/manga/domestic-na-kanojo/154-5/"),
@ -109,11 +109,11 @@ class HiperdexChapterExtractor(HiperdexBase, ChapterExtractor):
class HiperdexMangaExtractor(HiperdexBase, MangaExtractor): class HiperdexMangaExtractor(HiperdexBase, MangaExtractor):
"""Extractor for manga from 1sthiperdex.com""" """Extractor for manga from hiperdex.com"""
chapterclass = HiperdexChapterExtractor chapterclass = HiperdexChapterExtractor
pattern = BASE_PATTERN + r"(/manga/([^/?#]+))/?$" pattern = BASE_PATTERN + r"(/manga/([^/?#]+))/?$"
test = ( test = (
("https://1sthiperdex.com/manga/youre-not-that-special/", { ("https://hiperdex.com/manga/youre-not-that-special/", {
"count": 51, "count": 51,
"pattern": HiperdexChapterExtractor.pattern, "pattern": HiperdexChapterExtractor.pattern,
"keyword": { "keyword": {
@ -130,7 +130,7 @@ class HiperdexMangaExtractor(HiperdexBase, MangaExtractor):
"type" : "Manhwa", "type" : "Manhwa",
}, },
}), }),
("https://hiperdex.com/manga/youre-not-that-special/"), ("https://1sthiperdex.com/manga/youre-not-that-special/"),
("https://hiperdex2.com/manga/youre-not-that-special/"), ("https://hiperdex2.com/manga/youre-not-that-special/"),
("https://hiperdex.net/manga/youre-not-that-special/"), ("https://hiperdex.net/manga/youre-not-that-special/"),
("https://hiperdex.info/manga/youre-not-that-special/"), ("https://hiperdex.info/manga/youre-not-that-special/"),
@ -145,19 +145,9 @@ class HiperdexMangaExtractor(HiperdexBase, MangaExtractor):
self.manga_data(self.manga, page) self.manga_data(self.manga, page)
results = [] results = []
shortlink = text.extr(page, "rel='shortlink' href='", "'") for html in text.extract_iter(
data = { page, '<li class="wp-manga-chapter', '</li>'):
"action" : "manga_get_reading_nav", url = text.extr(html, 'href="', '"')
"manga" : shortlink.rpartition("=")[2],
"chapter" : "",
"volume_id": "",
"style" : "list",
"type" : "manga",
}
url = self.root + "/wp-admin/admin-ajax.php"
page = self.request(url, method="POST", data=data).text
for url in text.extract_iter(page, 'data-redirect="', '"'):
chapter = url.rpartition("/")[2] chapter = url.rpartition("/")[2]
results.append((url, self.chapter_data(chapter))) results.append((url, self.chapter_data(chapter)))

@ -59,10 +59,7 @@ class NitterExtractor(BaseExtractor):
if url[0] == "/": if url[0] == "/":
url = self.root + url url = self.root + url
file = { file = {"url": url, "_http_retry": _retry_on_404}
"url": url,
"_http_retry_codes": (404,),
}
file["filename"], _, file["extension"] = \ file["filename"], _, file["extension"] = \
name.rpartition(".") name.rpartition(".")
append(file) append(file)
@ -220,10 +217,6 @@ BASE_PATTERN = NitterExtractor.update({
"root": "https://nitter.lacontrevoie.fr", "root": "https://nitter.lacontrevoie.fr",
"pattern": r"nitter\.lacontrevoie\.fr", "pattern": r"nitter\.lacontrevoie\.fr",
}, },
"nitter.pussthecat.org": {
"root": "https://nitter.pussthecat.org",
"pattern": r"nitter\.pussthecat\.org",
},
"nitter.1d4.us": { "nitter.1d4.us": {
"root": "https://nitter.1d4.us", "root": "https://nitter.1d4.us",
"pattern": r"nitter\.1d4\.us", "pattern": r"nitter\.1d4\.us",
@ -283,13 +276,12 @@ class NitterTweetsExtractor(NitterExtractor):
}, },
}, },
}), }),
("https://nitter.pussthecat.org/i/user/2976459548", { ("https://nitter.lacontrevoie.fr/supernaturepics", {
"url": "c740a2683db2c8ed2f350afc0494475c4444025b", "url": "54f4b55f2099dcc248f3fb7bfacf1349e08d8e2d",
"pattern": r"https://nitter.pussthecat\.org/pic/orig" "pattern": r"https://nitter\.lacontrevoie\.fr/pic/orig"
r"/media%2FCGMNYZvW0AIVoom\.jpg", r"/media%2FCGMNYZvW0AIVoom\.jpg",
"range": "1", "range": "1",
}), }),
("https://nitter.lacontrevoie.fr/supernaturepics"),
("https://nitter.1d4.us/supernaturepics"), ("https://nitter.1d4.us/supernaturepics"),
("https://nitter.kavin.rocks/id:2976459548"), ("https://nitter.kavin.rocks/id:2976459548"),
("https://nitter.unixfox.eu/supernaturepics"), ("https://nitter.unixfox.eu/supernaturepics"),
@ -309,7 +301,6 @@ class NitterRepliesExtractor(NitterExtractor):
"range": "1-20", "range": "1-20",
}), }),
("https://nitter.lacontrevoie.fr/supernaturepics/with_replies"), ("https://nitter.lacontrevoie.fr/supernaturepics/with_replies"),
("https://nitter.pussthecat.org/supernaturepics/with_replies"),
("https://nitter.1d4.us/supernaturepics/with_replies"), ("https://nitter.1d4.us/supernaturepics/with_replies"),
("https://nitter.kavin.rocks/id:2976459548/with_replies"), ("https://nitter.kavin.rocks/id:2976459548/with_replies"),
("https://nitter.unixfox.eu/i/user/2976459548/with_replies"), ("https://nitter.unixfox.eu/i/user/2976459548/with_replies"),
@ -334,7 +325,6 @@ class NitterMediaExtractor(NitterExtractor):
"range": "1-20", "range": "1-20",
}), }),
("https://nitter.lacontrevoie.fr/supernaturepics/media"), ("https://nitter.lacontrevoie.fr/supernaturepics/media"),
("https://nitter.pussthecat.org/supernaturepics/media"),
("https://nitter.1d4.us/supernaturepics/media"), ("https://nitter.1d4.us/supernaturepics/media"),
("https://nitter.unixfox.eu/i/user/2976459548/media"), ("https://nitter.unixfox.eu/i/user/2976459548/media"),
) )
@ -353,7 +343,6 @@ class NitterSearchExtractor(NitterExtractor):
"range": "1-20", "range": "1-20",
}), }),
("https://nitter.lacontrevoie.fr/supernaturepics/search"), ("https://nitter.lacontrevoie.fr/supernaturepics/search"),
("https://nitter.pussthecat.org/supernaturepics/search"),
("https://nitter.1d4.us/supernaturepics/search"), ("https://nitter.1d4.us/supernaturepics/search"),
("https://nitter.kavin.rocks/id:2976459548/search"), ("https://nitter.kavin.rocks/id:2976459548/search"),
("https://nitter.unixfox.eu/i/user/2976459548/search"), ("https://nitter.unixfox.eu/i/user/2976459548/search"),
@ -375,7 +364,7 @@ class NitterTweetExtractor(NitterExtractor):
"url": "3f2b64e175bf284aa672c3bb53ed275e470b919a", "url": "3f2b64e175bf284aa672c3bb53ed275e470b919a",
"content": "ab05e1d8d21f8d43496df284d31e8b362cd3bcab", "content": "ab05e1d8d21f8d43496df284d31e8b362cd3bcab",
"keyword": { "keyword": {
"comments": 16, "comments": 19,
"content": "Big Wedeene River, Canada", "content": "Big Wedeene River, Canada",
"count": 1, "count": 1,
"date": "dt:2015-05-29 17:40:00", "date": "dt:2015-05-29 17:40:00",
@ -399,9 +388,9 @@ class NitterTweetExtractor(NitterExtractor):
"url": "9c51b3a4a1114535eb9b168bba97ad95db0d59ff", "url": "9c51b3a4a1114535eb9b168bba97ad95db0d59ff",
}), }),
# video # video
("https://nitter.pussthecat.org/i/status/1065692031626829824", { ("https://nitter.lacontrevoie.fr/i/status/1065692031626829824", {
"pattern": r"ytdl:https://nitter.pussthecat.org/video" "pattern": r"ytdl:https://nitter\.lacontrevoie\.fr/video"
r"/B875137EDC8FF/https%3A%2F%2Fvideo.twimg.com%2F" r"/[0-9A-F]{10,}/https%3A%2F%2Fvideo.twimg.com%2F"
r"ext_tw_video%2F1065691868439007232%2Fpu%2Fpl%2F" r"ext_tw_video%2F1065691868439007232%2Fpu%2Fpl%2F"
r"nv8hUQC1R0SjhzcZ.m3u8%3Ftag%3D5", r"nv8hUQC1R0SjhzcZ.m3u8%3Ftag%3D5",
"keyword": { "keyword": {
@ -446,7 +435,7 @@ class NitterTweetExtractor(NitterExtractor):
"count": 0, "count": 0,
}), }),
# "Misleading" content # "Misleading" content
("https://nitter.pussthecat.org/i/status/1486373748911575046", { ("https://nitter.lacontrevoie.fr/i/status/1486373748911575046", {
"count": 4, "count": 4,
}), }),
# age-restricted (#2354) # age-restricted (#2354)
@ -468,3 +457,7 @@ class NitterTweetExtractor(NitterExtractor):
quoted["user"] = tweet["user"] quoted["user"] = tweet["user"]
return (tweet, quoted) return (tweet, quoted)
return (tweet,) return (tweet,)
def _retry_on_404(response):
return response.status_code == 404

@ -248,11 +248,15 @@ class TwitterExtractor(Extractor):
author = tweet["user"] author = tweet["user"]
author = self._transform_user(author) author = self._transform_user(author)
if "note_tweet" in tweet:
note = tweet["note_tweet"]["note_tweet_results"]["result"]
else:
note = None
if "legacy" in tweet: if "legacy" in tweet:
tweet = tweet["legacy"] tweet = tweet["legacy"]
tget = tweet.get tget = tweet.get
entities = tweet["entities"]
tdata = { tdata = {
"tweet_id" : text.parse_int(tweet["id_str"]), "tweet_id" : text.parse_int(tweet["id_str"]),
"retweet_id" : text.parse_int( "retweet_id" : text.parse_int(
@ -272,6 +276,8 @@ class TwitterExtractor(Extractor):
"retweet_count" : tget("retweet_count"), "retweet_count" : tget("retweet_count"),
} }
entities = note["entity_set"] if note else tweet["entities"]
hashtags = entities.get("hashtags") hashtags = entities.get("hashtags")
if hashtags: if hashtags:
tdata["hashtags"] = [t["text"] for t in hashtags] tdata["hashtags"] = [t["text"] for t in hashtags]
@ -284,7 +290,8 @@ class TwitterExtractor(Extractor):
"nick": u["name"], "nick": u["name"],
} for u in mentions] } for u in mentions]
content = text.unescape(tget("full_text") or tget("text") or "") content = text.unescape(
note["text"] if note else tget("full_text") or tget("text") or "")
urls = entities.get("urls") urls = entities.get("urls")
if urls: if urls:
for url in urls: for url in urls:
@ -803,6 +810,23 @@ class TwitterTweetExtractor(TwitterExtractor):
r"\?format=(jpg|png)&name=orig$", r"\?format=(jpg|png)&name=orig$",
"range": "1-2", "range": "1-2",
}), }),
# note tweet with long 'content'
("https://twitter.com/i/web/status/1629193457112686592", {
"keyword": {
"content": """\
BREAKING - DEADLY LIES: Independent researchers at Texas A&M University have \
just contradicted federal government regulators, saying that toxic air \
pollutants in East Palestine, Ohio, could pose long-term risks. \n\nThe \
Washington Post writes, "Three weeks after the toxic train derailment in \
Ohio, an analysis of Environmental Protection Agency data has found nine air \
pollutants at levels that could raise long-term health concerns in and around \
East Palestine, according to an independent analysis. \n\n\"The analysis by \
Texas A&M University seems to contradict statements by state and federal \
regulators that air near the crash site is completely safe, despite residents \
complaining about rashes, breathing problems and other health effects." \
Your reaction.""",
},
}),
) )
def __init__(self, match): def __init__(self, match):
@ -951,6 +975,10 @@ class TwitterAPI():
self.extractor = extractor self.extractor = extractor
self.root = "https://api.twitter.com" self.root = "https://api.twitter.com"
self._nsfw_warning = True
self._syndication = self.extractor.syndication
self._json_dumps = json.JSONEncoder(separators=(",", ":")).encode
cookies = extractor.session.cookies cookies = extractor.session.cookies
cookiedomain = extractor.cookiedomain cookiedomain = extractor.cookiedomain
@ -965,7 +993,11 @@ class TwitterAPI():
auth_token = cookies.get("auth_token", domain=cookiedomain) auth_token = cookies.get("auth_token", domain=cookiedomain)
if not auth_token:
self.user_media = self.user_media_legacy
self.headers = { self.headers = {
"Accept": "*/*",
"authorization": "Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejR" "authorization": "Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejR"
"COuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu" "COuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu"
"4FA33AGWWjCpTnA", "4FA33AGWWjCpTnA",
@ -1019,73 +1051,132 @@ class TwitterAPI():
"collab_control,vibe", "collab_control,vibe",
} }
self.variables = { self.variables = {
"includePromotedContent": False,
"withSuperFollowsUserFields": True,
"withBirdwatchPivots": False,
"withDownvotePerspective": False, "withDownvotePerspective": False,
"withReactionsMetadata": False, "withReactionsMetadata": False,
"withReactionsPerspective": False, "withReactionsPerspective": False,
"withSuperFollowsTweetFields": True,
"withClientEventToken": False,
"withBirdwatchNotes": False,
"withVoice": True,
"withV2Timeline": False,
"__fs_interactive_text": False,
"__fs_dont_mention_me_view_api_enabled": False,
} }
self.features = {
self._nsfw_warning = True "responsive_web_twitter_blue_verified_badge_is_enabled": True,
self._syndication = self.extractor.syndication "responsive_web_graphql_exclude_directive_enabled": True,
self._json_dumps = json.JSONEncoder(separators=(",", ":")).encode "verified_phone_label_enabled": False,
"responsive_web_graphql_skip_user_profile_"
"image_extensions_enabled": False,
"responsive_web_graphql_timeline_navigation_enabled": True,
}
self.features_pagination = {
"responsive_web_twitter_blue_verified_badge_is_enabled": True,
"responsive_web_graphql_exclude_directive_enabled": True,
"verified_phone_label_enabled": False,
"responsive_web_graphql_timeline_navigation_enabled": True,
"responsive_web_graphql_skip_user_profile_"
"image_extensions_enabled": False,
"tweetypie_unmention_optimization_enabled": True,
"vibe_api_enabled": True,
"responsive_web_edit_tweet_api_enabled": True,
"graphql_is_translatable_rweb_tweet_is_translatable_enabled": True,
"view_counts_everywhere_api_enabled": True,
"longform_notetweets_consumption_enabled": True,
"tweet_awards_web_tipping_enabled": False,
"freedom_of_speech_not_reach_fetch_enabled": False,
"standardized_nudges_misinfo": True,
"tweet_with_visibility_results_prefer_gql_"
"limited_actions_policy_enabled": False,
"interactive_text_enabled": True,
"responsive_web_text_conversations_enabled": False,
"longform_notetweets_richtext_consumption_enabled": False,
"responsive_web_enhance_cards_enabled": False,
}
def tweet_detail(self, tweet_id): def tweet_detail(self, tweet_id):
endpoint = "/graphql/ItejhtHVxU7ksltgMmyaLA/TweetDetail" endpoint = "/graphql/zXaXQgfyR4GxE21uwYQSyA/TweetDetail"
variables = { variables = {
"focalTweetId": tweet_id, "focalTweetId": tweet_id,
"referrer": "profile",
"with_rux_injections": False, "with_rux_injections": False,
"includePromotedContent": True,
"withCommunity": True, "withCommunity": True,
"withQuickPromoteEligibilityTweetFields": True, "withQuickPromoteEligibilityTweetFields": True,
"withBirdwatchNotes": False, "withBirdwatchNotes": False,
"withSuperFollowsUserFields": True,
"withSuperFollowsTweetFields": True,
"withVoice": True,
"withV2Timeline": True,
} }
return self._pagination_tweets( return self._pagination_tweets(
endpoint, variables, ("threaded_conversation_with_injections",)) endpoint, variables, ("threaded_conversation_with_injections_v2",))
def user_tweets(self, screen_name): def user_tweets(self, screen_name):
endpoint = "/graphql/WZT7sCTrLvSOaWOXLDsWbQ/UserTweets" endpoint = "/graphql/9rys0A7w1EyqVd2ME0QCJg/UserTweets"
variables = { variables = {
"userId": self._user_id_by_screen_name(screen_name), "userId": self._user_id_by_screen_name(screen_name),
"count": 100, "count": 100,
"includePromotedContent": True,
"withQuickPromoteEligibilityTweetFields": True, "withQuickPromoteEligibilityTweetFields": True,
"withVoice": True,
"withV2Timeline": True,
} }
return self._pagination_tweets(endpoint, variables) return self._pagination_tweets(endpoint, variables)
def user_tweets_and_replies(self, screen_name): def user_tweets_and_replies(self, screen_name):
endpoint = "/graphql/t4wEKVulW4Mbv1P0kgxTEw/UserTweetsAndReplies" endpoint = "/graphql/ehMCHF3Mkgjsfz_aImqOsg/UserTweetsAndReplies"
variables = { variables = {
"userId": self._user_id_by_screen_name(screen_name), "userId": self._user_id_by_screen_name(screen_name),
"count": 100, "count": 100,
"includePromotedContent": True,
"withCommunity": True, "withCommunity": True,
"withVoice": True,
"withV2Timeline": True,
} }
return self._pagination_tweets(endpoint, variables) return self._pagination_tweets(endpoint, variables)
def user_media(self, screen_name): def user_media(self, screen_name):
endpoint = "/graphql/nRybED9kRbN-TOWioHq1ng/UserMedia" endpoint = "/graphql/MA_EP2a21zpzNWKRkaPBMg/UserMedia"
variables = { variables = {
"userId": self._user_id_by_screen_name(screen_name), "userId": self._user_id_by_screen_name(screen_name),
"count": 100, "count": 100,
"includePromotedContent": False,
"withClientEventToken": False,
"withBirdwatchNotes": False,
"withVoice": True,
"withV2Timeline": True,
} }
return self._pagination_tweets(endpoint, variables) return self._pagination_tweets(endpoint, variables)
def user_media_legacy(self, screen_name):
endpoint = "/graphql/nRybED9kRbN-TOWioHq1ng/UserMedia"
variables = {
"userId": self._user_id_by_screen_name(screen_name),
"count": 100,
"includePromotedContent": False,
"withSuperFollowsUserFields": True,
"withBirdwatchPivots": False,
"withSuperFollowsTweetFields": True,
"withClientEventToken": False,
"withBirdwatchNotes": False,
"withVoice": True,
"withV2Timeline": False,
"__fs_interactive_text": False,
"__fs_dont_mention_me_view_api_enabled": False,
}
return self._pagination_tweets(
endpoint, variables, ("user", "result", "timeline", "timeline"),
features=False)
def user_likes(self, screen_name): def user_likes(self, screen_name):
endpoint = "/graphql/9MSTt44HoGjVFSg_u3rHDw/Likes" endpoint = "/graphql/XbHBYpgURwtklXj8NNxTDw/Likes"
variables = { variables = {
"userId": self._user_id_by_screen_name(screen_name), "userId": self._user_id_by_screen_name(screen_name),
"count": 100, "count": 100,
"includePromotedContent": False,
"withClientEventToken": False,
"withBirdwatchNotes": False,
"withVoice": True,
"withV2Timeline": True,
} }
return self._pagination_tweets(endpoint, variables) return self._pagination_tweets(endpoint, variables)
def user_bookmarks(self): def user_bookmarks(self):
endpoint = "/graphql/uKP9v_I31k0_VSBmlpq2Xg/Bookmarks" endpoint = "/graphql/Xq0wQSWHlcfnXARLJGqTxg/Bookmarks"
variables = { variables = {
"count": 100, "count": 100,
} }
@ -1093,7 +1184,7 @@ class TwitterAPI():
endpoint, variables, ("bookmark_timeline", "timeline"), False) endpoint, variables, ("bookmark_timeline", "timeline"), False)
def list_latest_tweets_timeline(self, list_id): def list_latest_tweets_timeline(self, list_id):
endpoint = "/graphql/z3l-EHlx-fyg8OvGO4JN8A/ListLatestTweetsTimeline" endpoint = "/graphql/FDI9EiIp54KxEOWGiv3B4A/ListLatestTweetsTimeline"
variables = { variables = {
"listId": list_id, "listId": list_id,
"count": 100, "count": 100,
@ -1128,18 +1219,21 @@ class TwitterAPI():
["twitter_objects"]["live_events"][event_id]) ["twitter_objects"]["live_events"][event_id])
def list_by_rest_id(self, list_id): def list_by_rest_id(self, list_id):
endpoint = "/graphql/BWEhzAk7k8TwbU4lKH2dpw/ListByRestId" endpoint = "/graphql/KlGpwq5CAt9tCfHkV2mwYQ/ListByRestId"
params = {"variables": self._json_dumps({ params = {
"variables": self._json_dumps({
"listId": list_id, "listId": list_id,
"withSuperFollowsUserFields": True, "withSuperFollowsUserFields": True,
})} }),
"features": self._json_dumps(self.features),
}
try: try:
return self._call(endpoint, params)["data"]["list"] return self._call(endpoint, params)["data"]["list"]
except KeyError: except KeyError:
raise exception.NotFoundError("list") raise exception.NotFoundError("list")
def list_members(self, list_id): def list_members(self, list_id):
endpoint = "/graphql/snESM0DPs3c7M1SBm4rvVw/ListMembers" endpoint = "/graphql/XsAJX17RLgLYU8GALIWg2g/ListMembers"
variables = { variables = {
"listId": list_id, "listId": list_id,
"count": 100, "count": 100,
@ -1149,29 +1243,34 @@ class TwitterAPI():
endpoint, variables, ("list", "members_timeline", "timeline")) endpoint, variables, ("list", "members_timeline", "timeline"))
def user_following(self, screen_name): def user_following(self, screen_name):
endpoint = "/graphql/mIwX8GogcobVlRwlgpHNYA/Following" endpoint = "/graphql/vTZwBbd_gz6aI8v6Wze21A/Following"
variables = { variables = {
"userId": self._user_id_by_screen_name(screen_name), "userId": self._user_id_by_screen_name(screen_name),
"count": 100, "count": 100,
"includePromotedContent": False,
} }
return self._pagination_users(endpoint, variables) return self._pagination_users(endpoint, variables)
def user_by_rest_id(self, rest_id): def user_by_rest_id(self, rest_id):
endpoint = "/graphql/I5nvpI91ljifos1Y3Lltyg/UserByRestId" endpoint = "/graphql/QPSxc9lxrmrwnBzYkJI8eA/UserByRestId"
params = {"variables": self._json_dumps({ params = {
"variables": self._json_dumps({
"userId": rest_id, "userId": rest_id,
"withSafetyModeUserFields": True, "withSafetyModeUserFields": True,
"withSuperFollowsUserFields": True, }),
})} "features": self._json_dumps(self.features),
}
return self._call(endpoint, params)["data"]["user"]["result"] return self._call(endpoint, params)["data"]["user"]["result"]
def user_by_screen_name(self, screen_name): def user_by_screen_name(self, screen_name):
endpoint = "/graphql/7mjxD3-C6BxitPMVQ6w0-Q/UserByScreenName" endpoint = "/graphql/nZjSkpOpSL5rWyIVdsKeLA/UserByScreenName"
params = {"variables": self._json_dumps({ params = {
"variables": self._json_dumps({
"screen_name": screen_name, "screen_name": screen_name,
"withSafetyModeUserFields": True, "withSafetyModeUserFields": True,
"withSuperFollowsUserFields": True, }),
})} "features": self._json_dumps(self.features),
}
return self._call(endpoint, params)["data"]["user"]["result"] return self._call(endpoint, params)["data"]["user"]["result"]
def _user_id_by_screen_name(self, screen_name): def _user_id_by_screen_name(self, screen_name):
@ -1337,19 +1436,23 @@ class TwitterAPI():
params["cursor"] = cursor params["cursor"] = cursor
def _pagination_tweets(self, endpoint, variables, def _pagination_tweets(self, endpoint, variables,
path=None, stop_tweets=True): path=None, stop_tweets=True, features=True):
extr = self.extractor extr = self.extractor
variables.update(self.variables) variables.update(self.variables)
original_retweets = (extr.retweets == "original") original_retweets = (extr.retweets == "original")
pinned_tweet = extr.pinned pinned_tweet = extr.pinned
params = {"variables": None}
if features:
params["features"] = self._json_dumps(self.features_pagination)
while True: while True:
params = {"variables": self._json_dumps(variables)} params["variables"] = self._json_dumps(variables)
data = self._call(endpoint, params)["data"] data = self._call(endpoint, params)["data"]
try: try:
if path is None: if path is None:
instructions = (data["user"]["result"]["timeline"] instructions = (data["user"]["result"]["timeline_v2"]
["timeline"]["instructions"]) ["timeline"]["instructions"])
else: else:
instructions = data instructions = data
@ -1440,6 +1543,8 @@ class TwitterAPI():
if "retweeted_status_result" in legacy: if "retweeted_status_result" in legacy:
retweet = legacy["retweeted_status_result"]["result"] retweet = legacy["retweeted_status_result"]["result"]
if "tweet" in retweet:
retweet = retweet["tweet"]
if original_retweets: if original_retweets:
try: try:
retweet["legacy"]["retweeted_status_id_str"] = \ retweet["legacy"]["retweeted_status_id_str"] = \
@ -1485,10 +1590,12 @@ class TwitterAPI():
def _pagination_users(self, endpoint, variables, path=None): def _pagination_users(self, endpoint, variables, path=None):
variables.update(self.variables) variables.update(self.variables)
params = {"variables": None,
"features" : self._json_dumps(self.features_pagination)}
while True: while True:
cursor = entry = stop = None cursor = entry = stop = None
params = {"variables": self._json_dumps(variables)} params["variables"] = self._json_dumps(variables)
data = self._call(endpoint, params)["data"] data = self._call(endpoint, params)["data"]
try: try:

@ -79,6 +79,18 @@ class WeiboExtractor(Extractor):
def _extract_status(self, status, files): def _extract_status(self, status, files):
append = files.append append = files.append
if "mix_media_info" in status:
for item in status["mix_media_info"]["items"]:
type = item.get("type")
if type == "video":
if self.videos:
append(self._extract_video(item["data"]["media_info"]))
elif type == "pic":
append(item["data"]["largest"].copy())
else:
self.log.warning("Unknown media type '%s'", type)
return
pic_ids = status.get("pic_ids") pic_ids = status.get("pic_ids")
if pic_ids: if pic_ids:
pics = status["pic_infos"] pics = status["pic_infos"]
@ -100,18 +112,20 @@ class WeiboExtractor(Extractor):
else: else:
append(pic["largest"].copy()) append(pic["largest"].copy())
if "page_info" in status and self.videos: if "page_info" in status:
info = status["page_info"]
if "media_info" in info and self.videos:
append(self._extract_video(info["media_info"]))
def _extract_video(self, info):
try: try:
media = max(status["page_info"]["media_info"]["playback_list"], media = max(info["playback_list"],
key=lambda m: m["meta"]["quality_index"]) key=lambda m: m["meta"]["quality_index"])
except KeyError: except Exception:
pass return {"url": (info.get("stream_url_hd") or
except ValueError: info["stream_url"])}
info = status["page_info"]["media_info"]
append({"url": (info.get("stream_url_hd") or
info["stream_url"])})
else: else:
append(media["play_info"].copy()) return media["play_info"].copy()
def _status_by_id(self, status_id): def _status_by_id(self, status_id):
url = "{}/ajax/statuses/show?id={}".format(self.root, status_id) url = "{}/ajax/statuses/show?id={}".format(self.root, status_id)
@ -380,7 +394,7 @@ class WeiboStatusExtractor(WeiboExtractor):
}), }),
# missing 'playback_list' (#2792) # missing 'playback_list' (#2792)
("https://weibo.com/2909128931/4409545658754086", { ("https://weibo.com/2909128931/4409545658754086", {
"count": 9, "count": 10,
}), }),
# empty 'playback_list' (#3301) # empty 'playback_list' (#3301)
("https://weibo.com/1501933722/4142890299009993", { ("https://weibo.com/1501933722/4142890299009993", {
@ -389,6 +403,10 @@ class WeiboStatusExtractor(WeiboExtractor):
r"=0&ps=1CwnkDw1GXwCQx.+&KID=unistore,video", r"=0&ps=1CwnkDw1GXwCQx.+&KID=unistore,video",
"count": 1, "count": 1,
}), }),
# mix_media_info (#3793)
("https://weibo.com/2427303621/MxojLlLgQ", {
"count": 9,
}),
("https://m.weibo.cn/status/4339748116375525"), ("https://m.weibo.cn/status/4339748116375525"),
("https://m.weibo.cn/5746766133/4339748116375525"), ("https://m.weibo.cn/5746766133/4339748116375525"),
) )

@ -6,7 +6,6 @@
# it under the terms of the GNU General Public License version 2 as # it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation. # published by the Free Software Foundation.
import re
import sys import sys
import errno import errno
import logging import logging
@ -33,15 +32,11 @@ class Job():
self.kwdict = {} self.kwdict = {}
self.status = 0 self.status = 0
hooks = extr.config("hooks") actions = extr.config("actions")
if hooks: if actions:
if isinstance(hooks, dict): from .actions import parse
hooks = hooks.items() self._logger_actions = parse(actions)
self._wrap_logger = self._wrap_logger_hooks self._wrap_logger = self._wrap_logger_actions
self._logger_hooks = [
(re.compile(pattern).search, hook)
for pattern, hook in hooks
]
path_proxy = output.PathfmtProxy(self) path_proxy = output.PathfmtProxy(self)
self._logger_extra = { self._logger_extra = {
@ -211,11 +206,10 @@ class Job():
return self._wrap_logger(logging.getLogger(name)) return self._wrap_logger(logging.getLogger(name))
def _wrap_logger(self, logger): def _wrap_logger(self, logger):
return output.LoggerAdapter(logger, self._logger_extra) return output.LoggerAdapter(logger, self)
def _wrap_logger_hooks(self, logger): def _wrap_logger_actions(self, logger):
return output.LoggerAdapterEx( return output.LoggerAdapterActions(logger, self)
logger, self._logger_extra, self)
def _write_unsupported(self, url): def _write_unsupported(self, url):
if self.ulog: if self.ulog:

@ -12,7 +12,7 @@ import shutil
import logging import logging
import functools import functools
import unicodedata import unicodedata
from . import config, util, formatter, exception from . import config, util, formatter
# -------------------------------------------------------------------- # --------------------------------------------------------------------
@ -39,9 +39,9 @@ class LoggerAdapter():
"""Trimmed-down version of logging.LoggingAdapter""" """Trimmed-down version of logging.LoggingAdapter"""
__slots__ = ("logger", "extra") __slots__ = ("logger", "extra")
def __init__(self, logger, extra): def __init__(self, logger, job):
self.logger = logger self.logger = logger
self.extra = extra self.extra = job._logger_extra
def debug(self, msg, *args, **kwargs): def debug(self, msg, *args, **kwargs):
if self.logger.isEnabledFor(logging.DEBUG): if self.logger.isEnabledFor(logging.DEBUG):
@ -64,12 +64,12 @@ class LoggerAdapter():
self.logger._log(logging.ERROR, msg, args, **kwargs) self.logger._log(logging.ERROR, msg, args, **kwargs)
class LoggerAdapterEx(): class LoggerAdapterActions():
def __init__(self, logger, extra, job): def __init__(self, logger, job):
self.logger = logger self.logger = logger
self.extra = extra self.extra = job._logger_extra
self.job = job self.actions = job._logger_actions
self.debug = functools.partial(self.log, logging.DEBUG) self.debug = functools.partial(self.log, logging.DEBUG)
self.info = functools.partial(self.log, logging.INFO) self.info = functools.partial(self.log, logging.INFO)
@ -79,24 +79,21 @@ class LoggerAdapterEx():
def log(self, level, msg, *args, **kwargs): def log(self, level, msg, *args, **kwargs):
if args: if args:
msg = msg % args msg = msg % args
args = None
for search, action in self.job._logger_hooks: actions = self.actions[level]
match = search(msg) if actions:
if match: args = self.extra.copy()
if action == "wait+restart": args["level"] = level
kwargs["extra"] = self.extra
self.logger._log(level, msg, args, **kwargs) for cond, action in actions:
input("Press Enter to continue") if cond(msg):
raise exception.RestartExtraction() action(args)
elif action.startswith("~"):
level = logging._nameToLevel[action[1:]] level = args["level"]
elif action.startswith("|"):
self.job.status |= int(action[1:])
if self.logger.isEnabledFor(level): if self.logger.isEnabledFor(level):
kwargs["extra"] = self.extra kwargs["extra"] = self.extra
self.logger._log(level, msg, args, **kwargs) self.logger._log(level, msg, (), **kwargs)
class PathfmtProxy(): class PathfmtProxy():
@ -273,16 +270,15 @@ else:
def configure_standard_streams(): def configure_standard_streams():
for name in ("stdout", "stderr", "stdin"): for name in ("stdout", "stderr", "stdin"):
options = config.get(("output",), name)
if not options:
continue
stream = getattr(sys, name, None) stream = getattr(sys, name, None)
if not stream: if not stream:
continue continue
if isinstance(options, str): options = config.get(("output",), name)
options = {"encoding": options, "errors": "replace"} if not options:
options = {"errors": "replace"}
elif isinstance(options, str):
options = {"errors": "replace", "encoding": options}
elif not options.get("errors"): elif not options.get("errors"):
options["errors"] = "replace" options["errors"] = "replace"

@ -87,6 +87,7 @@ class MetadataPP(PostProcessor):
self.omode = options.get("open", omode) self.omode = options.get("open", omode)
self.encoding = options.get("encoding", "utf-8") self.encoding = options.get("encoding", "utf-8")
self.private = options.get("private", False) self.private = options.get("private", False)
self.skip = options.get("skip", False)
def run(self, pathfmt): def run(self, pathfmt):
archive = self.archive archive = self.archive
@ -96,6 +97,9 @@ class MetadataPP(PostProcessor):
directory = self._directory(pathfmt) directory = self._directory(pathfmt)
path = directory + self._filename(pathfmt) path = directory + self._filename(pathfmt)
if self.skip and os.path.exists(path):
return
try: try:
with open(path, self.omode, encoding=self.encoding) as fp: with open(path, self.omode, encoding=self.encoding) as fp:
self.write(fp, pathfmt.kwdict) self.write(fp, pathfmt.kwdict)

@ -6,4 +6,4 @@
# it under the terms of the GNU General Public License version 2 as # it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation. # published by the Free Software Foundation.
__version__ = "1.25.0-dev" __version__ = "1.25.1-dev"

@ -428,11 +428,46 @@ class MetadataTest(BasePostprocessorTest):
self.assertNotIn("baz", pdict["bar"]) self.assertNotIn("baz", pdict["bar"])
self.assertEqual(kwdict["bar"], pdict["bar"]) self.assertEqual(kwdict["bar"], pdict["bar"])
# no errors for deleted/undefined fields
self._trigger() self._trigger()
self.assertNotIn("foo", pdict) self.assertNotIn("foo", pdict)
self.assertNotIn("baz", pdict["bar"]) self.assertNotIn("baz", pdict["bar"])
self.assertEqual(kwdict["bar"], pdict["bar"]) self.assertEqual(kwdict["bar"], pdict["bar"])
def test_metadata_option_skip(self):
self._create({"skip": True})
with patch("builtins.open", mock_open()) as m, \
patch("os.path.exists") as e:
e.return_value = True
self._trigger()
self.assertTrue(e.called)
self.assertTrue(not m.called)
self.assertTrue(not len(self._output(m)))
with patch("builtins.open", mock_open()) as m, \
patch("os.path.exists") as e:
e.return_value = False
self._trigger()
self.assertTrue(e.called)
self.assertTrue(m.called)
self.assertGreater(len(self._output(m)), 0)
path = self.pathfmt.realdirectory + "file.ext.json"
m.assert_called_once_with(path, "w", encoding="utf-8")
def test_metadata_option_skip_false(self):
self._create({"skip": False})
with patch("builtins.open", mock_open()) as m, \
patch("os.path.exists") as e:
self._trigger()
self.assertTrue(not e.called)
self.assertTrue(m.called)
@staticmethod @staticmethod
def _output(mock): def _output(mock):
return "".join( return "".join(

Loading…
Cancel
Save