- use original image if available
- support video formats
- remove user info for ImageExtractor (it is no longer possible to get
image owner information for a single image)
An URL alone isn't good enough to distinguish between a gallery or a
gallery-listing, so the new extractor decides what to do based on the
page's content.
This patch implements the necessary details to package gallery-dl as a snap that can be run on a broad range of supporting GNU/Linux distributions[1].
[1] https://snapcraft.io/
Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
- Sometimes an ad interfered when trying to get a download URL
- Resolving "www.hentai-foundry.com" yields an invalid(?) IPv6 address
(2607:5300:60:ca9e:feed:dead:beef:1) and urllib3 only tries to connect
to the IPv4 variant after a rather long wait time
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext"
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
This allows for stuff like "{extractor.url}" and "{extractor.category}"
in logging format strings.
Accessing 'extractor' and 'job' in any way will return "None" if those
fields aren't defined, i.e. in general logging messages.
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.