Mike Fährmann
ec85bf90de
use context managers in cache.py & add tests
5 years ago
Mike Fährmann
4e361b3008
add tests for specific datetime values
5 years ago
Mike Fährmann
90e4c645ba
[formatter] allow multiple "special" format specifiers ( #595 )
...
It is now, for example, possible to specify multiple replacement
operations per format replacement field: {name:Ra/b/Rc/d/}
5 years ago
Mike Fährmann
219c4cc78c
[formatter] allow for numeric list and string indices
5 years ago
Mike Fährmann
7d1da614d9
[formatter] implement field name alternatives ( #525 )
...
The format string '{a|b|c}' will now try to use the value from 'a' and
fall back to 'b' and 'c' if accessing a field raises an exception or
if its value is None.
5 years ago
Mike Fährmann
c7cf9dd111
[furaffinity] support classic layout ( #284 )
5 years ago
Mike Fährmann
40fe062851
[pixiv] fix user id for bookmarks API calls ( closes #596 )
5 years ago
Mike Fährmann
2852691d78
[paheal] replace test URL
...
searching for 'k-on' doesn't yield any results anymore
5 years ago
Mike Fährmann
2a9be48511
improve util.load/save_cookiestxt() and add tests
...
- take a file object as argument instead of an filename
- accept whitespace before comments (" # comment")
- map expiration "0" to None and not the number 0
5 years ago
Mike Fährmann
b3b5754f2d
update test_cookies.py
5 years ago
Mike Fährmann
174117f827
allow multiple hashes for content tests
5 years ago
Mike Fährmann
60a43f0264
fix downloader tests
5 years ago
Mike Fährmann
e89413da22
update test results
5 years ago
Mike Fährmann
5cac79c3d9
[erolord] remove extractor
5 years ago
Mike Fährmann
988cc2ec23
[mangadex] change domain to mangadex.cc ( closes #559 )
5 years ago
Mike Fährmann
87c8b89ddd
[postprocessor:metadata] add 'directory' option ( #520 )
5 years ago
Mike Fährmann
82f7f4172a
update test results
5 years ago
Mike Fährmann
d0920e84e9
update test results
5 years ago
Mike Fährmann
9e63804347
[patreon] make retrieving user info nonfatal ( #508 )
...
… and fall back to the included data if an error occurs.
5 years ago
Mike Fährmann
15f9bb3d14
add option to disable pyOpenSSL usage ( #508 )
...
(pyOpenSSL is now disabled by default)
5 years ago
Mike Fährmann
50deab5265
[deviantart] fix URL generation from /extended_fetch results
...
(closes #505 )
5 years ago
Mike Fährmann
004812258d
[hentaifox] fix extraction
5 years ago
Mike Fährmann
a412531451
[postprocessor:metadata] implement 'extension-format' option
...
closes #477
5 years ago
Mike Fährmann
b5c964332b
improve config.py test coverage
5 years ago
Mike Fährmann
f5604492c3
update interface of config functions
5 years ago
Mike Fährmann
3fc1e12949
[postprocessor:metadata] filter private entries
...
i.e. keys starting with an underscore
5 years ago
Mike Fährmann
978cb03f81
update misc test results
...
- Livedoor now uses https:// for its image URLs
- Instagram image URLs got simplified
5 years ago
Mike Fährmann
bbbeff4c41
[downloader.http] implement file-specific HTTP headers
5 years ago
Mike Fährmann
3ece3976ae
[newgrounds] implement login support ( #394 )
5 years ago
Mike Fährmann
abfcb356fc
[flickr] support 3k, 4k, 5k, and 6k photo sizes ( closes #472 )
5 years ago
Mike Fährmann
da6789b2b0
disable unique archive id checks for some tests
...
- same image twice in a livedoor blog post
- unreliable results for related pinterest items
5 years ago
Mike Fährmann
ba083b30b2
fix snap build
...
… hopefully
5 years ago
Mike Fährmann
94a94f3b86
miscellaneous stuff
5 years ago
Mike Fährmann
9e88e7a344
[postprocessor:exec] improve ( #421 , #413 )
...
- add 'final' option
- include job status in pp finalization
- improve and extend documentation
5 years ago
Mike Fährmann
2a3bd4e3c7
rename extractor classes starting with a digit
5 years ago
Mike Fährmann
64786363be
[4chan] simplify
...
- remove 'chan.py'
- slight adjustments to directory and filenames
5 years ago
Mike Fährmann
557e2c018b
[8chan] remove module
5 years ago
Mike Fährmann
322c2e7ed4
renaming variables
...
mostly 'keyword(s)' to 'kwdict'
5 years ago
Mike Fährmann
87a87bff7e
[simplyhentai] fix image URLs
5 years ago
Mike Fährmann
d5e3910270
adjust 'util.raises()'
5 years ago
Mike Fährmann
b23c822b23
[luscious] use GraphQL
5 years ago
Mike Fährmann
1693d97bd3
update extractor class hierarchies
...
- let the GalleryExtractor class inherit directly from Extractor
- make ChapterExtractor a subclass of GalleryExtractor
- change enumeration field names of GalleryExtractors to 'num'
5 years ago
Mike Fährmann
7ebd984e8d
[imgur] print error message if no JSON data is found ( #446 )
5 years ago
Mike Fährmann
de4e2029d1
[nsfwalbum] update test album
...
the old one is no longer available
5 years ago
Mike Fährmann
913460240d
[reddit] fix 'extractor.blacklist()' arguments
...
The second argument must support 'append()'.
5 years ago
Mike Fährmann
1848788970
update test results etc
5 years ago
Mike Fährmann
d5fbb2d9de
[tumblr] ignore audio links from Spotify etc.
5 years ago
Mike Fährmann
c6c5cb1898
improve 'deviantart.quality' description
5 years ago
Mike Fährmann
c9b97dbca3
extend post processor tests
5 years ago
Mike Fährmann
49f6d7176d
[deviantart] restore filenames ( #392 )
...
<title>_by_<user>_<id> --> <title>_by_<user>-<id>
5 years ago
Mike Fährmann
e528f3cb77
adjust postprocessor test results
...
see 2495b99
5 years ago
Mike Fährmann
cb7f149974
fix mtime datetime test
...
datetime.timestamp() uses local time for a naive datetime object
5 years ago
Mike Fährmann
23251356cb
require 'extension' data for each URL ( #382 )
5 years ago
Mike Fährmann
dd72ae7164
add postprocessor tests
5 years ago
Mike Fährmann
0bb873757a
update PathFormat class
...
- change 'has_extension' from a simple flag/bool to a field that
contains the original filename extension
- rename 'keywords' to 'kwdict' and some other stuff as well
- inline 'adjust_path()'
- put enumeration index before filename extension (#306 )
5 years ago
Mike Fährmann
748e37554c
update .travis.yml
...
- install pyOpenSSL before running tests
- simplify snap tests
5 years ago
Mike Fährmann
b7fb93e2b2
[downloader:http] add 'adjust-extensions' option
5 years ago
Mike Fährmann
eb7da159e2
[imagebam] update URL test results
...
Image URLs are now using https://, but the website itself is still
served as http://.
5 years ago
Mike Fährmann
fa60109e97
[exhentai] don't use e-hentai.org for exhentai URLs
5 years ago
Mike Fährmann
4a0c98bfc9
miscellaneous fixes and adjustments
5 years ago
Mike Fährmann
40637556fa
[ngomik] fix extraction
5 years ago
Mike Fährmann
d9d44ad953
[tsumino] update test results
5 years ago
Mike Fährmann
b1bea8aaeb
add 'restrict-filenames' option ( #348 )
5 years ago
Mike Fährmann
b3851e01d9
release version 1.9.0
5 years ago
Mike Fährmann
12da6bd0c9
[simplyhentai] fix/improve extraction
5 years ago
Mike Fährmann
b89f0d8d3c
update extractor result tests
5 years ago
Mike Fährmann
40da44b17f
Merge branch 'v1.9.0'
5 years ago
Mike Fährmann
7a99e85943
[kissmanga] fix download URLs and file extensions
...
The current Blogspot image URLs hosted on Kissmanga end with an
"invalid" query parameter (/000.png&upx=...), which doesn't get
recognized by 'spliturl()' and 'parseurl()' as such and gets therefore
included in the 'extension' field from 'text.nameext_from_url()'.
5 years ago
Mike Fährmann
a9c89085fb
[instagram] implement login support ( #195 )
5 years ago
Mike Fährmann
b1985d6579
test default format strings during extractor result tests
...
A missing value or an invalid "syntax" for a format replacement field
will raise an exception.
5 years ago
Mike Fährmann
95b1e4c3c0
implement R<old>/<new>/ format option ( #318 )
5 years ago
Mike Fährmann
70713f0f28
fix extractor result tests
5 years ago
Mike Fährmann
ee4d7c3d89
update downloader.find() and related code
...
Instead of replacing 'https' with 'http' for every URL in
'get_downloader()', this now only happens once during downloader
initialization. Also unit tests.
5 years ago
Mike Fährmann
179d112083
[downloader] overhaul http and text modules
...
Get rid of the modular structure and simplify/specialize those modules.
5 years ago
Mike Fährmann
a77340c647
[keenspot] fix extraction for "TwoKinds"
5 years ago
Mike Fährmann
b171befa87
implement 'parse_unicode_escapes()'
5 years ago
Mike Fährmann
e05a96db5e
[deviantart] rename 'stash' to 'extra' ( #302 )
...
'stash' is already used as a name for the StashExtractor and therefore
expected to be a dictionary.
5 years ago
Mike Fährmann
7c6cb908f9
[xhamster] update test results
5 years ago
Mike Fährmann
62335b9015
[paheal] adjust test results
5 years ago
Mike Fährmann
6a34f4b0c1
skip tests on read timeouts; print list of skipped tests
5 years ago
Mike Fährmann
d33f5a7423
[wallhaven] rewrite
...
- use API
- remove login support, add 'api-key' option
- remove support for "alpha" subdomain - alpha.wallhaven.cc used numeric
IDs that can't be translated to the new ID system
- support direct links to wallpapers
5 years ago
Mike Fährmann
5499934ae2
[ngomik] fix extraction
5 years ago
Mike Fährmann
2b1999476e
implement 'text.rextract()'
5 years ago
Mike Fährmann
e30ada162d
fix cookie tests
...
update _get_extractor():
- always return an Extractor instance with a _login_impl() method
- use Extractor.from_url()
5 years ago
Mike Fährmann
2316e0ed3d
fix strptime workaround from b0e85a4
...
Don't return a modified version of 'date_time' if strptime fails.
5 years ago
Mike Fährmann
6764847349
fix cookie tests
...
'cookies' is a CookieJar, not a dict,
and removing the call to '.keys()' doesn't have the same effect
5 years ago
Mike Fährmann
a5b060765d
improve code in tests
...
- use 'assertRaises' as context manager
- remove calls to .keys()
5 years ago
Mike Fährmann
b0e85a42e3
apply workaround from 4736912
in parse_datetime() itself
5 years ago
Mike Fährmann
4736912d4e
[pixiv] work around strptime limitations in Python < 3.7
...
"%z" doesn't allow a colon separator in older Python versions:
- "+0900" is OK
- "+09:00" raises an exception
5 years ago
Mike Fährmann
d09864b581
implement text.parse_datetime()
5 years ago
Mike Fährmann
5582b06ae4
fix tests with 'urllist' messages
5 years ago
Mike Fährmann
5018781898
allow type tests by name
5 years ago
Mike Fährmann
6264a46212
use 'utcfromtimestamp()'
...
'fromtimestamp()' converts its results to the local timezone and causes
problems when running tests on a different machine.
5 years ago
Mike Fährmann
d670de0344
implement 'text.parse_timestamp()'
5 years ago
Mike Fährmann
21a7e395a7
implement convenience wrapper for text.extract functionality
6 years ago
Mike Fährmann
e25ebc4bff
don't disable certificate checks anymore
...
Executables generated with PyInstaller auto-include the root certificate
file and certificate checks now work out-of-the-box.
6 years ago
Mike Fährmann
d6ddb74cde
update test results
...
- deviantart: 'index' is now an integer
- flickr: image file with lower quality
- paheal: image server name changed
- rule34: post got deleted
6 years ago
Mike Fährmann
d9b94a585d
[mangoxo] add login support ( #184 )
...
A very recent change: It is now only possible to see more
than the first 5 images of an album if you are logged in.
6 years ago
Mike Fährmann
e730fc9045
[twitter] add login support ( #214 )
6 years ago
Mike Fährmann
790f15a56f
[photobucket] use HTTPS
6 years ago
Mike Fährmann
c70b21248d
[wikiart] add extractors ( #179 )
...
for
- artists: https://www.wikiart.org/en/thomas-cole
- artist-listings: https://www.wikiart.org/en/artists-by-century/12
- artwork-listings: https://www.wikiart.org/en/paintings-by-media/grisaille
6 years ago
Mike Fährmann
0c991a3155
add convenience targets to Makefile
6 years ago
Mike Fährmann
6277a739e4
[35photo] add user-, genre-, and image-extractors ( #162 )
6 years ago
Mike Fährmann
973a720a7a
[weibo] fix unit test URL patterns
6 years ago
Mike Fährmann
6f57d44ec2
[seaotterscans] remove extractor
...
http://seaotterscans.com/ now redirects to their MangaDex profile
6 years ago
Mike Fährmann
0887fb61f4
[komikcast] update test results
6 years ago
Mike Fährmann
a881537b91
more util.py tests
6 years ago
Mike Fährmann
976ccb267f
[myportfolio] combine gallery and user extractors
...
An URL alone isn't good enough to distinguish between a gallery or a
gallery-listing, so the new extractor decides what to do based on the
page's content.
6 years ago
Mike Fährmann
9c0e2f294b
[shopify] add generic collection and product extractors ( #175 )
...
with fashionnova.com as a default domain
6 years ago
Mike Fährmann
176b7253a1
update function signature for config.load()
6 years ago
Mike Fährmann
e687a6095e
[luscious] raise exception if album is not available
6 years ago
Mike Fährmann
b09a8184ca
move TestJob into test module; test _extractor values
6 years ago
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
6 years ago
Mike Fährmann
148b8f15d0
update tests for util.py
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
1f3422c28b
[mangahere] fix extraction
6 years ago
Mike Fährmann
84ae72b8d8
[ngomik] fix extraction
6 years ago
Mike Fährmann
9a9cd32461
implement alternative constructor for extractors
6 years ago
Mike Fährmann
abbd45d0f4
update handling of extractor URL patterns
...
When loading extractor classes during 'extractor.find(…)', their
'pattern' attribute will be replaced with a compiled version of itself.
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
bc0951d974
allow for simplified test data structures
...
Instead of a strict list of (URL, RESULTS)-tuples, extractor result
tests can now be a single (URL, RESULTS)-tuple, if it's just one test,
and "only matching" tests can now be a simple string.
6 years ago
Mike Fährmann
347398f692
fix various tests
6 years ago
Mike Fährmann
e1d3e9a926
add 'ext_from_url' to text.py
6 years ago
Mike Fährmann
2d2953a5bf
add 'text.parse_float()' + cleanup in text.py
6 years ago
Mike Fährmann
0c32dc5858
[hentaifox] add extractor for search results ( #160 )
6 years ago
Mike Fährmann
217a0687ef
[behance] add 'collection' extractor ( closes #157 )
6 years ago
Mike Fährmann
b8fed34548
add generalized extractors for Mastodon instances ( #144 )
...
Extractors for Mastodon instances can now be dynamically generated,
based on the instance names in the 'extractor.mastodon.*' config path.
Example:
{
"extractor": {
"mastodon": {
"pawoo.net": { ... },
"mastodon.xyz": { ... },
"tabletop.social": { ... },
...
}
}
}
Each entry requires an 'access-token' value, which can be generated with
'gallery-dl oauth:mastodon:<instance URL>'.
An 'access-token' (as well as a 'client-id' and 'client-secret') for
pawoo.net is always available, but can be overwritten as necessary.
6 years ago
Mike Fährmann
66460337f1
[mangapark] fix extraction
6 years ago
Mike Fährmann
79c01ec7ae
implement J<separator>/ format option
...
J joins list elements by calling <separator>.join(list):
Example:
{f:J - /} -> "a - b - c" (if "f" is ["a", "b", "c"])
6 years ago
Mike Fährmann
9bbbadd93a
[hbrowse] use HTTPS
6 years ago
Mike Fährmann
98c6520384
[pinterest] update root URL of API calls
6 years ago
Mike Fährmann
751e535948
[nhentai] fix extraction ( closes #156 )
...
Use JSON embedded in webpage since API endpoints have been disabled
6 years ago
Mike Fährmann
1734a6c879
[reactor] detect "circular" redirects ( #148 )
6 years ago
Mike Fährmann
e53cdfd6a8
update build_supportedsites.py
6 years ago
Mike Fährmann
0afa913de4
[tumblr] add tests for hidden and private blogs ( #145 )
...
Hidden / dashboard-only blogs are pretty straightforward and "only"
require a valid 'access-token' and 'access-token-secret' for the given
'api-key' and 'api-secret', so that signed OAuth1.0 requests are possible.
Private / password protected blogs on the other hand are a bit
cumbersome. In addition to a valid 'access-token' and
'access-token-secret', they also require the account belonging to those
tokens to be a member of the blog itself. Knowing the password and
entering it in the website isn't enough to access a blog through the
API. Following a private blog is also impossible, so that option can't
work either.
6 years ago
Mike Fährmann
fa7fa2f8ff
[deviantart1 update tests]
6 years ago
Mike Fährmann
259123732f
[readcomiconline] improve comic-page parsing
6 years ago
Mike Fährmann
6c71e9cf5d
[deviantart] add separate 'sta.sh' extractor ( #113 )
...
- supports multiple stashed deviations per page
- explicitly mentions sta.sh support on supportedsites.rst
6 years ago
Mike Fährmann
c5d4f558c9
allow missing field access keys in format strings ( #136 )
6 years ago
Mike Fährmann
4d73cc785d
update test results
6 years ago
Mike Fährmann
010da8372a
[instagram] relax test pattern
6 years ago
Mike Fährmann
15890930ea
[mangafox] fix extraction
...
use mobile version since desktop version is obfuscated
6 years ago
Mike Fährmann
fb53b5dd55
fix control+c during -j and range tests
6 years ago
Mike Fährmann
59bb434ba5
[flickr] add ability to download all albums of a user
...
for example with 'https://www.flickr.com/photos/shona_s/albums '
6 years ago
Mike Fährmann
041bd501fc
[hentaifoundry] unescape YII_CSRF_TOKEN value
...
This fixes the POST requests to /site/filters
6 years ago
Mike Fährmann
d4b2b73bef
release version 1.6.0
6 years ago
Mike Fährmann
3c25fa2dad
update build_testresult_db.py script
6 years ago
Mike Fährmann
7f6a0be982
adjust some tests
6 years ago
Mike Fährmann
966a9ca3a0
update test results
6 years ago
Mike Fährmann
c9861ca812
adjust message for status_code based exceptions
...
from: 5xx HTTP Error: Reason
to : 5xx: Reason
The "HTTP Error" part was in there to emulate Request's error messages
from response.raise_for_status(), but it reads a lot better without.
6 years ago