Mike Fährmann
7a412f5c32
implement generic manga-chapter extractor
7 years ago
Mike Fährmann
619387cbb1
update extractor unittest results
7 years ago
Mike Fährmann
f94e3706a8
use logging module for error messages during downloads
7 years ago
Mike Fährmann
0dd48d644f
update test results
...
nothing broke, but things got updated or changed
7 years ago
Mike Fährmann
f10ffc0839
update extractor blacklist to also allow classes
7 years ago
Mike Fährmann
35e09869d1
[mangapark] fix image URLs and use HTTPS
7 years ago
Mike Fährmann
4edb25346e
[slideshare] support mobile URLs ( closes #67 )
7 years ago
Mike Fährmann
b33efc99a4
[idolcomplex] add support for idol.sankakucomplex.com
7 years ago
Mike Fährmann
1a70857a12
update extractor-unittest capabilities
...
- "count" can now be a string defining a comparison in the form of
'<operator> <value>', for example: '> 12' or '!= 1'. If its value
is not a string, it is assumed to be a concrete integer as before.
- "keyword" can now be a dictionary defining tests for individual keys.
These tests can either be a type, a concrete value or a regex
starting with "re:". Dictionaries can be stacked inside each other.
Optional keys can be indicated with a "?" before its name.
For example:
"keyword:" {
"image_id": int,
"gallery_id", 123,
"name": "re:pattern",
"user": {
"id": 321,
},
"?optional": None,
}
7 years ago
Mike Fährmann
28cd78aae0
[kissmanga] extend chapter-string regex ( closes #58 )
7 years ago
Mike Fährmann
fc7d165c97
[deviantart] add support for OAuth2 authentication
...
Some user galleries [*] require you to be either logged in or
authenticated via OAuth2 to access their deviations.
[*] e.g. https://polinaegorussia.deviantart.com/gallery/
--------------
known issue:
A deviantart 'refresh_token' can only be used once and gets updated
whenever it is used to request a new 'access_token', so storing its
initial value in a config file and reusing it again and again is not
possible.
7 years ago
Mike Fährmann
0a9a07a6e1
[slideshare] improve metadata; flake8
...
- added 'views' and 'published' keywords
- fixed longer titles and descriptions
7 years ago
Mike Fährmann
291369eab2
various smaller changes/additions
7 years ago
Mike Fährmann
300346ecdf
[mangazuki] remove extractors
...
This site has been in "rebuild"-mode for a fairly long time and the
current extractor code isn't going to work for the new version either.
7 years ago
Mike Fährmann
214972bc9a
[gelbooru] use manual extraction
...
... to compensate for their disabled API.
(https://gelbooru.com/index.php?page=forum&s=view&id=3875 )
This also adds an extractor for image-pools.
7 years ago
Mike Fährmann
b14de6ffc2
[tumblr] small improvements
...
- don't transform inline GIF URLs
- set 'type' parameter for API calls if there is only
one post type selected
7 years ago
Mike Fährmann
b8cdd42cab
[senmanga] fix extraction (again)
...
this is basically a re-revert of 2ace5c7
7 years ago
Mike Fährmann
6913eeaa40
[powermanga] replace manga extractor unit test
...
My Hero Academia is gone
7 years ago
Mike Fährmann
f72318e593
[seiga] support more than 200 images
...
Due to API restrictions and/or missing knowledge about and
documentation of API usage, it was only possible to retrieve the
latest 200 images of a niconico seiga user with said API.
The new approach manually visits each HTML page and gets its
information from there.
7 years ago
Mike Fährmann
2457b71633
skip tests on 5xx status codes
7 years ago
Mike Fährmann
305da540c3
[mangahere] fix metadata extraction
7 years ago
Mike Fährmann
035ef655f1
[imagefap] update unit tests
...
old gallery/image has been deleted
7 years ago
Mike Fährmann
caf26412dd
add option to set alternate location of .part files ( #29 )
...
Note: The path set for 'downloader.*.part-directory' needs to point to an
already existing directory.
7 years ago
Mike Fährmann
27c026543f
re-enable download unit tests
7 years ago
Mike Fährmann
b0353aa02d
rewrite download modules ( #29 )
...
- use '.part' files during file-download
- implement continuation of incomplete downloads
- check if file size matches the one reported by server
7 years ago
Mike Fährmann
75d3a1f72f
[deviantart] always download original images
...
Deviation-objects returned by the DeviantArt API don't always contain
the URL and metadata of the original image ([1]). Getting this
information requires an additional API call [2], which is indicated by
the 'is_downloadable' and 'download_filesize' metadata within a
deviation-object.
[1] https://myria-moon.deviantart.com/art/Aime-Moi-part-en-vadrouille-261986576
[2] https://www.deviantart.com/developers/http/v1/20160316/deviation_download/bed6982b88949bdb08b52cd6763fcafd
7 years ago
Mike Fährmann
0386503c80
fix (sub)category-transfer for DownloadJob instances ( #41 )
...
... and extend "parent" parameters to TestJob- and DataJob-classes
as well.
7 years ago
Mike Fährmann
41adb99e9c
[pawoo] fix extraction
...
- changed access_token
- use account-search instead of general search
7 years ago
Mike Fährmann
b319f4bab3
smaller code and text changes
7 years ago
Mike Fährmann
85a2b2ae59
[khinsider] fix extraction
7 years ago
Mike Fährmann
8e14714c2b
[imgspice] fix extraction
7 years ago
Mike Fährmann
a85f06d2d1
[foolslide] restructure; convert suitable values to int
7 years ago
Mike Fährmann
a9e7145651
[hbrowse] extract hmanga metadata & general maintenance
7 years ago
Mike Fährmann
84d4450410
[fallenangels] extract manga metadata
7 years ago
Mike Fährmann
f32b1a0292
[imgyt] fix extraction
7 years ago
Mike Fährmann
31cd5b1c1d
[luscious] detect high-load responses
7 years ago
Mike Fährmann
81877bb5f6
add '-K' as shortcut for '--list-keywords'
7 years ago
Mike Fährmann
f98e3e8002
[luscious] fix tag extraction
7 years ago
Mike Fährmann
65997d835b
replace popular/ranking tests with older ones
...
Metadata of several year old lists shouldn't change as much as it
would for newer ones, which makes metadata-comparisons of the output
of build_testresult_db.oy easier.
7 years ago
Mike Fährmann
c0755a4d5e
[exhentai] revert login-method to its old version ( #37 )
...
Additional cookies don't seem to help and have to be manually set
anyway. The older method is more likely to succeed, so I'd rather
use this one.
7 years ago
Mike Fährmann
3ee39ffd93
[exhentai] update login procedure ( #37 )
...
This new version behaves pretty much exactly like a browser would and
caches all cookies sent to it and not just "ipb_member_id" and
"ipb_pass_hash".
7 years ago
Mike Fährmann
07214f4007
[booru] place subcategories into base classes
7 years ago
Mike Fährmann
47bcf53ec1
implement support for additional unit test result types
...
- "pattern" matches all resulting URLs against the given regex
- "count" allows to specify the amount of returned URLs
7 years ago
Mike Fährmann
c7ec103e15
[batoto] fix extraction of chapter URLs
7 years ago
Mike Fährmann
f7cdfd4c25
add a simplified version of 'parse_qs'
...
This version only returns a dict of plain string to string key-value
pairs and ignores multiple values for the same query variable.
7 years ago
Mike Fährmann
d70c66c516
fix "text:" downloader
7 years ago
Mike Fährmann
abd7c559cd
[yonkouprod] remove module
...
Every manga chapter on this site has been removed.
7 years ago
Mike Fährmann
852e7acd31
[twitter] ignore "Promoted Tweets"
7 years ago
Mike Fährmann
6950708e52
[hentaicdn] use HTTPS
7 years ago
Mike Fährmann
493bd235cf
workaround for missing 'assert_called_once' method
...
this method was introduced in Python 3.6, but calling it still
works (i.e. it doesn't cause the test to fail) on Python 3.3/3.4
7 years ago