Mike Fährmann
b9bfa4c675
update extractor test results
4 years ago
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
4 years ago
Mike Fährmann
cb0132e441
[khinsider] add 'format' option ( closes #840 )
4 years ago
Mike Fährmann
19ae6f3fc4
update test results
...
- twitter:
Don't test the whole kwdict, only the actual content, since the
keyword hash changes whenever that user changes his display name.
- khinsider:
Download host changed
5 years ago
Mike Fährmann
6426e3efc7
[khinsider] fix and improve metadata extraction
5 years ago
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
6 years ago
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
6 years ago
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
6 years ago
Mike Fährmann
00dc37ccbf
replace AsynchronousMixin Extractor with a Mixin
6 years ago
Mike Fährmann
95392554ee
use text.urljoin()
6 years ago
Mike Fährmann
179bcdd349
adjust archive-ids
7 years ago
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
7 years ago
Mike Fährmann
444008a14a
[khinsider] use urljoin() to complete page URLs
7 years ago
Mike Fährmann
291369eab2
various smaller changes/additions
7 years ago
Mike Fährmann
d275b1d9a3
[khinsider] fix extraction
...
... again
7 years ago
Mike Fährmann
2b9a783fc7
[khinsider] fix extraction
7 years ago
Mike Fährmann
55c64cad4b
[khinsider] fix filename extension and test-pattern
7 years ago
Mike Fährmann
65c1c53eb8
[khinsider] fix extraction
7 years ago
Mike Fährmann
68a0a7579c
fix/improve some regular expressions
7 years ago
Mike Fährmann
85a2b2ae59
[khinsider] fix extraction
7 years ago
Mike Fährmann
84d4450410
[fallenangels] extract manga metadata
7 years ago
Mike Fährmann
c184e47ee3
put common directory- and filename formats in base classes
7 years ago
Mike Fährmann
13dc5d72bc
update some extractors to use https
8 years ago
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
8 years ago
Mike Fährmann
37d4d07d9b
compatibility fixes to make a standalone exe work
8 years ago
Mike Fährmann
828aedd571
[khinsider] unescape soundtrack title
8 years ago
Mike Fährmann
56d810c896
update keyword hashes for tests
8 years ago
Mike Fährmann
19c2d4ff6f
remove explicit (sub)category keywords
8 years ago
Mike Fährmann
49a05c32ed
add missing tests
8 years ago
Mike Fährmann
d7e168799d
consistent extractor naming scheme + docstrings
8 years ago
Mike Fährmann
000df8d1fa
add 'encoding' argument for Extractor.request
8 years ago
Mike Fährmann
2b15b81673
[khinsider] add extractor
9 years ago