Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
1 year ago
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2 years ago
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
3 years ago
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
4 years ago
Mike Fährmann
e89413da22
update test results
5 years ago
Mike Fährmann
c0a1241648
[livedoor] force https:// for image URLs
5 years ago
Mike Fährmann
978cb03f81
update misc test results
...
- Livedoor now uses https:// for its image URLs
- Instagram image URLs got simplified
5 years ago
Mike Fährmann
da6789b2b0
disable unique archive id checks for some tests
...
- same image twice in a livedoor blog post
- unreliable results for related pinterest items
5 years ago
Mike Fährmann
4e8a548a61
[livedoor] update metadata extraction
5 years ago
Mike Fährmann
40c7eb3424
[livedoor] improve extraction ( fixes #301 )
5 years ago
Mike Fährmann
1cde38110d
[livedoor] return 'date' as datetime object
5 years ago
Mike Fährmann
e88824e1a7
[livedoor] fix adjustments for https:// URLs
5 years ago
Mike Fährmann
f2cf1c1d73
use 'text.extract_from()' in a few places
5 years ago
Mike Fährmann
35919a9bb8
[livedoor] add blog- and post-extractors ( #190 )
6 years ago