Mike Fährmann
ae9a37a528
implement text.split_html()
6 years ago
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
7 years ago
Mike Fährmann
4ffa94f634
remove 'shorten_path()' and 'shorten_filename()'
7 years ago
Mike Fährmann
27eab4e467
rewrite text tests and improve functions
...
- test more edge cases
- consistently return an empty string for invalid arguments
- remove the ungreedy-flag in 'remove_html()'
7 years ago
Mike Fährmann
e3f2bd4087
add tests for 'text.clean_xml()' and improve it
7 years ago
Mike Fährmann
6d8b191ea7
improve 'parse_query()' and add tests
...
- another irrelevant micro-optimization !
- use urllib.parse.parse_qsl directly instead of parse_qs, which
just packs the results of parse_qsl in a different data structure
- reduced memory requirements since no additional dict and lists are
created
7 years ago
Mike Fährmann
731ffd4986
improve text.filename_from_url() performance
...
- urlsplit() is faster than urlparse()
- rpartition() is faster than rindex() + slicing
- new version is 2.3 times as fast
7 years ago
Mike Fährmann
f7cdfd4c25
add a simplified version of 'parse_qs'
...
This version only returns a dict of plain string to string key-value
pairs and ignores multiple values for the same query variable.
7 years ago
Mike Fährmann
e5f79ae839
[deviantart] add support for all media types
...
- this includes
- images
- videos
- flash-animations
- journals
- also renamed some of the extractors
- User -> Gallery
- Image -> Deviation
7 years ago
Mike Fährmann
ed94d9b92d
fix/improve various things
8 years ago
Mike Fährmann
619c74159a
[seiga] fix file extension and xml parsing
...
- The file extension of the first image had been used for all further
images
- API responses can contain invalid characters, which cause the XML
parser to fail (http://seiga.nicovideo.jp/user/illust/26377934
contains several \x08 characters)
8 years ago
Mike Fährmann
4f123b8513
code adjustments according to pep8
8 years ago
Mike Fährmann
8780abcc77
fix a small spelling error
8 years ago
Mike Fährmann
00074a71d7
several changes to make travis build work
...
- fixed html.unescape not being available on Python3.3
- removed inconsistent test result
- added username/password pairs for authenticating extractors
8 years ago
Mike Fährmann
91c446805b
replace platform.system() with os.name
8 years ago
Mike Fährmann
8a49a28d13
replace deprecated 'unescape' method
9 years ago
Mike Fährmann
99b4fbb081
implement text.extract_iter
9 years ago
Mike Fährmann
7fd284a705
always provide lowercase fileextensions
9 years ago
Mike Fährmann
ca523b9f64
add helper method to text module
9 years ago
Mike Fährmann
d0bebd9ce3
allow adding values to existing dict
9 years ago
Mike Fährmann
629133a27a
document text.extract
9 years ago
Mike Fährmann
692d0c95cc
reimplement text.extract_all
9 years ago
Mike Fährmann
db479f881d
implement text.shorten_path/filename methods
9 years ago
Mike Fährmann
89f938ee55
handle non string-like arguemnts for clean_path
9 years ago
Mike Fährmann
c5801c9770
combine text related functions in new module
9 years ago