Mike Fährmann
42b9633c7e
update test results
5 years ago
Mike Fährmann
b28bd1c73e
[bobx] set generated session cookie ( closes #482 )
...
This reverts commit 490831f
and also restores original image downloads
by setting a randomly generated session cookie. No login required.
5 years ago
Mike Fährmann
ae09f87602
improve SharedConfigMixin config lookups
5 years ago
Mike Fährmann
b5c964332b
improve config.py test coverage
5 years ago
Mike Fährmann
f5604492c3
update interface of config functions
5 years ago
Mike Fährmann
4ca883c66f
[smugmug] replace test for custom URLs
...
The old one (http://www.creativedogportraits.com/ ) is empty and/or
no longer handled by SmugMug.
5 years ago
Mike Fährmann
d45fabb79d
match user profile handling on deviantart and newgrounds
5 years ago
Mike Fährmann
ea80dadd09
[deviantart] restore archive keys
...
Commit 9fdc5e7
changed 'username' fields to have consistent
capitalization, but that invalidated the archive keys of several
extractors where 'username' was usually lowercase.
5 years ago
Mike Fährmann
3fc1e12949
[postprocessor:metadata] filter private entries
...
i.e. keys starting with an underscore
5 years ago
Mike Fährmann
ea094692c8
[vsco] fix collection extraction ( #480 )
5 years ago
Mike Fährmann
490831f84a
[bobx] "fix" image download URLs
...
Access to original images got restricted to (paid) members only.
All that's publicly accessible now are essentially preview pictures.
5 years ago
Mike Fährmann
978cb03f81
update misc test results
...
- Livedoor now uses https:// for its image URLs
- Instagram image URLs got simplified
5 years ago
Mike Fährmann
fca87974fe
[sexcom] fix video downloads by sending specific Referer headers
5 years ago
Mike Fährmann
bbbeff4c41
[downloader.http] implement file-specific HTTP headers
5 years ago
Mike Fährmann
edc080468d
[instagram] make 'video_url' fields optional ( fixes #479 )
...
[ci skip]
5 years ago
Mike Fährmann
9fdc5e74cb
[deviantart] ensure consistent username capitalization ( #455 )
...
The 'username' field was capitalized in a very inconsistent manner:
Either all lowercase, or as given by the input URL, or with the
"original" capitalization, depending on the extractor used among
other things.
Now usernames use their original capitalization for all extractors.
('UserName' instead of 'username' or 'uSeRnAmE')
5 years ago
Mike Fährmann
b1f0609de5
[newgrounds] rewrite ( #394 )
...
- restructure extractor hierarchy
- extract more metadata
- extract videos without youtube-dl
- be more resilient to errors
TODO:
- favorites
- games, but that might be near impossible for non-flash titles
5 years ago
Mike Fährmann
3ece3976ae
[newgrounds] implement login support ( #394 )
5 years ago
Mike Fährmann
3a07c06865
[newgrounds] update
...
- create directory per post
- rename variables and methods
5 years ago
Mike Fährmann
5513b66eb0
[vsco] fix user profile extraction
5 years ago
Mike Fährmann
abfcb356fc
[flickr] support 3k, 4k, 5k, and 6k photo sizes ( closes #472 )
5 years ago
Mike Fährmann
521fcd2eb9
[imgbb] fix error in galleries without user info ( closes #471 )
5 years ago
Mike Fährmann
8061263d4c
[imgbb] improve pagination logic
...
- avoid unnecessary API calls for small or empty galleries
- combine duplicate code
5 years ago
Mike Fährmann
da6789b2b0
disable unique archive id checks for some tests
...
- same image twice in a livedoor blog post
- unreliable results for related pinterest items
5 years ago
Mike Fährmann
67e54ed8ea
release version 1.11.1
5 years ago
Mike Fährmann
ce98a86c0e
fix data file inclusion in source distributions
5 years ago
Mike Fährmann
6c86fbfe2a
release version 1.11.0
5 years ago
Mike Fährmann
94a94f3b86
miscellaneous stuff
5 years ago
Mike Fährmann
b0197098e6
[imgur] get title from webpage if missing in API response
...
(closes #467 )
5 years ago
Mike Fährmann
dd5d2b2eac
[deviantart] add user profile extractor ( #377 , #419 )
5 years ago
Mike Fährmann
a437e78620
[deviantart] minimize cookie usage during scraps extraction
...
(#445 )
5 years ago
Mike Fährmann
1a197d2195
store the original cookiejar as Extractor._cookiejar
5 years ago
Mike Fährmann
de83ae4576
make 'method' argument of Extractor.request keyword-only
5 years ago
Mike Fährmann
a5be08a830
[downloader:ytdl] forward proxy settings
5 years ago
Mike Fährmann
4325695d74
[luscious] expand GraphQL queries
5 years ago
Mike Fährmann
94dbdbf506
[nijie] change default filename format
...
… to be consistent with Pixiv filenames
5 years ago
Mike Fährmann
9e88e7a344
[postprocessor:exec] improve ( #421 , #413 )
...
- add 'final' option
- include job status in pp finalization
- improve and extend documentation
5 years ago
Mike Fährmann
c18fadc221
[instagram] extract videos without youtube-dl ( #391 )
5 years ago
Mike Fährmann
f15eedb634
[sexcom] set Referer header for file downloads ( closes #464 )
5 years ago
Mike Fährmann
2a3bd4e3c7
rename extractor classes starting with a digit
5 years ago
Mike Fährmann
b3b9da6d74
[photobucket] replace test URL
...
The other user deleted all of is images.
5 years ago
Mike Fährmann
64786363be
[4chan] simplify
...
- remove 'chan.py'
- slight adjustments to directory and filenames
5 years ago
Mike Fährmann
557e2c018b
[8chan] remove module
5 years ago
Mike Fährmann
e14782a948
[instagram] simplify graphql extraction for post pages
5 years ago
Mike Fährmann
c01ff78467
[twitter] extend 'videos' option to force extraction with ytdl
...
(closes #459 )
5 years ago
Mike Fährmann
f8ac67ce50
[hitomi] extend URL pattern + follow redirects
5 years ago
Mike Fährmann
e877ca97c3
[naver] adjust directory names and metadata structure
5 years ago
Mike Fährmann
702f2fbd1f
[issuu] add publication and user extractors ( #413 )
5 years ago
Mike Fährmann
8361d874d7
[hitomi] fix extraction
5 years ago
Mike Fährmann
5fa6ff04dd
[instagram] extract '__additionalDataLoaded' ( #391 )
...
The '_sharedData' of Post pages is missing its 'graphql' part for
logged in users. This data is now included in the parameters of a
function call to '__additionalDataLoaded(...)'
And, of course, video extraction with youtube-dl broke because of
this change as well.
5 years ago
Mike Fährmann
5af291ba5c
include failed downloads and child extractors in exit status
5 years ago
Mike Fährmann
322c2e7ed4
renaming variables
...
mostly 'keyword(s)' to 'kwdict'
5 years ago
Mike Fährmann
87a87bff7e
[simplyhentai] fix image URLs
5 years ago
Mike Fährmann
4409d00141
embed error messages in StopExtraction exceptions
5 years ago
Mike Fährmann
d5e3910270
adjust 'util.raises()'
5 years ago
Mike Fährmann
d44f790e81
adjust output for HTTP status related errors
5 years ago
Mike Fährmann
03e0cec715
return with non-zero exit status on error
5 years ago
Mike Fährmann
c887493a80
overhaul exception stuff
5 years ago
Mike Fährmann
109718a5e3
[blogger] add blog and post extractors ( closes #364 )
5 years ago
Mike Fährmann
244d396b0b
add '--ugoira-conv-lossless' command-line option ( #432 )
...
and cleanup the arguments for the regular '--ugoira-conv':
- remove '-an'
- enable two-pass encoding
5 years ago
Mike Fährmann
49a6b1b6c0
[twitter] extract video stream info without youtube-dl ( #452 )
...
This should allow video downloads when logged in without
'forward-cookies' disabled and from protected tweets.
youtube-dl still gets used to download HLS playlists, but the data
extraction part, which doesn't work with youtube-dl at the moment,
now gets handled by gallery-dl itself.
5 years ago
Mike Fährmann
9f0dbf2a72
[twitter] raise proper exception for protected Tweets
5 years ago
Mike Fährmann
083e14ad9a
[downloader:ytdl] add data from '_ytdl_extra' to info_dicts
5 years ago
Mike Fährmann
6e08ada4fe
[luscious] simplify some metadata entries
5 years ago
Mike Fährmann
9e3a8607ee
[deviantart] update usernames ( #455 )
...
In the case that a user changed his username, requesting deviations
with an old name might cause problems (missing deviations, etc.)
The internal 'username' value therefore now gets updated to the
current username taken from the user profile.
5 years ago
Mike Fährmann
2eb38810c5
[twitter] fix image extraction when logged in ( #452 )
...
... for individual tweets.
To get a Tweet page with the old Twitter layout, an Internet
Explorer User-Agent (e.g. Mozilla/5.0 (Windows NT 6.1; WOW64;
Trident/7.0; rv:11.0) like Gecko) as well as a Referer header
pointing to the page itself is required. The "app_shell_visited"
cookie appears to be optional at the moment, but that is what
a regular web browser would send.
5 years ago
Mike Fährmann
8f38a35b91
[imgur] use API with "public" client_id ( #446 )
...
Using the API endpoints makes it possible to access NSFW content
without logging in.
5 years ago
Mike Fährmann
b23c822b23
[luscious] use GraphQL
5 years ago
Mike Fährmann
ef17d94469
update test results
5 years ago
Mike Fährmann
2057c6ba29
[naver] add blog and post extractors ( closes #447 )
5 years ago
Mike Fährmann
389d2d7e38
implement 'cookies-update' option ( #445 )
5 years ago
Mike Fährmann
fbc0a6a059
[nozomi] skip unavailable posts ( #388 )
5 years ago
Mike Fährmann
ae98dbcbb3
[nozomi] implement searching for negated terms ( #388 )
...
It's incredibly slow and resource intensive (> 1GB of memory),
but that is also how it is implemented on nozomi.la itself.
5 years ago
Mike Fährmann
1c03a389df
[twitter] small improvements to search extractor
...
- put search results in separate directories
- set 'max_position' to '-1' for first request
-> prevent duplicate results
- add a test
- flake8
5 years ago
Mike Fährmann
c3042978b8
[deviantart] match "/gallery/all" ( closes #449 )
5 years ago
Alice
bcddcca6db
Add search downloading to twitter.py ( #448 )
...
Adds the functionality to download search results on twitter.com/search. Since twitter only allows downloading of up to 3,200 of a users most recent tweets, you will be unable to download old images from users with a lot of tweets. To bypass this, you can use the twitter search to get the tweets from the sections in time you were stopped at. An example search would be "from:user since:2015-01-01 until:2016-01-01 filter:images". The URL you would use will look something like this https://twitter.com/search?f=tweets&q=from%3Asupernaturepics%20since%3A2015-01-01%20until%3A2016-01-01%20filter%3Aimages&src=typd&lang=en
The _tweets_from_api function had to be changed because it would not get the next page of results using the last "data-tweet-id". It would return the same JSON but with a "min_position" string added. Using this string for the "max_position" param from the second page onwards correctly returned the next pages. This change does not interfere with how the other extractors work as far as I know. The 2 regex patterns in the extractors had to be changed to not match the search URL.
5 years ago
Mike Fährmann
1693d97bd3
update extractor class hierarchies
...
- let the GalleryExtractor class inherit directly from Extractor
- make ChapterExtractor a subclass of GalleryExtractor
- change enumeration field names of GalleryExtractors to 'num'
5 years ago
Mike Fährmann
7ebd984e8d
[imgur] print error message if no JSON data is found ( #446 )
5 years ago
Mike Fährmann
5882b00f2f
[imgur] implement login support ( #446 )
5 years ago
Mike Fährmann
91643ca54b
[nozomi] add search extractor ( #388 )
5 years ago
Mike Fährmann
df2b3c6888
restore OAuth2 authentication error messages
5 years ago
Mike Fährmann
6779512fc7
[nozomi] add post and tag extractors ( #388 )
5 years ago
Mike Fährmann
6abe5f5bbb
[patreon] fix pagination ( #444 )
...
The Patreon-provided URLs for the next set of posts aren't
always complete, i.e. they can be missing their scheme and
the subsequent double slash: "www.patreon.com/…"
5 years ago
Mike Fährmann
ff1e4a86aa
release version 1.10.6
5 years ago
Mike Fährmann
d4ffd6c952
[yaplog] improve metadata extraction ( #443 )
...
- provide a fallback if there is no numerical image ID
- add a 'filename' field
- convert 'date' to an actual datetime object
5 years ago
Mike Fährmann
15af2f8464
[hitomi] fallback to /reader/ page if main page returns 404
...
Some galleries return a 404: Not Found error when trying to access
them through the main gallery URL, but their content is still
available on the respective /reader/ page.
5 years ago
Mike Fährmann
8af59a4bba
fix & update docs
...
- update Requests links
- add example for --exec
- set '-dev' version
5 years ago
Mike Fährmann
dc6ad81e2e
[yaplog] prevent crash on empty posts ( #443 )
5 years ago
Mike Fährmann
94eb7c6cad
[deviantart] fix sta.sh extraction (436)
5 years ago
Mike Fährmann
1032cfa34b
[downloader:http] extend mimetype map with archive formats
5 years ago
Mike Fährmann
27b5b2497e
[deviantart] fix download URLs ( #436 )
...
... except for sta.sh content.
Instead of using the old '/api/v1/oauth2/deviation/download' endpoint,
which started delivering URLs to 404 pages a while ago,
it is also possible to get a download URL from the relatively new
'/_napi/da-browse/shared_api/deviation/extended_fetch' endpoint
used by DeviantArt's Eclipse interface.
The current strategy is therefore:
- Iterate over deviations using the OAuth2 API
- Fetch original download URLs with the new NAPI/Shared API
5 years ago
Mike Fährmann
93aac8dfea
[yaplog] fix incomplete image URLs ( #443 )
5 years ago
Mike Fährmann
a782b009b8
[yaplog] match blog names with '-' ( #443 )
5 years ago
Mike Fährmann
cf5e716b9d
[hitomi] fix image URLs
5 years ago
Mike Fährmann
ad81c07204
[postprocessor] match logger names of downloader modules
...
The logger name for a postprocessor object got changed to
"postprocessor.<module-name>" instead of just
"postprocessor"
5 years ago
Mike Fährmann
03bc8adfc7
[postprocessor:exec] run after file moved to target location
...
(#421 )
5 years ago
Mike Fährmann
35958bebd4
[postprocessor:exec] fix filename quoting on Windows ( #421 )
5 years ago
Mike Fährmann
b06c372e4d
[postprocessor:exec] improve; add command-line option ( #421 )
5 years ago
Mike Fährmann
5a54efa025
[xhamster] unescape 'title' and 'description'
5 years ago
Mike Fährmann
1b9bf4fc6e
[behance] fix 'tags' extraction
5 years ago
Mike Fährmann
bb97e87989
[komikcast] ignore banner image
5 years ago
Mike Fährmann
0ff90a3f7d
[gfycat] include title in default filenames ( closes #434 )
5 years ago
Mike Fährmann
fabdc3b0c6
release version 1.10.5
5 years ago
Mike Fährmann
de4e2029d1
[nsfwalbum] update test album
...
the old one is no longer available
5 years ago
Mike Fährmann
1faec285d1
[nijie] further improvements ( closes #423 )
...
- provide a 'user_name' metadata field
- usually the same as 'artist_id', except for favorite downloads
- extract the whole description text and properly escape HTML entities
- fixed an issue with titles or tags containing double quotes
5 years ago
Mike Fährmann
6d0a533d68
[reddit] respect 'comments:0' for single submissions ( #429 )
5 years ago
Mike Fährmann
803d8f814e
[oauth] update scope for reddit tokens ( #428 )
...
'/user/<username>/...' requires the 'history' scope to be accessible
(https://www.reddit.com/dev/api/#GET_user_{username}_{where} )
5 years ago
Mike Fährmann
46ba173ded
[reddit] fix documentation inconsistencies ( closes #429 )
...
- Require 'reddit.comments' to be a number and convert it to an
integer to be extra sure
- Link to the README's OAuth section were appropriate
5 years ago
Mike Fährmann
20eb6c401f
[nijie] improvements and fixes ( #423 )
...
- ignore unavailable image pages
- more metadata fields: artist_name, date, tags
- rename 'index' to 'num'
- improved code structure
5 years ago
Mike Fährmann
d1ea08c67d
[weibo] fixes and improvements
...
- ignore unavailable videos (fixes #427 )
- handle empty 'geo' fields
- consistent metadata fields for images and videos
5 years ago
Mike Fährmann
38d97f3da6
[deviantart] add debug message about API credentials ( #424 )
5 years ago
Mike Fährmann
80c2104fb5
[deviantart] fix 429 handling if 'fatal' is False ( closes #424 )
5 years ago
Mike Fährmann
913460240d
[reddit] fix 'extractor.blacklist()' arguments
...
The second argument must support 'append()'.
5 years ago
Mike Fährmann
22bac14452
[pixiv] match '/artworks/' URLs
5 years ago
Mike Fährmann
66cac207ac
[twitter] match and use 'i/web' status URLs
5 years ago
Mike Fährmann
946f2751e2
[reddit] add 'user' extractor ( closes #350 )
5 years ago
Mike Fährmann
c14abb9fb8
[reddit] improve URL parameter handling for subreddit links
5 years ago
Mike Fährmann
ee8b654464
[instagram] implement 'highlights' option ( closes #329 )
5 years ago
Mike Fährmann
f63c3097a9
[instagram] rework some code paths
...
- combine fetching an HTML page and extracting its 'shared_data'
- move 'shared_data' and field access info out of '_extract_page()'
- introduce a '_request_graphql()' method
5 years ago
Mike Fährmann
4330133114
[imgur] add 'favorite' extractor ( closes #420 )
...
… and use a newer site-internal API endpoint for user posts
5 years ago
Mike Fährmann
ee5e20221f
[imgth] fix image URLs
5 years ago
Mike Fährmann
b63b126808
[hentaicafe] extend URL pattern
5 years ago
Mike Fährmann
d780f0357e
[imgur] add user extractor
5 years ago
Mike Fährmann
11ea689013
[simplyhentai] fix image and video URLs
5 years ago
Mike Fährmann
15632a1570
[tsumino] fix extraction
5 years ago
Mike Fährmann
d92802fd37
[luscious] fix detection of unavailable galleries
5 years ago
Mike Fährmann
f99da2b866
[imgbb] detect invalid album and user profile links
...
and update test results, since the old album got deleted
5 years ago
Mike Fährmann
01bc7adadc
[deviantart] improve journal detection ( #419 )
...
Some journal-like posts are not reported to be journals (isJournal
is set to False), even though they have a textContent field.
https://www.deviantart.com/gliitchlord/art/brashstrokes-812942668
5 years ago
Mike Fährmann
776e9e073f
close archive on job completion ( #417 )
5 years ago
Mike Fährmann
5ac9732adc
call 'sys.exit()' on Ctrl+c
5 years ago
Mike Fährmann
9178b54eae
handle errors when opening download archive file ( #417 )
5 years ago
Mike Fährmann
6e12907de6
[deviantart] improve handling of private deviations ( #414 )
...
- don't try to call '/deviation/metadata' with an empty list of
deviation ids
- print a warning when detecting private deviations without having
a 'refresh-token'
5 years ago
Mike Fährmann
4203931d79
release version 1.10.4
5 years ago
Mike Fährmann
e7690ac694
[vsco] update URL pattern ( closes #410 )
5 years ago
Mike Fährmann
1848788970
update test results etc
5 years ago
Mike Fährmann
d5fbb2d9de
[tumblr] ignore audio links from Spotify etc.
5 years ago
Mike Fährmann
b1cddce865
Revert "[simplyhentai] fix extraction; remove image+video extractors"
...
This reverts commit d1db5180ab
.
5 years ago
Mike Fährmann
d23660c04d
[hentaicafe] restore default 'request()' behavior
5 years ago
Mike Fährmann
9ae58a6b3e
[exhentai] update image limit checks
...
- adjust cost of original images
- delay limit initialization until gallery and first image page have
been requested and all cookies are available
5 years ago
Mike Fährmann
6fe9a134bf
[lineblog] add blog and post extractors ( closes #404 )
5 years ago
Mike Fährmann
4e8a548a61
[livedoor] update metadata extraction
5 years ago
Mike Fährmann
f9285f99e6
[pixiv] fix authentication
5 years ago
Mike Fährmann
6f3df3999a
[fuskator] add gallery and search extractor ( closes #407 )
5 years ago
Mike Fährmann
bc0ca66c99
[twitter] small improvements
...
- handle reply tweets (#403 )
- unset cookies in Tweet extractor to "force" the legacy interface
5 years ago
Mike Fährmann
682105b8ee
prevent crash when loading unavailable downloader ( #405 )
5 years ago
Mike Fährmann
5fcebb69c2
[postprocessor:ugoira] improve error messages ( #406 )
5 years ago
Mike Fährmann
f02a768b5c
[danbooru] add 'ugoira' option ( #406 )
...
to choose between ZIP archives or converted video files
for Ugoira posts
5 years ago
Mike Fährmann
9646ccb320
release version 1.10.3
5 years ago
Mike Fährmann
dedea3b4db
[deviantart] fix journal creation ( #400 )
5 years ago
Mike Fährmann
c6c5cb1898
improve 'deviantart.quality' description
5 years ago