Mike Fährmann
0cd157300e
[patreon] fix regex pattern for posts
...
The previous one would match the first number in the URL slug as
post ID, which would fail for posts with numbers in their title.
5 years ago
Mike Fährmann
fe19e233f3
[xvideos] improve
...
- derive from GalleryExtractor
- match '…-channels' URLs
- "better" metadata structure
5 years ago
Mike Fährmann
d3e44e899d
raise NotFoundErrors for 404 responses in GalleryExtractors
5 years ago
Mike Fährmann
a4dd8b3dab
improve _check_cookies()
...
Only loop over all cookies once instead of calling
cookiejar._find() for each cookie name.
5 years ago
Mike Fährmann
76e60d10a6
[patreon] raise proper exception if creator/post doesn't exist
5 years ago
Mike Fährmann
9e63804347
[patreon] make retrieving user info nonfatal ( #508 )
...
… and fall back to the included data if an error occurs.
5 years ago
Mike Fährmann
964dc57286
[vsco] improve image resolutions
...
https://im.vsco.co/ URLs redirect to the appropriate CDN server
and occasionally insert a '/1200x1600/' into the image path,
limiting image dimensions.
This commit constructs redirect targets out of the given
im,vsco.co URLs without sending extra HTTP requests
and without any "builtin" resolution restrictions.
5 years ago
Mike Fährmann
0629fe8fa4
[vsco] fix user profile extraction … again
...
Given the pattern from last time, collections will also change
in due time and use cursor-based pagination.
5 years ago
Mike Fährmann
ab17ea9632
[deviantart] only print warning if 'original' is enabled
5 years ago
Mike Fährmann
2188db6284
[gelbooru] fix non-API tag extraction
5 years ago
Mike Fährmann
c4702ec9b6
simplify some logging calls
5 years ago
Gio
c0b9ad678d
Separate metadata from handle_url into handle_metadata, commenting
5 years ago
Mike Fährmann
c9ef1b21c3
[patreon] get partial user info without /api/user/<id> ( #507 )
...
It's a lot less data, but doesn't invoke any additional
HTTP requests with potential Cloudflare CAPTCHAs.
5 years ago
Mike Fährmann
0ab9bb1721
[4chan] add extractor for entire boards ( closes #510 )
5 years ago
Mike Fährmann
c59b98c81b
[downloader:http] improve rate limit handling
...
- Move the download "logic" with rate limit checks into its own
method that only gets used if a rate limit should be enforced
- Fix an issue where suspending gallery-dl during a download would
basically ignore the rate limit for the remaining download when
resuming its execution.
5 years ago
Mike Fährmann
bbbafc1c24
[downloader:http] catch both possible SSLException instances
...
With pyOpenSSL installed, but disabled, the SSLError exception
would be set to the one from pyOpenSSL, which could never get raised.
This commit solves this problem by catching both, the native SSLError
exception as well as the one from pyOpenSSL (if available.1)
5 years ago
Gio
c20bb5c338
Naming convention, as per travis.
5 years ago
Gio
6ed4fc07ff
Don't print intentional metadata skips to the console.
5 years ago
Gio
cfc70a97ab
Added an additional channel for downloading the metadata of an entire post or gallery.
5 years ago
Mike Fährmann
f451be48c3
release version 1.12.0
5 years ago
Mike Fährmann
15f9bb3d14
add option to disable pyOpenSSL usage ( #508 )
...
(pyOpenSSL is now disabled by default)
5 years ago
Mike Fährmann
c8e99e3b3b
[deviantart] fix crash on missing "token" field ( #505 )
5 years ago
Mike Fährmann
6ed2c7823c
[deviantart] disable original downloads if no cookies set
...
For 'deviation' and 'scraps' extractors only, since original file
downloads for those two will always fail with a 404 Not Found
when not logged in.
5 years ago
Mike Fährmann
50deab5265
[deviantart] fix URL generation from /extended_fetch results
...
(closes #505 )
5 years ago
Mike Fährmann
1f209da4c0
[pixiv] match new search URLs ( closes #507 )
5 years ago
Mike Fährmann
e17907ee2a
change default value of 'cookies-update' to 'true'
5 years ago
Mike Fährmann
07dafad26d
[twitter] attempt to fix infinite loops ( #499 )
...
(Hopefully this doesn't break anything else)
5 years ago
Mike Fährmann
71acbdabf4
[2chan] fix metadata extraction
5 years ago
Mike Fährmann
c0a1241648
[livedoor] force https:// for image URLs
5 years ago
Mike Fährmann
6e23c0da09
[imgur] add extractor for subreddit links ( closes #500 )
5 years ago
Mike Fährmann
38c05df290
[oauth] add custom/default indicator to log messages ( #501 )
5 years ago
Mike Fährmann
372ffe95ee
[oauth] adjust Flickr redirect URI ( fixes #503 )
...
Flickr now automatically forces https:// for all redirect URIs.
5 years ago
Mike Fährmann
004812258d
[hentaifox] fix extraction
5 years ago
Mike Fährmann
e2710702d4
fix Cloudflare bypss
5 years ago
Mike Fährmann
8759403f37
[plurk] add delay between comment requests
5 years ago
Mike Fährmann
a28552fd19
update test results
...
- hbrowse: one tag got removed
- mangoxo: gallery changed owner
- photobucket: ?, but photo still downloads
5 years ago
Mike Fährmann
dcaa3d01bd
[imagefap] adapt to new image URL format
5 years ago
Mike Fährmann
e62c209ca0
[nijie] fix 'date' parsing
5 years ago
Mike Fährmann
3bba763ab9
[twitter] improve
...
- update metadata structure
- combine all user… entries into their own dict
- let 'user' always specify the Timeline owner
- add 'author' entry that specifies the original Tweet author
- create directories per post (closes #491 )
- fix username issues with /i/web/ URLs
5 years ago
Mike Fährmann
26d2334550
[postprocessor:metadata] rename 'format' to 'content-format'
...
Just to be consistent with the other 'extension-format' option name,
and only 'format' is also still accepted.
5 years ago
Mike Fährmann
a412531451
[postprocessor:metadata] implement 'extension-format' option
...
closes #477
5 years ago
Mike Fährmann
0f1538af78
split filename formatting into its own function
5 years ago
Mike Fährmann
db35c3b581
[directlink] separate filenames from paths
...
With this, all default filename formats specify an '{extension}'
and PathFormat.set_extension() reliably works for all files.
5 years ago
Mike Fährmann
41a3169c67
[foolfuuka] use '{extension}' in default filename format
5 years ago
Mike Fährmann
e9aed62c91
[imgur] unescape image titles
5 years ago
Mike Fährmann
bca2222559
add '--exec-after'
5 years ago
Mike Fährmann
ed6592ea1a
remove '--abort-on-skip'
5 years ago
Mike Fährmann
2c332edaad
[plurk] fix comment pagination
5 years ago
Mike Fährmann
a3fa45bbb1
[behance] get images from 'media_collection' modules
5 years ago
Mike Fährmann
359c3bc1c5
[deviantart] revert to getting download URLs from OAuth API
...
This commit (partially) reverts 27b5b24
, 94eb7c6
, and a437e78
.
Download URLs from the 'extended_fetch' endpoint are now only
usable for logged in users, while those from the respective
OAuth API endpoint are working again. Everything except
scraps and direct deviation links should be fixed, and those
two categories will work with exported cookies. (#488 )
TODO:
- "native" login with --username and --password
- better handling of internally stored cookies
5 years ago
Mike Fährmann
42b9633c7e
update test results
5 years ago
Mike Fährmann
b28bd1c73e
[bobx] set generated session cookie ( closes #482 )
...
This reverts commit 490831f
and also restores original image downloads
by setting a randomly generated session cookie. No login required.
5 years ago
Mike Fährmann
ae09f87602
improve SharedConfigMixin config lookups
5 years ago
Mike Fährmann
b5c964332b
improve config.py test coverage
5 years ago
Mike Fährmann
f5604492c3
update interface of config functions
5 years ago
Mike Fährmann
4ca883c66f
[smugmug] replace test for custom URLs
...
The old one (http://www.creativedogportraits.com/ ) is empty and/or
no longer handled by SmugMug.
5 years ago
Mike Fährmann
d45fabb79d
match user profile handling on deviantart and newgrounds
5 years ago
Mike Fährmann
ea80dadd09
[deviantart] restore archive keys
...
Commit 9fdc5e7
changed 'username' fields to have consistent
capitalization, but that invalidated the archive keys of several
extractors where 'username' was usually lowercase.
5 years ago
Mike Fährmann
3fc1e12949
[postprocessor:metadata] filter private entries
...
i.e. keys starting with an underscore
5 years ago
Mike Fährmann
ea094692c8
[vsco] fix collection extraction ( #480 )
5 years ago
Mike Fährmann
490831f84a
[bobx] "fix" image download URLs
...
Access to original images got restricted to (paid) members only.
All that's publicly accessible now are essentially preview pictures.
5 years ago
Mike Fährmann
978cb03f81
update misc test results
...
- Livedoor now uses https:// for its image URLs
- Instagram image URLs got simplified
5 years ago
Mike Fährmann
fca87974fe
[sexcom] fix video downloads by sending specific Referer headers
5 years ago
Mike Fährmann
bbbeff4c41
[downloader.http] implement file-specific HTTP headers
5 years ago
Mike Fährmann
edc080468d
[instagram] make 'video_url' fields optional ( fixes #479 )
...
[ci skip]
5 years ago
Mike Fährmann
9fdc5e74cb
[deviantart] ensure consistent username capitalization ( #455 )
...
The 'username' field was capitalized in a very inconsistent manner:
Either all lowercase, or as given by the input URL, or with the
"original" capitalization, depending on the extractor used among
other things.
Now usernames use their original capitalization for all extractors.
('UserName' instead of 'username' or 'uSeRnAmE')
5 years ago
Mike Fährmann
b1f0609de5
[newgrounds] rewrite ( #394 )
...
- restructure extractor hierarchy
- extract more metadata
- extract videos without youtube-dl
- be more resilient to errors
TODO:
- favorites
- games, but that might be near impossible for non-flash titles
5 years ago
Mike Fährmann
3ece3976ae
[newgrounds] implement login support ( #394 )
5 years ago
Mike Fährmann
3a07c06865
[newgrounds] update
...
- create directory per post
- rename variables and methods
5 years ago
Mike Fährmann
5513b66eb0
[vsco] fix user profile extraction
5 years ago
Mike Fährmann
abfcb356fc
[flickr] support 3k, 4k, 5k, and 6k photo sizes ( closes #472 )
5 years ago
Mike Fährmann
521fcd2eb9
[imgbb] fix error in galleries without user info ( closes #471 )
5 years ago
Mike Fährmann
8061263d4c
[imgbb] improve pagination logic
...
- avoid unnecessary API calls for small or empty galleries
- combine duplicate code
5 years ago
Mike Fährmann
da6789b2b0
disable unique archive id checks for some tests
...
- same image twice in a livedoor blog post
- unreliable results for related pinterest items
5 years ago
Mike Fährmann
67e54ed8ea
release version 1.11.1
5 years ago
Mike Fährmann
ce98a86c0e
fix data file inclusion in source distributions
5 years ago
Mike Fährmann
6c86fbfe2a
release version 1.11.0
5 years ago
Mike Fährmann
94a94f3b86
miscellaneous stuff
5 years ago
Mike Fährmann
b0197098e6
[imgur] get title from webpage if missing in API response
...
(closes #467 )
5 years ago
Mike Fährmann
dd5d2b2eac
[deviantart] add user profile extractor ( #377 , #419 )
5 years ago
Mike Fährmann
a437e78620
[deviantart] minimize cookie usage during scraps extraction
...
(#445 )
5 years ago
Mike Fährmann
1a197d2195
store the original cookiejar as Extractor._cookiejar
5 years ago
Mike Fährmann
de83ae4576
make 'method' argument of Extractor.request keyword-only
5 years ago
Mike Fährmann
a5be08a830
[downloader:ytdl] forward proxy settings
5 years ago
Mike Fährmann
4325695d74
[luscious] expand GraphQL queries
5 years ago
Mike Fährmann
94dbdbf506
[nijie] change default filename format
...
… to be consistent with Pixiv filenames
5 years ago
Mike Fährmann
9e88e7a344
[postprocessor:exec] improve ( #421 , #413 )
...
- add 'final' option
- include job status in pp finalization
- improve and extend documentation
5 years ago
Mike Fährmann
c18fadc221
[instagram] extract videos without youtube-dl ( #391 )
5 years ago
Mike Fährmann
f15eedb634
[sexcom] set Referer header for file downloads ( closes #464 )
5 years ago
Mike Fährmann
2a3bd4e3c7
rename extractor classes starting with a digit
5 years ago
Mike Fährmann
b3b9da6d74
[photobucket] replace test URL
...
The other user deleted all of is images.
5 years ago
Mike Fährmann
64786363be
[4chan] simplify
...
- remove 'chan.py'
- slight adjustments to directory and filenames
5 years ago
Mike Fährmann
557e2c018b
[8chan] remove module
5 years ago
Mike Fährmann
e14782a948
[instagram] simplify graphql extraction for post pages
5 years ago
Mike Fährmann
c01ff78467
[twitter] extend 'videos' option to force extraction with ytdl
...
(closes #459 )
5 years ago
Mike Fährmann
f8ac67ce50
[hitomi] extend URL pattern + follow redirects
5 years ago
Mike Fährmann
e877ca97c3
[naver] adjust directory names and metadata structure
5 years ago
Mike Fährmann
702f2fbd1f
[issuu] add publication and user extractors ( #413 )
5 years ago
Mike Fährmann
8361d874d7
[hitomi] fix extraction
5 years ago
Mike Fährmann
5fa6ff04dd
[instagram] extract '__additionalDataLoaded' ( #391 )
...
The '_sharedData' of Post pages is missing its 'graphql' part for
logged in users. This data is now included in the parameters of a
function call to '__additionalDataLoaded(...)'
And, of course, video extraction with youtube-dl broke because of
this change as well.
5 years ago