[naver] EUC-KR encoding issue in old image URLs Fix

Around October 2010, the image server URL format and file name
encoding changed from EUC-KR to UTF-8.
Modified to detect old URL format and decode image URLs into EUC-KR

- (lint with flake8) Customize conditions
  Wrap lines smaller than 79 characters

- (lint with flake8) Customize conditions (2nd try)
  - One import per line
  - Indent on consecutive lines

- (lint with flake8) Customize conditions (3rd try)
  - E128 continuation line under-indented for visual indent
  - E123 closing bracket does not match indentation of opening bracket's line

- Update naver.py
  Check encoding for all image URLs
pull/5126/head
Johann Hong 8 months ago committed by Mike Fährmann
parent 36fc510d3a
commit f64fb8f239
No known key found for this signature in database
GPG Key ID: 5680CA389D365A88

@ -10,6 +10,7 @@
from .common import GalleryExtractor, Extractor, Message
from .. import text
from urllib.parse import unquote
class NaverBase():
@ -63,7 +64,13 @@ class NaverPostExtractor(NaverBase, GalleryExtractor):
def images(self, page):
return [
(url.replace("://post", "://blog", 1).partition("?")[0], None)
(unquote(url, encoding="EUC-KR")
.replace("://post", "://blog", 1)
.partition("?")[0], None)
if "\ufffd" in unquote(url)
else
(url.replace("://post", "://blog", 1)
.partition("?")[0], None)
for url in text.extract_iter(page, 'data-lazy-src="', '"')
]

Loading…
Cancel
Save