본문 바로가기

Data handling/Web crawling

[scrapy] 403 error 발생시 대처법

728x90

403 error는 크롤링을 막아놓은 사이트에서 크롤링시 발생하는 에러이다.

"2021-02-09 23:27:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 http://bodyluv.kr/product/3%EC%B0%A8-renewal-%EB%B0%94%EB%94%94%EB%9F%BD-%EB%A7%88%EC%95%BD%EB%B2%A0%EA%B0%9C/44/category/1/display/2/?crema-product-reviews-1-page=1>: HTTP status code is not handled or not allowed"

해결법은 간단하다.

setting.py를 열고 USER_AGENT 부분을 아래와 같이 수정한다. 이후 잘 크롤링 된다. 200

USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'

sangaline.com/post/advanced-web-scraping-tutorial/

 

Advanced Web Scraping: Bypassing "403 Forbidden," captchas, and more | sangaline.com

The full code for the completed scraper can be found in the companion repository on github. Introduction I wouldn’t really consider web scraping one of my hobbies or anything but I guess I sort of do a lot of it. It just seems like many of the things tha

sangaline.com