原创

爬虫中常用的技巧（1）

2024-11-21 09:10:48

技巧一：随机暂停，迷惑反爬机制

高频率访问容易被网站识别为爬虫，所以我们要学会“劳逸结合”！使用 time.sleep() 函数，加上随机时间，让爬虫更像人类访问。

import time
import random

# 随机睡眠 0 到 5 秒
time.sleep(random.random() * 5)

技巧二：伪装身份，User-Agent大法好

每个浏览器访问网站时都会带上 User-Agent 信息，我们可以使用 fake_useragent 库生成随机 User-Agent，伪装成浏览器访问。

from fake_useragent import UserAgent

ua = UserAgent()
headers = {'User-Agent': ua.random}

# 将 headers 添加到请求中
response = requests.get(url, headers=headers)

0条评论

0 / 1000

王****际

180文章数

1点赞数

2粉丝数

王****际

180 文章 | 2 粉丝

王****际

180文章数

1点赞数

2粉丝数

王****际

180 文章 | 2 粉丝

原创

爬虫中常用的技巧（1）

弹性云主机

2024-11-21 09:10:48

技巧一：随机暂停，迷惑反爬机制

高频率访问容易被网站识别为爬虫，所以我们要学会“劳逸结合”！使用 time.sleep() 函数，加上随机时间，让爬虫更像人类访问。

import time
import random

# 随机睡眠 0 到 5 秒
time.sleep(random.random() * 5)

技巧二：伪装身份，User-Agent大法好

每个浏览器访问网站时都会带上 User-Agent 信息，我们可以使用 fake_useragent 库生成随机 User-Agent，伪装成浏览器访问。

from fake_useragent import UserAgent

ua = UserAgent()
headers = {'User-Agent': ua.random}

# 将 headers 添加到请求中
response = requests.get(url, headers=headers)

文章来自个人专栏

文章 | 订阅

0条评论

0 / 1000

请输入你的评论

智算服务

应用商城

定价

合作伙伴

开发者

支持与服务

了解天翼云

爬虫中常用的技巧（1）

爬虫中常用的技巧（1）

活动

智算服务

应用商城

定价

合作伙伴

开发者

支持与服务

了解天翼云

爬虫中常用的技巧（1）

爬虫中常用的技巧（1）