目录[-]

目录:

  1. 爬虫是什么?
  2. 爬虫的典型应用
  3. 爬虫的工作原理
  4. 爬虫的类型
  5. 爬虫的合法性
  6. 爬虫技术栈
  7. 简单爬虫案例
  8. 小说爬虫案例

——————————————————————————————————————————————————

1、爬虫是什么?

  爬虫(Web Crawler 或 Spider)是一种自动化程序,用于从互联网上批量抓取(下载)和提取数据。它模拟人类浏览网页的行为,但速度更快、规模更大,并能自动化处理数据,它本质上是一个获取数据和解析数据的过程。

  爬虫的核心功能:

  1. 发送 HTTP 请求
    1. 模拟浏览器访问网页(如 GET/POST 请求)
  2. 解析网页内容
    1. 提取文本、链接、图片等数据(如 BeautifulSoup、Xpath)
  3. 存储数据
    1. 保存到数据库(MySQL等)、文件(CSV、JSON)或云端
  4. 自动化遍历
    1. 递归访问多个页面(如翻页、跳转链接)

——————————————————————————————————————————————————

2、爬虫的典型应用

场景 例子
搜索引擎 Google、百度等网站,检索网页信息
价格监控 爬取电商(淘宝、京东)商品价格,比价或分析趋势
舆情分析 抓取新闻、社交媒体数据、分析热点话题
数据聚合 整合多个网站的信息,如租房等
机器学习数据收集 爬取图片、文本,用于AI训练

——————————————————————————————————————————————————

3、爬虫的工作原理

  1. 种子 URL:从初始网页(如 https://example.com)开始
  2. 请求页面:用 requests 或 selenium 获取网页内容
  3. 解析数据:用 BeautifulSoup、Xpath 或正则表达式提取目标信息
  4. 存储数据:存到数据库或文件
  5. 发现新链接:递归爬取其他页面(如:“下一页”或“详情页”)

——————————————————————————————————————————————————

4、爬虫的类型

类型 特点
通用爬虫 如:搜索引擎爬虫,无特定目标
聚焦爬虫 针对特定网站(如只爬取豆瓣电影评分)
增量式爬虫 只爬取更新的内容(如新闻网站每日新文章)
深度爬虫 爬取需登录/交互的页面,若商品详情、秒杀功能

——————————————————————————————————————————————————5、爬虫的合法性

合法情况

  • 爬取 公开数据 (如新闻、天气)
  • 遵守 robots.txt (如 https://www.amazon.com/robots.txt
  • 控制请求频率,不干扰网站运行

非法情况

  • 爬取 隐私数据
  • 破坏验证码
  • 商业牟利
  • 爬虫可能违反相关法律

——————————————————————————————————————————————————

6、爬虫的技术栈

环节 常用工具
请求库 requests、httpx、aiohttp、selenium、playwright
解析库 Beautifulsouplxml、parsel、PyQuery
存储 MySQL、MongoDB、SQLite、CSVJSON
框架 Scrapy、pyspider
反反爬 代理IP:fake-useragent,验证码识别:2captcha

作者常用加重显示,另外说一下验证码识别的原理,在登录时,通过selenium获取到验证码对应的图片,将图片对应的base64位码当成入参或者直接将图片保存在本地,调用验证码识别的接口,进而提取到验证码,以前用过一个其他厂商的验证码识别,成功率在90%以上,但是收费。

——————————————————————————————————————————————————

7、简单爬虫示例

    例子1:爬取豆瓣电影 Top250

import requests
from bs4 import BeautifulSoup

url = "https://movie.douban.com/top250"
headers = {"User-Agent": "Mozilla/5.0"}

# 获取响应值
response = requests.get(url, headers=headers)
# 将响应值转换为结构化对象(BeautifulSoup对象,方便后续提取)
soup = BeautifulSoup(response.text, "html.parser")

# soup.select:BS的CSS提取器
for movie in soup.select(".item"):
    title = movie.select_one(".title").text
    rating = movie.select_one(".rating_num").text
    print(f"电影: {title}, 评分: {rating}")

  注意,这里的url不是动态url,也就是说,它只能爬到默认第1页的25条数据,而不是250条,那如何改造呢?下面按照步骤依次解释

  1. 通过什么确定豆瓣电影 Top 250这个数字的?
  2. 每一页为什么是25条数据?
  3. 切换到第2页和其他页,url地址有什么变化?
  4. 代码如何模拟这种变化?

  答案:

  1. 250这个数字,取自这里:<span class="count">(共250条)</span>
  2. 每一页的数据,来自这里:<div class="grid_view"></div>
  3. 切换后的url地址,取自这里:<a href="?start=25&amp;filter=">2</a>,总共有10个这样的链接,其中当前选中页不显示url,但是可以自己学着拼接。比如第1页,start=0,第2页,start=25,以此类推

 

    4.代码如何模拟这种变化,可以写列表推导式或for循环遍历。

# 列表推导式
urls = ['https://movie.douban.com/top250?start={}&amp;filter='.format(str(i)) for i in range(0,250,25)]

# for循环
urls_list=[]
for i in range(0,250,25):
    url='https://movie.douban.com/top250?start={}&amp;filter='.format(str(i))
    urls_list.append(url)

解释一下BeautifulSoup的巧妙之处,比如这一段,解析一下它的原理

    for movie in soup.select(".item"):
        title = movie.select_one(".title").text
        rating = movie.select_one(".rating_num").text
        print(f"电影: {title}, 评分: {rating}")
  1. 首先,每一个电影信息,都是存储在这里:<div class="item"></div>,这就是第一句,为什么遍历呢?因为每1页有25条数据,不遍历只能取第1条数据
  2. 那么select_one的用法是什么,怎么能精准的获取到电影title?简单拆分后,结构如下,正常的逻辑里,定位元素我们应该是一级一级往下找,item>info>a>span.title,这里通过classname,直接定位到元素,也可以改成:title = movie.find_all("span",class_="title")[0].text,输出是相同的内容
  3. <div class="item">
      <div class="info">
        <a href="https://movie.douban.com/subject/1292052/">
          <span class="title">肖申克的救赎

————

    例子2:爬取小说网站

#比趣看小说网
# -*- coding: utf-8 -*-
from requests_html import HTMLSession
import time
from retrying import retry  #重试模块,当爬取过程中出现超时后,会再次等待并继续爬取

@retry()
def xiazai():
    #这个是小说笔趣阁的爬取步骤
    session=HTMLSession()
    r=session.get('https://www.biqugex.com/book_25317/')
    #第一步,得到书名
    shuming=r.html.find('div.info>h2')[0].text
    # print(shuming)
    # #第二步,得到所有章节名和章节链接
    list_name=[]
    list_link=[]
    zhangjie=r.html.find('div.listmain>dl>dd>a')
    for i in zhangjie[6:]:
        # print(i.text)
        # print(i.absolute_links)
        set_to_str=','.join(i.absolute_links) #将set类型转str
        list_name.append(i.text)
        list_link.append(set_to_str)

    # 第三步,得到章节url内的信息,然后打出来全部文字
    for j in range(len(list_link)):
        url_zhangjie=list_link[j]
        # print(url_zhangjie)
        session2 = HTMLSession()
        r_content=session2.get(url_zhangjie)
        # print(r_content.text)
        # 找到正文内容
        print(list_name[j])
        content=r_content.html.find('div.showtxt')
    # 第四步,得到章节内容,替换文字,下载到F盘
        with open('F:\\' + shuming + '.txt', 'a', encoding='utf-8') as f:
            for m in content:
                content_neirong=m.text.replace(u'\xa0', u' ')[:-52]
                content_neirong=content_neirong.replace(r'『百度搜索↺49↰小↷说⇆网↴,更多好看小说阅读。』','') #替换掉没用的字符串
            f.write('{}\n'.format(content_neirong))
            time.sleep(0.5)
print('全书下载完毕。')

if __name__=='__main__':
    xiazai()

————

  例子3:上强度,医院挂号监测,上传一部分代码,这里查的是沈阳医大二南湖医院—内分泌科—某个医生的号情况

if __name__=="__main__":
    print('%s:开始遍历......'%datetime.datetime.now().strftime('%Y-%m-%d-%H:%M:%S'))
    cookie='4edhfk125nqn2b1h59qs'
    am=True # am=True优先上午
    search_doctor_name='胡凤楠'
    patientId='3281'
    date='2025-08-19'
    departmentCode='b3FWN1VnPT0'    # departmentCode,是和平院区的意思,分浑南、和平院区,这个就按照固定模式写
    clinicName='ZWs2dmc4aG04VnBBSVJxdmVJRk1QQnhh'   # clinicName,是内分泌科的意思,后续也都用到
    search(date)
    # number=0
    # while True:
    #     number+=1
    #     try:
    #         time.sleep(3)
    #         print('执行第%s次,当前时间:%s,搜索中...'%(number,datetime.datetime.now().strftime('%Y-%m-%d-%H:%M:%S')))
    #         # 如果返回True,就停止运行,那怎么才是True呢,2个状态,正常提交成功和有待付款的订单
    #         if search(date):
    #             break
    #     except:
    #         time.sleep(3)
    #         pass

步骤:

  1. 获取所有院区,选中一个后,精确查询某个医生当天的挂号情况
  2. 和时间相关,比如查询当天,现在是14点,那么只能查下午有没有号,没办法查上午,也不能查以前的日期
  3. 也可以监控最长7天的号,这就需要说明这个医院的机制,它是不定时发放号,有的时候是23点,有时候是24点
  4. 预约时间问题,假如已经查询到某位医生有号,那么不论上午还是下午,直接查询有号的最早的时间,并且进行锁单
  5. 锁单后,由于要调用微信支付,但是没有对接微信SDK,所以采用了另一种方式,通过Email邮件通知到邮箱,但是考虑到邮箱也有滞后性,一段时间锁单不付款就自动取消,所以加入了一个第三方SDK,方糖推送,可保证发送到微信,最后发送的信息是:日期+时间+医生的信息,通过公众号—服务号发送
  6. 由于编写这个代码时是在2023年,所以要额外填写一个流掉信息,现在用不上了
# 7月5日,医大一,查询单忠艳的剩余号,当大于0时,通过腾讯云发送短信到13066627983和17824240409手机上
# -*- coding:utf-8 -*-

import time,urllib3,requests,re,datetime,smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from tencentcloud.common import credential
from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException
# 导入对应产品模块的client models
from tencentcloud.sms.v20210111 import sms_client, models
# 导入可选配置类
from tencentcloud.common.profile.client_profile import ClientProfile
from tencentcloud.common.profile.http_profile import HttpProfile


urllib3.disable_warnings()
header = {
        'Host': 'webapp.cmu1h.com',
        'Connection': 'keep-alive',
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36 QBCore/4.0.1326.400 QQBrowser/9.0.2524.400 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2875.116 Safari/537.36 NetType/WIFI MicroMessenger/7.0.20.1781(0x6700143B) WindowsWechat(0x63010200)',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.6,en;q=0.5;q=0.4',
        'Origin': 'https://webapp.cmu1h.com',
        'X-Requested-With': 'XMLHttpRequest',
        'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
        'Referer': 'https://webapp.cmu1h.com'}

# # 1.获取日期列表
# def date():
#     days_list=[]
#     today=datetime.datetime.now()
#     # yestday = datetime.datetime.now()+datetime.timedelta(days=-1)
#     one_day = datetime.timedelta(days=1)
#     days=0
#     while days<7:
#         today += one_day
#         days+=1
#         # 0代表周一,2代表周三,这里就代表只遍历7天里,一个礼拜的周一和周三有没有号
#         if(today.weekday()==2):
#             print('正在查看的日期是:%s'%today.strftime('%Y-%m-%d'))
#             days_list.append(today.strftime('%Y-%m-%d'))
#         else:
#             continue
#     return days_list

# 2.正式请求,调用发短信接口
def search(day):
    session = requests.Session()
    # 获取整个一天内所有医生的情况
    url_1='https://webapp.cmu1h.com/wehospital/opregister/getschedoclist'
    data_1={
        'departmentCode':departmentCode,
        'schDate':day,
        'clinicName':clinicName}
    resp=session.post(url=url_1,data=data_1,headers=header,verify=False)
    resp_all=resp.text.encode('utf-8').decode('unicode_escape')
    # print(resp_all)
    # 下面一整段,都是拿医生的预约号信息
    on_click_list=re.findall(re.compile('onclick="showTimeRange(.*?)"'),resp_all)
    # print(on_click_list)
    if(len(on_click_list)==0):
        print('医生当天没有号')
        # 这个return False必须要有,首先要保证当天有医生可以挂号
        return False
    else:
        pass
    list_after=[]
    for i in range(len(on_click_list)):
        list_=on_click_list[i].replace('(','').replace(')','').replace(' ','').replace('&quot;&quot;','').split(',')
        list_after.append(list_)
    # print(list_after)    # list_after,格式为:["''", "'都镇先'", "'教授'", "''", "'门诊内分泌科'", "'2023-03-13'", "'星期一'", "'cCtsN0YxYlVKZWc9'", "'19.20'", "'b2VrPQ'"]
    # 使用修复好的数据遍历,此时再进行遍历,可以拿到这一天所有可预约的医生的数据,如果上午、下午都可以预约,就会出现2条数据
    list_final=[]
    for k in range(len(list_after)):
        if(list_after[k][1]=="'%s'"%search_doctor_name):
            list_final.append(list_after[k])
    # print(list_final)
    if(len(list_final)==0):
        # 和上面的原理一样,如果当日医生有号,但是我们要查找的医生没有号,也要返回False,重新遍历
        return False
    if(len(list_final)>0):
        if(am==True):
            # 优先上午
            scheduleCode=list_final[0][7][1:-1]
            departCode=list_final[0][-1][1:-1]
            # print(scheduleCode)
        else:
            # 优先下午
            scheduleCode=list_final[1][7][1:-1]
            departCode = list_final[0][-1][1:-1]
        # 得到挂号信息
        url_5='https://webapp.cmu1h.com/wehospital/opregister/getscheduleinfo'
        headers_5={
            'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36 QBCore/4.0.1326.400 QQBrowser/9.0.2524.400 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2875.116 Safari/537.36 NetType/WIFI MicroMessenger/7.0.20.1781(0x6700143B) WindowsWechat(0x63010200)',
            'Accept-Encoding':'gzip, deflate',
            'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
            'Host':'webapp.cmu1h.com',
            'Connection':'keep-alive',
            'Accept-Language':'zh-CN,zh;q=0.8,en-US;q=0.6,en;q=0.5;q=0.4',
            'Origin':'https://webapp.cmu1h.com',
            'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
            'Referer':'https://webapp.cmu1h.com',
            'X-Requested-With':'XMLHttpRequest',
            'Cookie': 'PHPSESSID=%s'%cookie,
        }
        data_5 = {'scheduleCode': scheduleCode}
        resp_5 = requests.post(url=url_5, data=data_5, headers=headers_5, verify=False)
        resp_5=resp_5.text.encode('utf-8').decode('unicode_escape')
        # print(resp_5)
        timeRangeFlag = re.findall(re.compile('"timeRangeFlag":"(.*?)"'), resp_5)
        # print(type(timeRangeFlag), timeRangeFlag)
        if(timeRangeFlag[0]=='N'):
            print('%s:医生没有号了'%day)
        else:
            print('%s:医生有号'%day)
            # 如果存在排期,就可以预约了,没有排期的花,下面的代码也不会执行
            departmentName=re.findall(re.compile('"departmentName":"(.*?)"'),resp_5)
            doctorName=re.findall(re.compile('"doctorName":"(.*?)"'),resp_5)
            feeSum=re.findall(re.compile('"feeSum":"(.*?)"'),resp_5)
            schDate = re.findall(re.compile('"schDate":"(.*?)"'), resp_5)[0][:-6] #2023-03-14 周二 上午,取2023-03-14
            rangeDesc_list=re.findall(re.compile('"rangeDesc":"(.*?)"'),resp_5) #列表,如果有多个预约时间,会可以选择多次
            addTime=schDate[5:]+' '+rangeDesc_list[0][:5]
            timeRange=rangeDesc_list[0]
            # 这个时间,我们可以写死,用config的方式来写
            url_7='https://webapp.cmu1h.com/wehospital/opregister/confirmregister'
            data_7={
                'scheduleId':scheduleCode,
                'doctorName':doctorName[0],
                'departName':departmentName[0],
                'admTime':addTime,
                'fee':feeSum[0],
                'departCode':departCode,
                'timeRange': timeRange
            }
            print(data_7)
            resp_7=requests.get(url=url_7,headers=headers_5,params=data_7,verify=False)
            resp_7=resp_7.text.encode('utf-8').decode('unicode_escape')
            # print(resp_7)
            # 锁单
            url_8='https://webapp.cmu1h.com/wehospital/opregister/lockorder'
            headers_8={
                'Host':'webapp.cmu1h.com',
                'Connection':'keep-alive',
                'Accept':'application/json, text/javascript, */*; q=0.01',
                'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36 NetType/WIFI MicroMessenger/7.0.20.1781(0x6700143B) WindowsWechat(0x6309001c) XWEB/6609',
                'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
                'Origin':'https://webapp.cmu1h.com',
                'Referer':'https://webapp.cmu1h.com/wehospital/opregister/confirmregister',
                'Accept-Encoding':'gzip, deflate, br',
                'Accept-Language':'zh-CN,zh',
                'Cookie': 'PHPSESSID=%s'%cookie,
                'X-Requested-With':'XMLHttpRequest',
                'Sec-Fetch-Site':'same-origin',
                'Sec-Fetch-Mode':'cors',
                'Sec-Fetch-Dest':'empty'}
            data_8={
                'scheduleCode':scheduleCode,
                'timeRange':timeRange,
                'patientId':patientId}
            resp_8=requests.post(url=url_8,data=data_8,headers=headers_8,verify=False)
            print(data_8,resp_8)
            resp_8=resp_8.text.encode('utf-8').decode('unicode_escape')
            print(resp_8)
            code=re.findall(re.compile('"code":(.*?),'),resp_8)[0]
            # print(code)
            if(code=='"0"'):
                # tradeNo = re.findall(re.compile('"tradeNo":"(.*?)"'), resp_7)[0]
                print('挂号成功,请立刻缴费。')
                sendMail(day, '%s的号'%doctorName)
                # 流调信息先不写了,
                # url_9='https://file.jiankangle.com/form/form/formCtrl?BLHMI=saveResult'
                # headers_9={
                #     'Host':'file.jiankangle.com',
                #     'Connection':'keep-alive',
                #     'Accept':'application/json, text/javascript, */*; q=0.01',
                #     'X-Requested-With':'XMLHttpRequest',
                #     'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36 NetType/WIFI MicroMessenger/7.0.20.1781(0x6700143B) WindowsWechat(0x6309001c) XWEB/6609',
                #     'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
                #     'Origin':'https://file.jiankangle.com',
                #     'Sec-Fetch-Site':'same-origin',
                #     'Sec-Fetch-Mode':'cors',
                #     'Sec-Fetch-Dest':'empty',
                #     'Referer':'https://file.jiankangle.com/form/answer/answerCtrl',
                #     'Accept-Encoding':'gzip, deflate, br',
                #     'Accept-Language':'zh-CN,zh',
                #     'Cookie': 'PHPSESSID=%s'%cookie
                # }
                # data_9={
                #     'params[planId]':None,
                #     'params[results]':'[{"quesId":"ff8080817b87b584017b952d9b6d1f37","itemId":"ff8080817b87b584017b952d9b721f39","itemName":"否","quesType":"A","thirdPartyId":""},{"quesId":"ff8080817b87b584017b952d9b0f1f19","content":"梁君","pluginId":"name","quesType":"C","dataType":""},{"quesId":"ff8080817b87b584017b95b53f996c54","content":"13066627983","quesType":"C","dataType":""},{"quesId":"ff8080817b87b584017b952d9b121f1a","content":"210603198701096517","pluginId":"cardno","quesType":"C","dataType":"none"},{"quesId":"ff808081802d7d7e01809352ce266200","content":"2023-03-15","quesType":"C","dataType":""},{"quesId":"ff80808180b33cee01811925a6520671","content":"002374692720230308","quesType":"C","dataType":""},{"quesId":"ff8080817ed9a436017edd453ee92a20","itemId":"ff8080817ed9a436017edd453ee92a20","content":"210000000000`!@^`210100000000`!@^`210105000000`!@^``!@^`                                        ","itemName":"{\"provinceId\":\"210000000000\",\"provinceName\":\"辽宁省\",\"cityId\":\"210100000000\",\"cityName\":\"沈阳市\",\"countryId\":\"210105000000\",\"countryName\":\"皇姑区\",\"townId\":\"\",\"addressDetail\":\"                                        \"}","quesType":"NA"}]',
                #     'params[openId]':None,
                #     'params[formId]':'ff8080817b87b584017b951cb72e0312',
                #     'params[partionId]':7,
                #     'params[partitionName]':'p7',
                #     'params[businessId]':'002374692720230308',
                #     'params[urlParams]':'formId=ff8080817b87b584017b951cb72e0312&departmentName=%E9%97%A8%E8%AF%8A%E5%86%85%E5%88%86%E6%B3%8C%E7%A7%91&sign=773FAB3757D02BEF92F8E7D8FBA0F43D&businessId=002374692720230308&idno=210603198701096517&nonceStr=Ve2JiGo2jYBhhEC8GqwgQaTapVKGcBJQ&content=UeS4HoqsVXpGihlEJB3tq%2BKC8FPFaTg6Mfsv%2F32tlksZzXgVyFPCKjRspjai0kCD8ZbN%2BcnFu7o6eS7E%2BEvqiDF5EAJTJr8sLRjA4%2BMWbHWOrBVE3OGI%2F05W7%2FWnVBcRdGJyMHksA6D%2FWbxmesPXfLvPtQLidRZOLDy6tXxVrwVYdC8PyGBCeNUXTVLbdrEbfy9Sc%2FgXAY38zQm5MDtdXUHh7%2FdIsTKwl%2BmbRIwHOqqxBYPQo%2BdKv0pfoDeNjv5P%2FJ4jijVnY6SAqdaSyeJkJM6rfLYoHNiK2l1IHCFh0f5wujHmbkeTgLQ27z%2Fsx%2B3pTSUKVW9D06U1qmkuRlolBQ%3D%3D&currentTime=20230308165750008&businessNo=20230308165715ZGOP7471&phone=13066627983&name=%E6%A2%81%E5%90%9B&admDate=2023-03-15&BLHMI=initForm',
                #     'params[callbackType]':None,
                #     'params[title]':'就诊预检分诊筛查表',
                #     'params[jumpType]':2,
                #     'params[answerModel]':1,
                #     'params[appendParam]':None,
                #     'params[conditionQuestionId]':None,
                #     'params[conditionQuestionAnswer]':None,
                #     'params[targetQuestionAnswer]':None,
                #     'params[callbackAsync]':None,
                #     'params[controllerBeginDate]':1678265882945,
                #     'params[controllerEndDate]':1678265882953,
                #     'params[pageBeginDate]':1678265883168,
                #     'params[pageEndDate]':1678265883204,
                #     'params[formSubmitDate]':1678265907325,
                #     'params[savedItemIdArr]':None,
                #     'params[allQuesIdStr]':'ff8080817b87b584017b952d9b061f18,ff8080817f9d6fb9017fde98fdde3a2a,ff8080817b87b584017b952d9b0f1f19,ff8080817b87b584017b95b53f996c54,ff8080817b87b584017b952d9b121f1a,ff8080817ed9a436017edd453ee92a20,ab8d86af6a3011ed90c0b026284380d0,ff8080817b87b584017b952d9b6d1f37,ff8080817c9db918017cdeba28347aaf,ff808081802d7d7e018045d9dffe1d7e,ff808081802d7d7e018045d9e0011d7f,ff808081802d7d7e01809352ce266200,ff80808180b33cee01811925a6520671,ff8080817b87b584017b9ff39d297d03,ff8080817b87b584017b9533b09826b4,ff8080817b87b584017b9533b09b26b5,ff8080817b87b584017b9533b09d26b6,ac94d8d2670c11ed90c0b026284380d0',
                #     'params[hasScoreFlag]':0
                # }
                # resp_9=requests.post(url=url_9,headers=headers_9,verify=False)
                # print(resp_9.text)
                return True
            elif(code=='"-110107"'):
                print('锁号失败:-110107,您有此科室医生待付费挂号记录,请付费!')
                sendMail(day, '%s的号' % doctorName)
                return True
            elif(code=='"-110105"'):
                print("锁号失败:-110105,排班已经停诊或替诊")
                return False
            elif(code=='"404"'):
                print("锁号失败:404,绑卡信息错误,速度去公众号绑定就诊人")
                return True
            else:
                print('重新抓一次cookie,cookie是每天过期一次,过期后只能查看不能提交:%s'%code)
                return True

def sendMail(data,data2):
    username = '714@qq.com'
    password = 'dyzatvq'
    smtp_server='smtp.qq.com'

    # 如名字所示: Multipart就是多个部分
    msg = MIMEMultipart()
    msg['Subject'] = '医大一去公众号付款'  # 邮件标题
    msg['From'] = '714q.com'  # 发件人
    msg_to = ['lang@l.com']
    msg['To'] = ','.join(msg_to)  # 收件人,可添加多个  
    # msg['CC']  抄送人,可写可不写,如果写了,就要在最下面的client.sendmail加上这一句,用+
    #下面是文字部分,用于编写邮件内容
    puretext = MIMEText('%s日,%s的号————请尽快付款'%(data,data2),_charset='utf-8')
    msg.attach(puretext)

    try:
        client = smtplib.SMTP()
        client.connect(smtp_server)  #连接发送邮箱的smtp服务器地址
        client.login(username,password)  #使用账号密码登录发送者邮箱
        client.sendmail(msg['From'], msg['To'].split(','), msg.as_string())
        client.quit()
        print('邮件发送成功!请去邮箱:'+','.join(msg_to)+' 查看')
    except smtplib.SMTPRecipientsRefused:
        print('Recipient refused')
    except smtplib.SMTPAuthenticationError:
        print('Auth error:去看看发件箱设置的授权码,重新弄一个,不然是不能发送邮件的')
    except smtplib.SMTPSenderRefused:
        print('Sender refused')
    except smtplib.SMTPException as e:
        print(e)

if __name__=="__main__":
    print('%s:开始遍历......'%datetime.datetime.now().strftime('%Y-%m-%d-%H:%M:%S'))
    cookie='4edhfk125nqn2b1h59qsb029ga'
    am=True # am=True优先上午
    search_doctor_name='胡凤楠'
    patientId='3248‘
    date='2023-03-15'
    departmentCode='b3FWN1VnPT0'    # departmentCode,是和平院区的意思,分浑南、和平院区,这个就按照固定模式写
    clinicName='ZWs2dmc4aG04VnBBSVJxdmVJRk1QQnhh'   # clinicName,是内分泌科的意思,后续也都用到
    search(date)
    number=0
    while True:
        number+=1
        try:
            time.sleep(3)
            print('执行第%s次,当前时间:%s,搜索中...'%(number,datetime.datetime.now().strftime('%Y-%m-%d-%H:%M:%S')))
            # 如果返回True,就停止运行,那怎么才是True呢,2个状态,正常提交成功和有待付款的订单
            if search(date):
                break
        except:
            time.sleep(3)
            pass

#4.发送短信
def send_tencent_message(content):
    try:
      # 必要步骤:
      # 实例化一个认证对象,入参需要传入腾讯云账户密钥对secretId,secretKey。
      # 这里采用的是从环境变量读取的方式,需要在环境变量中先设置这两个值。
      # 你也可以直接在代码中写死密钥对,但是小心不要将代码复制、上传或者分享给他人,
      # 以免泄露密钥对危及你的财产安全。
      # CAM密匙查询: https://console.cloud.tencent.com/cam/capi
      cred = credential.Credential("AKIDUkocASSBfspWx70cy3qMgjV0DAr", "TgdJipgGYZKo0vZw2rPggPq1vXO")
      # cred = credential.Credential(
      #     os.environ.get(""),
      #     os.environ.get("")
      # )
       # 实例化一个http选项,可选的,没有特殊需求可以跳过。
      httpProfile = HttpProfile()
      # 如果需要指定proxy访问接口,可以按照如下方式初始化hp
      # httpProfile = HttpProfile(proxy="http://用户名:密码@代理IP:代理端口")
      httpProfile.reqMethod = "POST"  # post请求(默认为post请求)
      httpProfile.reqTimeout = 30    # 请求超时时间,单位为秒(默认60秒)
      httpProfile.endpoint = "sms.tencentcloudapi.com"  # 指定接入地域域名(默认就近接入)
       # 非必要步骤:
      # 实例化一个客户端配置对象,可以指定超时时间等配置
      clientProfile = ClientProfile()
      clientProfile.signMethod = "TC3-HMAC-SHA256"  # 指定签名算法
      clientProfile.language = "en-US"
      clientProfile.httpProfile = httpProfile
       # 实例化要请求产品(以sms为例)的client对象
      # 第二个参数是地域信息,可以直接填写字符串ap-guangzhou,或者引用预设的常量
      client = sms_client.SmsClient(cred, "ap-guangzhou", clientProfile)
       # 实例化一个请求对象,根据调用的接口和实际情况,可以进一步设置请求参数
      # 你可以直接查询SDK源码确定SendSmsRequest有哪些属性可以设置
      # 属性可能是基本类型,也可能引用了另一个数据结构
      # 推荐使用IDE进行开发,可以方便的跳转查阅各个接口和数据结构的文档说明
      req = models.SendSmsRequest()
       # 基本类型的设置:
      # SDK采用的是指针风格指定参数,即使对于基本类型你也需要用指针来对参数赋值。
      # SDK提供对基本类型的指针引用封装函数
      # 帮助链接:
      # 短信控制台: https://console.cloud.tencent.com/smsv2
      # sms helper: https://cloud.tencent.com/document/product/382/3773
       # 短信应用ID: 短信SdkAppId在 [短信控制台] 添加应用后生成的实际SdkAppId,示例如1400006666
      req.SmsSdkAppId = "140054"
      # 短信签名内容: 使用 UTF-8 编码,必须填写已审核通过的签名,签名信息可登录 [短信控制台] 查看
      req.SignName = "个人开发liangjun"
      # 短信码号扩展号: 默认未开通,如需开通请联系 [sms helper]
      req.ExtendCode = ""
      # 用户的 session 内容: 可以携带用户侧 ID 等上下文信息,server 会原样返回
      req.SessionContext = "测试用123123"
      # 国际/港澳台短信 senderid: 国内短信填空,默认未开通,如需开通请联系 [sms helper]
      req.SenderId = ""
      # 下发手机号码,采用 E.164 标准,+[国家或地区码][手机号]
      # 示例如:+8613711112222, 其中前面有一个+号 ,86为国家码,13711112222为手机号,最多不要超过200个手机号
      # req.PhoneNumberSet = ["+861782429"]
      req.PhoneNumberSet = ["+861306663"]
      # 模板 ID: 必须填写已审核通过的模板 ID。模板ID可登录 [短信控制台] 查看
      req.TemplateId = "1024"
      # 模板参数: 若无模板参数,则设置为空
      req.TemplateParamSet = [content]
        # 通过client对象调用DescribeInstances方法发起请求。注意请求方法名与请求对象是对应的。
      # 返回的resp是一个DescribeInstancesResponse类的实例,与请求对象对应。
      resp = client.SendSms(req)
       # 输出json格式的字符串回包
      # print(resp.to_json_string(indent=2))
    except TencentCloudSDKException as err:
        print(err)
        pass

  例子4:京东秒杀

# 2021年12月8日记录:批量写法,把cookies.txt文件的路径和内容放到类外面或者另创建一个类,将cookies.txt读到的CK做成一个列表,cookies作为一个入参,写在类的入参里,循环调用(但是循环太慢,所以还要考虑做多线程处理,暂时不考虑)
# 2021年12月21日记录,加入多线程+代理ip的组合,调试通过,本次使用的熊猫代理ip是5分钟一个
# 2022年5月16日记录,这一次直链改成了app下单接口,这个难度就更大了,整不了
# 2022年6月20日记录,好朋友有2个提交订单接口
# 2022年7月06日记录,简化代码


# -*- coding:utf-8 -*-

import threading,chardet,sys,multiprocessing,warnings
from util import *
from timer import Timer

User_agents=[
    'Mozilla/5.0 (Linux; Android 8.0.0; SM-N9500 Build/R16NW; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/63.0.3239.83 Mobile Safari/537.36 T7/10.13 baiduboxapp/10.13.0.11 (Baidu; P1 8.0.0)',
    'Mozilla/5.0 (Linux; Android 8.1.0; vivo Y71A Build/OPM1.171019.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/63.0.3239.83 Mobile Safari/537.36 T7/10.13 baiduboxapp/10.13.0.11 (Baidu; P1 8.1.0)',
    'Mozilla/5.0 (Linux; Android 6.0.1; OPPO A57 Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/63.0.3239.83 Mobile Safari/537.36 T7/10.13 baiduboxapp/10.13.0.10 (Baidu; P1 6.0.1)',
    'Mozilla/5.0 (Linux; U; Android 8.0.0; zh-CN; MI 5 Build/OPR1.170623.032) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.108 UCBrowser/11.8.9.969 Mobile Safari/537.36',
    'Mozilla/5.0 (Linux; Android 8.0.0; SM-G9650 Build/R16NW; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/63.0.3239.83 Mobile Safari/537.36 T7/10.13 baiduboxapp/10.13.0.11 (Baidu; P1 8.0.0)',
    'Mozilla/5.0 (Linux; U; Android 8.1.0; zh-CN; EML-AL00 Build/HUAWEIEML-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.108 UCBrowser/11.9.4.974 UWS/2.13.1.48 Mobile Safari/537.36 AliApp(DingTalk/4.5.11) com.alibaba.android.rimet/10487439 Channel/227200 language/zh-CN',
    'Mozilla/5.0 (Linux; U; Android 8.0.0; zh-CN; MI 5 Build/OPR1.170623.032) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.108 UCBrowser/11.8.9.969 Mobile Safari/537.36'
]

@retry(attempt=3)
def get_item_title():
    time.sleep(2)
    '''获取商品详情信息100024868868  https://item.m.jd.com/product/100024868868.html'''
    url_itemname = 'https://item.m.jd.com/product/%s.html'%sku_id
    resp = requests.get(url=url_itemname, headers={
        'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'accept-encoding': 'gzip, deflate',
        'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7',
        'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'
    }, verify=False)
    # print(resp.status_code,resp.text)
    try:
        if(resp.status_code == 200):
            rule = re.compile('<title>(.*?)</title>', re.DOTALL)
            title = re.findall(rule, resp.text)[0]
            # print('商品名称是:%s'%title)
            if(title=='多快好省,购物上京东'):
                raise Exception('重试')
            else:
                return title
        else:
            raise Exception('重试')
    except:
        raise Exception('重试')

@retry(attempt=3)
def get_miaosha_type():
    '''获取商品秒杀类型100018466652这个商品是抢购模式'''
    # 'https://item.m.jd.com/product/100024918580.html'
    url_itemname = 'https://item.m.jd.com/product/%s.html'%sku_id
    resp = requests.get(url=url_itemname, verify=False)
    # print(resp.status_code,type(resp.status_code))
    # print(resp.text)
    try:
        if (resp.status_code == 200):
            rule = re.compile('"isKO":"(.*?)"', re.DOTALL)
            miaosha_type = re.findall(rule, resp.text)[0]
            # print(miaosha_type)
            if(miaosha_type == '1'):
                return '抢购模式'
            elif(miaosha_type == '0'):
                return '加车秒杀'
        else:
            raise Exception('get_miaoha_type网络错误,重新获取')
    except:
        return '加车秒杀'

# 子类,拿到三个必须填写的参数,就进行主流程
class Son_data(object):
    """初始化"""
    def __init__(self, cookie, *ips):
        # 入参
        self.cookie = cookie  # 入参ck
        self.ips=get_ips(ips)    # 入参ips(这个ips必须是单个代理ip)
        # 系统或默认拿到的参数
        self.path = os.getcwd()     # 当前文件路径
        self.areaNo=global_config.getRaw('config','areaNo')     # 监控区域
        self.traceId = None
        self.token2 = None
        self.nickName = None
        self.totalPrice=None
        self.dealId =None
        # 代码运转拿到的参数
        self.ns = get_time_stamp()  # 19位时间戳

    def get_headers(self):
        return {
            'user-agent': random.choice(User_agents),
            'accept': '*/*',
            'accept-encoding': 'gzip, deflate',
            'accept-language': 'zh-CN,zh;q=0.9',
            'Connection': 'keep-alive',
            'cookie': self.cookie}

    def session(self):
        """session保持登录状态"""
        session = requests.session()
        session.headers = self.get_headers()
        return session

    """校验cookies有效性"""
    @retry(attempt=3)
    def verify_cookies(self):
        url = 'https://wq.jd.com/deal/recvaddr/getrecvaddrlistV3?adid=&locationid=undefined&callback=cbLoadAddressListA&reg=1&r=0.40773966318157595&sceneval=2'
        resp = self.session().get(url, headers={'referer': 'https://wqs.jd.com/'},verify=False)
        try:
            resp = resp.content
            # 解码变成中文
            encoding = chardet.detect(resp).get('encoding')
            html = resp.decode(encoding, 'ignore')
            html = html[19:-1]
            html_json = json.loads(html)
            # print(html_json)
            if(html_json['errCode'] == '0'):
                if (html_json['list'] != []):
                    for i in range(len(html_json['list'])):
                        # 拿到每一条地址的信息,匹配到默认地址,输出出来
                        if(html_json['list'][i]['default_address'] == '1' or html_json['list'][i]['default_address']=='0'):
                            logger.info('ck有效,姓名:{},电话:{},默认地址:{}'.format(html_json['list'][i]['name'],html_json['list'][i]['mobile'],html_json['list'][i]['addrfull']))
                            return True
                elif(html_json['list'] == []):
                    logger.info('ck有效,没有默认地址')
                    return True
            else:
                return False
        except Exception as e:
            print('verify_cookies接口报错,内容是%s'%e)
            raise Exception('verify_cookies接口重新调用')

    # @retry(attempt=3)
    def make_reserve(self):
        # 2022年5月23日,用好朋友的代码来写
        url_reserve='https://wq.jd.com/bases/yuyue/item?callback=subscribeItemCBA&dataType=1&skuId=%s&sceneval=2'%sku_id
        headers={
            # 'User-Agent':get_random_useragent(),
            'Accept-Encoding':'gzip, deflate','Accept':'*/*','Connection':'keep-alive','Host':'wq.jd.com',
            'Referer':'https://wqs.jd.com/item/yuyue_item.shtml','Accept-Language':'zh-CN,en-US;q=0.9'}
        response = self.session().get(url=url_reserve, headers=headers, verify=False)
        response_text=response.text
        # print(response.text)
        try:
            rule_replyCode = re.compile('"replyCode":"(.*?)",', re.DOTALL)
            replyCode = re.findall(rule_replyCode,response_text)[0]
            if(str(replyCode=='6')):
                print('预约成功')
            elif(str(replyCode=='9')):
                print('您已经成功预约,不需重复预约')
            elif(str(replyCode=='0')):
                print('无需预约')
            return True
        except Exception as e:
            print('make_reserve报错,报错内容是:%s'%e)
            return True

    # @retry(attempt=3)
    def clear_shop_list(self):
        print('开始清车')
        try:
            # 购物车商品
            url = 'https://p.m.jd.com/cart/cart.action?fromnav=1&sceneval=2&jxsid='
            headers = {
                'User-Agent': random.choice(USER_AGENTS),  # 这个请求头不能省,因为它是用requests请求,不是session
                'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
                'accept-encoding': 'gzip, deflate',
                'accept-language': 'zh-CN,zh;q=0.9',
                'referer': 'https://home.m.jd.com/',
                'cookie': self.cookie}
            # print('清空购物车的请求头是:%s'%headers)
            # 2月22日更新接口
            params='body={"tenantCode":"jgm","bizModelCode":"1","bizModeClientType":"M","externalLoginType":1,"platform":3,"pingouchannel":0,"commlist":"100034568620,,1,100034568620,1,,0,skuUuid:F1v2oh1185165804037988352@@useUuid:0","type":0,"checked":0,"locationid":"8-560-50826-129211","templete":1,"reg":1,"scene":0,"version":"20190418","traceid":"","sceneval":"2"}&loginType=2&loginWQBiz=golden-trade&appid=m_core&platform=3&functionId=deal_mshopcart_rmvcmdy_m&uuid=25871523143562157&osVersion=&screen=jdm&d_brand=&d_model=&lang=zh_CN&h5st=20230222102558973%3B3130393825006976%3B59365%3Btk03w96ba1b4f18ngM5iATWaas4aBW_HnXiuti5TLDS3UGgU8vyl8WCJbkJ1I03jgID8VRePBdcZGDd7VRcpI35-TkYn%3B01bb171dcd9d840d8bf6102ae6ecb431e29355611b17005324f8442aec5168bc%3B3.1%3B1677032758973%3B62f4d401ae05799f14989d31956d3c5f66ef4d74f9d9cbd1eef382b56d3f491a7a9a01cd1dac9cffb3a89f6bdb2329477f669cadb6e8f0a4fa84e1b1f44263d1e8c2044a226526ebdb5a3d897039a8a8654b484399019354b7dc913a5d9e79705194b1f8321e40efdc8dea9f3ef8c61edc69f5c6db1be5d55dbd7fd3e3a20090'
            response = self.session().get(url=url, headers=headers, params=params,verify=False)
            response_text=response.text
            # print(response_text)
            # 如果有这个值,才算拿到了购物车的信息
            rule_errId = re.compile('"errId":"(.*?)",', re.DOTALL)
            # 如果判断存在这个错误id
            errId = re.findall(rule_errId, response_text)[0]
            if(str(errId)=='0'):
                rule_id = re.compile('"id":"(.*?)","name"', re.DOTALL)
                rule_skuUuid = re.compile('"skuUuid":"(.*?)",', re.DOTALL)
                itemId_list = re.findall(rule_id, response_text)
                skuUuid_list = re.findall(rule_skuUuid, response_text)
                # 6月10日更新,不要这块代码,写出来也没什么太大意思,还容易匹配不到
                # rule_name=re.compile('"name":"(.*?)","num"', re.DOTALL)
                # name_list=re.findall(rule_name, response_text)
                # print(itemId_list,name_list,skuUuid_list)
                print('购物车有%s件商品'%len(skuUuid_list))
                # print('*' * 100)
                for i in range(len(skuUuid_list)):
                    time.sleep(2)
                    # print('正在删除第%s件购物车商品'%(i+1))
                    # 清空购物车接口,post
                    url_qingkong = 'https://api.m.jd.com/client.action/deal/mshopcart/rmvcmdy/m?sceneval=2&g_login_type=1&g_ty=ajax'
                    form_data = {
                        'body': '{"tenantCode":"jgm","bizModelCode":"1","bizModeClientType":"M","externalLoginType":1,"platform":3,"pingouchannel":0,"commlist":"%s,,1,%s,11,,0,skuUuid:%s@@useUuid:0","type":0,"checked":0,"locationid":"1-72-2819-0","templete":1,"reg":1,"scene":0,"version":"20190418","traceid":"","sceneval":"2"}' % (
                        itemId_list[i],itemId_list[i],skuUuid_list[i]),
                        'loginType': 2,
                        'loginWQBiz': 'golden-trade',
                        'appid': 'm_core',
                        'platform': 3,
                        'functionId': 'deal_mshopcart_rmvcmdy_m'}
                    headers_2 = {
                        'accept': 'application/json',
                        'accept-encoding': 'gzip, deflate',
                        'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7',
                        'content-type': 'application/x-www-form-urlencoded',
                        'cookie': self.cookie,
                        'origin': 'https://p.m.jd.com',
                        'referer': 'https://p.m.jd.com/'}
                    response = self.session().post(url=url_qingkong, headers=headers_2, data=form_data,verify=False)
                    # print(response.text)
                    rule_errId=re.compile('"errId":"(.*?)",', re.DOTALL)
                    errId=re.findall(rule_errId,response.text)
                    # print(response.text)
                    if(errId[0]=='0'):
                        print('购物车清空完成')
                    else:
                        print('购物车清空有错误')
            else:
                raise Exception('clear_shop_list接口有报错,没有拿到正确的返回值,需重试')
        except Exception as e:
            raise Exception('clear_shop_list接口有报错,报错内容是:%s'%e)

    # 购物车列表
    # @retry(attempt=3)
    def check_shop_list(self):
        url_shop_list='https://p.m.jd.com/cart/cart.action?fromnav=1'
        response=self.session().get(url=url_shop_list,headers={'referer':'https://wqs.jd.com/'},verify=False)
        response_text=response.text
        # print(response.text)
        try:
            # 购物车,商品id
            rule_itemId=re.compile('"itemId":"(.*?)",',re.DOTALL)
            itemId=re.findall(rule_itemId,response_text)
            # 商品名称
            rule_name=re.compile('"name":"(.*?)","image"',re.DOTALL)
            name=re.findall(rule_name,response_text)
            # 加车数量
            rule_num=re.compile('"num":"(.*?)","price"',re.DOTALL)
            num=re.findall(rule_num,response_text)
            for i in range(len(itemId)):
                #print(itemId[i],num[i],name[i])
                print('购物车商品id:%s,商品名:%s,已加车数量:%s'%(itemId[i],name[i],num[i]))
            if(sku_id in response_text):
                # logger.info('已加入购物车')
                print('已加入购物车')
            else:
                print('购物车里没有加入config.ini配置的商品')
                # print(response_text)
        except Exception as e:
            print('shop_list接口报错:%s'%e)

    def get_traceId(self):
        # time.sleep(2)
        """调用商品详情接口"""
        # https://p.m.jd.com/norder/order.action?wareId=100033858324&wareNum=1&enterOrder=true'
        # https://wqs.jd.com/order/m.confirm.shtml?bid=&scene=jd&isCanEdit=1&EncryptInfo=&Token=&commlist=100024868868,,1,100024868868,1,0,0&locationid=8-560-50819-63285&type=0&lg=0&supm=0&v=&sceneval=2&ufc=&wareId=100024868868&wareNum=1&enterOrder=true#/index
        # 2022年5月17日,抓包好朋友拿到的内容来写
        headers = {
            # 'user-agent': 'Mozilla/5.0 (Linux; U; Android 8.1.0; zh-CN; EML-AL00 Build/HUAWEIEML-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.108 UCBrowser/11.9.4.974 UWS/2.13.1.48 Mobile Safari/537.36 AliApp(DingTalk/4.5.11) com.alibaba.android.rimet/10487439 Channel/227200 language/zh-CN',
            'user-agent':random.choice(User_agents),
            'referer': 'https://item.m.jd.com/','cookie':self.cookie,
            'accept-language': 'zh-CN,en-US;q=0.9','x-requested-with':'XMLHttpRequest',
            'accept-encoding': 'gzip, deflate','connection':'keep-alive','host':'wq.jd.com',
            'accept': '*/*'}
        # print(headers)
        # url_mshopcart='https://wq.jd.com/deal/mshopcart/addcmdy?callback=addCartCBA&sceneval=2&reg=1&scene=2&type=0&commlist=100006473216,,1,100006473216,1,0,0&locationid=1-72-2819-0&t=0.16527778402022267'
        url_sku = 'https://wq.jd.com/deal/mshopcart/addcmdy?callback=addCartCBA&sceneval=2&reg=1&scene=2&type=0&commlist={},,{},{},1,0,0&locationid={}&t={}'.format(sku_id,global_config.getRaw('config','number_traceId'),sku_id,self.areaNo,time.time()/10000000000)
        # '%s'%(time.time()/10000000000)
        response = self.session().get(url=url_sku, headers=headers,verify=False)
        response_text = response.text
        # print(response_text)
        errId='0'
        errMsg=''
        try:
            rule_errId = re.compile('"errId":"(.*?)",', re.DOTALL)
            rule_errMsg = re.compile('"errMsg":"(.*?)",', re.DOTALL)
            errId = re.findall(rule_errId, response_text)[0]
            errMsg = re.findall(rule_errMsg, response_text)[0]
            if('traceId' in response_text):
                # print('123')
                rule_traceId = re.compile('"traceId":"(.*?)",', re.DOTALL)
                self.traceId = re.findall(rule_traceId, response_text)[0]
                # rule_token2 = re.compile('"token2":"(.*?)",', re.DOTALL)
                # self.token2 = re.findall(rule_token2, response_text)[0]
                rule_nickName = re.compile('"pin":"(.*?)",', re.DOTALL)
                self.nickName = re.findall(rule_nickName, response_text)[0]
                # print(self.traceId,self.token2,self.nickName)
                return True
            elif('抱歉,您购买的商品为抢购商品,请返回商品详情页面重新购买' in response_text):
                print('抢购商品,脚本停止')
                return False
            else:
                return False
        except Exception as e:
            print('errId:%s,errMsg:%s'%(errId,errMsg))
            return False

    # 提交订单,5月16日用app端口
    def submit_seckill_order(self):
        url_msubmit = ''
        data_submit = ''
        """提交抢购(秒杀)订单,两个必须参数,uuid和token,uuid是ck里面拿的,需转换ck,token是接口拿"""
        headers = {
            'referer':'https://wq.jd.com/deal/confirmorder/main?wdref=https://p.m.jd.com/cart/cart.action&t=%s&sceneval=2'%(int(str(int(time.time()))+'000')),
            'User-Agent': random.choice(User_agents),
            'accept': '*/*',
            'accept-encoding': 'gzip, deflate','connection':'keep-alive','host':'wq.jd.com',
            'accept-language': 'zh-CN,en-US;q=0.9'}
        url_msubmit,data_submit='',''
        a = random.random()
        if(a<0.5):
        # if(global_config.getRaw('config','type_submit')=='1'):
            url_msubmit='https://wq.jd.com/deal/msubmit/confirm'
            data_submit={
                'paytype':'0','paychannel':'1','action':'0','reg':'1','type':'0','token2':None,'dpid':'',
                'skulist':sku_id,'scan_orig':None,'gpolicy':None,'platprice':'0','ship':'0','pick':None,
                'savepayship':'0','sceneval':'2','r':'%s'%(time.time()/10000000000),'callback':'confirmCbA',
                'traceid':self.traceId}
            # print(data_submit)
        elif(a>=0.5):
        # elif(global_config.getRaw('config','type_submit')=='2'):
            url_msubmit='https://m.jingxi.com/deal/msubmit/confirm?'
            data_submit={
                'paytype':0,'paychannel':2,'action=':1,'reg':1,'type':0,'token2':None,'dpid':None,
                'skulist':sku_id,'scan_orig':None,'gpolicy':None,'platprice':0,'ship':None,'pick':None,
                'savepayship':0,'valuableskus':'%s,1,990,1590'%sku_id,'tuanfull':1,
                'commlist':'%s,,1,%s,1,0,0'%(sku_id,sku_id),'bizcode':None,'canpintuan':None,
                'setdefcoupon':0,'r':'%s'%(time.time()/10000000000),'callback':'confirmCbA',
                'traceid':'%s'%self.traceId,'sceneval':2}
        # print(url_msubmit,data_submit)
        response = ''
        # 开始秒杀时间
        time_start = datetime.datetime.strptime((datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S%f')),
"%Y-%m-%d %H:%M:%S%f")
        time_start = str(time_start)[:-3] + 'ms'
        try:
            if(global_config.getRaw('config', 'type_daili')=='2'):
                response = self.session().post(url=url_msubmit,params=data_submit,headers=headers,verify=False)
            elif(self.ips):
                print('正在使用代理ip:%s' %self.ips)
                proxies = {'http': 'http://{}'.format(self.ips), 'https': 'https://{}'.format(self.ips)}
                # print(proxies)
                response = self.session().post(url=url_msubmit, params=data_submit, headers=headers, proxies=proxies,verify=False)
        except Exception as e:
            print('submit_seckill_order接口提交订单时报错,内容是:%s'%e)
            pass
        '''
         try {confirmCbA({"errId":"0","errMsg":"","encryptCode":"","nextUrl":"","idc":"","traceId":"1068008284506819609","outOfStock":[],"rmInvalidSku":[],"resultCode":"","pin":"jd_vDyNxDeHkNnZ","appid":"wxae3e8056daea8727","dealId":"245773887406","totalPrice":"111600","ordeType":"112","callBackUrl":"","sucPopSrc":"","sucPopGray":"","limitedskuinfo":[],"riskResult":"","phoneNumber":"","uuid":"","cancelscaleskus":[],"commonstocksku":[],"limitedbuyskus":[]})}catch (e){if (window.confirm_badJs) {window.confirm_badJs(e)}}
        '''
        try:
            # 判断如果response是字符串,有可能压根就没返回正确的结果
            if(isinstance(response,str)):
                print('抢购失败,返回值是:%s'%response)
            else:
                # print('正确的类型是:%s'%(type(response.text)))
                if(response.status_code==200):
            # if(response.status_code == 200):
                    # print('response.text的类型是什么:%s'%response.text)
                    if("errId" in response.text):
                        result = response.text
                        rule = re.compile('"errId":"(.*?)",', re.DOTALL)
                        rule1 = re.compile('"errMsg":"(.*?)",', re.DOTALL)
                        rule_pin=re.compile('"pin":"(.*?)",', re.DOTALL)
                        # 如果判断存在这个错误id
                        errId = re.findall(rule, result)[0]
                        errMsg = re.findall(rule1, result)[0]
                        pin=re.findall(rule_pin, result)[0]
                        if(errId == '0'):
                            """抢购成功"""
                            rule2 = re.compile('"dealId":"(.*?)",', re.DOTALL)
                            rule3 = re.compile('"totalPrice":"(.*?)",', re.DOTALL)
                            dealId = re.findall(rule2, result)[0]
                            totalPrice = re.findall(rule3, result)[0]
                            totalPrice = (int(totalPrice)) / 100
                            time_end = datetime.datetime.strptime((datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S%f')), "%Y-%m-%d %H:%M:%S%f")
                            time_end = str(time_end)[:-3] + 'ms'
                            logger.info('抢购成功,%s——%s,账号id:%s,商品id:%s,商品名:%s,订单号:%s,总价:%s元'%(time_start,time_end,pin,sku_id, sku_name, dealId, totalPrice))
                            # SCT62780TAOXMQTATDMwHL2CoNIIDxMok
                            # send_wechat('订单号:%s,价格:%s元'%(dealId,totalPrice))
                            with open(self.path + '\\' + '5.成功记录.txt', 'a', encoding='utf8') as file:
                                file.write('时间:%s,账号:%s,商品id:%s,商品名:%s,订单号%s,总价:%s' % (time_end, pin, sku_id, sku_name, dealId, totalPrice) + '\n')
                                file.write(pin+'--------' + self.cookie + '\n')
                                file.write('*' * 90 + '\n')
                            # 邮件功能迁移
                            url = 'https://item.m.jd.com/product/%s.html'%sku_id
                            # 根据地区id发送邮件
                            if(global_config.getRaw('config', 'areaNo') == '8_560_50826_52069'):
                                # print('发送714143967邮箱')
                                # username = '714143967@qq.com'
                                # password = 'dzpuzqqlqxbtbcah'
                                # sendMail(totalPrice,title,sku_id,dealId,pin,self.cookie, username, password,url)
                                key_fangtang='SCT62780TAOXMQTATDMwHL2CoNIIDxMok'
                                title_fangtang=totalPrice
                                content_fangtang='商品名:%s:\n商品id:%s\n订单编号:%s\n账号名:%s\nck:%s\n'%(sku_name,sku_id,dealId,pin,self.cookie)
                                msg_push(title_fangtang,content_fangtang,key_fangtang)
                            else:
                                # print('发送215319702邮箱')
                                # username = '215319702@qq.com'
                                # password = 'nlwkwixtgtvjcaeb'
                                # sendMail(totalPrice,sku_name,sku_id,dealId,pin,self.cookie, username, password,url)
                                # key_fangtang='SCT159621Ti5YeIFT8CteUL3ED2PHx5mMg'
                                key_fangtang=global_config.getRaw('config','key_fangtang')
                                title_fangtang=totalPrice
                                content_fangtang='商品名:%s:\n商品id:%s\n订单编号:%s\n账号名:%s\nck:%s\n'%(sku_name,sku_id,dealId,pin,self.cookie)
                                msg_push(title_fangtang,content_fangtang,key_fangtang)
                            # 6月20日新增功能,调整2个列表
                            cookies_list.remove(self.cookie)
                            cookies_success.append(self.cookie)
                        else:
                            time_end = datetime.datetime.strptime((datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S%f')), "%Y-%m-%d %H:%M:%S%f")
                            time_end = str(time_end)[:-3] + 'ms'
                            """抢购失败的其他报错内容"""
                            # logger.info('抢购失败,报错内容:%s'%errMsg)
                            logger.info('%s——%s,抢购失败,报错内容:%s'%(time_start,time_end,errMsg))
                            if('活动每天限量1件,今日已售完' in errMsg):
                                return False
                    else:
                        time_end = datetime.datetime.strptime((datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S%f')),"%Y-%m-%d %H:%M:%S%f")
                        time_end = str(time_end)[:-3] + 'ms'
                        # print('errorId 不在resonse.text里面,那到底有什么%s' % response.text)
                        logger.info('%s——%s,errorId 不在resonse.text里面,那到底有什么%s'%(time_start,time_end,response.text))
                else:
                    logger.info('返回值是:%s'%response)
        except Exception as e:
            logger.info('返回值不是str类型,内容是:%s——%s'%(response,response.text))

'''
无货:100009579789
云神价的接口,只能查到有没有货,查不到有几件货

https://pe.3.cn/prices/mgets?origin=4&pin=&pdpin=jd_nlpxFrTbEiNn&skuids=100009579789&area=8_560_50826_52069

https://wq.jd.com/commodity/skudescribe/get?callback=reaStockAnPriceCbA&command=3&source=wqm_search&priceinfo=1&buynums=1&skus=100009579789&area=8_560_50826_52069
'''

# 6月6日研究库存监控
# @retry(attempt=3)
def check(*ips):
    # 柠檬助手接口1,好像接口2差别不大
    url='https://wq.jd.com/itemv3/h5draw?sku=%s&isJson=1&source=h5v3&g_login_type=0&g_tk=1946297207&g_ty=ajax'%(global_config.getRaw('config','sku_id'))
    headers={
        'Accept-Language':'zh-cn,zh;q=0.5',
        'Accept-Charset':'utf8',
        'Cookie':'cid=3;wq_addr=5929863146%7C{}%7C%u5185%u8499%u53E4_%u547C%u548C%u6D69%u7279%u5E02_%u548C%u6797%u683C%u5C14%u53BF_%u76DB%u4E50%u7ECF%u6D4E%u5DE5%u4E1A%u56ED%u533A%7C%u5185%u8499%u53E4%u547C%u548C%u6D69%u7279%u5E02%u548C%u6797%u683C%u5C14%u53BF%u76DB%u4E50%u7ECF%u6D4E%u5DE5%u4E1A%u56ED%u533A%u80DC%u5229%u8DEF%u4E0E%u65B0%u6C11%u8857%u4EA4%u53C9%u53E3%u897F%u5357%u4FA7VV%20Bar%7C111.775848%2C40.491539'.format(global_config.getRaw('config','areaNo')),
        # 'Cookie':'cid=3;wq_addr=5929863146|19_1657_4080|内蒙古_呼和浩特市_和林格尔县_盛乐经济工业园区|内蒙古呼和浩特市和林格尔县盛乐经济工业园区胜利路与新民街交叉口西南侧VV Bar|111.775848,40.491539',
        'User-Agent':'Dalvik/2.1.0 (Linux; U; Android 5.1.1; mx5 Build/LYZ28N)',
        'Host':'wq.jd.com','Connection':'Keep-Alive','Accept-Encoding':'gzip'}
    proxies = {'http': 'http://{}'.format(ips[0]), 'https': 'https://{}'.format(ips[0])}
    # print(headers)
    try:
        if(len(cookies_list)==0):
            print('当前没有可用的账号,停止监控')
            os.system('pause')

        response=''
        # try:
        # 6月8日新增需求,添加一个开关,选择用本地ip监控还是代理ip监控
        if(global_config.getRaw('config','monitor')=='1'):
            response = requests.get(url=url,headers=headers,proxies=proxies,verify=False,timeout=(2,6))
        elif(global_config.getRaw('config','monitor')=='2'):
            response = requests.get(url=url,headers=headers,verify=False,timeout=(2,6))
        response_text = response.text
        # print('返回值是:%s'%response_text)
        stockState = re_find('"hasStock":.*,"stockState":(.*?),"skuType"', response_text)[0]
        if('抱歉,该商品已下架' in response_text):
            # logger.info('商品还未上架')
            return '商品还未上架'
        else:
            jdPrice = re_find('"jdPrice":"(.*?)",', response_text)[0]
            # print(jdPrice)
            # 如果小于监控价格再下单,大于就直接抛出去
            if(float(jdPrice)<float(global_config.getRaw('config','jdPrice'))):
                pass
            else:
                # 2022年7月15日增加,如果当前时间大于秒杀时间,过了多少秒之后的那种,再停止
                # local_time=int(round(time.time() * 1000))
                # buy_time=datetime.datetime.strptime(global_config.getRaw('config','buy_time'),"%Y-%m-%d %H:%M:%S.%f")
                # buy_time_ms = int(time.mktime(buy_time.timetuple()) * 1000.0 + buy_time.microsecond / 1000)
                # print('aaaaaaaaaaaaa',local_time-buy_time_ms)
                # if(local_time-buy_time_ms>100):
                logger.info('监控价格大于预设价格,即将停止,现在价格是:%s,监控价格是:%s'%(jdPrice,global_config.getRaw('config','jdPrice')))
                if(global_config.getRaw('config', 'areaNo') == '8_560_50826_52069'):
                    key_fangtang = 'SCT62780TAOXMQTATDMwHL2CoNIIDxMok'
                    title_fangtang = '监控价格变成原价'
                    content_fangtang = '停止监控'
                    msg_push(title_fangtang, content_fangtang, key_fangtang)
                else:
                    # key_fangtang = 'SCT159621Ti5YeIFT8CteUL3ED2PHx5mMg'
                    key_fangtang = global_config.getRaw('config', 'key_fangtang')
                    title_fangtang = '监控价格变成原价'
                    content_fangtang = '停止监控'
                    msg_push(title_fangtang, content_fangtang, key_fangtang)
                return 'quit'

            # 判断有没有货
            if(stockState=='33' or stockState=='"现货"' or stockState=='"有货"'):
                return True
            elif(stockState=='34' or stockState=='"无货"'):
                # print(response_text)
                return False
            elif('对不起,该商品不存在' in response_text):
                logger.info('对不起,商品不存在,拿不到信息')
                return 'quit'
            elif(stockState=='36'):
                logger.info('已下架或不支持售卖')
                return 'quit'
            elif(stockState=='40'):
                logger.info('状态40,该地址在该区域暂不支持销售')
                return False
            else:
                logger.info('其他返回值:%s'%stockState)
    except requests.exceptions.RequestException as e:
        # 注意,这个return不能改,改了就报错,因为后面都用这个字段来判断
        print('error:%s'%e)
        return 'ip错误'
        #pass
    except Exception as e:
        # logger.info('check接口其他错误',response_text)
        #pass
        return 'ip错误'

# 6月6日研究库存监控
# @retry(attempt=3)
def check_noip():
    # 柠檬助手接口1,好像接口2差别不大
    url='https://wq.jd.com/itemv3/h5draw?sku=%s&isJson=1&source=h5v3&g_login_type=0&g_tk=1946297207&g_ty=ajax'%(global_config.getRaw('config','sku_id'))
    headers={
        'Accept-Language':'zh-cn,zh;q=0.5',
        'Accept-Charset':'utf8',
        'Cookie':'cid=3;wq_addr=5929863146%7C{}%7C%u5185%u8499%u53E4_%u547C%u548C%u6D69%u7279%u5E02_%u548C%u6797%u683C%u5C14%u53BF_%u76DB%u4E50%u7ECF%u6D4E%u5DE5%u4E1A%u56ED%u533A%7C%u5185%u8499%u53E4%u547C%u548C%u6D69%u7279%u5E02%u548C%u6797%u683C%u5C14%u53BF%u76DB%u4E50%u7ECF%u6D4E%u5DE5%u4E1A%u56ED%u533A%u80DC%u5229%u8DEF%u4E0E%u65B0%u6C11%u8857%u4EA4%u53C9%u53E3%u897F%u5357%u4FA7VV%20Bar%7C111.775848%2C40.491539'.format(global_config.getRaw('config','areaNo')),
        # 'Cookie':'cid=3;wq_addr=5929863146|19_1657_4080|内蒙古_呼和浩特市_和林格尔县_盛乐经济工业园区|内蒙古呼和浩特市和林格尔县盛乐经济工业园区胜利路与新民街交叉口西南侧VV Bar|111.775848,40.491539',
        'User-Agent':'Dalvik/2.1.0 (Linux; U; Android 5.1.1; mx5 Build/LYZ28N)',
        'Host':'wq.jd.com','Connection':'Keep-Alive','Accept-Encoding':'gzip'}
    try:
        if(len(cookies_list)==0):
            print('当前没有可用的账号,停止监控')
            os.system('pause')
        response = requests.get(url=url,headers=headers,verify=False,timeout=(2,6))
        response_text = response.text
        # print(response_text)
        stockState=re_find('"hasStock":.*,"stockState":(.*?),"skuType"', response_text)[0]
        # print('货物状态是:%s'%stockState)
        if('抱歉,该商品已下架' in response_text):
            # logger.info('商品还未上架')
            return '商品还未上架'
        else:
            jdPrice = re_find('"jdPrice":"(.*?)",', response_text)[0]
            # print(jdPrice)
            # 如果小于监控价格再下单,大于就直接抛出去
            if(float(jdPrice)<float(global_config.getRaw('config','jdPrice'))):
                pass
            else:
                logger.info('监控价格大于预设价格,即将停止,现在价格是:%s,监控价格是:%s'%(jdPrice,global_config.getRaw('config','jdPrice')))
                if(global_config.getRaw('config', 'areaNo') == '8_560_50826_52069'):
                    key_fangtang = 'SCT62780TAOXMQTATDMwHL2CoNIIDxMok'
                    title_fangtang = '监控价格变成原价'
                    content_fangtang = '停止监控'
                    msg_push(title_fangtang, content_fangtang, key_fangtang)
                else:
                    # key_fangtang = 'SCT159621Ti5YeIFT8CteUL3ED2PHx5mMg'
                    key_fangtang = global_config.getRaw('config', 'key_fangtang')
                    title_fangtang = '监控价格变成原价'
                    content_fangtang = '停止监控'
                    msg_push(title_fangtang, content_fangtang, key_fangtang)
                return 'quit'

            # 判断有没有货
            if(stockState=='33' or stockState=='"现货"' or stockState=='"有货"'):
                return True
            elif(stockState=='34' or stockState=='"无货"'):
                return False
            elif('对不起,该商品不存在' in response_text):
                logger.info('对不起,商品不存在,拿不到信息')
                return 'quit'
            elif(stockState=='36'):
                logger.info('状态36,已下架或不支持售卖')
                return 'quit'
            elif(stockState=='40'):
                logger.info('状态40,该地址在该区域暂不支持销售')
                return False
            else:
                logger.info('其他返回值:%s'%stockState)
    except requests.exceptions.RequestException as e:
        # 注意,这个return不能改,改了就报错,因为后面都用这个字段来判断
        print('error:%s'%e)
        # return 'ip错误'
        pass
    except Exception as e:
        # logger.info('check接口其他错误',response_text)
        pass
        # return 'ip错误'

def re_find(zhengze,response_text):
    rule = re.compile(zhengze,re.DOTALL)
    pipei = re.findall(rule,response_text)
    return pipei

# 这个run不能动,它涉及到的是所有流程,牵一发动全身
def run(ck_one, *ips):
    # 实例化类,ips如果是不传参,那就不用代理;如果传参,那就用代理
    if(ips):
        main_jincheng = Son_data(ck_one, ips)
    else:
        main_jincheng = Son_data(ck_one)
    # 1.校验ck
    if(main_jincheng.verify_cookies()):
        # 2.预约
        main_jincheng.make_reserve()
        # 3.清除购物车
        main_jincheng.clear_shop_list()
        # 4.拿traceId
        if(main_jincheng.get_traceId()):
            # logger.info('success')
            Timer().start()
            # 5.开始秒杀
            main_jincheng.submit_seckill_order()
        else:
            print('get_traceId接口返回false')
    else:
        print('ck不正确,无法登录')

def run_before(ck_one,*ips):
    # 实例化类,ips如果是不传参,那就不用代理;如果传参,那就用代理
    if (ips):
        main_jincheng = Son_data(ck_one, ips)
    else:
        main_jincheng = Son_data(ck_one)
    # 1.校验ck
    if (main_jincheng.verify_cookies()):
        # 2.预约
        main_jincheng.make_reserve()
        # 3.清除购物车
        main_jincheng.clear_shop_list()
        # 4.拿traceId
        if (main_jincheng.get_traceId()):
            # logger.info('success')
            pass
        else:
            print('get_traceId接口返回false')
    else:
        print('ck不正确,无法登录')

def run_2(ck_one,*ips):
    # 实例化类,ips如果是不传参,那就不用代理;如果传参,那就用代理
    if(ips):
        main_jincheng = Son_data(ck_one, ips)
    else:
        main_jincheng = Son_data(ck_one)
    main_jincheng.submit_seckill_order()

def main():
    daili_ip_number =check_dailiip()
    if(daili_ip_number):
        if(int(daili_ip_number) <= int(global_config.getRaw('config', 'warning_ip_number'))):
            logger.info('程序即将停止,代理ip数字低于警戒线')
            time.sleep(5)
            sys.exit()
        else:
            print('剩余%s条代理'%int(daili_ip_number))

    # 全部ck
    # cookies_list=get_cookies_all()

    # 通过type字段判断代理模式
    type = global_config.getRaw('config', 'type_daili')
    # print(type)
    # 如果使用代理
    if(type == '1'):
        # print('使用代理ip')
        if (global_config.getRaw('config', 'api')):
            threads_pool = []
            # 创建线程
            threads_pool_number = int(len(cookies_list))
            print('使用代理ip,有%d条ck' % threads_pool_number)
            Timer().start_ip()
            # 2022年3月3日更新,如果check是1,证明检查代理ip;2则不检查代理ip
            ip_list = []
            # 如果筛选代理
            if (global_config.getRaw('config', 'check_ip') == '1'):
                ip_list = get_IP_port_list(threads_pool_number)
            # 如果不筛选代理
            elif(global_config.getRaw('config', 'check_ip') == '2'):
                ip_list=get_IP_port_list_not_check(threads_pool_number)
            print('正在使用的代理ip是:%s'%ip_list)
            for i in range(threads_pool_number):
                run_before(cookies_list[i])
            # 多线程并发
            # for i in range(threads_pool_number):
            #     thread = threading.Thread(target=run_before, args=(cookies_list[i], ip_list[i]))
            #     threads_pool.append(thread)
            # for i in range(threads_pool_number):
            #     threads_pool[i].start()
            # for i in range(threads_pool_number):
            #     threads_pool[i].join()
            Timer().start()
            for i in range(threads_pool_number):
                thread = threading.Thread(target=run_2, args=(cookies_list[i],ip_list[i]))
                threads_pool.append(thread)
            for i in range(threads_pool_number):
                threads_pool[i].start()
            for i in range(threads_pool_number):
                threads_pool[i].join()
    # 如果不使用代理
    elif(type == '2'):
        threads_pool = []
        # 创建线程
        threads_pool_number = len(cookies_list)
        print('有%s条ck:' % threads_pool_number)
        for i in range(threads_pool_number):
            run_before(cookies_list[i])
        # 多线程并发
        # for i in range(threads_pool_number):
        #     thread = threading.Thread(target=run, args=(cookies_list[i],))
        #     threads_pool.append(thread)
        # for i in range(threads_pool_number):
        #     threads_pool[i].start()
        # for i in range(threads_pool_number):
        #     threads_pool[i].join()

        Timer().start()
        for i in range(threads_pool_number):
            thread = threading.Thread(target=run_2, args=(cookies_list[i],))
            threads_pool.append(thread)
        for i in range(threads_pool_number):
            threads_pool[i].start()
        for i in range(threads_pool_number):
            threads_pool[i].join()
    else:
        print('检查config.ini里type的值是否正确')

# 监控用:清车加车
def run_jiankong_qingche(ck_one):
    main_jincheng = Son_data(ck_one)
    # 1.清空购物车
    main_jincheng.clear_shop_list()
    # 2.再加入购物车
    main_jincheng.get_traceId()

# 监控用,创建并发任务,这是用来清车的
def main_jiankong_qingche():
    # cookies_list = get_cookies_all()
    threads_pool = []
    # 创建线程
    threads_pool_number = len(cookies_list)
    logger.info('有%s条ck:' % threads_pool_number)
    # 多线程并发
    for i in range(threads_pool_number):
        thread = threading.Thread(target=run_jiankong_qingche, args=(cookies_list[i],))
        threads_pool.append(thread)
    for i in range(threads_pool_number):
        threads_pool[i].start()
    for i in range(threads_pool_number):
        threads_pool[i].join()

# 监控下单
def run_jiankong(ck_one):
    main_jincheng = Son_data(ck_one)
    main_jincheng.submit_seckill_order()

# 创建线程,设置并发提交任务
def main_jiankong():
    # cookies_list = get_cookies_all()
    threads_pool = []
    # 创建线程
    threads_pool_number = len(cookies_list)
    logger.info('有%s条ck:' % threads_pool_number)
    # 多线程并发
    for i in range(threads_pool_number):
        thread = threading.Thread(target=run_jiankong, args=(cookies_list[i],))
        threads_pool.append(thread)
    for i in range(threads_pool_number):
        threads_pool[i].start()
    for i in range(threads_pool_number):
        threads_pool[i].join()

# 监控主流程
#@retry(attempt=3)
def check_main():
    time.sleep(1)
    # 6月6日,开始写监控代码,如果监控到,就提交一次traceId接口和submit接口
    # 注意,这个接口是清车然后加车,不是单纯的清车
    main_jiankong_qingche()
    # 然后再开始区分别的
    if(global_config.getRaw('config', 'areaNo') == '19_1657_4080'):
        print('开始监控地区:广东省中山市东升镇')
    elif(global_config.getRaw('config', 'areaNo') == '8_560_50826_52069'):
        print('开始监控地区:辽宁省沈阳市白塔镇')
    else:
        print('开始监控地区:这个地区我是拿不到的,不过不影响接下去的流程')
        # logger.info('保证areaNo正确就行,这块没有功能')
    # 6月10日,在这里加一个定时器校验
    Timer().start()
    ips=''
    if(global_config.getRaw('config','monitor')=='2'):
        ips=''
    elif(global_config.getRaw('config','monitor')=='1'):
        ips = get_IP_port_list_not_check(1)[0]
        print('当前使用代理ip:%s'%ips)
    number = 0
    number_error_daili = 0
    number_chongshi_cishu=0
    wuxian=global_config.getRaw('config', 'wuxian')
    while number < int(global_config.getRaw('config','cishu')):
        sleep_time=float(global_config.getRaw('config','sleep_time'))
        time.sleep(sleep_time)
        number += 1
        check_ips=check(ips)
        if(check_ips == True):
            logger.info('%s:有货'%number)
            # 2022年8月10日修改,去掉一些参数
            main_jiankong()
            # 1.只提交1次
            if(wuxian=='1'):
                main_jiankong()
                break
            # 无限提交
            elif(wuxian=='2'):
                print('wuxian==2,会继续运行,休息5s')
                time.sleep(5)
            else:
                logger.info('wuxian字段看看是不是写错了,程序即将退出。')
                time.sleep(5)
                break
        elif(check_ips == False):
            logger.info('%s:无货'%number)
            continue
        elif(check_ips=='商品还未上架'):
            logger.info('%s:商品还未上架'%number)
            continue
        elif(check_ips == 'ip错误'):
            # logger.info('%s:ip错误' % number)
            number_error_daili+=1
            # 每次报错之后归0,这样才能持续下去
            if(number_error_daili>=int(global_config.getRaw('config','switch_ip_cishu'))):
                # print('number_error_daili次数:%s'%number_error_daili)
                ips = get_IP_port_list_not_check(1)[0]
                print('报错次数:%s,切换ip:%s'%(number_error_daili,ips))
                number_error_daili = 0
            else:
                pass
            continue
        elif(check_ips=='quit'):
            print('check_ips返回quit')
            break
        else:
            continue

# 监控主流程
#@retry(attempt=3)
def check_main_noip():
    time.sleep(1)
    # 6月6日,开始写监控代码,如果监控到,就提交一次traceId接口和submit接口
    # 注意,这个接口是清车然后加车,不是单纯的清车
    main_jiankong_qingche()
    # 然后再开始区分别的
    if(global_config.getRaw('config', 'areaNo') == '19_1657_4080'):
        print('开始监控地区:广东省中山市东升镇')
    elif(global_config.getRaw('config', 'areaNo') == '8_560_50826_52069'):
        print('开始监控地区:辽宁省沈阳市白塔镇')
    else:
        print('开始监控地区:这个地区我是拿不到的,不过不影响接下去的流程')
        # logger.info('保证areaNo正确就行,这块没有功能')
    # 6月10日,在这里加一个定时器校验
    Timer().start()
    number = 0
    number_error_daili = 0
    number_chongshi_cishu=0
    wuxian=global_config.getRaw('config', 'wuxian')
    while number < int(global_config.getRaw('config','cishu')):
        sleep_time=float(global_config.getRaw('config','sleep_time'))
        time.sleep(sleep_time)
        number += 1
        check_ips=check_noip()
        if(check_ips == True):
            logger.info('%s:有货'%number)
            # 2022年8月10日修改,去掉一些参数
            main_jiankong()
            # 1.只提交1次
            if(wuxian=='1'):
                main_jiankong()
                break
            # 无限提交
            elif(wuxian=='2'):
                print('wuxian==2,会继续运行,休息5s,因为提交')
                time.sleep(5)
            else:
                logger.info('wuxian字段看看是不是写错了,程序即将退出。')
                time.sleep(5)
                break
        elif(check_ips == False):
            logger.info('%s:无货'%number)
            continue
        elif(check_ips=='商品还未上架'):
            logger.info('%s:商品还未上架'%number)
            continue
        elif(check_ips == 'ip错误'):
            # logger.info('%s:ip错误' % number)
            number_error_daili+=1
            # 每次报错之后归0,这样才能持续下去
            if(number_error_daili>=int(global_config.getRaw('config','switch_ip_cishu'))):
                # print('number_error_daili次数:%s'%number_error_daili)
                ips = get_IP_port_list_not_check(1)[0]
                print('报错次数:%s,切换ip:%s'%(number_error_daili,ips))
                number_error_daili = 0
            else:
                pass
            continue
        elif(check_ips=='quit'):
            print('check_ips返回quit')
            break
        else:
            continue


if __name__ == '__main__':
    warnings.filterwarnings("ignore")
    sku_id = global_config.getRaw('config', 'sku_id')

    # 全部cookies
    cookies_list = get_cookies_all()
    cookies_success = []
    # cookie='guid=e3e7e071dc3ba2ca7ad86ce862294e7f5dc102641c87fe5482fe929311d28aec;EXPIRES=Fri;PATH=/;DOMAIN=.jd.com;pt_key=app_openAAJisBNMAED5y81ipq3U8pyct9M-QQiRGbaCIXYqnowt6SK1wLLhVkyD6nmVlSoLcvzkw1qZWg4zqJQONUh2WRxwo47Cn9go;pt_pin=jd_zBTuIEB3FyFAfTA;pwdt_id=jd_zBTuIEB3FyFAfTA;sid=38359c00c9cd96fea8d808bb31c8f9bw;thor1='
    # main_xiancheng=Son_data(cookie).order_daifukuan()
    # main()
    multiprocessing.freeze_support()
    miaosha_type = get_miaosha_type()
    sku_name = get_item_title()
    print('商品是:%s,抢购模式是:%s,开始时间是:%s' %(sku_name, miaosha_type,global_config.getRaw('config','buy_time')))

    if(miaosha_type=='抢购模式'):
        logger.info('抢购模式,程序停止')
    else:
        jiankong=global_config.getRaw('config','jiankong')
        if(jiankong=='1'):
            main()
        elif(jiankong=='2'):
            main()
            if(global_config.getRaw('config','monitor')=='2'):
                check_main_noip()
            else:
                check_main()
        elif(jiankong=='3'):
            if(global_config.getRaw('config','monitor')=='2'):
                check_main_noip()
            else:
                check_main()
        else:
            print('jiankong写错了,不能运行')

    time.sleep(3)
    os.system('pause')

疫情时候,用这个脚本,买到过一些稀缺药品,后期加了监控功能,这个脚本用的是京喜m端的接口,这个端介于web和app,类似QQ或微信分享出来的一个页面。

步骤:

  1. 获取商品信息:  get_item_title()
  2. 获取秒杀类型:get_miaosha_type(),京东部分秒杀商品采用的抢购模式,也就是说只能是APPCK抢购,我这个本质上是M端
  3. 预约商品:make_reserve
  4. 清空购物车、加入购物车:clear_shop_list、check_shop_list
  5. 获取traceId:get_traceId,这是最后一步必需的入参
  6. 提交秒杀:这里使用了2种秒杀接口,也尝试过使用代理ip,成功后,会写入本地文件并且发送邮件,付款是通过其他工具完成的,这里不涉及
  7. 时间校准代码如下
# -*- coding:utf-8 -*-
import time,requests,json,random,datetime,warnings
from jd_logger import logger
from config import global_config
warnings.filterwarnings("ignore")

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2226.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2224.3 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 4.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36",
    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36",
    "Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.3319.102 Safari/537.36",
    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.2309.372 Safari/537.36",
    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.2117.157 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36",
    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1866.237 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/4E423F",
    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36 Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.517 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36",
    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1623.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.17 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.62 Safari/537.36",
    "Mozilla/5.0 (X11; CrOS i686 4319.74.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1468.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1467.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1464.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1500.55 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36",
    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.90 Safari/537.36",
    "Mozilla/5.0 (X11; NetBSD) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36",
    "Mozilla/5.0 (X11; CrOS i686 3912.101.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.60 Safari/537.17",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17",
    "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.15 (KHTML, like Gecko) Chrome/24.0.1295.0 Safari/537.15",
    "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.14 (KHTML, like Gecko) Chrome/24.0.1292.0 Safari/537.14"]

# 生成随机header
def get_random_useragent():
    return random.choice(USER_AGENTS)


def retry(attempt):
    """装饰器,重试函数"""
    def decorator(func):
        def wrapper(*args, **kw):
            att = 0
            while att < attempt:
                try:
                    return func(*args, **kw)
                except Exception as e:
                    att += 1
                    print('****timer.py接口同步时间:第%s回****'%att)
                    time.sleep(0.2)
        return wrapper
    return decorator

def ms_date(timestamp):
    # 转换成localtime
    # time_local = time.localtime(timestamp / 1)
    d = datetime.datetime.fromtimestamp(timestamp / 1000)
    # 精确到毫秒
    str1 = d.strftime("%Y-%m-%d %H:%M:%S.%f")
    return str1  # 2019-10-11 14:15:56.514000

class Timer(object):
    def __init__(self):
        # '2018-09-28 22:45:50.000'
        self.buy_time = datetime.datetime.strptime(global_config.getRaw('config','buy_time'),"%Y-%m-%d %H:%M:%S.%f")
        self.buy_time_ms = int(time.mktime(self.buy_time.timetuple())*1000.0+self.buy_time.microsecond / 1000)

        # 后续补充的,用来做代理ip
        self.ip_time= self.buy_time-datetime.timedelta(minutes=2)
        self.ip_time_ms = int(time.mktime(self.ip_time.timetuple())*1000.0+self.buy_time.microsecond / 1000)
        # print(self.ip_time)
        # 休息时间0.2秒
        self.sleep_interval = 0.1
        # diff_time就是京东和本地的时间差
        # self.diff_time = self.local_jd_time_diff()
        # 时间差
        self.before_time=int(global_config.getRaw('config','before_time'))

    @retry(attempt=3)
    def jd_time_ms(self):
        """
        从京东服务器获取时间毫秒
        """
        # url = 'https://a.jd.com//ajax/queryServerData.html'
        # 11月3日自己更新一次,原接口已经无法使用了
        try:
            url='https://api.m.jd.com/client.action?functionId=queryMaterialProducts&client=wh5'
            resp = requests.get(url=url,headers={
                # 'user-agent': get_random_useragent(),
                'user-agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
                # 'Cache-Control':'max-age=0',
                'Host':'api.m.jd.com',
                'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
                'Accept-Encoding':'gzip, deflate, br',
                'Accept-Language':'zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7','Upgrade-Insecure-Requests':'1',
                'cookie':'shshshfpa=9be939cd-c5b6-ab05-76a5-162a201551a4-1622131305; whwswswws=; shshshfpb=o3aXLaKT%2BtZT2IJ05zHzxJg%3D%3D; __jdu=1639554356002396140445; webp=1; mba_muid=1639554356002396140445; visitkey=71048173188271857; _gia_s_local_fingerprint=74ba373362e28abad3c43f790c22676d; jcap_dvzw_fp=f9z_NvZhWQgZxYbzDDjXXRsofhWEHEkDyu7337REfc3zyajQisMTfSdrPH5BBQpOb_L0jQ==; TrackerID=Ewb-0n2tr5LbBo_DL87N_vxDB_cv1tUlyh2seNFt8SLOtW5D599BX9sMYI_RGR3nRi_8xbBGmixhwZPd06dHdrUmLsTwqwP1siLa-P-Fgl4; 3AB9D23F7A4B3C9B=QBYYB64LYBPQVNKQ3PS4KWEMRJLLMAQBNVS6MF4KQQM3S2XLT3KFXVJJ7W7AY5QLYEARTRBLJLQOKQ4DLSSIFILWF4; __jda=122270672.1639554356002396140445.1639554356.1646791762.1647415594.12; shshshfp=2b89e9130276f718c3ad111f9e42b9ad; e_wq_addr=DJUmEJSyCNc5EMU3GzrpDJYmXzUmENS2XzUyCNc0TJdNTXU4HuTOTXU1GtqnXyV1DuC4EMV1EJYzCyV1DUUmCv8vdJZODJOvdJUzDJcvdJUzC0PpTXU1D0DPTXU1CzDLTJdNTXU4HuTOTXU1GtqnTXU2Gzq4TXU5DtCzTXU1HJKyTXU2HNUnTXU1CzU3TXU1CzDLTXU1D0DPTXU1CzDLTXU2Gzq4TXU5DtCzTXU2HNUnTXU1CzU3TXU1CzDLTXVQHtLNTXU3DtdOTXU1ENU0TXU2G0SzTXU0HJrNTXU4HOVQTXU5DtG0TXU4HuGnTXVQHtLNTXU1DuZOTXU5DtG1TXU4HtZQTXU0HUY2TXU1DuVOHIV1DJCzGJKzTXU1HUO3BI05TXU1GzGyTXVQHtLNTXU4GuY3TXU2CtUzTXU4HuG5TXU0HJTLTXU3DJC1TXU4GuHOCJCmDtY2Ctc5ENCvD0CnCtCkDNqyCtCnTJTNDNO=; jdAddrId=8_560_50826_52074; wq_addr=5509220798%7C8_560_50826_52074%7C%u8FBD%u5B81_%u6C88%u9633%u5E02_%u6D51%u5357%u533A_%u57CE%u533A%7C%u8FBD%u5B81%u6C88%u9633%u5E02%u6D51%u5357%u533A%u57CE%u533A%u6C88%u9633%u6D51%u5357%u533A%uFF0C%u767D%u5854%u6CB3%u4E8C%u8DEF%u9644%u8FD1%uFF0C%u56FD%u9645%u8F6F%u4EF6%u56EDE%u533A03%u5EA7--9%u5C42%uFF0C%u8BF7%u6253%u8FD9%u4E2A%u7535%u8BDD13066627983%7C123.482231%2C41; jdAddrName=%u8FBD%u5B81_%u6C88%u9633%u5E02_%u6D51%u5357%u533A_%u57CE%u533A; commonAddress=5509220798; regionAddress=8%2C560%2C50826%2C52074; mitemAddrId=8_560_50826_52074; mitemAddrName=%u8FBD%u5B81%u6C88%u9633%u5E02%u6D51%u5357%u533A%u57CE%u533A%u6C88%u9633%u6D51%u5357%u533A%uFF0C%u767D%u5854%u6CB3%u4E8C%u8DEF%u9644%u8FD1%uFF0C%u56FD%u9645%u8F6F%u4EF6%u56EDE%u533A03%u5EA7--9%u5C42%uFF0C%u8BF7%u6253%u8FD9%u4E2A%u7535%u8BDD13066627983; _gia_s_e_joint={"eid":"QBYYB64LYBPQVNKQ3PS4KWEMRJLLMAQBNVS6MF4KQQM3S2XLT3KFXVJJ7W7AY5QLYEARTRBLJLQOKQ4DLSSIFILWF4","ma":"","im":"","os":"Windows 8.1","ip":"123.56.217.252","ia":"","uu":"","at":"5"}; __wga=1647417038038.1647415631377.1646704870029.1646701468720.3.3'
            },verify=False)
            #print(resp.text)
            if('currentTime2' in resp.text):
                js = json.loads(resp.text)
                # currentTime2的值,是时间戳,类似1635916331691这种
                # print('使用京东时间参与计算')
                return int(js["currentTime2"])
            elif('currentTime2' not in resp.text):
                raise Exception('jd_time接口报错,重试')
        except:
            print('得不到京东时间,应该是时间服务器的ck过期了,需要重新获取,使用本地时间计算')
            return int(round(time.time() * 1000))

    def local_time(self):
        """
        获取本地毫秒时间
        """
        return int(round(time.time() * 1000))

    # @retry(attempt=3)
    def local_jd_time_diff(self):
        timer_diff = self.local_time()-self.jd_time_ms()
        return timer_diff

    def start(self):
        while True:
            # print('123123123')
            # 7月13日调整
            # diff_time=self.local_jd_time_diff()
            # 本地时间减去与京东的时间差,能够将时间误差提升到0.1秒附近
            # 具体精度依赖获取京东服务器时间的网络时间损耗
            #if(self.local_time()>=self.buy_time_ms):
            # 1 使用本地时间
            if(global_config.getRaw('config','time_calibration')=='1'):
                if(self.local_time()+self.before_time>= self.buy_time_ms):
                    logger.info('使用本地时间,时间到达,开始执行……')
                    break
                else:
                    time.sleep(self.sleep_interval)
            # 2 使用京东时间
            elif(global_config.getRaw('config','time_calibration')=='2'):
                # 2022年7月13日更新
                local_time=self.local_time()
                # local_jd_time_diff=self.local_jd_time_diff()
                # logger.info('start接口:本地时间减去京东时间:%s ms'%local_jd_time_diff)
                # # 如果需要购买的时间,减去本地时间,大于等于与京东服务器的差值
                # if(self.buy_time_ms-local_time<=local_jd_time_diff):
                # 2022年8月3日更新算法,如果京东时间戳大于等于现在的时间,立即运行
                jd_time_ms=self.jd_time_ms()
                if(jd_time_ms+self.before_time>=self.buy_time_ms):
                    jd_ms_date=ms_date(jd_time_ms)
                    # local_ms_date=ms_date(local_time)
                    # print(jd_time_ms)
                    # print(self.buy_time_ms)
                    time_cha=self.buy_time_ms-jd_time_ms
                    print('开始执行:京东时间是:%s,秒杀时间是:%s,设置提前:%sms,实际提前:%sms'%(jd_ms_date,global_config.getRaw('config','buy_time'),global_config.getRaw('config','before_time'),time_cha))
                    # logger.info('京东时间大于等于购买时间,时间到达,开始执行……')
                    break
                # 5000000是5000秒=83.33分钟
                elif(self.buy_time_ms-local_time>5000000):
                    print('等待5000秒')
                    time.sleep(5000)
                # 500000是500秒=8.333分钟
                elif(self.buy_time_ms-local_time>500000):
                    print('等待500秒')
                    time.sleep(500)
                # 50000是50秒
                elif(self.buy_time_ms-local_time>50000):
                    print('等待50秒')
                    time.sleep(50)
                # 10000是10秒
                elif(self.buy_time_ms-local_time>10000):
                    print('等待10秒')
                    time.sleep(10)
                # 如果购买时间和本地时间的差额大于2000毫秒,那就等2000毫秒,否则就按照0.2秒来停顿
                elif(self.buy_time_ms-local_time>2000):
                    print('等待2秒')
                    time.sleep(2)
                else:
                    pass
                    # time.sleep(self.sleep_interval)

    # 代理ip时间良莠不齐,很容易造成大量时间延迟
    def start_ip(self):
        while True:
            # # 设置代理,提前3分钟执行,代理ip持续5分钟,则应该能覆盖上
            # if(self.local_time()>=self.ip_time_ms):
            #     print('代理ip:时间到达,开始执行……')
            #     break
            # else:
            #     time.sleep(self.sleep_interval)
            local_time = self.local_time()
            # 如果差额时间大于500
            if(local_time>=self.ip_time_ms):
                logger.info('代理ip时间到达,开始执行……')
                break
            # 5000000是5000秒=83.33分钟
            elif(self.ip_time_ms - local_time > 5000000):
                print('等待5000秒')
                time.sleep(5000)
            # 500000是500秒=8.333分钟
            elif (self.ip_time_ms - local_time > 500000):
                print('等待500秒')
                time.sleep(500)
            # 50000是50秒
            elif (self.ip_time_ms - local_time > 50000):
                print('等待50秒')
                time.sleep(50)
            # 10000是10秒
            elif (self.ip_time_ms - local_time > 10000):
                print('等待10秒')
                time.sleep(10)
            # 如果购买时间和本地时间的差额大于2000毫秒,那就等2000毫秒,否则就按照0.2秒来停顿
            elif(self.ip_time_ms - local_time > 2000):
                print('等待2秒')
                time.sleep(2)
            else:
                pass

timer=Timer()

# timer.start()
# timer.jd_time()
# print(timer.ip_time,timer.buy_time)
# print(timer.start())

# timer.start_ip()

# print(timer.jd_time_ms())
# print(timer.local_time())
# print(timer.local_jd_time_diff())

——————————————————————————————————————————————————

8.小说爬虫案例

采用的笔趣阁小说网:https://www.57389b.sbs/#/,爬虫思路为:

  1. 首页:https://www.57389b.sbs/#/   得知首页后,需要知道小说的具体id,比如小说《13路末班车》,地址是:https://www.57389b.sbs/#/book/5329/
  2. 得知了图书的具体地址,需要爬取所有的章节数和章节名称,容易出现的问题,章节数过多或触发反爬机制时,页面显示“展开更多章节”,需点击按钮后,才能看到所有的章节信息。
  3. 再下一步,得知网址信息:https://www.57389b.sbs/#/book/5329/1.html,内容如下,容易出现的问题,反爬机制,隐藏小说内容,需点击后“点击报错”,刷新网页后,才能显示真实的内容。

重要代码如下:

from .base_crawler import BaseCrawler
from novel_crawler.utils.content_utils import ContentCleaner
from novel_crawler.utils.file_utils import FileManager
import time
from datetime import datetime
import concurrent.futures
import traceback
import signal
import sys


class NovelCrawler(BaseCrawler):
    """小说爬虫类"""

    def __init__(self, headless=True):
        super().__init__(headless)
        self.content_cleaner = ContentCleaner()
        self.file_manager = FileManager()
        # 设置异常中断的回馈
        self.current_data = None
        self.current_title = None
        # 设置Ctrl+C处理
        signal.signal(signal.SIGINT, self._handle_exit)

    def _handle_exit(self, sig, frame):
        """处理退出信号"""
        print(f"\n🛑 停止爬取,保存已下载内容...")
        if self.current_data and self.current_title:
            # 生成详细文件名
            start_chapter = self.current_data.get("start_chapter", 1)
            chapters_count = len(self.current_data["chapters"])
            if chapters_count > 0:
                end_chapter = start_chapter + chapters_count - 1
                filename = f"{self.current_title}_-{self.current_data['author']}_第{start_chapter}-{end_chapter}章_部分内容"
            else:
                filename = f"{self.current_title}_-{self.current_data['author']}_无内容"
            self.file_manager.save_novel_data(self.current_data, filename)
        sys.exit(0)

    def crawl_novel_parallel(self, novel_url, start_chapter=1, end_chapter=None, max_chapters=None, max_workers=3):
        """并行爬取章节,带进度保存"""
        print("💡 提示: 按 Ctrl+C 可停止并保存已下载内容")
        try:
            start_time = time.time()
            print(f"🚀 开始访问: {novel_url}")

            # 获取小说信息
            self.driver.get(novel_url)

            try:
                # 判断如果存在展开按钮,则点击:展开全部章节
                more_element = self.driver.find_element("dd.more a")
                if more_element:
                    print("点击:展开全部章节元素")
                    more_element.click()
            except:
                print("不存在:展开全部章节元素")
                pass

            self._wait_for_element(".info h1", timeout=10)

            # 获取小说信息
            title = self.driver.get_text(".info h1") or "未知标题"
            author_element = self.driver.find_element(".info .small span")
            author = author_element.text.strip()

            print(f"📖 正在爬取: {title} - {author}")

            # 获取所有章节列表
            all_chapters = self.driver.find_elements(".listmain dl dd a")
            # 计算实际爬取范围
            if end_chapter is not None:
                chapters = all_chapters[start_chapter - 1:end_chapter]
            elif max_chapters is not None:
                chapters = all_chapters[start_chapter - 1:start_chapter - 1 + max_chapters]
            else:
                chapters = all_chapters[start_chapter - 1:]

            print(f"📚 总章节数: {len(all_chapters)}")
            print(f"📖 爬取范围: 第{start_chapter}章 - 第{start_chapter - 1 + len(chapters)}章")
            print(f"📋 实际爬取: {len(chapters)} 个章节")
            print(f"⚡ 并行 worker 数: {max_workers}")

            # 初始化数据
            self.current_title = title
            self.current_data = {
                "title": title,
                "author": author,
                "chapters": [],
                "crawl_time": datetime.now().isoformat(),
                "start_chapter": start_chapter,
                "end_chapter": start_chapter - 1 + len(chapters)
            }

            # 准备章节数据
            chapter_data = []
            for i, chapter in enumerate(chapters, start_chapter):
                chapter_title = chapter.text.strip()
                if chapter_title:
                    chapter_data.append({
                        'index': i,
                        'title': chapter_title,
                        'url': chapter.get_attribute("href")
                    })

            # 并行爬取
            results = self._parallel_crawl_chapters(chapter_data, max_workers)

            # 过滤掉内容为None的章节
            valid_results = [r for r in results if r['content'] is not None]

            # 整理结果
            novel_data = {
                "title": title,
                "author": author,
                "chapters": valid_results,
                "crawl_time": datetime.now().isoformat(),
                "start_chapter": start_chapter,
                "end_chapter": start_chapter - 1 + len(chapters)
            }

            success_count = len(valid_results)

            # 保存文件 - 生成详细文件名
            if start_chapter == 1 and end_chapter is None and max_chapters is None:
                filename = f"{title}_作者-{author}_第{start_chapter}-{start_chapter - 1 + len(chapters)}章_完整版"
            else:
                filename = f"{title}_作者-{author}_第{start_chapter}-{start_chapter - 1 + len(chapters)}章"

            if success_count > 0:
                self.file_manager.save_novel_data(novel_data, filename)
            else:
                print("⚠️ 没有成功爬取到任何章节内容")

            # 总时长 - 修复章节数显示
            total_time = time.time() - start_time
            print(
                f"✅ 完成! 成功爬取 {success_count}/{len(chapter_data)} 章,耗时 {total_time:.1f}秒——{(total_time / 60):.1f} 分钟")

            # 清理引用
            self.current_data = None
            self.current_title = None

            return novel_data

        except Exception as e:
            print("❌ 主程序发生错误:")
            print(f"错误类型: {type(e).__name__}")
            print(f"错误信息: {e}")
            print(f"完整错误堆栈:")
            traceback.print_exc()
            raise e

    def _parallel_crawl_chapters(self, chapter_data, max_workers=3):
        """并行爬取章节内容 - 带重试机制和实时进度更新"""
        results = []
        # 设置总体超时时间(根据章节数量动态调整)
        overall_timeout = len(chapter_data) * 30  # 每个章节最多30秒

        with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
            # 第一轮:初始爬取
            future_to_chapter = {}
            for data in chapter_data:
                future = executor.submit(self._crawl_single_chapter_with_retry, data['url'], max_retries=2)
                future_to_chapter[future] = data

            # 收集第一轮结果
            completed = 0
            failed_chapters = []  # 记录失败的章节
            start_time = time.time()

            for future in concurrent.futures.as_completed(future_to_chapter):
                # 检查总体超时
                if time.time() - start_time > overall_timeout:
                    print(f"⏰ 总体超时,已耗时 {overall_timeout}秒,停止等待剩余任务")
                    break

                chapter = future_to_chapter[future]
                try:
                    # 设置单次future超时
                    content = future.result(timeout=30)

                    # 安全处理内容长度
                    word_count = len(content) if content is not None else 0
                    chapter_result = {
                        'index': chapter['index'],
                        'title': chapter['title'],
                        'content': content,
                        'word_count': word_count
                    }
                    results.append(chapter_result)

                    # 🔥 实时更新当前数据
                    if content and self.current_data:
                        self.current_data["chapters"].append({
                            "index": chapter['index'],
                            "title": chapter['title'],
                            "content": content,
                            "word_count": word_count
                        })

                    completed += 1

                    if content:
                        status = "✅"
                    else:
                        status = "⚠️"
                        failed_chapters.append(chapter)  # 记录失败章节用于重试

                    print(
                        f"{status} 完成第 {chapter['index']} 章 ({completed}/{len(chapter_data)}): {chapter['title'][:20]}...")

                except concurrent.futures.TimeoutError:
                    print(f"⏰ 第 {chapter['index']} 章超时,标记为失败")
                    results.append({
                        'index': chapter['index'],
                        'title': chapter['title'],
                        'content': None,
                        'word_count': 0
                    })
                    failed_chapters.append(chapter)
                    completed += 1
                except Exception as e:
                    print(f"❌ 第 {chapter['index']} 章失败: {str(e)[:50]}")
                    results.append({
                        'index': chapter['index'],
                        'title': chapter['title'],
                        'content': None,
                        'word_count': 0
                    })
                    failed_chapters.append(chapter)
                    completed += 1

            # 第二轮:重试失败的章节(最多重试2次)
            if failed_chapters:
                # 为重试设置更严格的超时
                retry_count = 0
                max_retries = 5

                print(f"\n🔄 开始重试 {len(failed_chapters)} 个失败的章节(最多{max_retries}次重试)...")

                while failed_chapters and retry_count < max_retries:
                    retry_count += 1
                    current_failed = failed_chapters.copy()
                    failed_chapters = []

                    print(f"\n🔄 第 {retry_count}/{max_retries} 轮重试...")

                    retry_future_to_chapter = {}
                    for chapter in current_failed:
                        future = executor.submit(self._crawl_single_chapter_with_retry, chapter['url'], max_retries=1)
                        retry_future_to_chapter[future] = chapter

                    # 收集重试结果
                    retry_completed = 0
                    retry_start_time = time.time()

                    for future in concurrent.futures.as_completed(retry_future_to_chapter):
                        # 重试单轮超时:30秒
                        if time.time() - retry_start_time > 30:
                            print(f"⏰ 重试轮次超时,停止当前轮次重试")
                            # 将未完成的重试添加到下一轮
                            for f, ch in retry_future_to_chapter.items():
                                if not f.done():
                                    failed_chapters.append(ch)
                            break

                        chapter = retry_future_to_chapter[future]
                        try:
                            content = future.result(timeout=15)  # 重试时使用更短的超时

                            # 更新结果
                            updated = False
                            for result in results:
                                if result['index'] == chapter['index']:
                                    if content:
                                        result['content'] = content
                                        result['word_count'] = len(content)
                                        # 🔥 更新当前数据
                                        if self.current_data:
                                            for chap in self.current_data["chapters"]:
                                                if chap["index"] == chapter['index']:
                                                    chap["content"] = content
                                                    chap["word_count"] = len(content)
                                                    break
                                        print(f"🔄 ✅ 重试成功第 {chapter['index']} 章: {chapter['title'][:20]}...")
                                    else:
                                        # 重试后仍然失败,添加到下一轮重试
                                        failed_chapters.append(chapter)
                                        print(f"🔄 ⚠️ 重试后仍无内容第 {chapter['index']} 章")
                                    updated = True
                                    break

                            if not updated:
                                print(f"🔄 ❓ 未找到章节 {chapter['index']} 的结果记录")

                            retry_completed += 1

                        except concurrent.futures.TimeoutError:
                            print(f"🔄 ⏰ 重试超时第 {chapter['index']} 章")
                            failed_chapters.append(chapter)
                            retry_completed += 1
                        except Exception as e:
                            print(f"🔄 ❌ 重试异常第 {chapter['index']} 章: {str(e)[:50]}")
                            failed_chapters.append(chapter)
                            retry_completed += 1

        # 按章节序号排序
        results.sort(key=lambda x: x['index'])

        # 统计结果
        success_count = sum(1 for r in results if r['content'] is not None)
        print(f"\n📊 爬取结果统计: 成功 {success_count}/{len(chapter_data)} 章")
        if success_count < len(chapter_data):
            print(f"⚠️  有 {len(chapter_data) - success_count} 章未能获取内容")

        return results

    def _crawl_single_chapter_with_retry(self, chapter_url, max_retries=3):
        """带重试的章节爬取 - 添加超时限制"""
        for attempt in range(max_retries + 1):
            try:
                # 设置每次尝试的超时
                with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
                    future = executor.submit(self._crawl_single_chapter_thread, chapter_url)
                    try:
                        content = future.result(timeout=15)  # 单次尝试最多15秒
                        if content:
                            return content
                        elif attempt < max_retries:
                            wait_time = (attempt + 1) * 2
                            time.sleep(wait_time)
                    except concurrent.futures.TimeoutError:
                        if attempt < max_retries:
                            wait_time = (attempt + 1) * 2
                            time.sleep(wait_time)
                        else:
                            return None
            except Exception as e:
                if attempt < max_retries:
                    wait_time = (attempt + 1) * 2
                    time.sleep(wait_time)
                else:
                    return None
        return None

    def _crawl_single_chapter_thread(self, chapter_url):
        """单线程章节爬取 - 使用 Selenium 版本"""
        from seleniumbase import Driver
        import random
        import threading

        # 添加线程标识,用于调试
        thread_id = threading.current_thread().ident
        # print(f"🔍 线程 {thread_id} 开始处理章节: {chapter_url[-20:]}...")

        try:
            # 每个线程创建自己的driver - 增加更多反检测参数
            thread_driver = Driver(
                browser="chrome",
                headless=True,
                undetectable=True,
                guest_mode=True,
                block_images=True,
                incognito=True,
                no_sandbox=True,
                disable_gpu=True,
                # disable_dev_shm_usage=True,
                # disable_blink_features="AutomationControlled",
                # user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
            )

            try:
                # 访问前添加随机延迟
                time.sleep(random.uniform(1.0, 2.5))

                thread_driver.get(chapter_url)

                # 随机等待时间,模拟人类阅读
                wait_time = random.uniform(2.0, 4.0)
                time.sleep(wait_time)

                # 首先尝试直接提取内容,检查是否需要点击报错按钮
                content_selectors = ["#chaptercontent"]
                content = None

                # 第一次尝试:直接提取内容
                for selector in content_selectors:
                    try:
                        elements = thread_driver.find_elements(selector)
                        if elements:
                            content_text = elements[0].text
                            if content_text and len(content_text.strip()) > 300:  # 如果内容长度大于300
                                content = self.content_cleaner.clean_content(content_text)
                                print(f"📄 直接找到完整内容,长度: {len(content)} 字符")
                                return content
                            elif content_text and 0 < len(content_text.strip()) <= 300:
                                # 内容太短,需要点击报错按钮
                                print(f"⚠️ 内容过短 ({len(content_text.strip())} 字符),需要点击报错按钮")
                                break
                    except Exception as e:
                        continue

                # 如果没找到足够内容,尝试点击"点此报错"按钮
                if not content:
                    clicked = False
                    try:
                        # 根据提供的HTML结构更新选择器
                        error_link_selectors = [
                            "a.ll[href*='javascript:chapter_error']",
                            "a[href*='javascript:chapter_error']",
                            "a.ll"
                        ]

                        for selector in error_link_selectors:
                            try:
                                error_links = thread_driver.find_elements(selector)
                                if error_links:
                                    for link in error_links:
                                        try:
                                            link_text = link.text.strip()
                                            if link_text and "点此报错" in link_text:
                                                if link.is_displayed() and link.is_enabled():
                                                    # 模拟人类点击前的小延迟
                                                    time.sleep(random.uniform(0.3, 0.8))
                                                    link.click()
                                                    print(f"✅ 点击了'点此报错'按钮")
                                                    clicked = True
                                                    # 随机等待内容加载
                                                    load_time = random.uniform(2.0, 3.5)
                                                    time.sleep(load_time)
                                                    break
                                        except:
                                            continue
                                    if clicked:
                                        break
                            except:
                                continue

                        if clicked:
                            # 等待内容刷新
                            refresh_time = random.uniform(2.5, 4.0)
                            time.sleep(refresh_time)

                            # 点击报错后再次尝试提取内容
                            for selector in content_selectors:
                                try:
                                    elements = thread_driver.find_elements(selector)
                                    if elements:
                                        content_text = elements[0].text
                                        if content_text and len(content_text.strip()) > 100:
                                            content = self.content_cleaner.clean_content(content_text)
                                            print(f"📄 点击报错后找到内容,长度: {len(content)} 字符")
                                            break
                                except Exception as e:
                                    continue

                    except Exception as e:
                        print(f"⚠️ 处理报错按钮时出错: {str(e)[:50]}")

                # 如果还是没有内容,尝试其他选择器
                if not content:
                    backup_selectors = ["#content", ".content", ".chapter-content", ".txtnav"]
                    for selector in backup_selectors:
                        try:
                            elements = thread_driver.find_elements(selector)
                            if elements:
                                content_text = elements[0].text
                                if content_text and len(content_text.strip()) > 300:
                                    content = self.content_cleaner.clean_content(content_text)
                                    print(f"📄 使用备用选择器 '{selector}' 找到内容,长度: {len(content)} 字符")
                                    break
                        except Exception as e:
                            continue

                if not content:
                    print(f"⚠️ 未找到任何内容")

                return content

            finally:
                thread_driver.quit()

        except Exception as e:
            print(f"❌ 单章爬取异常: {str(e)[:50]}")
            return None

效果如下:

由于采用的是seleniumBase库,它的做法就是,通过 chrome driver 启动浏览器,再获取页面的数据,更偏向于自动化,而不是接口自动化。所以速度相较于scrapy慢很多,但是它的稳定性和准确率比较高,364章,未爬到1章,99.72%的成功率还是可以接受的,如果再多加一点重试次数,准确率更高,但是可能时间更长。

总结

爬虫还可以做一些很方便的事情,比如有些单位每天要安排下一个工作日的任务、爬取一些想要的知识、完成日报等等,都可以用爬虫来写。在上一家公司,调用过用友的接口,上架过一千多个机床设备信息,爬虫用在工作上,非常方便。

 

END