當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Scrapy爬虫（6）爬取银行理财产品并存入MongoDB（共12w+数据）

發(fā)布時間：2025/5/22 编程问答 16 豆豆

生活随笔收集整理的這篇文章主要介紹了 Scrapy爬虫（6）爬取银行理财产品并存入MongoDB（共12w+数据）小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

??本次Scrapy爬蟲的目標是爬取“融360”網(wǎng)站上所有銀行理財產(chǎn)品的信息，并存入MongoDB中。網(wǎng)頁的截圖如下，全部數(shù)據(jù)共12多萬條。

??我們不再過多介紹Scrapy的創(chuàng)建和運行，只給出相關(guān)的代碼。關(guān)于Scrapy的創(chuàng)建和運行，有興趣的讀者可以參考：Scrapy爬蟲（4）爬取豆瓣電影Top250圖片。
??修改items.py，代碼如下，用來儲存每個理財產(chǎn)品的相關(guān)信息，如產(chǎn)品名稱，發(fā)行銀行等。

import scrapy class BankItem(scrapy.Item):# define the fields for your item here like:name = scrapy.Field()bank = scrapy.Field()currency = scrapy.Field()startDate = scrapy.Field()endDate = scrapy.Field()period = scrapy.Field()proType = scrapy.Field()profit = scrapy.Field()amount = scrapy.Field()

??創(chuàng)建爬蟲文件bankSpider.py，代碼如下，用來爬取網(wǎng)頁中理財產(chǎn)品的具體信息。

import scrapy from bank.items import BankItemclass bankSpider(scrapy.Spider):name = 'bank'start_urls = ['https://www.rong360.com/licai-bank/list/p1']def parse(self, response):item = BankItem()trs = response.css('tr')[1:]for tr in trs:item['name'] = tr.xpath('td[1]/a/text()').extract_first()item['bank'] = tr.xpath('td[2]/p/text()').extract_first()item['currency'] = tr.xpath('td[3]/text()').extract_first()item['startDate'] = tr.xpath('td[4]/text()').extract_first()item['endDate'] = tr.xpath('td[5]/text()').extract_first()item['period'] = tr.xpath('td[6]/text()').extract_first()item['proType'] = tr.xpath('td[7]/text()').extract_first()item['profit'] = tr.xpath('td[8]/text()').extract_first()item['amount'] = tr.xpath('td[9]/text()').extract_first()yield itemnext_pages = response.css('a.next-page')if len(next_pages) == 1:next_page_link = next_pages.xpath('@href').extract_first() else:next_page_link = next_pages[1].xpath('@href').extract_first()if next_page_link:next_page = "https://www.rong360.com" + next_page_linkyield scrapy.Request(next_page, callback=self.parse)

??為了將爬取的數(shù)據(jù)儲存到MongoDB中，我們需要修改pipelines.py文件，代碼如下：

# pipelines to insert the data into mongodb import pymongo from scrapy.conf import settingsclass BankPipeline(object):def __init__(self):# connect databaseself.client = pymongo.MongoClient(host=settings['MONGO_HOST'], port=settings['MONGO_PORT'])# using name and password to login mongodb# self.client.admin.authenticate(settings['MINGO_USER'], settings['MONGO_PSW'])# handle of the database and collection of mongodbself.db = self.client[settings['MONGO_DB']]self.coll = self.db[settings['MONGO_COLL']] def process_item(self, item, spider):postItem = dict(item)self.coll.insert(postItem)return item

其中的MongoDB的相關(guān)參數(shù)，如MONGO_HOST, MONGO_PORT在settings.py中設(shè)置。修改settings.py如下：

ROBOTSTXT_OBEY = False

ITEM_PIPELINES = {‘bank.pipelines.BankPipeline’: 300}

添加MongoDB連接參數(shù)

MONGO_HOST = "localhost" # 主機IP MONGO_PORT = 27017 # 端口號 MONGO_DB = "Spider" # 庫名 MONGO_COLL = "bank" # collection名 # MONGO_USER = "" # MONGO_PSW = ""

其中用戶名和密碼可以根據(jù)需要添加。

??接下來，我們就可以運行爬蟲了。運行結(jié)果如下：

共用時3小時，爬了12多萬條數(shù)據(jù)，效率之高令人驚嘆！
??最后我們再來看一眼MongoDB中的數(shù)據(jù)：

??Perfect！本次分享到此結(jié)束，歡迎大家交流~~

總結(jié)

以上是生活随笔為你收集整理的Scrapy爬虫（6）爬取银行理财产品并存入MongoDB（共12w+数据）的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【60岁老人年审】老来网app养老保险年
下一篇： CentOS6.7 安装hadoop2.

国产亚洲精品久久久久动-影视先锋中文字幕-av网站在线观看一区-亚洲视频 在线观看-久久亚洲不卡-欧美精品一区在线观看-欧美乱淫视频-欧美熟妇另类久久久久久不卡-粉嫩av一区二区三区四区五区-日韩欧美操

编程问答

Scrapy爬虫（6）爬取银行理财产品并存入MongoDB（共12w+数据）

總結(jié)

国产亚洲精品久久久久动-影视先锋中文字幕-av网站在线观看一区-亚洲视频在线观看-久久亚洲不卡-欧美精品一区在线观看-欧美乱淫视频-欧美熟妇另类久久久久久不卡-粉嫩av一区二区三区四区五区-日韩欧美操