網(wǎng)站導(dǎo)航

python 爬取名著

Python 作為一門強(qiáng)大的編程語言，可以用于爬取互聯(lián)網(wǎng)上的很多文本資源。其中，可以利用 Python 爬取名著電子版來進(jìn)行專業(yè)化的研究、學(xué)習(xí)等活動(dòng)。下面將介紹如何使用 Python 爬取名著。

import requests
from bs4 import BeautifulSoup
url = 'http://www.shicimingju.com/book/sanguoyanyi.html'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
titles = soup.select('.book-mulu >ul >li >a')
for title in titles:
url = 'http://www.shicimingju.com' + title['href']
article_html = requests.get(url).text
article_soup = BeautifulSoup(article_html, 'lxml')
article_title = article_soup.h1.text
article_content = article_soup.select('.chapter_content')[0].text
with open('san_guo_yan_yi.txt', 'a', encoding='utf-8') as f:
f.write(article_title)
f.write('\n\n')
f.write(article_content)
f.write('\n\n')

上述代碼中，我們首先利用 requests.get() 函數(shù)獲取名著《三國(guó)演義》的 HTML 代碼，然后通過 BeautifulSoup 庫(kù)解析 HTML，獲取到該書的目錄列表（.book-mulu >ul >li >a）。接著，我們遍歷目錄列表并通過遍歷獲得每章節(jié)的鏈接地址，然后分別對(duì)每章節(jié)分別進(jìn)行獲取并寫入 txt 文件中。其中，我們使用了 pre 標(biāo)簽來展示代碼。

import requests
from bs4 import BeautifulSoup
url = 'http://www.shicimingju.com/book/hongloumeng.html'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
titles = soup.select('.book-mulu >ul >li >a')
for title in titles:
url = 'http://www.shicimingju.com' + title['href']
article_html = requests.get(url).text
article_soup = BeautifulSoup(article_html, 'lxml')
article_title = article_soup.h1.text
article_content = article_soup.select('.chapter_content')[0].text
with open('hong_lou_meng.txt', 'a', encoding='utf-8') as f:
f.write(article_title)
f.write('\n\n')
f.write(article_content)
f.write('\n\n')

以上代碼是用來爬取《紅樓夢(mèng)》的代碼，只需更改 url 和文件名即可爬取您感興趣的任何名著。

上一篇double json科學(xué)計(jì)數(shù)法

下一篇vue產(chǎn)品詳情彈幕

色婷婷狠狠18禁久久YY,CHINESE性内射高清国产,国产女人18毛片水真多1,国产AV在线观看

網(wǎng)站導(dǎo)航

網(wǎng)站導(dǎo)航

網(wǎng)站分類

python 爬取名著

色婷婷狠狠18禁久久YY,CHINESE性内射高清国产,国产女人18毛片水真多1,国产AV在线观看

網(wǎng)站導(dǎo)航

網(wǎng)站導(dǎo)航

網(wǎng)站分類

python 爬取名著

相關(guān)文章