R軟件如何讀取文本數(shù)據(jù)

R軟件如何讀取文本數(shù)據(jù)？

1、對于網(wǎng)頁為csv文件的頁面，可以直接用read.csv函數(shù)導入網(wǎng)頁數(shù)據(jù)并轉為數(shù)據(jù)框的形式。html格式的網(wǎng)頁也可以讀取。

例如：

data <- read.csv(text="it is a page")#text是要查看的文本

head(data,10)

#讀取網(wǎng)頁數(shù)據(jù)的代碼data <- read.csv("page")，page可以是要查看的網(wǎng)址或文本。

2、R基礎包中的readLines可以讀取網(wǎng)頁或文本數(shù)據(jù)。

#輸入文本

cat("asqsd\n1213",file="a1")

readLines("a1") #讀取文本數(shù)據(jù)

#cat中"\n"表示換行。

3、RCurl包中的getURL()函數(shù)獲取網(wǎng)頁數(shù)據(jù)。

library(RCurl)

data<-getURL("a1")#a1為某個具體的網(wǎng)址。

head(data)

4、通過getURL直接獲取的數(shù)據(jù)有些凌亂，可以借助library(XML)解析樹函數(shù)htmlTreeParse處理。

library(XML)#解析樹函數(shù)htmlTreeParse

data_Parse<-htmlTreeParse(data)

head(data_Parse,2)

5、對于復雜網(wǎng)站的文本數(shù)據(jù)，用rvest包中的read_html函數(shù)來提取文本數(shù)據(jù)。

library(rvest)

page<-read_html("a1")#a1為某個具體的網(wǎng)址

data<-html_nodes(page,"table")

head(data)

#本例中沒有輸入網(wǎng)址，所以結果為空。

6、通過html_nodes獲得的數(shù)據(jù)不能直接投入使用。

table<-html_table(data);table #提取表格數(shù)據(jù)，可以得到多個表格結果

table[1]#查看第1個表

text<-html_text(data);text #提取文本數(shù)據(jù)

#在實際應用中，可以發(fā)現(xiàn)提取表格后的數(shù)據(jù)或文本是非常便于分析的。

css文字導入,R軟件如何讀取文本數(shù)據(jù)

色婷婷狠狠18禁久久YY,CHINESE性内射高清国产,国产女人18毛片水真多1,国产AV在线观看