使用XML包将html表刮成R数据帧

小编典典

使用XML包将html表刮成R数据帧

html

如何使用XML包抓取html表？

维基百科页面为例。我想在R中阅读它，并获得“巴西与国际足联认可的球队进行的所有比赛的清单”表作为data.frame。我怎样才能做到这一点？

阅读 401

2020-05-10

共1个答案

小编典典

…或更短的尝试：

library(XML)
library(RCurl)
library(rlist)
theurl <- getURL("https://en.wikipedia.org/wiki/Brazil_national_football_team",.opts = list(ssl.verifypeer = FALSE) )
tables <- readHTMLTable(theurl)
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))

选择的表是页面上最长的表

tables[[which.max(n.rows)]]

2020-05-10