Extracting Href Attr Or Converting Node To Character List

June 27, 2023 Post a Comment

I try to extract some information from the website library(rvest) library(XML) url <- 'http://wiadomosci.onet.pl/wybory-prezydenckie/xcnpc' html <- html(url) nodes <- htm

Solution 1:

Try searching inside nodes' children:

nodes <- html_nodes(html, ".listItemSolr") 

sapply(html_children(nodes), function(x){
  html_attr( x$a, "href")
})

Update

Hadley suggested using elegant pipes:

html %>%  
  html_nodes(".listItemSolr") %>% 
  html_nodes(xpath = "./a") %>% 
  html_attr("href")

Solution 2:

Package XML function getHTMLLinks() can do virtually all the work for us, we just have to write the xpath query. Here we query all the node attributes to determine if any contains "listItemSolr", then select the parent node for the href query.

Baca Juga

getHTMLLinks(url, xpQuery = "//@*[contains(., 'listItemSolr')]/../a/@href")

In xpQuery we are doing the following:

//@*[contains(., 'listItemSolr')] query all node attributes for listItemSolr
/.. select the parent node
/a/@href get the href links

Html5 Log

Extracting Href Attr Or Converting Node To Character List

Solution 1:

Update

Solution 2:

Post a Comment for "Extracting Href Attr Or Converting Node To Character List"