I want to scrape google search for number of hits:
require(XML) input <- "projektgebiet" url <- paste("https://www.google.at/search?q=", input, "&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:de:official&client=firefox-a", sep = "") CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "") script <- getURL(url, followlocation = TRUE, cainfo = CAINFO) doc <- htmlParse(script) xmlValue(getNodeSet(doc, "//td")[])
I'm close - the only problem is that I don't grasp how to address the two values within the node seperately - I actually just want the number.. (in the above example the two values are concatenated)
I'd also wish to know of a way how to avoid the indexing [], but don't know if it is possible to address the node by any other characteristic.
Any help or pointers would be greatly appreciated!
ps: of course I could use a regex - but I think this is not the most elegant way..
You can avoid the
by noticing that one of the
div elements has an
The following returns the contents of the two child nodes, separately,
without concatenating them.
xpathSApply(doc, "//div[@id='subform_ctrl']/*", xmlValue) #  "Erweiterte Suche" "Ungefähr 245.000 Ergebnisse"