suppose you want to scrape <dc:creator> tag or <media:content>, how do you do it? I am a Scala coder, I am using jsoup in scala.
suppose you have an xml file like this (reference from http://www.theguardian.com/politics/rss)-
<item><title>Remain campaigners step up efforts to secure ethnic minority votes</title><description><p>Leave camp also targets BAME voters in recognition that they may be crucial in determining outcome of EU referendum</p></description><pubDate>Wed, 01 Jun 2016 15:00:27 GMT</pubDate><dc:creator>Anushka Asthana Political editor</dc:creator><dc:date>2016-06-01T15:00:27Z</dc:date></item>
The following code should then extract data –
val doc = Jsoup.connect(url1).parser(Parser.xmlParser()).userAgent(“Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36”).get();
val element = doc.select(“item”).select(“dc|creator”).text;
The Output is –
Anushka Asthana Political editor
Hope this helps! Happy Coding!