Complicated xml tags Jsoup

suppose you want to scrape <dc:creator> tag or <media:content>, how do you do it? I am a Scala coder, I am using jsoup in scala.

suppose you have an xml file like this (reference from

Remain campaigners step up efforts to secure ethnic minority votes
<p>Leave camp also targets BAME voters in recognition that they may be crucial in determining outcome of EU referendum</p>
<pubDate>Wed, 01 Jun 2016 15:00:27 GMT</pubDate>
<dc:creator>Anushka Asthana Political editor</dc:creator>

The following code should then extract data –

val doc = Jsoup.connect(url1).parser(Parser.xmlParser()).userAgent(“Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36”).get();

val element =“item”).select(“dc|creator”).text;


The Output is –

Anushka Asthana Political editor

Hope this helps! Happy Coding!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s