Sending POST request with username and password and save session cookie

Assuming that the HTML form look like below: <form action=”http://example.com/login” method=”post”> <input type=”text” name=”username” /> <input type=”password” name=”password” /> <input type=”submit” name=”login” value=”Login” /> </form> You can POST it and obtain cookies as below: Response response = Jsoup.connect(“http://example.com/login”) .method(Method.POST) .data(“username”, username) .data(“password”, password) .data(“login”, “Login”) .execute(); Map<String, String> cookies = response.cookies(); Document document = response.parse(); … Read more

Jsoup connection with basic access authentication

With HTTP basic access authentication you need to send the Authorization header along with a value of “Basic ” + base64encode(“username:password”). E.g. String username = “foo”; String password = “bar”; String login = username + “:” + password; String base64login = Base64.getEncoder().encodeToString(login.getBytes()); Document document = Jsoup .connect(“http://example.com”) .header(“Authorization”, “Basic ” + base64login) .get(); // … … Read more

Parse JavaScript with jsoup

Since jsoup isn’t a javascript library you have two ways to solve this: A. Use a javascript library Pro: Full Javascript support Con: Additional libraray / dependencies B. Use Jsoup + manual parsing Pro: No extra libraries required Enough for simple tasks Con: Not as flexible as a javascript library Here’s an example how to … Read more

Jsoup Cookies for HTTPS scraping

I know I’m kinda late by 10 months here. But a good option using Jsoup is to use this easy peasy piece of code: //This will get you the response. Response res = Jsoup .connect(“url”) .data(“loginField”, “login@login.com”, “passField”, “pass1234”) .method(Method.POST) .execute(); //This will get you cookies Map<String, String> cookies = res.cookies(); //And this is the … Read more

How to parse XML with jsoup

It seems the latest version of Jsoup (1.6.2 – released March 28, 2012) includes some basic support for XML. String html = “<?xml version=\”1.0\” encoding=\”UTF-8\”><tests><test><id>xxx</id><status>xxx</status></test><test><id>xxx</id><status>xxx</status></test></tests></xml>”; Document doc = Jsoup.parse(html, “”, Parser.xmlParser()); for (Element e : doc.select(“test”)) { System.out.println(e); } Give that a shot..

JSoup UserAgent, how to set it right?

You might try setting the referrer header as well: doc = Jsoup.connect(“https://www.facebook.com/”) .userAgent(“Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6”) .referrer(“http://www.google.com”) .get();

tech