Switchboard

Screenscraping with Browser

org.switchboard.browser is a utility package that simulates a typical web browser, but is entirely code-based, and has lots of useful features such as XQuery, programmatic access to cookies, and much mroe to come. It is the basis of many Switchboard services, and is meant to fill in the gaps where the Switchboard built-in services leave off. It allows you to get anything you want from any page on the web. Here's how.

  1. Import the appropriate library.
    import org.switchboard.browser.*; 
  2. Create a Browser
    import org.switchboard.browser.*;
     
    Browser browser = new Browser();
  3. Open a URL
    import org.switchboard.browser.*;
     
    Browser browser = new Browser();
     
    void setup() {
      browser.open("http://boingboing.net/");
    }
  4. What do you want to do with the page?

    Check out the Browser documentation. There is lots of stuff you can do, including XPath queries, but for now we will stick with something easy and just get all of the images with getImages().

    import org.switchboard.browser.*;
     
    Browser browser = new Browser();
     
    void setup() {
      browser.open("http://boingboing.net/");
      String[] images = browser.getImages();
    }
  5. Make sure the image still exists

    Use the NetUtil.exists() function to make sure that the resource (image) still exists on the web. Otherwise we might encounter some nastiness later when we try to display it. If it does exist, we will create a PImage and add it to a list.

    import org.switchboard.util.*;
    import org.switchboard.browser.*;
     
    Browser browser = new Browser();
    ArrayList list = new ArrayList();
     
    void setup() {
      browser.open("http://boingboing.net/");
      String[] images = browser.getImages();
      for(int i=0; i<images.length; i++) {
        String imgURL = images[i];
        if(NetUtil.exists(imgURL)) {
          PImage img = loadImage(imgURL);
          list.add(img);
        }
      }
    }
  6. Do something with the images

    As usual, I'm going to do something boring and just add the imges to a list and then print them to random places ont he screen. But now you know how to use Browser, right?

    import org.switchboard.util.*;
    import org.switchboard.browser.*;
     
    Browser browser = new Browser();
    ArrayList list = new ArrayList();
     
    void setup() {
      size(400, 400);
      framerate(10);
      browser.open("http://boingboing.net/");
     
      String[] images = browser.getImages();
      println(images.length+" images found.");
      for(int i=0; i<images.length && i<12; i++) {
        String imgURL = images[i];
        println("adding "+imgURL);
        if(NetUtil.exists(imgURL)) {
          PImage img = loadImage(imgURL);
          list.add(img);
        }
      }
    }
     
    void draw() {
      for(int i=0; i<list.size(); i++) {
        PImage img = (PImage)list.get(i);
        if(img.width > 0 && img.height > 0) {
          image(img, random(width), random(height)); 
        }
      }
    }