Skip to content

Butter API and the beginnings of a DOM scraper

July 5, 2011

Last thursday Bobby Richter, Scott Downe, Chris DeCairos, Mohammed Buttu and myself met up at the toronto Mozilla Office to iron out some details about the new Butter.js API that we are beginning to develop. We essentially went over the current version of butter and extracted major portions of it into modules, as well as designed a slick trigger/listen method of communication for all of the modules and the core. The day consisted of 8 hours brainstorming ideas in a small room, attempting to address any potential issues now that we could think of, and ensure that each of these issues was being addressed by one of us in one of the modules. Turning Butter.js into a fully fledged API as opposed to what it is now will provide us with a great deal more felxibility, enable us to easily test each portion of the API, and easily build upon it in a sane manner. In doing so this will aid Mohammed in developing a Mac application that will interact with final cut pro. This will provide final cut pro users with a native version of Butter that will interact with final cut pro, providing them with the ability easily go from editing their movie on final cut pro to adding butter tracks in a very Mac-esque way. Hopefully this will attract filmakers into using Butter and in turn we will begin seeing even more amazing videos created using Butter/Popcorn.

After our meeting about the Butter API, we divided up the modules and filed tickets on each of them. Yesterday we all began working on our respective modules, out of which I was assigned the previewer module. Essentially the previewer module is responsible for displaying preview of the HTML page the user has created using butter. This will either be represented in an iFrame on the page, or in a totally new window. The user will provide the previewer with a layout (or a way to specify they want an example of the current page, remixed using their changes made in Butter), and the previewer will the layout for all of the DOM objects and make these available as potential targets for each of the plugins. This list of DOM objects that are being scraped will be thrown into the Butter core’s DOM Object Manager (DOMOM), and will be accesible in other places throughout the API.

After writing a bit of psuedo-esque code to outline my module, I began work on the DOM Scraper. In my modules current form, I don’t have a good way to interact with it, so I began screwing around with the concept in another file to test out my ideas and get a rough version working.

Essentially what I wanted my test to do was scrape an iFrame’s source for all of its DOM objects and populate a list of all of the objects. I began by creating an two html files, one to store my test DOM objects, and one that housed the iFrame and the script that would execute the code. The file that housed all of the DOM objects contained numerous different types of DOM objects (divs, strong and paragraph tags, ect) as as nesting tags within one another, as most websites are structured liked this and we want to be able to grab all of the objects, not just the top most layer. This required a bit of recursion to continue diving deeper into an objects children as long as it still had children. As it is going over all of the children it is adding the children into a hash table, using the objects id as a key and its tag name as the value. This will provide the users with a way to know exactly what type of object they will potentially be selecting to insert there data into, which may end up not being neccissary, but I added it in for the time being. After about a day screwing around with it, I had a working version! The only problem that I quickly found was trying to access the HTML data of a site that was not on the same domain as mine. This restriction is obviously in place as a security measure, as user’s don’t want to be making their data accessible to others. For now we decided that this problem could be thrown on the back burner, and other issues could be dealt with in the mean time.

For those interested, here is a quick code snippet of what the code looks like:



  
    function nonsense() {

      var DOMDB = new Array();

      var asdf = window.document.getElementById( "test" );
      console.log(asdf.contentWindow);
      var qwer = asdf.contentWindow.document.getElementsByTagName("BODY");

      var checkcheck = setTimeout(function(){
        console.log(qwer);
        if ( qwer.length < 1 ) {

          checkcheck();      
        } else {
          
          ok( qwer[0].children );
          for ( var k = 0; k < DOMDB.length; k++ ) {
            console.log( DOMDB );
          }
        }
      }, 10);

      checkcheck;

      function ok( children ) {

        for( var i = 0; i  0 ) {
            ok ( children[i].children );
          }
        }
      }

    }
  


  iframe id="test" style="height:100%;width:100%" src="http://www.webdeveloper.com/forum/showthread.php?t=105602" onload="nonsense();"></iframe


And don’t worry, the variable names won’t be staying in the final version.

Advertisements

From → school

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: