Skip to content

Processing.js codePointAt()

January 31, 2011

After taking a small break from OSD600 for a few days to do some work in a couple other classes, I finally came back to it today.  I left off were I was about to write a test for codePointAt that could be added to the other tests that processing.js runs now.  Processing.js has these tests to ensure no functions or crucial parts of the code get broken when someone implements a fix/new function.  By running these test after you have written your code you ensure that no existing code has been broken and what you have added hasn’t altered it in any way.  When I started writing my test, I made a normal string comprised of characters that I knew would work and give me output that I expected.  I tested this first and it worked.  After this I started adding chinese characters such as 𧺆. This is when stuff started to get weird.

 

I tried running the test again and the test came back false, saying that 163462 (the unicode value of the chinese character) was != 63.  63?  How did I get 63?  I look up what character was = 63 and it was a question mark.  After talking to Pomax and yury on IRC, I found out that when the encoding type of a file is set to ANSI (which it is by default on notepad++) characters that are not within the character list show up as ?.  So one of the guys on IRC suggested I switch it to UTF8 encoding which should solve the problem (for anyone else doing this in notepad++ change the encoding type, don’t convert to).  After doing this I ran the test again and this time I got an unexpected character(s) at the start of my file.  What the f**k, how did those get there.  Back to IRC.  As soon as i mentioned the unexpected characters I think 3 or so people all said something about BOM.  What is a BOM you ask? BOM stands for byte order mark and is used to signal the byte order of a text file.  Since the processing.js tester (or the shell it was running in, I don’t really know) already has a BOM, it isnt needed when encoding in UTF8.  So after changing it to encode in UTF8 without a BOM it finally ran.  What did it output? A whole pile of NaN’s (Not a number).  Results at least!

 

After going back to IRC and listening in on a conversation about UTF8 and why my code wasn’t working (about 80% of which I didn’t understand), someone suggested I try passing in hex values of the character I was trying to find the unicode values for.  I was skepticle of this working but tried it anyways, and to my surprise, it worked.  For the one character that I gave it, the test confirmed that it had the correct unicode value. AWESOME.  I then tried adding another hex value on the end as yury suggested, and it then failed.  This was because the code I got from a mozilla fix I found online had something to do with increasing the size of the index that was passed into codePointAt depending if there were surrogate pairs or not(I’m pretty sure after doing this that a surrogate pair is for when a character is too big to fit in 16 bits, its split into two 16 bit pairs, each with their own hex value, by grouping these values, you check both pairs togethor and get the proper unicode value.  Both pomax and yury said this was unnecesary and I should be able to do it without it.  I removed this and voila, everything worked as it should.  I then tested this again later in somde html page, and had a small blunder because I accidentally uncommented some code, but we wont go into that ahah.  After I commented the code again, it too worked in an html page.

 

To be honest after I did all of this, and listened to everyones conversations in IRC, I learned quite a bit about unicode and how it actually works.  Last week when I got the code from that mozilla webpage, I understood nearly none of the code except some simple obvious stuff.  After hacking the code a bit and listening to everyone, I slowly understood more and more about how it worked, what was happening, and what was causing problems.   What a satisfying night.

 

Man I love this course.

Advertisements

From → school

One Comment
  1. minooz permalink

    Glad to see you’re work is going on well!
    Thanks for your help on ‘make’. I actually had the .profile created but that tip about MINGW32 really helped! I was using the regular Git Bash and I was keep getting error…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: