lime icon

Phosphorus and Lime

A Developer's Broadsheet

This blog has been deprecated. Please visit my new blog at klenwell.com/press.
sonnetmonkey.com
sonnetmonkey.com is finally a reality. The typical unexpected difficulties in getting the server set up, but our host was great in assisting us.

Celebrate with the sonnet monkeys at their new artist's retreat at Bananaloaf.
Earthlink Downgraded to Sell
This response to my support request a couple days ago utterly ignores the details of my report and shows me nothing but polite contempt. Take that back. Assuming I use IE: utter contempt.

Labels: ,

PHP: Create Your Own Richter Scale
This formula provides a way to take any set of somewhat arbitrary data and put it on a scale of your choosing. I used it specifically for frequency values in a wordlist I had, but it has a variety of applications.

The key to this formula is the conversion of bases, using this formula:

exp = log(result) / log(base)


Here's the code (values are configurable):

// 100 pt logarithmic scale
$max_raw_value = 18349;
$scale_max = 100;

$scaled_value = round( log($raw_value)/log(pow($max_raw_value,1/$scale_max)) );


Note that a $raw_value of 1 will equal 0, which wasn't valid for my use. So I converted raw values of 1 to 1.1, which resulted in a scaled value of 1.

I might also note that this lead me to my first practical application of logarithms. (I just wish my calculus teacher in high school would have offered a practical example like this.)
Proposal for an User ID API
As a website developer and administrator, I've explored the OpenID site, read the recent articles about Microsoft's tentative endorsement of it, studied the comments about the announcement on Slashdot, and I confess it still doesn't fully make sense to me. What would make sense is a User ID API along the lines of what Google is now offering with their GData API suite. It wouldn't necessarily have to be Google, but it would probably have to be an established email or login broker like Google, Yahoo!, MSN, perhaps even Wikipedia or Myspace. More ideally, still, it might be a non-profit or government organization. (If we're going to have a SSN or national ID card, why not a national email address? The states of CA or MA could lead the way if necessary, though I admit I'd have more faith in a Google or Yahoo.) More than one API would be fine, as long as they were simple to deploy, as protective as possible of user's privacy, reliable and convenient, and attractive to smaller site admins.

How would it work? Here's a rough scenario based on how the Google API functions. Imagine you have a blog where you'd like to limit comments from new, unidentified visitors but don't want to put them through the hassle of registering with your site or solving a CAPTCHA every time they visit. You have your comment form with email field. When unknown visitor submits comment, the info gets relayed to the appropriate API server, which then could return relevant authenticating data. This data might include:

boolean indicating if active cookie is currently set for visitor
unix timestamp indicating when this account was created/registered
unix timestamp indicating last time this account was active
user's display name
url for hot-linkable user profile image (opt-in)
other opt-in profile information

The API would enable the site admin to define restrictions and hurdles based on the data returned. For instance, if the site admin finds she is getting a lot of spam for Gmail account created within the last three days, she could create script to require CAPTCHA validation for accounts less than 3 days old, etc.

The incentive for account authenticating services like Google or Yahoo would be obvious.

The drawbacks? The first one that springs to mind would be the leviathan one: do we want to centralize this service and trust public ID authentication to a large corporation (Auth Super Site) like Google or Yahoo? Wouldn't they be in perfect position to abuse this power down the line? Perhaps. The service would necessarily require aggregation of data with a large, central authority. But this has already been done with the major webmail and social networking site, which is why they make perfect candidates. And the data does not have to be that personal. Basically, just a user id and something indicating that the user has been around for a while and is not a spammer or hacker.

Also, the service could allow site admins to send data back to the Auth Super Site on abusive behavior. For example, code:900 (possible spam) user input:love your site, hey check out mine.

Such a service would seem to fit hand-in-glove with what Google is doing with their ClientLogin API and Analytics and Website Tool services. (Perhaps they have something like this in the works?) I'm a big proponent of this kind of service. Is anyone else hoping for something like this? Is there some falsifier or are there other drawbacks that I'm missing?

I'm going to start circulating this idea in the hope of getting a discussion going.
Earthlink Wifi Update
My Earthlink wireless account has been virtually unusable the last 2 weeks. As this is more than 6 months after it was deployed, it seems to me a bad sign that things are getting worse, rather than better.

And, of course, now the support line just leaves me on hold, leading me to suspect a general degradation in service.

Sent the following message using their website "email" form.

Labels: ,

Greedy Phonotactic Coordination
This was the puzzle I was trying to solve: I have a long list of English words (over 100,000 words). It includes a phonetic representation of each word. I want to break the phonetic representation of the word up into syllables. Specifically, I want an algorithm that will do this for me.

I'm not the first person to confront this problem:

summary of responses to syllabification algorithm query

Surprisingly, however, I couldn't find a working algorithm to solve this problem. Well, there was this: ale-trale man node 83. But hell if I knew what it meant.

Greedy Phonotactic Coordination is my solution. I'm still testing it. But there were only about 50 words that it couldn't handle in the 100,000+ list, and the majority of those were either outright foreign appropriations or mistranscriptions.

A couple examples:

encourages -> E.n.k.3r.I.J.I.z -> En.k3r.I.JIz
mackler -> m.{.k.l.3r -> m{.kl3r
petunia -> p.V.t.u.n.i.V -> pV.tu.ni.V

How it works: basically, it finds a vowel phoneme, finds the next, and then looks at every coda/onset break between the two vowels looking for a valid combination. It favors onsets at the outset and moves left so that it ends up with an empty onset to the next syllable and all the consonants in the coda of the current syllable.

I created functions for valid onsets and codas with the information on this wikipedia page.

Not perfect. But pretty good.