Friday, July 11, 2014

A style rule for CamelCase

for JavaScript, in particular.

The context

"bactrian camels have two humps" is an example of a string.

It is common for programming languages that identifiers (labels or names for variables and values) are not just arbitrary strings, but very limited ones. Besides, they are distinct entities with their own syntax due to the omission of the quotation marks.

camel27 is a typical identifier.

As an immediate consequence, the lack of the quotations marks as delimiter means that white space must now be excluded from identifiers.

The phrase bactrian camels have two humps is not one, but (a sequence of) five identifiers.

One strategy to overcome this limitation is the insertion of strokes: bactrian-camels-have-two-humps is an identifier in say Scheme and bactrian_camels_have_two_humps is a legal identifier in JavaScript.

CamelCase is another strategy for the conversion of multi-word phrases in a single word: bactrianCamelsHaveTwoHumps is now an identifier for JavaScript. And of course, JavaScript itself is another one.

The rule

I suggest the following rule for the generation of CamelCase identifiers:

Suppose the descriptor for the identifier in native language has the form

w_1 w_2 w_3 .... w_n

i.e. n words, separated by white space. For example,

the president of USA

Try to avoid one-letter words (such as I or a) in the given phrase.

Now convert the phrase into a CamelCase identifier by doing:

  1. For each of the n words, turn all but the first letter into small ones. So the example phrase now is

    the president of Usa
  2. Capitalize the first letter of the words w_2,..., w_n. So

    the President Of Usa
  3. Remove the white space between the words, so

    thePresidentOfUsa

The result is your identifier.

Some of the words (but not w_1 or two consecutive words) in the initial phrase may be decimals (e.g. 4321) and they remain unchanged. For example, the phrase Henry 5 of England would turns into the identifier Henry5OfEngland. In that case, in the word following the decimal, the first letter may remain unchanged, i.e. the of does not have to be changed to Of and the resulting identifier is Henry5ofEngland.

Remarks and examples

Many identifiers in JavaScript are ugly in the sense that they contain too many consecutive capital letters, and that makes them hard to read and memorize. If the rule would have been applied, things would look a little nicer. Some examples:

  • JavaScript is a allright, at least as a CamelCase identifier, but ECMAScript is bad. The phrase ECMA script converts to the identifier EcmaScript and this would be a much better writing.

  • JavaScriptObjectNotation would be allright, but JSON is a bad identifier and should have been Json, instead. JSON is a standard object in ECMAScript 5, and it is not only a badly, but also wrongly chosen identifier: usually only constants are exclusively written with capitals only.

  • The whole DOM is full of bad style identifiers, the most striking example is XMLHttpRequest. Two acronyms in two versions (XML and Http), what a mess. It should have been XmlHttpRequest.

Another particular challenge is the naming of Node.js related code, especially since there already is a different Node object in the standard. If we consider the dot as space between words, Node.js is the phrase Node js and that converts into NodeJs, according to our rule.

No comments:

Post a Comment