JavaScript and multiline regular expressions

Just something I would like to share, because it was causing me grief :P.

While working on parsing a Java-like class into a JavaScript class, for processing.js. I needed to grab all member variables that are marked as public, using a regular expression, and return them as an object literal. Given this as an example:


public String thingOne;
public String thingTwo;

Into this:


{
thingOne: null,
thingTwo: null
}

This is what I ended up doing:


middle = middle.replace(/(?:(private|public)\s+)?(var[\s|\S]*?;)/g, function(all, access, variable) {
if (access === "private") {
return variable;
} else {
variable = variable.replace(/,\s*/g, ";\nvar ");
publicVars += variable.replace(/(?:var\s+)?(\w+)\s*=?\s*([^;,]*)(?:;)?/g, function(all, name, value) {
if (value) {
return "\n" + name + ": " + value + ",";
} else {
return "\n" + name + ": null,";
}
});
return "";
}
});

Now don’t worry too much about that, just a short background of what I was doing to bring me to the problem. Multiline regular expressions in Javascript.

From my favourite Javascript regex resource “Multiline Input ‘m’ This flag makes the beginning of input (^) and end of input ($) codes also catch beginning and end of line respectively.”. It does NOT do what you would expect, it simply allows code like this:


String one;
String two;

to match both lines with this /^String/gm. Without m, String two; would not match as it does not start at the beginning of the line, but after a \n. Another way to look at it is, m changes the meaning of ^ and $ to also match \n and \r (\r being carriage return). JavaScript regular expression is already multiline, in a way, as all lines are actually one long line with newline characters.

What I wanted to do was match any instance of “var (bunch of stuff) ;” spanning multiple lines. To grab examples like this:


public String thingOne,
thingTwo;

What I was using was something like this /var.*?;/g grabbing anything between var and ;, including whitespace and newlines? right? Nope, apparently not.

There was a trick I found to solve this. /var[\s|\S]*?;/ This uses \s which is any whitespace, including newline, OR \S which is anything NOT a white space, this is the TRUE anything, and a neat little trick.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: