Parsing scientific notation sensibly?

Google on “scientific notation regexp” shows a number of matches, including this one (don’t use it!!!!) which uses

*** warning: questionable ***
/[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?/

which includes cases such as -.5e7 and +00000e33 (both of which you may not want to allow).

Instead, I would highly recommend you use the syntax on Doug Crockford’s JSON website which explicitly documents what constitutes a number in JSON. Here’s the corresponding syntax diagram taken from that page:

alt text
(source: json.org)

If you look at line 456 of his json2.js script (safe conversion to/from JSON in javascript), you’ll see this portion of a regexp:

/-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/

which, ironically, doesn’t match his syntax diagram…. (looks like I should file a bug) I believe a regexp that does implement that syntax diagram is this one:

/-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?/

and if you want to allow an initial + as well, you get:

/[+\-]?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?/

Add capturing parentheses to your liking.

I would also highly recommend you flesh out a bunch of test cases, to ensure you include those possibilities you want to include (or not include), such as:

allowed:
+3
3.2e23
-4.70e+9
-.2E-4
-7.6603

not allowed:
+0003   (leading zeros)
37.e88  (dot before the e)

Good luck!

Leave a Comment

tech