File.listFiles() mangles unicode names with JDK 6 (Unicode Normalization issues)

Using Unicode, there is more than one valid way to represent the same letter. The characters you’re using in your Tricky Name are a “latin small letter i with circumflex” and a “latin small letter a with ring above”. You say “Note the %CC versus %C3 character representations”, but looking closer what you see are … Read more

Minimum no of tables that exists after decomposing relation R into 1NF?

If all the candidate keys of a relation contain multivalued attributes: Introduce a surrogate attribute for at least one multivalued attribute. For each attribute you deem “composite” (having heterogeneous components, like a tuple): For each attribute component that can be missing: Add a relation with attributes of some multivalue-free candidate key and an attribute for … Read more

Programatic Accent Reduction in JavaScript (aka text normalization or unaccenting)

/** * Creates a RegExp that matches the words in the search string. * Case and accent insensitive. */ function make_pattern(search_string) { // escape meta characters search_string = search_string.replace(/([|()[{.+*?^$\\])/g,”\\$1″); // split into words var words = search_string.split(/\s+/); // sort by length var length_comp = function (a,b) { return b.length – a.length; }; words.sort(length_comp); // replace … Read more

How can I normalize a URL in python

Have a look at this module: werkzeug.utils. (now in werkzeug.urls) The function you are looking for is called “url_fix” and works like this: >>> from werkzeug.urls import url_fix >>> url_fix(u’http://de.wikipedia.org/wiki/Elf (Begriffsklärung)’) ‘http://de.wikipedia.org/wiki/Elf%20%28Begriffskl%C3%A4rung%29′ It’s implemented in Werkzeug as follows: import urllib import urlparse def url_fix(s, charset=”utf-8″): “””Sometimes you get an URL by a user that just … Read more

How to interpret MSE in Keras Regressor

I apologise for sounding silly as I am starting out! Do not; this is a subtle issue of great importance, which is usually (and regrettably) omitted in tutorials and introductory expositions. Unfortunately, it is not as simple as taking the square root of the inverse-transformed MSE, but it is not that complicated either; essentially what … Read more