File.listFiles() mangles unicode names with JDK 6 (Unicode Normalization issues)

Using Unicode, there is more than one valid way to represent the same letter. The characters you’re using in your Tricky Name are a “latin small letter i with circumflex” and a “latin small letter a with ring above”. You say “Note the %CC versus %C3 character representations”, but looking closer what you see are … Read more

Minimum no of tables that exists after decomposing relation R into 1NF?

If all the candidate keys of a relation contain multivalued attributes: Introduce a surrogate attribute for at least one multivalued attribute. For each attribute you deem “composite” (having heterogeneous components, like a tuple): For each attribute component that can be missing: Add a relation with attributes of some multivalue-free candidate key and an attribute for … Read more

When can I save JSON or XML data in an SQL Table

The main questions are What are you going to do with this data? and How are you filtering/sorting/joining/manipulating this data? JSON (like XML) is great for data exchange, small storage and generically defined structures, but it cannot participate in typical actions you run within your RDBMS. In most cases it will be better to transfer … Read more

Programatic Accent Reduction in JavaScript (aka text normalization or unaccenting)

/** * Creates a RegExp that matches the words in the search string. * Case and accent insensitive. */ function make_pattern(search_string) { // escape meta characters search_string = search_string.replace(/([|()[{.+*?^$\\])/g,”\\$1″); // split into words var words = search_string.split(/\s+/); // sort by length var length_comp = function (a,b) { return b.length – a.length; }; words.sort(length_comp); // replace … Read more

How can I normalize a URL in python

Have a look at this module: werkzeug.utils. (now in werkzeug.urls) The function you are looking for is called “url_fix” and works like this: >>> from werkzeug.urls import url_fix >>> url_fix(u’http://de.wikipedia.org/wiki/Elf (Begriffsklärung)’) ‘http://de.wikipedia.org/wiki/Elf%20%28Begriffskl%C3%A4rung%29′ It’s implemented in Werkzeug as follows: import urllib import urlparse def url_fix(s, charset=”utf-8″): “””Sometimes you get an URL by a user that just … Read more

How to interpret MSE in Keras Regressor

I apologise for sounding silly as I am starting out! Do not; this is a subtle issue of great importance, which is usually (and regrettably) omitted in tutorials and introductory expositions. Unfortunately, it is not as simple as taking the square root of the inverse-transformed MSE, but it is not that complicated either; essentially what … Read more