csv – Page 2 – Make Me Engineer

How to load jar dependenices in IPython Notebook

June 19, 2022 by Tarik

You can simply pass it in the PYSPARK_SUBMIT_ARGS variable. For example: export PACKAGES=”com.databricks:spark-csv_2.11:1.3.0″ export PYSPARK_SUBMIT_ARGS=”–packages ${PACKAGES} pyspark-shell” These property can be also set dynamically in your code before SparkContext / SparkSession and corresponding JVM have been started: packages = “com.databricks:spark-csv_2.11:1.3.0” os.environ[“PYSPARK_SUBMIT_ARGS”] = ( “–packages {0} pyspark-shell”.format(packages) )

Is there a way to include commas in CSV columns without breaking the formatting?

June 14, 2022 by Tarik

Enclose the field in quotes, e.g. field1_value,field2_value,”field 3,value”,field4, etc… See wikipedia. Updated: To encode a quote, use “, one double quote symbol in a field will be encoded as “”, and the whole field will become “”””. So if you see the following in e.g. Excel: ————————————— | regular_value |,,,”| ,””, |””” |”| ————————————— the … Read more

How to properly escape a double quote in CSV?

June 13, 2022 by Tarik

Use 2 quotes: “Samsung U600 24″””

Properly escape a double quote in CSV

May 23, 2022 by Tarik

Use 2 quotes: “Samsung U600 24″””

Dealing with commas in a CSV file

April 25, 2022 by Tarik

There’s actually a spec for CSV format, RFC 4180 and how to handle commas: Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. http://tools.ietf.org/html/rfc4180 So, to have values foo and bar,baz, you do this: foo,”bar,baz” Another important requirement to consider (also from the spec): If double-quotes are used to enclose … Read more

What’s the most robust way to efficiently parse CSV using awk?

April 24, 2022 by Tarik

If your CSV cannot contain newlines then all you need is (with GNU awk for FPAT): $ echo ‘foo,”field,””with””,commas”,bar’ | awk -v FPAT='[^,]*|(“([^”]|””)*”)’ ‘{for (i=1; i<=NF;i++) print i ” <” $i “>”}’ 1 <foo> 2 <“field,””with””,commas”> 3 <bar> or the equivalent using any awk: $ echo ‘foo,”field,””with””,commas”,bar’ | awk -v fpat=”[^,]*|(“([^”]|””)*”)” -v OFS=’,’ ‘{ rec … Read more