How to detect the right encoding for read.csv?

First of all based on more general question on StackOverflow it is not possible to detect encoding of file in 100% certainty. I’ve struggle this many times and come to non-automatic solution: Use iconvlist to get all possible encodings: codepages <- setNames(iconvlist(), iconvlist()) Then read data using each of them x <- lapply(codepages, function(enc) try(read.table(“encoding.asc”, … Read more

Why am I getting X. in my column names when reading a data frame?

read.csv() is a wrapper around the more general read.table() function. That latter function has argument check.names which is documented as: check.names: logical. If ‘TRUE’ then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by ‘make.names’) so that they … Read more

read.csv, header on first line, skip second line [duplicate]

This should do the trick: all_content = readLines(“file.csv”) skip_second = all_content[-2] dat = read.csv(textConnection(skip_second), header = TRUE, stringsAsFactors = FALSE) The first step using readLines reads the entire file into a list, where each item in the list represents a line in the file. Next, you discard the second line using the fact that negative … Read more

Invalid multibyte string in read.csv

Encoding sets the encoding of a character string. It doesn’t set the encoding of the file represented by the character string, which is what you want. This worked for me, after trying “UTF-8″: x <- read.csv(url, header=FALSE, stringsAsFactors=FALSE, fileEncoding=”latin1”) And you may want to skip the first 16 lines, and read in the headers separately. … Read more

Specify custom Date format for colClasses argument in read.table/read.csv

You can write your own function that accepts a string and converts it to a Date using the format you want, then use the setAs to set it as an as method. Then you can use your function as part of the colClasses. Try: setAs(“character”,”myDate”, function(from) as.Date(from, format=”%d/%m/%Y”) ) tmp <- c(“1, 15/08/2008”, “2, 23/05/2010”) … Read more