Why is Collections.counter so slow?

It’s not because collections.Counter is slow, it’s actually quite fast, but it’s a general purpose tool, counting characters is just one of many applications. On the other hand str.count just counts characters in strings and it’s heavily optimized for its one and only task. That means that str.count can work on the underlying C-char array … Read more

Find the intersection of overlapping ranges in two tables using data.table function foverlaps

@Seth provided the fastest way to solve the problem of intersection overlaps using the data.table foverlaps function. However, this solution did not take into account the fact that the input bed files may have overlapping ranges that needed to be reduced into single regions. @Martin Morgan solved that with his solution using the GenomicRanges package, … Read more

Dictionary style replace multiple items

If you’re open to using packages, plyr is a very popular one and has this handy mapvalues() function that will do just what you’re looking for: foo <- mapvalues(foo, from=c(“AA”, “AC”, “AG”), to=c(“0101”, “0102”, “0103”)) Note that it works for data types of all kinds, not just strings.