The ..globalmap_k20tok54.tgz file contains 25 binary files representing uniqueness maps for each chromosome for all k=20 to 54 (a) The files are in uint8 (unsigned 8 bit integers) binary formats (b) Each file is basically a vector of unsigned 8bit integers that is the length of the chromosome. The elements of the vector are >= 0 (c) A value of 'x' at a position means that position is PERFECTLY unique in the genome for all k-mers of length >= x starting at that position on the + strand (d) A value of 0 at a position means that position is not unique for any of the k-mer lengths (k=20 to 54) (d) In order to obtain the uniqueness map for a particular k, simply perform the following operation on the vector (vector > 0) & (vector <= k) (d) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by . i.e. if position 1 is UNIQUE on the + strand for then position 3 is UNIQUE on the - strand ================================ How to read the files in matlab ================================ %First gunzip and untar the globalmap_k20tok54.tgz file %You will see one file for each chromosome tmp_uMap = fopen('chr1.uint8.unique','r'); uMapdata = fread(tmp_uMap,'*uint8'); fclose(tmp_uMap); % You can similarly read the files in any other programming language as a vector of unsigned 8bit integers. Convert to doubles if you like (although this is a waste of memory) or write it out as a text file if you prefer. Some of the directories also contain optional k.tgz files (which are derived from the globalmap_* files) Each file k.tgz contains 25 binary files representing uniqueness maps for each chromosome (a) The files are are in uint8 binary formats (b) Each file is basically a vector of the length of the chromosome. The elements of the vector are in [0,1] (c) 1 indicates that the k-mer (5' to 3') of length starting at that position on the + strand is PERFECTLY UNIQUE in the genome (both strands, all chromosomes) (d) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by . i.e. if position 1 is UNIQUE on the + strand for then position 3 is UNIQUE on the - strand