id_geno_checksum: GWAS overlap test without sharing genotypes tutorial for a successful first run: - change to a directory with a valid plink binary dataset (bed/bim/fam), for more details: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#bed - download the perl-script (id_geno_checksum.v1.0.2), either with clicking on the link: http://www.broadinstitute.org/~sripke/share_links/checksums_download or directly from the comandline: wget http://www.broadinstitute.org/~sripke/share_links/checksums_download/id_geno_checksum.v1.0.2 - make the perl script executable: e.g. chmod u+x id_geno_checksum.v1.0.2 - download and unzip the helper files, e.g. directly from the comand line: wget http://www.broadinstitute.org/~sripke/share_links/checksums_download/CKS_batches_50_0114b.tar.gz tar -xvf CKS_batches_50_0114b.tar.gz - make sure the program plink is found and executable in the path (you can still define another location via options), e.g. try plink --help if necessary you'll find pre-compiled version for direct download: http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml - run the program with (assuming the three files BFILE.bed, BFILE.bim and BFILE.fam are in your working directory) ./id_geno_checksum.v1.0.2 --bfile BFILE - if run succesfully you will notified with a big success banner and asked to share a specific file with your collaborator - for more details and options please use ./id_geno_checksum.v1.0.2 --help here some further considerations: - the script creates non identifiable checksum out of GWAS SNPs - it uses ten batches with 50 SNPs each, all of them found on all current and older GWAS platforms I encountered yet (dating back to Affy500 and Illumina I317 up to the HumanCore). with genotype representation of 0,1,2 there are 3^50 = 7.1e23 possible different configurations within each batch - it uses the program as standardly available on UNIX platforms with 32 bit CRC algorithm with 2^32 = 4.3e10 possible distinct outomes - this means for each distinct checksum there exist ~1.0e23 differnt possible genotype configurations this algorithm therefor does not provide a single key to the genotype configuration ** WARNING **: even though these checksums do not contain genotype information these are powerful identifiers. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ $$$$$$$ DO NOT POST CHECKSUMS on public websites or similar $$$$$$$$$$$$$$$ $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$