This is a repository of integrated snp and indel calls for NA12878, along with a new bed file that excludes additional uncertain regions. The vcf file contains highly confident heterozygous and homozygous variant calls. Any bases that are in the bed file intervals and not in the vcf should be considered highly confident homozygous reference (for snps and short indels). The bed file (union12…bed.gz) excludes regions/variant locations that are uncertain due to low coverage, genotypes called in < 3 datasets, locations with unresolved discordant genotypes, locations where most datasets have evidence of bias, variants inside possible deletions, long tandem repeats or homopolymers, and known segmental duplications. I generally use GATK's CombineVariants with the bed file to assess performance of individual datasets, but other methods can be used, such as David Nix's VcfComparator at http://sourceforge.net/projects/useq/. I required that at least 3 datasets confidently call a genotype, and less than 1/6th of the datasets confidently call a different genotype, after excluding sites with evidence of bias. Filtered rows in the vcf files are uncertain, and I have the reason for the uncertainty in the FILTER field. Currently, I'm integrating 1 Complete Genomics, 1 SOLiD WGS, 1 454 WGS, 1 Ion Exome, 2 Illumina exomes, and 6 Illumina WGS. I integrated calls from GATK UnifiedGenotyper and HaplotypeCaller on all datasets, and cortex on one. I have found that some differences between my calls and individual datasets are due to different representations of the same complex variants, so be careful about this. I'll be very interested if you find any places where you think I've made incorrect calls, since I'm always interested in improving it. I still request that you keep these callsets private because we are in the process of resubmitting the paper with snp and indel calls. Thanks, Justin Zook National Institute of Standards and Technology