DRAFT AGENDA

1000 Genomes Project Meeting

May 5-6, 2008

Bush Hall, ColdSpringHarbor Laboratory

Monday, May 5, 2008

9:00 – 12:00Pre-meeting of the DCC and Data Flow Group (in Bush Hall)

Are the processes in place to transfer data, process them uniformly, distribute them, and compute on them? What needs to be worked on?

10:00 – 12:00Pre-meeting of the Analysis Group (in Bush Hall)

How should the analysis discussions on Tuesday and Wednesday be organized? Groups with analyses of project data should prepare 3-5 slides summarizing what they have done.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Project Meeting

12:00 – 1:00Lunch(in the dining hall next to Bush Hall)

1:00 – 1:40 1000 Genomes Project goals and meeting aimsRichard Durbin

(10 min talkThe main aims of the meeting are to define the specific goals and what

30 min disc)is needed to reach those goals, and to assess the data and whether the

data to be produced in the pilots will address the questions that need to be answered to design the full study. This discussion will aim to reach consensus on the project goals.

Sample issuesModeratorLeenaPeltonen-Palotie

1:40 – 2:05Choosing additional population samples: scientific considerationsAravinda Chakravarti

(10 min talkWhat should be considered for choosing additional samples from

15 min disc)populations already sampled or from new populations? These include

the project goals, power, utility, etc. How many samples are needed from a population?

2:05 - 2:30Analysis of genotype and sequence data from the extended set ofDavid Altshulerand

(10 min talkHapMap samples: how do they inform the samples needed forRichard Gibbs

15 min disc)the 1000 Genomes Project? Which populations are similarenough

that they can be combined for analyses? How do the results inform the goals of the 1000 Genomes Project and its design?

2:30 – 2:45Choosing additional population samples: ethical considerationsBartha Knoppers

(5 min talkWhat ethical considerations relate to choosing samples beyond the

10 min disc)extended set of HapMap samples?

2:45 – 3:00Potential additionalpopulation samplesLeena Peltonen-Palotie

(5 min talkWhat additional samples from the same populations, related populations,

10 min disc)or additional populations may be collected?

3:00 – 3:15Discussion

Monday, May 5, 2008, continued

3:15 – 3:30Break

Structural variation issuesModerator Matt Hurles

3:30 – 3:35Structural variation data: sequenceEvan Eichler

(5 min talk)What sort of data are needed for detecting structural variants?

How can existing structural variation sequence data be used to validate the sequence data? How should the Structural Variation project samples be integrated with the 1000 Genomes samples?

3:35 – 3:40Structural variation data: array Charles Lee

(5 min talk)How can existing array data be used to validate the sequence data?

How should the CNV project samples (including Sebat work) be integrated with the 1000 Genomes samples?

3:40 – 4:00Discussionon these issues

===Production issuesModerators Elaine Mardis andStacey Gabriel

|

| 4:00 – 4:15Gene region pilot projectRichard Gibbs

|(5 min talk How is the technology for capture working? What strategy should

|10 min disc)be used for this pilot? Which samples should be used in this pilot?

|

| 4:15 – 6:00Reports from all sequencing centers (each 8 min talk 4 min disc)Durbin, Wang, Mardis,

|What progress has each center made in implementingthe newGibbs,Gabriel, De La

|technologies,and what has been learned?Vega, Bentley, Egholm

===

OR (simultaneous session)

===

|4:00 – 6:00Samples/ELSI Group discussion of populations and consent processes

===

6:00 -7:00Dinner (in the dining hall next to Bush Hall)

Project metrics and data qualityrequirementsModerator Andy Clark

7:00 – 7:30What project metrics are needed? Gil McVean

(10 min talkTo accomplish the project goals, what data are needed, of what quality,

20 min disc) and how should progress be measured?

7:30 – 8:00How can data quality be measured? David Jaffe

(10 min talkWhat are measures of base quality and mapping? How are the centers

20 min disc)ensuring accuracy and measuring it?

8:00 – 8:30What experimental validation will be needed?Stacey Gabriel

(10 min talkHow should variants be validated, and false positive and falsenegative

20 min disc)rates estimated? How many of each class of variant should be validated?

Tuesday, May 6

8:00 - 9:00Breakfast (in the dining hall next to Bush Hall)

Analysis of sequence data sets Moderator Peter Donnelly

9:00 – 9:20Analysis of the X chromosome data from SangerRichard Durbin

(10 min talkand whole-genome data from Illumina and AB

10 min disc) What do these data tell us about how to do the 1000 Genomes Project?

9:20 – 9:35Analysis of Solexa data from 140 kb on chromosome 1Aravinda Chakravarti

(7 min talkWhat do these data tell us about how to do the 1000 Genomes Project?

8 min disc)

9:35 – 9:50Analysis of genome-wide data for a Chinese sample Jun Wang

(7 min talkWhat do these data tell us about how to do the 1000 Genomes Project?

8 min disc)

9:50 – 10:30Analysis of project sequence dataGonçalo Abecasis

(15 min talkWhat are we learning? What issues are raised?

25 min disc)

10:30 – 10:45Break

Analysisissues Moderators Gonçalo Abecasisand Gil McVean

10:45 – 11:00Base quality and read quality Gabor Marth

(5 min talkHow should data quality be measured?

10 min disc)

11:00 – 11:15Mapping readsRichard Durbin

(5 min talkHow well are reads mapped, including in different types of

10 min disc)genomic regions?

11:15 – 11:30Analysis challenges, including phasing and imputationGil McVean

(5 min talkHow would the design of the pilot and full studies affect the

10 min disc)ability to phase and impute variants of different frequencies?

11:30–12:15Design of the main projectDavid Altshuler

(10 min talkAre we collecting the right data to decide how to design the full project?

35min disc)What conclusions can we start to come to about project design?

12:15 – 1:00Lunch (in the dining hall next to Bush Hall)

1:00 – 1:15Report from the Samples/ELSI Group on population sample issuesAravinda Chakravarti

(5 min talk

10 min disc)

Tuesday, May 6, 2008, continued

Dealing with the dataModerator Debbie Nickerson

1:15 – 1:30The Short-Read ArchiveStephen Sherry

(5 min talkHow well is the SRA working to accept and distribute the data?

10 min disc)What needs to be worked on?

1:30 – 1:45Data flow and computation Paul Flicek

(5 min talkHow well are the data being transferred, and what needs to be worked

10 min disc)on? What computing resources need to be set up or obtained?

1:45 – 2:30Project metrics to track productionand project timelinesElaine Mardis

(10 min talkWhat metrics are needed beyond what we are using?

35 min disc)When should sets of data be done?

2:30 – 2:45Break

2:45 – 3:15Next steps Francis Collins

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

3:30 – 4:30 Steering Committee meeting to discuss issues raised and

make any needed decisions

Wednesday, May 7

8:30 – 12:00Simulated data meeting (Beckman Lab, Plimpton conference room)

1:00 – 5:00Analysis Group meeting (Marks Lab, Gerry conference room)