TO: The GAW Mailing List FROM: Laura Almasy Jean MacCluer DATE: 25 January 2008 RE: Accessing data for GAW16 GAW16 will be held 17-20 September 2008 in St. Louis. We hope to distribute three data sets: (1) Case-control data, including phenotypes and genotypes for 550,000 SNPs, from the NARAC rheumatoid arthritis studies that provided data for GAW15. (2) Cohort and family data from the Framingham Heart Study, including cardiovascular disease risk factors and genotypes for 550,000 SNPs. (3) A simulated data set that mimics the Framingham data. It will include the real Framingham pedigrees and genotypes and multiple replicates of simulated phenotypes. The RA data set will be distributed through Southwest Foundation, as has been done with past GAW data sets. However, the process of obtaining access to the second and third data sets (the Framingham data and the simulated data) is through the National Center for Biotechnology Information (NCBI) and may be difficult for some participants. We believe that the data access process for the second and third data sets will take most individuals at least 6-8 weeks, possibly more depending on whether your institution has dealt with the NIH database system in the past and how your human subjects review board operates. Below we describe the steps that need to be completed in order to access the Framingham data and/or the simulated data. These steps are necessary in order to fulfill NIH requirements for data sharing. The GAW16 Framingham and simulated data will be made available to you through dbGaP (database of Genotype and Phenotype, http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap). The process isn't linear; you will need to complete Steps A-F outlined below before you can access the data on dbGaP (Step G). A. OBTAIN A DUNS NUMBER: To access dbGaP, you first need to obtain a DUNS number. If your institution already has a DUNS number, you can use that number and will not have to obtain one. (If your institution has any NIH grants, it almost certainly already has a DUNS number.) Obtaining a DUNS number is relatively straightforward and can be done via the Web or by telephone. By telephone, one needs to call 1-866-705-5711 and the DUNS number will be assigned at the end of the call. The person requesting the number has to make the call, not an assistant. One can also use the Web at http://fedgov.dnb.com/webform/displayHomePage.do. Click the link for requesting the DUNS number using the Web. The number is e-mailed within 24-48 hours. B. IDENTIFY A SIGNING OFFICIAL: The Signing Official is the person who has the authority to legally bind your institution in grants administration matters, e.g. the administrator whose signature appears on grant applications from your institution. The individual fulfilling this role may have any number of titles in your institution. C. REGISTER IN eRA COMMONS: Only one individual from each group needs to register in eRA Commons. This individual registers in eRA Commons through their institution's Signing Official (also known as the Authorized Organizational Representative). For investigators in the U.S., this process is relatively straightforward. For non-U.S. investigators, we suggest the following: If the person is associated with an institution, then the institution needs first to register with the Commons at the following URL: https://commons.era.nih.gov/commons/ (follow the link on the right, "Grantee Organization Registration"). Once the institution is registered, the Authorized Official or Signing Official for the institution can create individual accounts. If the person is not affiliated with an institution, they should create an "institutional" account as above, designate themselves as Authorized or Signing Official (AO or SO) and, as AO, add PI status to their account. Any of these actions will generate an email to the individual directing the person to log on to the web site (the link will be in the email) and validate the account. The person must validate the account before it becomes active. D. DOCUMENT HUMAN SUBJECTS TRAINING FOR ALL INDIVIDUALS IN YOUR GROUP: Each person in your group who will have any access to the data, including students and research assistants, must complete human subjects training. If you have already had human subjects training through your own institution's programs, you just need a letter or certificate documenting this. If your institution does not have its own human subjects training program, you can satisfy this requirement by accessing the following web site: http://cme.cancer.gov/clinicaltrials/learning/humanparticipant-protections.asp E. GET APPROVAL FOR THE ANALYSES THAT YOU PROPOSE FROM YOUR INSTITUTIONAL REVIEW BOARD (IRB) OR AN EQUIVALENT ALTERNATIVE: You will need certification that the human subjects review board at your institution has approved your proposed analyses of the Framingham or simulated data. Obviously, the procedures for this depend on the rules at your institution. We are aware that analysis of anonymous data is not considered human subjects research in many institutions and countries. However, the dbGaP rules require this approval and access to the data can not be granted without it; so you will need to negotiate with your ethics board to obtain a letter from them saying that they have reviewed and approved your proposal. F. COMPLETE A DATA USE CERTIFICATION (DUC) AND SUBMIT TO A DATA ACCESS COMMITTEE: Go to the dbGaP homepage, http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap Click on 'Controlled Access' (a link in the blue margin on the left). This will take you to a help page with a big 'Log in' in the upper right hand corner. After you have read the help page, click the Log in. This will take you to an NIH login page. At this point you should use your NIH eRA Commons username and password to log in. If you do this correctly, you will go to your own 'My Resarch Projects' page, where there is a link to 'create new project'. When you click on 'create new project' you will be sent to some interactive pages that are similar in design to submitting grants and grant reports. Filling in those pages and supplying that information will create a pdf (the DUC) which will be the basis for your application. Please note that at the end of the interactive routine, the application will ask for the location of a pdf file that has all supporting materials, as described below (including, for example, the IRB approval, certification of completion of human subjects training, etc.). The elements necessary for the online application to dbGaP are as follows: . Identify a Principal Investigator (the person responsible to meet the terms of the agreement). This will be the individual with the eRA Commons account. . Prepare a data analysis plan. It should be fairly detailed, but with a 1600 character limit. If it is longer, make a pdf of it and add it to the pdf package described below. Remember that the Data Access Committee is simply interested in verifying that it is a legitimate use of the data, not in judging its scientific merit. Soon we will provide a brief description of the GAW16 Framingham data and simulated data, as well as a list of presentation group topics that can serve as a guide as you plan your analyses. . Prepare a brief lay abstract of your project, describing it in non-technical terms. . Prepare a list of investigators. All must be named; standard NIH biosketches will be needed for each. . Specify the institutional signing official (SO). . Find the data use agreement in the pdf that is created and print the data access agreement (DAA). Get it signed by everyone listed on the proposal and convert it to a pdf. . Choose the datasets you are requesting (there will be an obvious GAW16 package). Additional Support Documents necessary (all in one BIG pdf file): . NIH-style biosketches of all investigators on the proposal, including students. . A data access agreement DAA (printed from the DUC above) signed by ALL individuals who might even touch the data (including the investigators, and any analysts or database people or students who might assist with management or analysis tasks). . A letter from the overall Principal Investigator naming and certifying that all individuals have undergone human subjects research training. . A letter showing institutional IRB approval for the proposed project (an expedited review is OK, but classification as non- human research is NOT OK). . A computer/data security plan (addressing HIPAA compliance; security measures and passwords; type and version of operating system; frequency of installation of security patches; authentication protocols; names of responsible IT personnel; whether data will exist on laptops; if so, encryption and security for the laptops). At the end of the interactive routine begun to create the DUC, the application will ask for the name of the pdf file containing all these support documents to upload. This will create a final complete package that goes to the institutional signing official whom you've identified. However, the SO is not automatically notified, so it is helpful if you ask your SO to log on to their eRA Commons account and approve the application to get this done in a timely manner. Once the package is endorsed by the institution, the package is forwarded to the Data Access Committee and an automated response from dbGaP comes to tell you the application has been picked up for review. It is possible that you may have to iterate through these steps again to accommodate revisions in response to comments provided by the access committee if access is denied the first time around. G. ACCESS FRAMINGHAM DATA AND/OR SIMULATED DATA ON dbGaP: You will receive an email informing you when your application has been approved and access has been granted. Now, you will need to create a request to dbGAP to download the GAW16 phenotype set and your choice of genotypes (more on genotypes below). Go to the dbGaP homepage, http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap Click on 'Controlled Access' (a link in the blue margin on the left) and log in as you have done before. This will take you to the 'My Research Projects' page. The title of your project will be near the top of the page. Now click on the 'My Requests' tab near the top of the page. This will take you to a list of your 'Approved requests'. You should have permission for the 'General_research_use' participant set. Next, click on the project number on the left hand side of the page. Be patient, the system can be slow. On the new page you will find a section called 'Create new archive for download'. In this section you will find the GAW16 phenotype file, the SHARe pedigree file, and a large number of genotype and phenotype files. Choose the files you want with the check boxes to the left. Then go to the bottom of the page, type in a password of your choice, and select the type of archive file that you want. Then click 'send request'. You will receive email when your request is ready for download. You should log back into dbGAP and go to the 'downloads' tab, where you will be able to, finally, get the data. As we develop more experience with this process, we will provide additional information. Feel free to contact us. We are in the process of developing a wiki which will, we hope, make it easier for GAW16 participants to communicate with us and with each other. This should allow participants to post examples and notes concerning their successes and their problems. Once we have begun analyses, the wiki also will provide a mechanism for continuous exchange of information. We all need to be aware that this approach to data sharing for large cohorts with extensive genotyping probably will become the "industry standard". It may seem intractable now, however it is likely that we will all have to go down this path sooner or later. The process is lengthier and more arduous than what we're used to, but we are told by Ingrid Borecki, who has successfully completed the process, that it is achievable. Because the process isn't linear, it is more difficult to write about than it is to do. For information about the Framingham Heart Study, you also may find it helpful to access the Framingham website, www.framinghamheartstudy.org. More information about Framingham can be found at http://www.nhlbi.nih.gov/about/framingham/.