Genotyping ---------- .. contents:: :depth: 2 :local: :backlinks: top Defining Locus types and attributes *********************************** .. container:: divright A Locus type is a category of locus or genetic markers with common descriptors. It is one of the descriptors of an :ref:`Locus`. For instance, a locus type can be SNP, SSR, RFLP etc. Locus type can be managed in the menu *Admin > Genotyping > Locus type*. It is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the :ref:`form
`. .. container:: divleft .. figure:: /_img/admin/LocusMenu.png *Locus administration menu* Locus types can be edited from the :ref:`table` below the form. +---------------+---------------------------------------------+ | Field name | Description | | | | +===============+=============================================+ | | The name of this locus type | | **Name** | (for instance : SNP, SSR, Genomic sequence) | +---------------+---------------------------------------------+ | | A text describing this locus type | | *Description* | | +---------------+---------------------------------------------+ :ref:`Locus` are described regarding a set of fixed attributes (see below), common to all locus types. Thaliadb makes possible to add specific attributes to each locus type. Attributes can be managed in the menu 'Attributes' in the left navigation toolbar. An attribute is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the :ref:`form`. Attributes can be edited from the :ref:`table
` below the form. +---------------+-------------------------------------------------------------------------+ | Field name | Description | | | | +===============+=========================================================================+ | | The name of this attribute | | **Name** | | +---------------+-------------------------------------------------------------------------+ | | A type defining the nature of the information to choose between : | | **Type** | Short text, Long text, Number, URL | | | The type will have consequences on the look of the widget in Locus form | +---------------+-------------------------------------------------------------------------+ | | A short description for this attribute | | *Description* | | +---------------+-------------------------------------------------------------------------+ As soon as your attributes are defined you can add them as additional descriptors in locus type. Adding attributes to a locus type will enrich the fields available when you create a new locus of this type. To do so, go back to the Locus type table and edit one of them. A new tab 'Attributes' will appear in the edition form. In this interface, you can add attributes to the current locus type (1). By default, this attribute will be added at the last position. You can change this order by drag'n drop, as soon as you're fine with your attributes order you can click on the 'Save order' button (2). This order that you defined will be respected in the Locus form and when you will export locus data from thaliadb. .. figure:: /_img/admin/Attributes-management.png :align: center *Locus type attributes management works as Accession type attributes management* Locus positions *************** Locus can be positioned regarding a :ref:`genome version`. The position is filled when the :ref:`locus` is recorded in Thaliadb. New positions can be added to a newer genome version for a set of locus. This can be done in the menu *Admin > Genotyping > Locus Position* by selection a genome version and submitting a file with the following fields : +------------------+-----------------------------------------------------------------------------------+ | Field name | Description | | | | +==================+===================================================================================+ | | The name of this Locus | | *Name* | | +------------------+-----------------------------------------------------------------------------------+ | | If position is known for this locus, the chromosome on which it is positioned | | *Chromosome* | | | | | +------------------+-----------------------------------------------------------------------------------+ | | The position of this locus if it is known | | *Position* | | | | | +------------------+-----------------------------------------------------------------------------------+ Locus management **************** A Locus is a fixed position on a chromosome. This :ref:`position` can be known or not and can be different from a :ref:`genome version` to another. Locus can be managed in the menu *Admin > Genotyping > Locus type*. In the left navigation toolbar, a menu is available for each :ref:`locus type` that have been defined in the tool. It is defined at least by the fields described in the table below (additional fields can be defined as explained in the previous section). Fields in bold are required while italic ones are optional when submitting the :ref:`form`. Locus can be edited from the :ref:`table
` below the form. +------------------+-----------------------------------------------------------------------------------+ | Field name | Description | | | | +==================+===================================================================================+ | | The name of this Locus | | **Name** | | +------------------+-----------------------------------------------------------------------------------+ | | A description for this Locus | | *Comments* | | +------------------+-----------------------------------------------------------------------------------+ | | If position is known for this locus, the genome version on which it is positioned | | *Genome version* | | +------------------+-----------------------------------------------------------------------------------+ | | If position is known for this locus, the chromosome on which it is positioned | | *Chromosome* | | | | | +------------------+-----------------------------------------------------------------------------------+ | | The position of this locus if it is known | | *Position* | | | | | +------------------+-----------------------------------------------------------------------------------+ Referential *********** A Referential is a reading of a genotyping :ref:`experiment` regarding a defined context. referential can be managed in the menu *Admin > Genotyping > Referential*. It is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the :ref:`form`. Referential can be edited from the :ref:`table
` below the form. +------------------+-----------------------------------------------------------------------------------+ | Field name | Description | | | | +==================+===================================================================================+ | | The name of this Referential | | **Name** | | +------------------+-----------------------------------------------------------------------------------+ | | The reference :ref:`person` for this referential | | **Person** | | +------------------+-----------------------------------------------------------------------------------+ | | The creation date of this referential | | **Date** | | +------------------+-----------------------------------------------------------------------------------+ | | Description of this referential | | *Comments* | | | | | +------------------+-----------------------------------------------------------------------------------+ | | A set of related :ref:`documents` for this | | *Linked files* | referential | | | | +------------------+-----------------------------------------------------------------------------------+ | | :ref:`Projects` in which the referential is shared | | *Projects* | | | | | +------------------+-----------------------------------------------------------------------------------+ Experiment ********** An Experiment is the molecular characterization of a panel of :ref:`Accessions` regarding a set of :ref:`markers`. Experiments can be managed in the menu *Admin > Genotyping > Experiment*. It is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the :ref:`form`. Experiment can be edited from the :ref:`table
` below the form. +------------------+-----------------------------------------------------------------------------------+ | Field name | Description | | | | +==================+===================================================================================+ | | The name of this Experiment | | **Name** | | +------------------+-----------------------------------------------------------------------------------+ | | The :ref:`institution` that lead this experiment | | **Institution** | | +------------------+-----------------------------------------------------------------------------------+ | | The :ref:`person` that lead this experiment | | **Person** | | +------------------+-----------------------------------------------------------------------------------+ | | The creation date of this experiment | | **Date** | | +------------------+-----------------------------------------------------------------------------------+ | | Description of this experiment | | *Comments* | | | | | +------------------+-----------------------------------------------------------------------------------+ | | :ref:`Projects` in which the experiment is shared | | *Projects* | | | | | +------------------+-----------------------------------------------------------------------------------+ Genome versions *************** A Genome version is a new annotation or a new sequence of a given genome taxon. It results in new :ref:`positions` for related :ref:`locus`. Genome versions can be managed in the menu *Admin > Genotyping > Genome version*. It is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the :ref:`form`. Genome version can be edited from the :ref:`table
` below the form. +------------------+-----------------------------------------------------------------------------------+ | Field name | Description | | | | +==================+===================================================================================+ | | The name of this Genome version | | **Name** | | +------------------+-----------------------------------------------------------------------------------+ | | The description of this genome version | | *Description* | | +------------------+-----------------------------------------------------------------------------------+ Sample ****** A Sample is a DNA sample of a :ref:`seed lot`. It is used to characterize genetic material in :ref:`experiments`. Sample can be managed in the menu *Admin > Genotyping > Samplen*. It is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the :ref:`form`. Sample can be edited from the :ref:`table
` below the form. +------------------+-----------------------------------------------------------------------------------+ | Field name | Description | | | | +==================+===================================================================================+ | | The name of this Sample | | **Name** | | +------------------+-----------------------------------------------------------------------------------+ | | The origin :ref:`seed lot` of this sample | | **Seed lot** | | +------------------+-----------------------------------------------------------------------------------+ | | The description of this sample | | *Description* | | +------------------+-----------------------------------------------------------------------------------+ | | Denote a bulk sample | | *Is bulk* | | +------------------+-----------------------------------------------------------------------------------+ | | Denote this sample is obsolete | | *Is obsolete* | | +------------------+-----------------------------------------------------------------------------------+ | | The code used in the lab for internal identification | | *Code labo* | | +------------------+-----------------------------------------------------------------------------------+ | | The name used to indentify the sample at the lab | | *Tube name* | | +------------------+-----------------------------------------------------------------------------------+ | | :ref:`Projects` in which the sample is shared | | *Projects* | | +------------------+-----------------------------------------------------------------------------------+ GenotypingID ************ A GenotypingID is the characterization of a :ref:`sample` in one :ref:``. GenotypingID can be managed in the menu *Admin > Genotyping > GenotypingID*. It is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the :ref:`form`. GenotypingID can be edited from the :ref:`table
` below the form. +--------------------+-----------------------------------------------------------------------------------+ | Field name | Description | | | | +====================+===================================================================================+ | | The name of this genotypingID | | **Name** | | +--------------------+-----------------------------------------------------------------------------------+ | | The origin :ref:`sample` of this genotypingID | | **Sample** | | +--------------------+-----------------------------------------------------------------------------------+ | | The :ref:`` of this genotypingID | | **Experiment** | | +--------------------+-----------------------------------------------------------------------------------+ | | ?? | | *Sentrixbarcode a* | | +--------------------+-----------------------------------------------------------------------------------+ | | ?? | | *Sentrixposition a*| | +--------------------+-----------------------------------------------------------------------------------+ | | Source funding for this genotypingID | | *Funding* | | +--------------------+-----------------------------------------------------------------------------------+ | | :ref:`Projects` in which the genotypingID is shared | | *Projects* | | +--------------------+-----------------------------------------------------------------------------------+ Insert Genotyping data ********************** Genotyping data can be uploaded in Thaliadb with files. The form to submit data files can be found in the menu *Admin > Genotyping > insert data*. Multiple formats are available to submit genotyping data : Flat format, IUPAC matrix, 2 letters matrix, Slash matrix, and VCF. They are detailed in the sections below. .. figure:: /_img/admin/Upload-Geno-data.png :align: center *Form to upload Genotyping data* Flat format =========== The flat format can be used to describe the genotyping of heterogenous material such as population, Landraces etc. A frequency can be assigned to each allele identified at the locus. The file must contain the following headers : +--------------------+-----------------------------------------------------------------------------------+ | Field name | Description | | | | +====================+===================================================================================+ | | The name of the :ref:`genotypingID` | | **GenotypingID** | | +--------------------+-----------------------------------------------------------------------------------+ | | The name of the :ref:`locus` | | **Locus** | | +--------------------+-----------------------------------------------------------------------------------+ | | An identified allele for this genotypinID the locus. | | **Allele_value** | | +--------------------+-----------------------------------------------------------------------------------+ | | A value between 0 and 1 corresponding to the frequency of the allele | | *Allelic_frequency*| | +--------------------+-----------------------------------------------------------------------------------+ The form to submit genotyping data with flat files provides two options : * missing value : A string meaning an unidentified allele (Any string, any size except the letters 'A', 'C', 'T' and 'G'). * frequencies test : this option compute a test on all frequencies for one genotypingID at one locus, ensuring that the sum of frequency is 1 +/- 0,001. **Important** : missing value will be considered in priority compared with 'Allele_value' in the flat file, meaning that you have to take care that your missing value is consistent with the allele values referenced in your file. IUPAC code matrix ================== The IUPAC code matrix format can be used to describe the genotyping data of material encoded in a one letter. `IUPAC code `_ is an established standard. The file is a matrix using standard separators (comma, semi comma, tabulation etc.) of individual vs locus. It is possible to choose if locus are in line or in column. 2 letters code matrix ===================== The 2 letters code matrix is more suitable to describe the genotyping data of homogeneous material such as inbred lines. In this file, the genotypes are encoded with 2 letters such as AA, CC, TT, GG if the individual is homozygous for this locus. Heterozygous can also be encoded with 2 letters (for instance AT, CT, GA etc.). Insertion and deletion are encoded with the symbols '+' and '-'. The form to submit genotyping data with 2 letters files provides two options : * missing value : A string meaning an unidentified allele (Any string, any size except the letters 'A', 'C', 'T', 'G', '+' and '-'). * matrix orientation : this option makes possible to choose if locus are in line or in column. Slash matrix ============ The "Slash" matrix is also suitable to describe the genotyping of homogeneous material such as inbred lines. In this file, the genotypes are encoded with 2 strings separated by a slash. This format offers more flexibility to encode alleles values regarding the 2 letters matrix format. The form to submit genotyping data with a slash file format provides two options : * missing value : A string meaning an unidentified allele (Any string, any size except the letters 'A', 'C', 'T', 'G'). * matrix orientation : this option makes possible to choose if locus are in line or in column. **Important** : missing value will be considered in priority compared with allele values detected in the cells of the matrix That means you have to take care that your missing value is consistent with the allele values referenced in your file. VCF format ========== The VCF option can be selected to submit genotyping data with this established standard. No option is available here as Thaliadb follow the format description. The VCF file will be stored in Thaliadb "as it is" and also converted in a matrix format.