Thaliadb

Genotyping

Defining Locus types and attributes

A Locus type is a category of locus or genetic markers with common descriptors. It is one of the descriptors of an Locus. For instance, a locus type can be SNP, SSR, RFLP etc.

Locus type can be managed in the menu Admin > Genotyping > Locus type.

It is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the form.

../_images/LocusMenu.png

Locus administration menu

Locus types can be edited from the table below the form.

Field name

Description

Name

The name of this locus type (for instance : SNP, SSR, Genomic sequence)

Description

A text describing this locus type

Locus are described regarding a set of fixed attributes (see below), common to all locus types. Thaliadb makes possible to add specific attributes to each locus type. Attributes can be managed in the menu ‘Attributes’ in the left navigation toolbar. An attribute is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the form. Attributes can be edited from the table below the form.

Field name

Description

Name

The name of this attribute

Type

A type defining the nature of the information to choose between : Short text, Long text, Number, URL The type will have consequences on the look of the widget in Locus form

Description

A short description for this attribute

As soon as your attributes are defined you can add them as additional descriptors in locus type. Adding attributes to a locus type will enrich the fields available when you create a new locus of this type. To do so, go back to the Locus type table and edit one of them. A new tab ‘Attributes’ will appear in the edition form.

In this interface, you can add attributes to the current locus type (1). By default, this attribute will be added at the last position. You can change this order by drag’n drop, as soon as you’re fine with your attributes order you can click on the ‘Save order’ button (2). This order that you defined will be respected in the Locus form and when you will export locus data from thaliadb.

../_images/Attributes-management.png

Locus type attributes management works as Accession type attributes management

Locus positions

Locus can be positioned regarding a genome version. The position is filled when the locus is recorded in Thaliadb. New positions can be added to a newer genome version for a set of locus.

This can be done in the menu Admin > Genotyping > Locus Position by selection a genome version and submitting a file with the following fields :

Field name

Description

Name

The name of this Locus

Chromosome

If position is known for this locus, the chromosome on which it is positioned

Position

The position of this locus if it is known

Locus management

A Locus is a fixed position on a chromosome. This position can be known or not and can be different from a genome version to another.

Locus can be managed in the menu Admin > Genotyping > Locus type. In the left navigation toolbar, a menu is available for each locus type that have been defined in the tool.

It is defined at least by the fields described in the table below (additional fields can be defined as explained in the previous section). Fields in bold are required while italic ones are optional when submitting the form. Locus can be edited from the table below the form.

Field name

Description

Name

The name of this Locus

Comments

A description for this Locus

Genome version

If position is known for this locus, the genome version on which it is positioned

Chromosome

If position is known for this locus, the chromosome on which it is positioned

Position

The position of this locus if it is known

Referential

A Referential is a reading of a genotyping experiment regarding a defined context.

referential can be managed in the menu Admin > Genotyping > Referential.

It is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the form. Referential can be edited from the table below the form.

Field name

Description

Name

The name of this Referential

Person

The reference person for this referential

Date

The creation date of this referential

Comments

Description of this referential

Linked files

A set of related documents for this referential

Projects

Projects in which the referential is shared

Experiment

An Experiment is the molecular characterization of a panel of Accessions regarding a set of markers.

Experiments can be managed in the menu Admin > Genotyping > Experiment.

It is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the form. Experiment can be edited from the table below the form.

Field name

Description

Name

The name of this Experiment

Institution

The institution that lead this experiment

Person

The person that lead this experiment

Date

The creation date of this experiment

Comments

Description of this experiment

Projects

Projects in which the experiment is shared

Genome versions

A Genome version is a new annotation or a new sequence of a given genome taxon. It results in new positions for related locus.

Genome versions can be managed in the menu Admin > Genotyping > Genome version.

It is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the form. Genome version can be edited from the table below the form.

Field name

Description

Name

The name of this Genome version

Description

The description of this genome version

Sample

A Sample is a DNA sample of a seed lot. It is used to characterize genetic material in experiments.

Sample can be managed in the menu Admin > Genotyping > Samplen.

It is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the form. Sample can be edited from the table below the form.

Field name

Description

Name

The name of this Sample

Seed lot

The origin seed lot of this sample

Description

The description of this sample

Is bulk

Denote a bulk sample

Is obsolete

Denote this sample is obsolete

Code labo

The code used in the lab for internal identification

Tube name

The name used to indentify the sample at the lab

Projects

Projects in which the sample is shared

GenotypingID

A GenotypingID is the characterization of a sample in one <experiment.

GenotypingID can be managed in the menu Admin > Genotyping > GenotypingID.

It is defined by the fields described in the table below. Fields in bold are required while italic ones are optional when submitting the form. GenotypingID can be edited from the table below the form.

Field name

Description

Name

The name of this genotypingID

Sample

The origin sample of this genotypingID

Experiment

The <experiment of this genotypingID

Sentrixbarcode a

??

Sentrixposition a

??

Funding

Source funding for this genotypingID

Projects

Projects in which the genotypingID is shared

Insert Genotyping data

Genotyping data can be uploaded in Thaliadb with files. The form to submit data files can be found in the menu Admin > Genotyping > insert data.

Multiple formats are available to submit genotyping data : Flat format, IUPAC matrix, 2 letters matrix, Slash matrix, and VCF. They are detailed in the sections below.

../_images/Upload-Geno-data.png

Form to upload Genotyping data

Flat format

The flat format can be used to describe the genotyping of heterogenous material such as population, Landraces etc. A frequency can be assigned to each allele identified at the locus. The file must contain the following headers :

Field name

Description

GenotypingID

The name of the genotypingID

Locus

The name of the locus

Allele_value

An identified allele for this genotypinID the locus.

Allelic_frequency

A value between 0 and 1 corresponding to the frequency of the allele

The form to submit genotyping data with flat files provides two options :

  • missing value : A string meaning an unidentified allele (Any string, any size except the letters ‘A’, ‘C’, ‘T’ and ‘G’).

  • frequencies test : this option compute a test on all frequencies for one genotypingID at one locus, ensuring that the sum of frequency is 1 +/- 0,001.

Important : missing value will be considered in priority compared with ‘Allele_value’ in the flat file, meaning that you have to take care that your missing value is consistent with the allele values referenced in your file.

IUPAC code matrix

The IUPAC code matrix format can be used to describe the genotyping data of material encoded in a one letter. IUPAC code is an established standard. The file is a matrix using standard separators (comma, semi comma, tabulation etc.) of individual vs locus. It is possible to choose if locus are in line or in column.

2 letters code matrix

The 2 letters code matrix is more suitable to describe the genotyping data of homogeneous material such as inbred lines. In this file, the genotypes are encoded with 2 letters such as AA, CC, TT, GG if the individual is homozygous for this locus. Heterozygous can also be encoded with 2 letters (for instance AT, CT, GA etc.). Insertion and deletion are encoded with the symbols ‘+’ and ‘-‘.

The form to submit genotyping data with 2 letters files provides two options :

  • missing value : A string meaning an unidentified allele (Any string, any size except the letters ‘A’, ‘C’, ‘T’, ‘G’, ‘+’ and ‘-‘).

  • matrix orientation : this option makes possible to choose if locus are in line or in column.

Slash matrix

The “Slash” matrix is also suitable to describe the genotyping of homogeneous material such as inbred lines. In this file, the genotypes are encoded with 2 strings separated by a slash. This format offers more flexibility to encode alleles values regarding the 2 letters matrix format.

The form to submit genotyping data with a slash file format provides two options :

  • missing value : A string meaning an unidentified allele (Any string, any size except the letters ‘A’, ‘C’, ‘T’, ‘G’).

  • matrix orientation : this option makes possible to choose if locus are in line or in column.

Important : missing value will be considered in priority compared with allele values detected in the cells of the matrix That means you have to take care that your missing value is consistent with the allele values referenced in your file.

VCF format

The VCF option can be selected to submit genotyping data with this established standard. No option is available here as Thaliadb follow the format description. The VCF file will be stored in Thaliadb “as it is” and also converted in a matrix format.