public class SimpleDefaultGenotypeData extends SimpleFrequencyGenotypeData implements DefaultGenotypeData
Constructor and Description |
---|
SimpleDefaultGenotypeData(uno.informatics.data.SimpleEntity[] itemHeaders,
String[] markerNames,
String[][][] observedAlleles)
Create data with name "Default marker data".
|
SimpleDefaultGenotypeData(String datasetName,
uno.informatics.data.SimpleEntity[] itemHeaders,
String[] markerNames,
String[][][] observedAlleles)
Create data with given dataset name, item headers, marker/allele names
and allele frequencies.
|
Modifier and Type | Method and Description |
---|---|
int |
getNumberOfObservedAllelesPerIndividual(int markerIndex)
Get the number of observed alleles per individual for a given marker.
|
String |
getObservedAllele(int id,
int markerIndex,
int i)
Get the reference of the i-th observed allele for the given marker, in the given entry.
|
static FrequencyGenotypeData |
readData(Path filePath,
uno.informatics.data.io.FileType type)
Read default genotype data from file.
|
void |
writeData(Path filePath,
uno.informatics.data.io.FileType fileType,
org.jamesframework.core.subset.SubsetSolution solution,
boolean includeSelected,
boolean includeUnselected,
boolean includeIndex)
Write selected data to file.
|
getAlleleFrequency, getAlleleName, getMarkerName, getNumberOfAlleles, getNumberOfMarkers, getTotalNumberOfAlleles, hasMissingValues, inferMarkerNames
checkHeaders, getDataset, getHeader, getIDs, getSize, indexOf, indexOf, setDataset, updateOrCreateHeaders, updateOrCreateHeaders
equals, getName, getUniqueIdentifier, hashCode, initialise, setName, setUniqueIdentifier, toString
addPropertyChangeListener, getPropertyChangeSupport, removePropertyChangeListener
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
getAlleleFrequency, getAlleleName, getMarkerName, getNumberOfAlleles, getNumberOfMarkers, getTotalNumberOfAlleles, hasMissingValues, writeData
public SimpleDefaultGenotypeData(uno.informatics.data.SimpleEntity[] itemHeaders, String[] markerNames, String[][][] observedAlleles)
SimpleDefaultGenotypeData(String, SimpleEntity[], String[], String[][][])
.itemHeaders
- item headers, specifying name and/or unique identifiermarkerNames
- marker namesobservedAlleles
- observed allele referencespublic SimpleDefaultGenotypeData(String datasetName, uno.informatics.data.SimpleEntity[] itemHeaders, String[] markerNames, String[][][] observedAlleles)
alleleFrequencies
denotes the number of items in the dataset. The length of
alleleFrequencies[i]
should be the same for all
i
and denotes the number of markers. Finally, the length of
alleleFrequencies[i][m]
should also be the same for all
i
and denotes the number of alleles of the m
-th
marker. Allele counts may differ for different markers.
All frequencies should be positive and the values in
alleleFrequencies[i][m]
should sum to one for all
i
and m
, with a precision of 0.01. Missing
values are encoded as null
. If one or more allele
frequencies are missing at a certain marker for a certain individual, the
remaining frequencies should sum to a value less than or equal to one.
Item headers are required and marker/allele names are optional. If marker
and/or allele names are given they need not be defined for all
markers/alleles nor unique. Each item should at least have a unique
identifier (names are optional). If not null
the length of
each header/name array should correspond to the dimensions of
alleleFrequencies
(number of individuals, markers and
alleles per marker).
Violating any of the requirements will produce an exception.
Allele frequencies as well as assigned headers and names are copied into internal data structures, i.e. no references are retained to any of the arrays passed as arguments.
datasetName
- name of the datasetitemHeaders
- item headers; its length should correspond to the number of
individuals and each item should at least have a unique
identifier (names are optional)markerNames
- marker names, null
if no marker names are
assigned; if not null
its length should
correspond to the number of markers (can contain
null
values)observedAlleles
- observed allele references, may not be null
but can
contain null
values (missing); dimensions indicate
number of individuals, markers, and allele observations per
individual for each specific markerpublic int getNumberOfObservedAllelesPerIndividual(int markerIndex)
DefaultGenotypeData
getNumberOfObservedAllelesPerIndividual
in interface DefaultGenotypeData
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of
markers as returned by FrequencyGenotypeData.getNumberOfMarkers()
public String getObservedAllele(int id, int markerIndex, int i)
DefaultGenotypeData
null
if no allele was detected here.getObservedAllele
in interface DefaultGenotypeData
id
- the id of the entry, must be one of the IDs returned by Data.getIDs()
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of markers
as returned by FrequencyGenotypeData.getNumberOfMarkers()
i
- observation index within the range 0 to k-1, where k is the number of observed alleles per individual,
for the given marker, as returned by DefaultGenotypeData.getNumberOfObservedAllelesPerIndividual(int)
null
if missingpublic static FrequencyGenotypeData readData(Path filePath, uno.informatics.data.io.FileType type) throws IOException
FileType.TXT
and
FileType.CSV
are allowed. Values are separated with a single tab
(txt) or comma (csv) character.
The file contains one or more consecutive columns per marker, in which the observed alleles are specified (by name/id/number/...). This format is suited for datasets with a fixed number of allele observations per individual, for each marker. Common cases are those with one or two columns per marker, e.g. suited for fully homozygous and diploid datasets, respectively. Any (possibly varying) number of columns per marker is supported.
Missing values are encoding as empty cells.
A required first header row and column are included to specify unique item identifiers and marker names, respectively, identified with column/row header "ID". Optionally a second header column "NAME" can be included to provide (not necessarily unique) item names in addition to the unique identifiers.
Consecutive columns corresponding to the same marker should have the same name, optionally extended with a suffix starting with a dash, underscore or dot character. The latter allows to use column names such as "M1-1" and "M1-2", "M1.a" and "M1.b" or "M1_1" and "M1_2" for a marker named "M1" with two columns. The column name prefix up to before the last occurrence of any dash, underscore or dot character is taken to be the marker name.
Leading and trailing whitespace is removed from names and unique identifiers and they are unquoted if wrapped in single or double quotes after whitespace removal. If it is intended to start or end a name/identifier with whitespace, this whitespace should be contained within the quotes, as it will then not be removed.
Trailing empty cells can be omitted from any row in the file.
The dataset name is set to the name of the file to which
filePath
points.
filePath
- path to file that contains the datatype
- FileType.TXT
or FileType.CSV
IOException
- if the file can not be read or is not correctly formattedpublic void writeData(Path filePath, uno.informatics.data.io.FileType fileType, org.jamesframework.core.subset.SubsetSolution solution, boolean includeSelected, boolean includeUnselected, boolean includeIndex) throws IOException
FrequencyGenotypeData
writeData
in interface FrequencyGenotypeData
writeData
in class SimpleFrequencyGenotypeData
filePath
- file pathfileType
- FileType.TXT
or FileType.CSV
solution
- the solution to subset the data (selected core)includeSelected
- includes selected accessions in output fileincludeUnselected
- includes unselected accessions output fileincludeIndex
- includes accession indices, i.e. the internal integer IDs used by the solutionIOException
- if the data can not be written to the fileCopyright © 2017. All rights reserved.