public class SimpleBiAllelicGenotypeData extends uno.informatics.data.pojo.DataPojo implements BiAllelicGenotypeData
Constructor and Description |
---|
SimpleBiAllelicGenotypeData(uno.informatics.data.SimpleEntity[] itemHeaders,
String[] markerNames,
byte[][] alleleScores)
Create data with name "Biallelic marker data".
|
SimpleBiAllelicGenotypeData(String datasetName,
uno.informatics.data.SimpleEntity[] itemHeaders,
String[] markerNames,
byte[][] alleleScores)
Create data with given dataset name, item headers, marker names and allele scores.
|
Modifier and Type | Method and Description |
---|---|
double |
getAlleleFrequency(int id,
int markerIndex,
int alleleIndex)
Get the relative frequency of an allele for the given entry (sample/accession).
|
String |
getAlleleName(int markerIndex,
int alleleIndex)
Get the name of an allele, if assigned.
|
byte |
getAlleleScore(int id,
int markerIndex)
Get the allele score of the marker for the given entry.
|
String |
getMarkerName(int markerIndex)
Get the name of a marker by index, if assigned.
|
int |
getNumberOfAlleles(int markerIndex)
Get the number of alleles for a given marker.
|
int |
getNumberOfMarkers()
Get the total number of markers used in this dataset.
|
int |
getTotalNumberOfAlleles()
Get the total number of allele across all markers.
|
boolean |
hasMissingValues(int id,
int markerIndex)
Indicates whether there are missing values (frequencies)
for the given entry (sample/accession) at the given marker.
|
static SimpleBiAllelicGenotypeData |
readData(Path filePath,
uno.informatics.data.io.FileType type)
Read biallelic genotype data from file.
|
void |
writeData(Path filePath,
uno.informatics.data.io.FileType fileType,
org.jamesframework.core.subset.SubsetSolution solution,
boolean includeSelected,
boolean includeUnselected,
boolean includeIndex)
Write selected data to file.
|
checkHeaders, getDataset, getHeader, getIDs, getSize, indexOf, indexOf, setDataset, updateOrCreateHeaders, updateOrCreateHeaders
equals, getName, getUniqueIdentifier, hashCode, initialise, setName, setUniqueIdentifier, toString
addPropertyChangeListener, getPropertyChangeSupport, removePropertyChangeListener
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
writeData
public SimpleBiAllelicGenotypeData(uno.informatics.data.SimpleEntity[] itemHeaders, String[] markerNames, byte[][] alleleScores)
SimpleBiAllelicGenotypeData(String, SimpleEntity[], String[], byte[][])
.itemHeaders
- item headers (include name and/or unique identifier)markerNames
- marker namesalleleScores
- 0/1/2 allele score matrixpublic SimpleBiAllelicGenotypeData(String datasetName, uno.informatics.data.SimpleEntity[] itemHeaders, String[] markerNames, byte[][] alleleScores)
alleleScores
indicates the number of
items and markers, respectively. All entries in this matrix should be 0, 1 or 2 (or
CoreHunterConstants.MISSING_ALLELE_SCORE
for missing values).
Item headers are required but marker names are optional. If marker names
are given they need not be defined for all markers nor unique. Each item
should at least have a unique identifier (names are optional). The length
of itemHeaders
and markerNames
(if not
null
) should be equal to the number of items and markers,
respectively, as inferred from the dimensions of alleleScores
.
Violating any of the requirements will produce an exception.
Allele scores as well as assigned headers and names are copied into internal data structures, i.e. no references are retained to any of the arrays passed as arguments.
datasetName
- name of the datasetitemHeaders
- item headers; its length should equal the number of rows in
alleleScores
and each item should at least have a
unique identifiermarkerNames
- marker names, null
if no marker names are
assigned; if not null
its length should equal the
number of columns in alleleScores
; can contain
null
values for markers whose name is undefinedalleleScores
- allele scores, may not be null
; contains only values
0, 1 and 2, and possibly CoreHunterConstants.MISSING_ALLELE_SCORE
in case of missing values; dimensions indicate number of items (rows) and
markers (columns)public static SimpleBiAllelicGenotypeData readData(Path filePath, uno.informatics.data.io.FileType type) throws IOException
FileType.TXT
and FileType.CSV
are allowed. Values are
separated with a single tab (txt) or comma (csv) character. The file
contains an allele score matrix with one row per individual and one
column per marker. Only values 0, 1 and 2 are valid. Empty cells are also
allowed in case of missing data.
The file contains one required header row and column ("ID") specifying item identifiers (row headers) and marker names (colum headers). Item identifiers should be unique and defined for all items. Marker names may be undefined for some or all markers and need not be unique. An optional second header column ("NAME") can also be included, specifying (not necessarily unique) item names. If no explicit item names are provided the unique identifiers are used as names as well.
Leading and trailing whitespace is removed from names and unique identifiers and they are unquoted if wrapped in single or double quotes after whitespace removal. If it is intended to start or end a name/identifier with whitespace this whitespace should be contained within the quotes, as it will then not be removed.
Trailing empty cells can be omitted at any row in the file.
The dataset name is set to the name of the file to which
filePath
points.
filePath
- path to file that contains the datatype
- FileType.TXT
or FileType.CSV
IOException
- if the file can not be read or is not correctly formattedpublic int getNumberOfMarkers()
FrequencyGenotypeData
getNumberOfMarkers
in interface FrequencyGenotypeData
public String getMarkerName(int markerIndex) throws ArrayIndexOutOfBoundsException
FrequencyGenotypeData
getMarkerName
in interface FrequencyGenotypeData
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of
markers as returned by FrequencyGenotypeData.getNumberOfMarkers()
null
if no name has been setArrayIndexOutOfBoundsException
- if the index is out of rangepublic int getNumberOfAlleles(int markerIndex)
FrequencyGenotypeData
getNumberOfAlleles
in interface FrequencyGenotypeData
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of
markers as returned by FrequencyGenotypeData.getNumberOfMarkers()
public int getTotalNumberOfAlleles()
FrequencyGenotypeData
getTotalNumberOfAlleles
in interface FrequencyGenotypeData
public String getAlleleName(int markerIndex, int alleleIndex) throws ArrayIndexOutOfBoundsException
FrequencyGenotypeData
getAlleleName
in interface FrequencyGenotypeData
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of markers as
returned by FrequencyGenotypeData.getNumberOfMarkers()
alleleIndex
- allele index within the range 0 to a-1, where a is the number of alleles for the given marker
as returned by FrequencyGenotypeData.getNumberOfAlleles(int)
null
if no name has been setArrayIndexOutOfBoundsException
- if the marker or allele index is out of rangepublic byte getAlleleScore(int id, int markerIndex)
BiAllelicGenotypeData
CoreHunterConstants.MISSING_ALLELE_SCORE
.getAlleleScore
in interface BiAllelicGenotypeData
id
- the id of the entry, must be one of the IDs returned by Data.getIDs()
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of markers and
is returned by FrequencyGenotypeData.getNumberOfMarkers()
CoreHunterConstants.MISSING_ALLELE_SCORE
if missingpublic double getAlleleFrequency(int id, int markerIndex, int alleleIndex)
FrequencyGenotypeData
getAlleleFrequency
in interface FrequencyGenotypeData
id
- the id of the entry, must be one of the IDs returned by Data.getIDs()
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of markers as
returned by FrequencyGenotypeData.getNumberOfMarkers()
alleleIndex
- allele index within the range 0 to a-1, where a is the number of alleles for the given marker
as returned by FrequencyGenotypeData.getNumberOfAlleles(int)
Double.NaN
if missingpublic boolean hasMissingValues(int id, int markerIndex)
FrequencyGenotypeData
hasMissingValues
in interface FrequencyGenotypeData
id
- the id of the entry, must be one of the IDs returned by Data.getIDs()
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of
markers as returned by FrequencyGenotypeData.getNumberOfMarkers()
true
if some or all values are missing for the given marker in the given entrypublic void writeData(Path filePath, uno.informatics.data.io.FileType fileType, org.jamesframework.core.subset.SubsetSolution solution, boolean includeSelected, boolean includeUnselected, boolean includeIndex) throws IOException
FrequencyGenotypeData
writeData
in interface FrequencyGenotypeData
filePath
- file pathfileType
- FileType.TXT
or FileType.CSV
solution
- the solution to subset the data (selected core)includeSelected
- includes selected accessions in output fileincludeUnselected
- includes unselected accessions output fileincludeIndex
- includes accession indices, i.e. the internal integer IDs used by the solutionIOException
- if the data can not be written to the fileCopyright © 2017. All rights reserved.