public class SimpleFrequencyGenotypeData extends uno.informatics.data.pojo.DataPojo implements FrequencyGenotypeData
Constructor and Description |
---|
SimpleFrequencyGenotypeData(uno.informatics.data.SimpleEntity[] itemHeaders,
String[] markerNames,
String[][] alleleNames,
double[][][] alleleFrequencies)
Create data with name "Allele frequency data".
|
SimpleFrequencyGenotypeData(String datasetName,
uno.informatics.data.SimpleEntity[] itemHeaders,
String[] markerNames,
String[][] alleleNames,
double[][][] alleleFrequencies)
Create data with given dataset name, item headers, marker/allele names
and allele frequencies.
|
Modifier and Type | Method and Description |
---|---|
double |
getAlleleFrequency(int id,
int markerIndex,
int alleleIndex)
Get the relative frequency of an allele for the given entry (sample/accession).
|
String |
getAlleleName(int markerIndex,
int alleleIndex)
Get the name of an allele, if assigned.
|
String |
getMarkerName(int markerIndex)
Get the name of a marker by index, if assigned.
|
int |
getNumberOfAlleles(int markerIndex)
Get the number of alleles for a given marker.
|
int |
getNumberOfMarkers()
Get the total number of markers used in this dataset.
|
int |
getTotalNumberOfAlleles()
Get the total number of allele across all markers.
|
boolean |
hasMissingValues(int id,
int markerIndex)
Indicates whether there are missing values (frequencies)
for the given entry (sample/accession) at the given marker.
|
static LinkedHashMap<String,Integer> |
inferMarkerNames(String[] columnNames)
Infer marker names and number of columns per marker from column names.
|
static FrequencyGenotypeData |
readData(Path filePath,
uno.informatics.data.io.FileType fileType)
Read genotype data from file.
|
void |
writeData(Path filePath,
uno.informatics.data.io.FileType fileType,
org.jamesframework.core.subset.SubsetSolution solution,
boolean includeSelected,
boolean includeUnselected,
boolean includeIndex)
Write selected data to file.
|
checkHeaders, getDataset, getHeader, getIDs, getSize, indexOf, indexOf, setDataset, updateOrCreateHeaders, updateOrCreateHeaders
equals, getName, getUniqueIdentifier, hashCode, initialise, setName, setUniqueIdentifier, toString
addPropertyChangeListener, getPropertyChangeSupport, removePropertyChangeListener
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
writeData
public SimpleFrequencyGenotypeData(uno.informatics.data.SimpleEntity[] itemHeaders, String[] markerNames, String[][] alleleNames, double[][][] alleleFrequencies)
SimpleFrequencyGenotypeData(String, SimpleEntity[], String[], String[][], double[][][])
.itemHeaders
- item headers, specifying name and/or unique identifiermarkerNames
- marker namesalleleNames
- allele names per markeralleleFrequencies
- allele frequenciespublic SimpleFrequencyGenotypeData(String datasetName, uno.informatics.data.SimpleEntity[] itemHeaders, String[] markerNames, String[][] alleleNames, double[][][] alleleFrequencies)
alleleFrequencies
denotes the number of items in the dataset. The length of
alleleFrequencies[i]
should be the same for all
i
and denotes the number of markers. Finally, the length of
alleleFrequencies[i][m]
should also be the same for all
i
and denotes the number of alleles of the m
-th
marker. Allele counts may differ for different markers.
All frequencies should be positive and the values in
alleleFrequencies[i][m]
should sum to one for all
i
and m
, with a precision of 0.01. Missing
values are encoded with Double.NaN
.
If one or more allele frequencies are missing at a certain marker for a certain
individual, the remaining frequencies should sum to a value less than or equal to one.
Item headers are required and marker/allele names are optional. If marker
and/or allele names are given they need not be defined for all
markers/alleles nor unique. Each item should at least have a unique
identifier (names are optional). If not null
, the length of
each header/name array should correspond to the dimensions of
alleleFrequencies
(number of individuals, markers and
alleles per marker).
Violating any of the requirements will produce an exception.
Allele frequencies as well as assigned headers and names are copied into internal data structures, i.e. no references are retained to any of the arrays passed as arguments.
datasetName
- name of the datasetitemHeaders
- item headers; its length should correspond to the number of
individuals and each item should at least have a unique
identifier (names are optional)markerNames
- marker names, null
if no marker names are
assigned; if not null
its length should
correspond to the number of markers (can contain
null
values)alleleNames
- allele names per marker, null
if no allele names
are assigned; if not null
the length of
alleleNames
should correspond to the number of
markers and the length of alleleNames[m]
to the
number of alleles of the m-th marker (can contain
null
values)alleleFrequencies
- allele frequencies, may not be null
; missing values
are encoded with Double.NaN
; dimensions indicate number
of individuals, markers and alleles per markerpublic int getNumberOfMarkers()
FrequencyGenotypeData
getNumberOfMarkers
in interface FrequencyGenotypeData
public int getNumberOfAlleles(int markerIndex)
FrequencyGenotypeData
getNumberOfAlleles
in interface FrequencyGenotypeData
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of
markers as returned by FrequencyGenotypeData.getNumberOfMarkers()
public String getMarkerName(int markerIndex)
FrequencyGenotypeData
getMarkerName
in interface FrequencyGenotypeData
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of
markers as returned by FrequencyGenotypeData.getNumberOfMarkers()
null
if no name has been setpublic int getTotalNumberOfAlleles()
FrequencyGenotypeData
getTotalNumberOfAlleles
in interface FrequencyGenotypeData
public String getAlleleName(int markerIndex, int alleleIndex)
FrequencyGenotypeData
getAlleleName
in interface FrequencyGenotypeData
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of markers as
returned by FrequencyGenotypeData.getNumberOfMarkers()
alleleIndex
- allele index within the range 0 to a-1, where a is the number of alleles for the given marker
as returned by FrequencyGenotypeData.getNumberOfAlleles(int)
null
if no name has been setpublic double getAlleleFrequency(int id, int markerIndex, int alleleIndex)
FrequencyGenotypeData
getAlleleFrequency
in interface FrequencyGenotypeData
id
- the id of the entry, must be one of the IDs returned by Data.getIDs()
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of markers as
returned by FrequencyGenotypeData.getNumberOfMarkers()
alleleIndex
- allele index within the range 0 to a-1, where a is the number of alleles for the given marker
as returned by FrequencyGenotypeData.getNumberOfAlleles(int)
Double.NaN
if missingpublic boolean hasMissingValues(int id, int markerIndex)
FrequencyGenotypeData
hasMissingValues
in interface FrequencyGenotypeData
id
- the id of the entry, must be one of the IDs returned by Data.getIDs()
markerIndex
- the index of the marker within the range 0 to n-1, where n is the total number of
markers as returned by FrequencyGenotypeData.getNumberOfMarkers()
true
if some or all values are missing for the given marker in the given entrypublic static FrequencyGenotypeData readData(Path filePath, uno.informatics.data.io.FileType fileType) throws IOException
FileType.TXT
and
FileType.CSV
are allowed. Values are separated with a single tab
(txt) or comma (csv) character.
The file contains allele frequencies following the requirements as described in the constructor
SimpleFrequencyGenotypeData(String, SimpleEntity[], String[], String[][], double[][][])
.
Missing frequencies are encoding as empty cells. The file starts with a
compulsory header row from which marker names and allele counts are
inferred. All columns corresponding to the same marker occur
consecutively in the file and are named after that marker, optionally
including a suffix as described below. Marker names should be unique.
There is one compulsory header column "ID" containing unique item
identifiers. Optionally a second header column "NAME" can be included to
provide (not necessarily unique) item names in addition to the unique
identifiers. Finally, an optional second header row can be included to
define allele names per marker, identified with row header "ALLELE".
Allele names need not be unique and can be undefined for some alleles by
leaving the corresponding cells empty.
Leading and trailing whitespace is removed from names and unique identifiers and they are unquoted if wrapped in single or double quotes after whitespace removal. If it is intended to start or end a name/identifier with whitespace this whitespace should be contained within the quotes, as it will then not be removed. Also, column names may optionally include an arbitrary suffix added to the marker name, starting with a dash, underscore or dot character. The latter allows to use column names such as "M1-1" and "M1-2", "M1.a" and "M1.b" or "M1_1" and "M1_2" for a marker named "M1" with two columns. The column name prefix up to before the last occurrence of any dash, underscore or dot character is taken to be the marker name.
Trailing empty cells can be omitted from any row in the file.
The dataset name is set to the name of the file to which
filePath
points.
filePath
- path to file that contains the datafileType
- FileType.TXT
or FileType.CSV
IOException
- if the file can not be read or is not correctly formattedpublic static LinkedHashMap<String,Integer> inferMarkerNames(String[] columnNames)
columnNames
- column namespublic void writeData(Path filePath, uno.informatics.data.io.FileType fileType, org.jamesframework.core.subset.SubsetSolution solution, boolean includeSelected, boolean includeUnselected, boolean includeIndex) throws IOException
FrequencyGenotypeData
writeData
in interface FrequencyGenotypeData
filePath
- file pathfileType
- FileType.TXT
or FileType.CSV
solution
- the solution to subset the data (selected core)includeSelected
- includes selected accessions in output fileincludeUnselected
- includes unselected accessions output fileincludeIndex
- includes accession indices, i.e. the internal integer IDs used by the solutionIOException
- if the data can not be written to the fileCopyright © 2017. All rights reserved.