public class VectorBasedSRMetric extends BaseSRMetric
The metric requires two subcomponents:
This class also manages a feature matrix and transpose. The matrix is required for calls to mostSimilar. It is not required for calls to similarity(), but will be used to speed them up if available. The matrix is built when trainMostSimilar() is called, but can also be explicitly built by calling buildFeatureAndTransposeMatrices().
VectorGenerator
,
VectorSimilarity
Modifier and Type | Class and Description |
---|---|
static class |
VectorBasedSRMetric.Provider |
BaseSRMetric.SRConfig
Modifier and Type | Field and Description |
---|---|
protected BaseSRMetric.SRConfig |
config |
protected VectorGenerator |
generator |
protected VectorSimilarity |
similarity |
Constructor and Description |
---|
VectorBasedSRMetric(String name,
Language language,
LocalPageDao dao,
Disambiguator disambig,
VectorGenerator generator,
VectorSimilarity similarity) |
Modifier and Type | Method and Description |
---|---|
void |
buildFeatureAndTransposeMatrices(gnu.trove.set.TIntSet validIds)
Rebuild the feature and transpose matrices.
|
double[][] |
cosimilarity(int[] pageIds)
Construct symmetric comsimilarity matrix of Wikipedia ids in a given language.
|
double[][] |
cosimilarity(int[] rowIds,
int[] colIds)
Computes the cosimilarity matrix between pages.
|
protected double[][] |
cosimilarity(List<gnu.trove.map.TIntFloatMap> rowVectors,
List<gnu.trove.map.TIntFloatMap> colVectors)
Computes the cosimilarity between a set of vectors.
|
double[][] |
cosimilarity(String[] phrases)
Construct symmetric cosimilarity matrix of phrases by mapping through local pages.
|
double[][] |
cosimilarity(String[] rowPhrases,
String[] colPhrases)
Calculates the cosimilarity matrix between phrases.
|
BaseSRMetric.SRConfig |
getConfig() |
protected File |
getFeatureMatrixPath() |
VectorGenerator |
getGenerator() |
gnu.trove.map.TIntFloatMap |
getPageVector(int pageId)
Returns the vector associated with a page, or null.
|
VectorSimilarity |
getSimilarity() |
protected File |
getTransposeMatrixPath() |
protected boolean |
hasFeatureMatrix() |
protected boolean |
hasTransposeMatrix() |
SRResultList |
mostSimilar(int pageId,
int maxResults,
gnu.trove.set.TIntSet validIds)
Find the most similar local pages to a local page.
|
SRResultList |
mostSimilar(String phrase,
int maxResults,
gnu.trove.set.TIntSet validIds)
Find the most similar local pages to a phrase.
|
void |
read()
Reads the metric from the current data directory.
|
void |
setFeatureFilter(FeatureFilter filter) |
SRResult |
similarity(int pageId1,
int pageId2,
boolean explanations)
Determine the similarity between two local pages.
|
SRResult |
similarity(String phrase1,
String phrase2,
boolean explanations)
Determine the similarity between two strings in a given language by mapping through local pages.
|
void |
trainMostSimilar(Dataset dataset,
int numResults,
gnu.trove.set.TIntSet validIds)
Train the mostSimilar() function
The KnownSims may already be associated with Wikipedia ids (check wpId1 and wpId2).
|
void |
trainSimilarity(Dataset dataset)
Train the similarity() function.
|
clearMostSimilarCache, configureBase, ensureMostSimilarTrained, ensureSimilarityTrained, getCachedMostSimilar, getDataDir, getDisambiguator, getLanguage, getLocalPageDao, getMostSimilarCache, getMostSimilarMatrixPath, getMostSimilarNormalizer, getName, getSimilarityNormalizer, mostSimilar, mostSimilar, mostSimilarIsTrained, normalize, normalize, normalize, setBuildMostSimilarCache, setDataDir, setMostSimilarCacheRowIds, setMostSimilarNormalizer, setReadNormalizers, setSimilarityNormalizer, similarityIsTrained, write, writeMostSimilarCache, writeMostSimilarCache
protected final VectorGenerator generator
protected final VectorSimilarity similarity
protected final BaseSRMetric.SRConfig config
public VectorBasedSRMetric(String name, Language language, LocalPageDao dao, Disambiguator disambig, VectorGenerator generator, VectorSimilarity similarity)
public SRResult similarity(String phrase1, String phrase2, boolean explanations) throws DaoException
SRMetric
similarity
in interface SRMetric
similarity
in class BaseSRMetric
phrase1
- The first phrase.phrase2
- The second phrase.explanations
- Whether explanations should be created.DaoException
public SRResult similarity(int pageId1, int pageId2, boolean explanations) throws DaoException
SRMetric
similarity
in interface SRMetric
similarity
in class BaseSRMetric
pageId1
- Id of the first page.pageId2
- Id of the second page.explanations
- Whether explanations should be created.DaoException
public SRResultList mostSimilar(String phrase, int maxResults, gnu.trove.set.TIntSet validIds) throws DaoException
SRMetric
mostSimilar
in interface SRMetric
mostSimilar
in class BaseSRMetric
phrase
- The phrase whose similarity we are examining.maxResults
- The maximum number of results to return.validIds
- The local page ids to be considered. Null means all ids in the languageDaoException
public SRResultList mostSimilar(int pageId, int maxResults, gnu.trove.set.TIntSet validIds) throws DaoException
SRMetric
mostSimilar
in interface SRMetric
mostSimilar
in class BaseSRMetric
pageId
- The id of the local page whose similarity we are examining.maxResults
- The maximum number of results to return.validIds
- The local page ids to be considered. Null means all ids in the language.DaoException
public void trainSimilarity(Dataset dataset) throws DaoException
trainSimilarity
in interface SRMetric
trainSimilarity
in class BaseSRMetric
dataset
- A gold standard datasetDaoException
public void trainMostSimilar(Dataset dataset, int numResults, gnu.trove.set.TIntSet validIds)
SRMetric
trainMostSimilar
in interface SRMetric
trainMostSimilar
in class BaseSRMetric
dataset
- A gold standard dataset.numResults
- The maximum number of similar articles computed per phrase.validIds
- The Wikipedia ids that should be considered in result sets. Null means all ids.SRMetric.trainMostSimilar(org.wikibrain.sr.dataset.Dataset, int, gnu.trove.set.TIntSet)
public double[][] cosimilarity(int[] pageIds) throws DaoException
SRMetric
cosimilarity
in interface SRMetric
cosimilarity
in class BaseSRMetric
DaoException
public double[][] cosimilarity(String[] phrases) throws DaoException
SRMetric
cosimilarity
in interface SRMetric
cosimilarity
in class BaseSRMetric
DaoException
public double[][] cosimilarity(String[] rowPhrases, String[] colPhrases) throws DaoException
cosimilarity
in interface SRMetric
cosimilarity
in class BaseSRMetric
rowPhrases
- colPhrases
- DaoException
public double[][] cosimilarity(int[] rowIds, int[] colIds) throws DaoException
cosimilarity
in interface SRMetric
cosimilarity
in class BaseSRMetric
rowIds
- colIds
- DaoException
protected double[][] cosimilarity(List<gnu.trove.map.TIntFloatMap> rowVectors, List<gnu.trove.map.TIntFloatMap> colVectors)
rowVectors
- colVectors
- public void buildFeatureAndTransposeMatrices(gnu.trove.set.TIntSet validIds) throws IOException
validIds
- IOException
protected File getFeatureMatrixPath()
protected File getTransposeMatrixPath()
public void read() throws IOException
SRMetric
read
in interface SRMetric
read
in class BaseSRMetric
IOException
public gnu.trove.map.TIntFloatMap getPageVector(int pageId) throws IOException
pageId
- IOException
protected boolean hasFeatureMatrix()
protected boolean hasTransposeMatrix()
public VectorGenerator getGenerator()
public VectorSimilarity getSimilarity()
public void setFeatureFilter(FeatureFilter filter)
public BaseSRMetric.SRConfig getConfig()
getConfig
in class BaseSRMetric
Copyright © 2014. All rights reserved.