Package com.oracle.coherence.ai.search
Class SimilaritySearch<K,V,T>
java.lang.Object
com.oracle.coherence.ai.search.SimilaritySearch<K,V,T>
- Type Parameters:
K- the type of the cache keyV- the type of the cache valueT- the type of the vector
- All Implemented Interfaces:
ExternalizableLite,PortableObject,InvocableMap.EntryAggregator<K,,V, List<QueryResult<K, V>>> InvocableMap.StreamingAggregator<K,,V, List<BinaryQueryResult>, List<QueryResult<K, V>>> Serializable
public class SimilaritySearch<K,V,T>
extends Object
implements InvocableMap.StreamingAggregator<K,V,List<BinaryQueryResult>,List<QueryResult<K,V>>>, ExternalizableLite, PortableObject
An
InvocableMap.StreamingAggregator to execute a similarity query.- Since:
- 24.09
- Author:
- Jonathan Knight 2024.07.19, Aleks Seovic 2024.07.25
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected DistanceAlgorithm<T> TheDistanceAlgorithmto execute.protected ValueExtractor<? super V, ? extends Vector<T>> TheValueExtractorto extract the vector from the entry value.protected booleanA flag indicating whether to ignore anyVectorIndexthat may be present and just use the algorithm directly (i.e. use a brute force calculation)protected Filter<?> An optionalFilterto use to filter the search results.protected intThe maximum number of results to return.protected final SortedBag<BinaryQueryResult> The interim results for the aggregator.TheVectorto extract the vector from the entry value.Fields inherited from interface com.tangosol.util.InvocableMap.StreamingAggregator
ALLOW_INCONSISTENCIES, BY_MEMBER, BY_PARTITION, PARALLEL, PRESENT_ONLY, RETAINS_ENTRIES, SERIAL -
Constructor Summary
ConstructorsConstructorDescriptionDefault constructor for serialization.SimilaritySearch(ValueExtractor<? super V, ? extends Vector<T>> extractor, Vector<T> vector, int maxResults) Create aSimilaritySearchaggregator that will use cosine distance to calculate and return up tomaxResultsresults that are closest to the specifiedvector. -
Method Summary
Modifier and TypeMethodDescriptionbooleanaccumulate(InvocableMap.Entry<? extends K, ? extends V> entry) Accumulate one entry into the result.booleanaccumulate(Streamer<? extends InvocableMap.Entry<? extends K, ? extends V>> streamer) Accumulate multiple entries into the result.algorithm(DistanceAlgorithm<T> algorithm) Set thealgorithmto use for distance calculation between vectors.Force brute force search, ignoring any available indexes.protected booleanbruteForce(Streamer<? extends InvocableMap.Entry<? extends K, ? extends V>> streamer, InvocableMap.Entry<? extends K, ? extends V> entry) intA bit mask representing the set of characteristics of this aggregator.booleancombine(List<BinaryQueryResult> partialResult) Merge another partial result into the result.Set thefilterto use to limit the set of entries to search.List<QueryResult<K, V>> Return the final result of the aggregation.List<QueryResult<K, V>> finalizeResult(Converter<Binary, ?> converterBin) Return the final result of the aggregation.ValueExtractor<? super V, ? extends Vector<T>> Filter<?> intReturn the partial result of the aggregation.booleanvoidRestore the contents of a user type instance by reading its state using the specified PofReader object.voidRestore the contents of this object by loading the object's state from the passed DataInput object.protected booleansearchPartition(BinaryEntry binaryEntry, Vector<T> vector) If aVectorIndexexists for the specified partition, then use it to perform the KNN search.supply()Create a new instance of this aggregator.voidwriteExternal(PofWriter out) Save the contents of a POF user type instance by writing its state using the specified PofWriter object.voidwriteExternal(DataOutput out) Save the contents of this object by storing the object's state into the passed DataOutput object.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface com.tangosol.util.InvocableMap.StreamingAggregator
aggregate, isAllowInconsistencies, isByMember, isByPartition, isParallel, isPresentOnly, isRetainsEntries, isSerial
-
Field Details
-
m_extractor
TheValueExtractorto extract the vector from the entry value. -
m_vector
TheVectorto extract the vector from the entry value. -
m_algorithm
TheDistanceAlgorithmto execute. -
m_nMaxResults
protected int m_nMaxResultsThe maximum number of results to return. -
m_fBruteForce
protected boolean m_fBruteForceA flag indicating whether to ignore anyVectorIndexthat may be present and just use the algorithm directly (i.e. use a brute force calculation) -
m_filter
An optionalFilterto use to filter the search results. -
m_results
The interim results for the aggregator.
-
-
Constructor Details
-
SimilaritySearch
public SimilaritySearch()Default constructor for serialization. -
SimilaritySearch
public SimilaritySearch(ValueExtractor<? super V, ? extends Vector<T>> extractor, Vector<T> vector, int maxResults) Create aSimilaritySearchaggregator that will use cosine distance to calculate and return up tomaxResultsresults that are closest to the specifiedvector. To use a different distance algorithm, provide it viaalgorithm(DistanceAlgorithm)method. You can also specify a filter criteria usingfilter(Filter)method, and force the aggregator to perform a brute force calculation by callingbruteForce()method. The latter is useful for testing, as it will ignore any available indexes, which allows you to compare the results of an index-based query against the exact matches returned by the brute force search to verify that the recall is where you need it to be, and tune index parameters if it isn't.- Parameters:
extractor- theValueExtractorto extract the vector from the cache valuevector- the vector to calculate similarity withmaxResults- the maximum number of results to return
-
-
Method Details
-
algorithm
Set thealgorithmto use for distance calculation between vectors.- Parameters:
algorithm- the distance algorithm to use- Returns:
- this instance
-
bruteForce
Force brute force search, ignoring any available indexes.- Returns:
- this instance
-
filter
Set thefilterto use to limit the set of entries to search.- Parameters:
filter- the filter to use- Returns:
- this instance
-
getExtractor
-
getVector
-
getAlgorithm
-
getMaxResults
public int getMaxResults() -
isBruteForce
public boolean isBruteForce() -
getFilter
-
characteristics
public int characteristics()Description copied from interface:InvocableMap.StreamingAggregatorA bit mask representing the set of characteristics of this aggregator.By default, characteristics are a combination of
InvocableMap.StreamingAggregator.PARALLELandInvocableMap.StreamingAggregator.RETAINS_ENTRIES, which is sub-optimal and should be overridden by the aggregator implementation if the aggregator does not need to retain entries (which is often the case).- Specified by:
characteristicsin interfaceInvocableMap.StreamingAggregator<K,V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Returns:
- a bit mask representing the set of characteristics of this aggregator
- See Also:
-
supply
public InvocableMap.StreamingAggregator<K,V, supply()List<BinaryQueryResult>, List<QueryResult<K, V>>> Description copied from interface:InvocableMap.StreamingAggregatorCreate a new instance of this aggregator.- Specified by:
supplyin interfaceInvocableMap.StreamingAggregator<K,V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Returns:
- a StreamAggregator
-
accumulate
Description copied from interface:InvocableMap.StreamingAggregatorAccumulate multiple entries into the result.Important note: The default implementation of this method provides necessary logic for aggregation short-circuiting and should rarely (if ever) be overridden by the custom aggregator implementation.
- Specified by:
accumulatein interfaceInvocableMap.StreamingAggregator<K,V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Parameters:
streamer- aStreamerthat can be used to iterate over entries to add- Returns:
trueto continue the aggregation, andfalseto signal to the caller that the result is ready and the aggregation can be short-circuited
-
accumulate
Description copied from interface:InvocableMap.StreamingAggregatorAccumulate one entry into the result.- Specified by:
accumulatein interfaceInvocableMap.StreamingAggregator<K,V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Parameters:
entry- the entry to accumulate into the aggregation result- Returns:
trueto continue the aggregation, andfalseto signal to the caller that the result is ready and the aggregation can be short-circuited
-
combine
Description copied from interface:InvocableMap.StreamingAggregatorMerge another partial result into the result.- Specified by:
combinein interfaceInvocableMap.StreamingAggregator<K,V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Parameters:
partialResult- the partial result to merge- Returns:
trueto continue the aggregation, andfalseto signal to the caller that the result is ready and the aggregation can be short-circuited
-
getPartialResult
Description copied from interface:InvocableMap.StreamingAggregatorReturn the partial result of the aggregation.- Specified by:
getPartialResultin interfaceInvocableMap.StreamingAggregator<K,V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Returns:
- the partial result of the aggregation
-
finalizeResult
Description copied from interface:InvocableMap.StreamingAggregatorReturn the final result of the aggregation.- Specified by:
finalizeResultin interfaceInvocableMap.StreamingAggregator<K,V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Returns:
- the final result of the aggregation
-
finalizeResult
Description copied from interface:InvocableMap.StreamingAggregatorReturn the final result of the aggregation. This method has a default implementation that simply callsInvocableMap.StreamingAggregator.finalizeResult()in order to avoid compilation errors and preserve backwards compatibility of custom aggregator implementations, even though it would make more sense to do it the other way around if we were designing this API from scratch. The unfortunate consequence is that even if you override this method in order to use the provided converter, you still need to implementInvocableMap.StreamingAggregator.finalizeResult()method, although you could simply implement it to throwjava.lang.UnsupportedOperationException.- Specified by:
finalizeResultin interfaceInvocableMap.StreamingAggregator<K,V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Parameters:
converterBin- converter that can be used to convert result from internal format- Returns:
- the final result of the aggregation
-
readExternal
Description copied from interface:PortableObjectRestore the contents of a user type instance by reading its state using the specified PofReader object.- Specified by:
readExternalin interfacePortableObject- Parameters:
in- the PofReader from which to read the object's state- Throws:
IOException- if an I/O error occurs
-
writeExternal
Description copied from interface:PortableObjectSave the contents of a POF user type instance by writing its state using the specified PofWriter object.- Specified by:
writeExternalin interfacePortableObject- Parameters:
out- the PofWriter to which to write the object's state- Throws:
IOException- if an I/O error occurs
-
readExternal
Description copied from interface:ExternalizableLiteRestore the contents of this object by loading the object's state from the passed DataInput object.- Specified by:
readExternalin interfaceExternalizableLite- Parameters:
in- the DataInput stream to read data from in order to restore the state of this object- Throws:
IOException- if an I/O exception occurs
-
writeExternal
Description copied from interface:ExternalizableLiteSave the contents of this object by storing the object's state into the passed DataOutput object.- Specified by:
writeExternalin interfaceExternalizableLite- Parameters:
out- the DataOutput stream to write the state of this object to- Throws:
IOException- if an I/O exception occurs
-
bruteForce
protected boolean bruteForce(Streamer<? extends InvocableMap.Entry<? extends K, ? extends V>> streamer, InvocableMap.Entry<? extends K, ? extends V> entry) -
searchPartition
If aVectorIndexexists for the specified partition, then use it to perform the KNN search.- Parameters:
binaryEntry- theBinaryEntryto use to identify the partitionvector- the target vector to find the nearest neighbours to- Returns:
trueif aVectorIndexwas present and used for the search
-