Package com.oracle.coherence.ai.search
Class SimilaritySearch<K,V,T>
java.lang.Object
com.oracle.coherence.ai.search.SimilaritySearch<K,V,T>
- Type Parameters:
K
- the type of the cache keyV
- the type of the cache valueT
- the type of the vector
- All Implemented Interfaces:
ExternalizableLite
,PortableObject
,InvocableMap.EntryAggregator<K,
,V, List<QueryResult<K, V>>> InvocableMap.StreamingAggregator<K,
,V, List<BinaryQueryResult>, List<QueryResult<K, V>>> Serializable
public class SimilaritySearch<K,V,T>
extends Object
implements InvocableMap.StreamingAggregator<K,V,List<BinaryQueryResult>,List<QueryResult<K,V>>>, ExternalizableLite, PortableObject
An
InvocableMap.StreamingAggregator
to execute a similarity query.- Since:
- 24.09
- Author:
- Jonathan Knight 2024.07.19, Aleks Seovic 2024.07.25
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected DistanceAlgorithm
<T> TheDistanceAlgorithm
to execute.protected ValueExtractor
<? super V, ? extends Vector<T>> TheValueExtractor
to extract the vector from the entry value.protected boolean
A flag indicating whether to ignore anyVectorIndex
that may be present and just use the algorithm directly (i.e. use a brute force calculation)protected Filter
<?> An optionalFilter
to use to filter the search results.protected int
The maximum number of results to return.protected final SortedBag
<BinaryQueryResult> The interim results for the aggregator.TheVector
to extract the vector from the entry value.Fields inherited from interface com.tangosol.util.InvocableMap.StreamingAggregator
ALLOW_INCONSISTENCIES, BY_MEMBER, BY_PARTITION, PARALLEL, PRESENT_ONLY, RETAINS_ENTRIES, SERIAL
-
Constructor Summary
ConstructorsConstructorDescriptionDefault constructor for serialization.SimilaritySearch
(ValueExtractor<? super V, ? extends Vector<T>> extractor, Vector<T> vector, int maxResults) Create aSimilaritySearch
aggregator that will use cosine distance to calculate and return up tomaxResults
results that are closest to the specifiedvector
. -
Method Summary
Modifier and TypeMethodDescriptionboolean
accumulate
(InvocableMap.Entry<? extends K, ? extends V> entry) Accumulate one entry into the result.boolean
accumulate
(Streamer<? extends InvocableMap.Entry<? extends K, ? extends V>> streamer) Accumulate multiple entries into the result.algorithm
(DistanceAlgorithm<T> algorithm) Set thealgorithm
to use for distance calculation between vectors.Force brute force search, ignoring any available indexes.protected boolean
bruteForce
(Streamer<? extends InvocableMap.Entry<? extends K, ? extends V>> streamer, InvocableMap.Entry<? extends K, ? extends V> entry) int
A bit mask representing the set of characteristics of this aggregator.boolean
combine
(List<BinaryQueryResult> partialResult) Merge another partial result into the result.Set thefilter
to use to limit the set of entries to search.List
<QueryResult<K, V>> Return the final result of the aggregation.List
<QueryResult<K, V>> finalizeResult
(Converter<Binary, ?> converterBin) Return the final result of the aggregation.ValueExtractor
<? super V, ? extends Vector<T>> Filter
<?> int
Return the partial result of the aggregation.boolean
void
Restore the contents of a user type instance by reading its state using the specified PofReader object.void
Restore the contents of this object by loading the object's state from the passed DataInput object.protected boolean
searchPartition
(BinaryEntry binaryEntry, Vector<T> vector) If aVectorIndex
exists for the specified partition, then use it to perform the KNN search.supply()
Create a new instance of this aggregator.void
writeExternal
(PofWriter out) Save the contents of a POF user type instance by writing its state using the specified PofWriter object.void
writeExternal
(DataOutput out) Save the contents of this object by storing the object's state into the passed DataOutput object.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface com.tangosol.util.InvocableMap.StreamingAggregator
aggregate, isAllowInconsistencies, isByMember, isByPartition, isParallel, isPresentOnly, isRetainsEntries, isSerial
-
Field Details
-
m_extractor
TheValueExtractor
to extract the vector from the entry value. -
m_vector
TheVector
to extract the vector from the entry value. -
m_algorithm
TheDistanceAlgorithm
to execute. -
m_nMaxResults
protected int m_nMaxResultsThe maximum number of results to return. -
m_fBruteForce
protected boolean m_fBruteForceA flag indicating whether to ignore anyVectorIndex
that may be present and just use the algorithm directly (i.e. use a brute force calculation) -
m_filter
An optionalFilter
to use to filter the search results. -
m_results
The interim results for the aggregator.
-
-
Constructor Details
-
SimilaritySearch
public SimilaritySearch()Default constructor for serialization. -
SimilaritySearch
public SimilaritySearch(ValueExtractor<? super V, ? extends Vector<T>> extractor, Vector<T> vector, int maxResults) Create aSimilaritySearch
aggregator that will use cosine distance to calculate and return up tomaxResults
results that are closest to the specifiedvector
. To use a different distance algorithm, provide it viaalgorithm(DistanceAlgorithm)
method. You can also specify a filter criteria usingfilter(Filter)
method, and force the aggregator to perform a brute force calculation by callingbruteForce()
method. The latter is useful for testing, as it will ignore any available indexes, which allows you to compare the results of an index-based query against the exact matches returned by the brute force search to verify that the recall is where you need it to be, and tune index parameters if it isn't.- Parameters:
extractor
- theValueExtractor
to extract the vector from the cache valuevector
- the vector to calculate similarity withmaxResults
- the maximum number of results to return
-
-
Method Details
-
algorithm
Set thealgorithm
to use for distance calculation between vectors.- Parameters:
algorithm
- the distance algorithm to use- Returns:
- this instance
-
bruteForce
Force brute force search, ignoring any available indexes.- Returns:
- this instance
-
filter
Set thefilter
to use to limit the set of entries to search.- Parameters:
filter
- the filter to use- Returns:
- this instance
-
getExtractor
-
getVector
-
getAlgorithm
-
getMaxResults
public int getMaxResults() -
isBruteForce
public boolean isBruteForce() -
getFilter
-
characteristics
public int characteristics()Description copied from interface:InvocableMap.StreamingAggregator
A bit mask representing the set of characteristics of this aggregator.By default, characteristics are a combination of
InvocableMap.StreamingAggregator.PARALLEL
andInvocableMap.StreamingAggregator.RETAINS_ENTRIES
, which is sub-optimal and should be overridden by the aggregator implementation if the aggregator does not need to retain entries (which is often the case).- Specified by:
characteristics
in interfaceInvocableMap.StreamingAggregator<K,
V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Returns:
- a bit mask representing the set of characteristics of this aggregator
- See Also:
-
supply
public InvocableMap.StreamingAggregator<K,V, supply()List<BinaryQueryResult>, List<QueryResult<K, V>>> Description copied from interface:InvocableMap.StreamingAggregator
Create a new instance of this aggregator.- Specified by:
supply
in interfaceInvocableMap.StreamingAggregator<K,
V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Returns:
- a StreamAggregator
-
accumulate
Description copied from interface:InvocableMap.StreamingAggregator
Accumulate multiple entries into the result.Important note: The default implementation of this method provides necessary logic for aggregation short-circuiting and should rarely (if ever) be overridden by the custom aggregator implementation.
- Specified by:
accumulate
in interfaceInvocableMap.StreamingAggregator<K,
V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Parameters:
streamer
- aStreamer
that can be used to iterate over entries to add- Returns:
true
to continue the aggregation, andfalse
to signal to the caller that the result is ready and the aggregation can be short-circuited
-
accumulate
Description copied from interface:InvocableMap.StreamingAggregator
Accumulate one entry into the result.- Specified by:
accumulate
in interfaceInvocableMap.StreamingAggregator<K,
V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Parameters:
entry
- the entry to accumulate into the aggregation result- Returns:
true
to continue the aggregation, andfalse
to signal to the caller that the result is ready and the aggregation can be short-circuited
-
combine
Description copied from interface:InvocableMap.StreamingAggregator
Merge another partial result into the result.- Specified by:
combine
in interfaceInvocableMap.StreamingAggregator<K,
V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Parameters:
partialResult
- the partial result to merge- Returns:
true
to continue the aggregation, andfalse
to signal to the caller that the result is ready and the aggregation can be short-circuited
-
getPartialResult
Description copied from interface:InvocableMap.StreamingAggregator
Return the partial result of the aggregation.- Specified by:
getPartialResult
in interfaceInvocableMap.StreamingAggregator<K,
V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Returns:
- the partial result of the aggregation
-
finalizeResult
Description copied from interface:InvocableMap.StreamingAggregator
Return the final result of the aggregation.- Specified by:
finalizeResult
in interfaceInvocableMap.StreamingAggregator<K,
V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Returns:
- the final result of the aggregation
-
finalizeResult
Description copied from interface:InvocableMap.StreamingAggregator
Return the final result of the aggregation. This method has a default implementation that simply callsInvocableMap.StreamingAggregator.finalizeResult()
in order to avoid compilation errors and preserve backwards compatibility of custom aggregator implementations, even though it would make more sense to do it the other way around if we were designing this API from scratch. The unfortunate consequence is that even if you override this method in order to use the provided converter, you still need to implementInvocableMap.StreamingAggregator.finalizeResult()
method, although you could simply implement it to throwjava.lang.UnsupportedOperationException
.- Specified by:
finalizeResult
in interfaceInvocableMap.StreamingAggregator<K,
V, List<BinaryQueryResult>, List<QueryResult<K, V>>> - Parameters:
converterBin
- converter that can be used to convert result from internal format- Returns:
- the final result of the aggregation
-
readExternal
Description copied from interface:PortableObject
Restore the contents of a user type instance by reading its state using the specified PofReader object.- Specified by:
readExternal
in interfacePortableObject
- Parameters:
in
- the PofReader from which to read the object's state- Throws:
IOException
- if an I/O error occurs
-
writeExternal
Description copied from interface:PortableObject
Save the contents of a POF user type instance by writing its state using the specified PofWriter object.- Specified by:
writeExternal
in interfacePortableObject
- Parameters:
out
- the PofWriter to which to write the object's state- Throws:
IOException
- if an I/O error occurs
-
readExternal
Description copied from interface:ExternalizableLite
Restore the contents of this object by loading the object's state from the passed DataInput object.- Specified by:
readExternal
in interfaceExternalizableLite
- Parameters:
in
- the DataInput stream to read data from in order to restore the state of this object- Throws:
IOException
- if an I/O exception occurs
-
writeExternal
Description copied from interface:ExternalizableLite
Save the contents of this object by storing the object's state into the passed DataOutput object.- Specified by:
writeExternal
in interfaceExternalizableLite
- Parameters:
out
- the DataOutput stream to write the state of this object to- Throws:
IOException
- if an I/O exception occurs
-
bruteForce
protected boolean bruteForce(Streamer<? extends InvocableMap.Entry<? extends K, ? extends V>> streamer, InvocableMap.Entry<? extends K, ? extends V> entry) -
searchPartition
If aVectorIndex
exists for the specified partition, then use it to perform the KNN search.- Parameters:
binaryEntry
- theBinaryEntry
to use to identify the partitionvector
- the target vector to find the nearest neighbours to- Returns:
true
if aVectorIndex
was present and used for the search
-