public class LongestCommonSubsequence extends Object implements SimilarityScore<Integer>
The Longest common subsequence algorithm returns the length of the longest subsequence that two strings have in
common. Two strings that are entirely different, return a value of 0, and two strings that return a value
of the commonly shared length implies that the strings are completely the same in value and position.
Note. Generally this algorithm is fairly inefficient, as for length m, n of the input
CharSequence
's left
and right
respectively, the runtime of the
algorithm is O(m*n).
This implementation is based on the Longest Commons Substring algorithm from https://en.wikipedia.org/wiki/Longest_common_subsequence_problem.
For further reading see:
Lothaire, M. Applied combinatorics on words. New York: Cambridge U Press, 2005. 12-13
Constructor and Description |
---|
LongestCommonSubsequence() |
Modifier and Type | Method and Description |
---|---|
Integer |
apply(CharSequence left,
CharSequence right)
Calculates longest common subsequence similarity score of two
CharSequence 's passed as
input. |
CharSequence |
logestCommonSubsequence(CharSequence left,
CharSequence right)
Deprecated.
Deprecated as of 1.2 due to a typo in the method name.
Use
longestCommonSubsequence(CharSequence, CharSequence) instead.
This method will be removed in 2.0. |
CharSequence |
longestCommonSubsequence(CharSequence left,
CharSequence right)
Computes the longest common subsequence between the two
CharSequence 's passed as
input. |
int[][] |
longestCommonSubstringLengthArray(CharSequence left,
CharSequence right)
Computes the lcsLengthArray for the sake of doing the actual lcs calculation.
|
public Integer apply(CharSequence left, CharSequence right)
CharSequence
's passed as
input.apply
in interface SimilarityScore<Integer>
left
- first character sequenceright
- second character sequenceIllegalArgumentException
- if either String input null
@Deprecated public CharSequence logestCommonSubsequence(CharSequence left, CharSequence right)
longestCommonSubsequence(CharSequence, CharSequence)
instead.
This method will be removed in 2.0.CharSequence
's passed as input.
Note, a substring and subsequence are not necessarily the same thing. Indeed, abcxyzqrs
and
xyzghfm
have both the same common substring and subsequence, namely xyz
. However,
axbyczqrs
and abcxyzqtv
have the longest common subsequence xyzq
because a
subsequence need not have adjacent characters.
For reference, we give the definition of a subsequence for the reader: a subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.
left
- first character sequenceright
- second character sequenceIllegalArgumentException
- if either String input null
public CharSequence longestCommonSubsequence(CharSequence left, CharSequence right)
CharSequence
's passed as
input.
Note, a substring and subsequence are not necessarily the same thing. Indeed, abcxyzqrs
and
xyzghfm
have both the same common substring and subsequence, namely xyz
. However,
axbyczqrs
and abcxyzqtv
have the longest common subsequence xyzq
because a
subsequence need not have adjacent characters.
For reference, we give the definition of a subsequence for the reader: a subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.
left
- first character sequenceright
- second character sequenceIllegalArgumentException
- if either String input null
public int[][] longestCommonSubstringLengthArray(CharSequence left, CharSequence right)
left
- first character sequenceright
- second character sequenceCopyright © 2014–2018 The Apache Software Foundation. All rights reserved.