Class MatchStarTables
This class originally contained only static methods.
Currently some methods are static and some are instance methods;
those which use a ProgressIndicator
or
SplitProcessor
are instance methods
which use the values set up at construction time.
The methods in this class operate on
Collection<RowLink>
s
rather than on LinkSet
s, to emphasise that they do not
modify the contents of the collections.
Such collections will typically be sorted into their natural sequence,
see orderLinks(uk.ac.starlink.table.join.LinkSet)
.
- Author:
- Mark Taylor (Starlink)
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final uk.ac.starlink.table.ValueInfo
Defines the characteristics of a table column which represents the ID of a group of matched row objects.static final uk.ac.starlink.table.ValueInfo
Defines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID). -
Constructor Summary
ConstructorsConstructorDescriptionConstructs a MatchStarTables with default characteristics.MatchStarTables
(ProgressIndicator indicator, uk.ac.starlink.util.SplitProcessor<?> splitProcessor) Constructs a MatchStarTables with configuration. -
Method Summary
Modifier and TypeMethodDescriptionstatic MatchStarTables
createInstance
(ProgressIndicator indicator, uk.ac.starlink.table.RowRunner rowRunner) Creates a MatchStarTables instance based on given optional progress indicator and row runner.findGroups
(Collection<RowLink> links) static uk.ac.starlink.table.StarTable
makeInternalMatchTable
(int iTable, Collection<RowLink> rowLinks, long rowCount) Analyses a set of RowLinks to mark as linked rows of a given table.uk.ac.starlink.table.StarTable
makeJoinTable
(uk.ac.starlink.table.StarTable[] tables, Collection<RowLink> rowLinks, boolean addGroups, uk.ac.starlink.table.JoinFixAction[] fixActs, uk.ac.starlink.table.ValueInfo matchScoreInfo) Constructs a table made out of a set of constituent tables joined together according to a set of RowLinks describing row matches.static uk.ac.starlink.table.StarTable
makeSequentialJoinTable
(uk.ac.starlink.table.StarTable[] tables, Collection<RowLink> rowLinks, uk.ac.starlink.table.JoinFixAction[] fixActs, uk.ac.starlink.table.ValueInfo matchScoreInfo) Constructs a non-random table made out of a set of possibly non-random constituent tables joined together according to a RowLink collection.static Collection
<RowLink> orderLinks
(LinkSet linkSet) Best-efforts Conversion of a LinkSet, which is what RowMatcher outputs, to a Collection of RowLinks, which is what's used by this class.
-
Field Details
-
GRP_ID_INFO
public static final uk.ac.starlink.table.ValueInfo GRP_ID_INFODefines the characteristics of a table column which represents the ID of a group of matched row objects. -
GRP_SIZE_INFO
public static final uk.ac.starlink.table.ValueInfo GRP_SIZE_INFODefines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID).
-
-
Constructor Details
-
MatchStarTables
public MatchStarTables()Constructs a MatchStarTables with default characteristics. -
MatchStarTables
public MatchStarTables(ProgressIndicator indicator, uk.ac.starlink.util.SplitProcessor<?> splitProcessor) Constructs a MatchStarTables with configuration.The splitProcessor argument allows to configure how potentially parallel processing is done.
- Parameters:
indicator
- progress indicator, or null for no loggingsplitProcessor
- parallel processing implementation, or null for default behaviour
-
-
Method Details
-
makeJoinTable
public uk.ac.starlink.table.StarTable makeJoinTable(uk.ac.starlink.table.StarTable[] tables, Collection<RowLink> rowLinks, boolean addGroups, uk.ac.starlink.table.JoinFixAction[] fixActs, uk.ac.starlink.table.ValueInfo matchScoreInfo) throws InterruptedException Constructs a table made out of a set of constituent tables joined together according to a set of RowLinks describing row matches. The columns of the resulting table are made by appending the columns of the constituent tables side by side. Each row in the resulting table corresponds to oneRowLink
entry in a setrowLinks
; if that RowLink contains a row from one of the tables being joined here, the columns corresponding to that table are filled in. If it contains multiple rows from that table, an arbitrary one of them is filled in.The
tables
array determines which tables columns appear in the output table. It must have (at least) as many elements as the highest table index in the RowLink set. Table data will be picked from the n'th table in this array for RowRef elements with a tableIndex of n. If the nth element is null, the corresponding columns will not appear in the output table.The
matchScoreInfo
parameter is optional. If it is non-null, then an additional column, described bymatchScoreInfo
, will be added to the table containing thescore
values from theRowLink
s inlinks
. The content class ofmatchScoreInfo
should beNumber
or one of its subclasses.- Parameters:
tables
- array of constituent tablesrowLinks
- set of RowLink objects which define which rows in one table are associated with which rows in the othersaddGroups
- flag which indicates whether the output table should, if appropriate, includeGRP_ID_INFO
andGRP_SIZE_INFO
columnsfixActs
- actions to take for deduplicating column names (array of the same length astables
)matchScoreInfo
- may supply information about the meaning of the link scores- Throws:
InterruptedException
-
makeSequentialJoinTable
public static uk.ac.starlink.table.StarTable makeSequentialJoinTable(uk.ac.starlink.table.StarTable[] tables, Collection<RowLink> rowLinks, uk.ac.starlink.table.JoinFixAction[] fixActs, uk.ac.starlink.table.ValueInfo matchScoreInfo) Constructs a non-random table made out of a set of possibly non-random constituent tables joined together according to a RowLink collection. Any input tables which do not have random access must have row ordering consistent with (that is, monotonically increasing for) the ordering of the links. In practice, this is only likely to be the case if all the input tables are random access except for (at most) one, and the links are ordered with reference to that one. If this requirement is not met, sequential access to the resulting table is likely to fail at some point.- Parameters:
tables
- array of constituent tablesrowLinks
- link set defining the matchfixActs
- actions to take for deduplicating column names (array of the same size astables
)matchScoreInfo
- may suply information about the meaning of the match scores, if present
-
makeInternalMatchTable
public static uk.ac.starlink.table.StarTable makeInternalMatchTable(int iTable, Collection<RowLink> rowLinks, long rowCount) Analyses a set of RowLinks to mark as linked rows of a given table. The result of this method is a two-column table whose rows correspond one-to-one with the rows of the table referenced in the link set. The output columns are defined by the constantsGRP_ID_INFO
andGRP_SIZE_INFO
. Rows of the table linked together byrowLinks
are assigned the same integer value in the new GRP_ID_INFO column, and the GRP_SIZE_INFO column indicates how many rows are linked together in this way. Each group corresponds to a single RowLink; if a row is part of more than one RowLink then only one of them will be recorded in the new columns. Any rows linked inrowLinks
which do not refer totable
have null entries in these columns.- Parameters:
iTable
- the index of the table in which internal matches are to be soughtrowLinks
- a collection ofRowLink
objects linking groups of rows togetherrowCount
- number of rows in the returned table (must be large enough to accommodate the indices inrowLinks
)- Returns:
- a new two-column table with a one-to-one row correspondance with the table describing internal row matches
-
findGroups
Returns a mapping fromRowLink
s toLinkGroup
s which describes connected groups of links in the input collection. A related group is one in which the RowRefs of its constituent RowLinks form a connected graph in which RowRefs are the nodes and RowLinks are the edges. A LinkGroup with a link count of more than one therefore represents an ambiguous match, that is one in which one or more of its RowRefs is contained in more than one RowLink in the original RowLink collection.The returned map contains entries only for non-trivial LinkGroups, that is ones which contain more than one link.
- Parameters:
links
- link set representing a set of matches- Returns:
- RowLink -> LinkGroup mapping describing connected groups
in
links
- Throws:
InterruptedException
-
orderLinks
Best-efforts Conversion of a LinkSet, which is what RowMatcher outputs, to a Collection of RowLinks, which is what's used by this class. This essentially callsLinkSet.toSorted()
, but in case that fails for lack of memory (not that likely, but could happen) it will write a message through the logging system and return a value giving an unordered result instead.- Parameters:
linkSet
- unordered LinkSet- Returns:
- input links as a collection, but if possible in natural order
-
createInstance
public static MatchStarTables createInstance(ProgressIndicator indicator, uk.ac.starlink.table.RowRunner rowRunner) Creates a MatchStarTables instance based on given optional progress indicator and row runner.- Parameters:
indicator
- progress indicator, or null for no loggingrowRunner
- parallel processing implementation, or null for default behaviour
-