Class MatchStarTables

java.lang.Object
uk.ac.starlink.table.join.MatchStarTables

public class MatchStarTables extends Object
Provides methods for producing tables which represent the result of row matching.

This class originally contained only static methods. Currently some methods are static and some are instance methods; those which use a ProgressIndicator or SplitProcessor are instance methods which use the values set up at construction time.

The methods in this class operate on Collection<RowLink>s rather than on LinkSets, to emphasise that they do not modify the contents of the collections. Such collections will typically be sorted into their natural sequence, see orderLinks(uk.ac.starlink.table.join.LinkSet).

Author:
Mark Taylor (Starlink)
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final uk.ac.starlink.table.ValueInfo
    Defines the characteristics of a table column which represents the ID of a group of matched row objects.
    static final uk.ac.starlink.table.ValueInfo
    Defines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID).
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructs a MatchStarTables with default characteristics.
    MatchStarTables(ProgressIndicator indicator, uk.ac.starlink.util.SplitProcessor<?> splitProcessor)
    Constructs a MatchStarTables with configuration.
  • Method Summary

    Modifier and Type
    Method
    Description
    createInstance(ProgressIndicator indicator, uk.ac.starlink.table.RowRunner rowRunner)
    Creates a MatchStarTables instance based on given optional progress indicator and row runner.
    Returns a mapping from RowLinks to LinkGroups which describes connected groups of links in the input collection.
    static uk.ac.starlink.table.StarTable
    makeInternalMatchTable(int iTable, Collection<RowLink> rowLinks, long rowCount)
    Analyses a set of RowLinks to mark as linked rows of a given table.
    uk.ac.starlink.table.StarTable
    makeJoinTable(uk.ac.starlink.table.StarTable[] tables, Collection<RowLink> rowLinks, boolean addGroups, uk.ac.starlink.table.JoinFixAction[] fixActs, uk.ac.starlink.table.ValueInfo matchScoreInfo)
    Constructs a table made out of a set of constituent tables joined together according to a set of RowLinks describing row matches.
    static uk.ac.starlink.table.StarTable
    makeSequentialJoinTable(uk.ac.starlink.table.StarTable[] tables, Collection<RowLink> rowLinks, uk.ac.starlink.table.JoinFixAction[] fixActs, uk.ac.starlink.table.ValueInfo matchScoreInfo)
    Constructs a non-random table made out of a set of possibly non-random constituent tables joined together according to a RowLink collection.
    Best-efforts Conversion of a LinkSet, which is what RowMatcher outputs, to a Collection of RowLinks, which is what's used by this class.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • GRP_ID_INFO

      public static final uk.ac.starlink.table.ValueInfo GRP_ID_INFO
      Defines the characteristics of a table column which represents the ID of a group of matched row objects.
    • GRP_SIZE_INFO

      public static final uk.ac.starlink.table.ValueInfo GRP_SIZE_INFO
      Defines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID).
  • Constructor Details

    • MatchStarTables

      public MatchStarTables()
      Constructs a MatchStarTables with default characteristics.
    • MatchStarTables

      public MatchStarTables(ProgressIndicator indicator, uk.ac.starlink.util.SplitProcessor<?> splitProcessor)
      Constructs a MatchStarTables with configuration.

      The splitProcessor argument allows to configure how potentially parallel processing is done.

      Parameters:
      indicator - progress indicator, or null for no logging
      splitProcessor - parallel processing implementation, or null for default behaviour
  • Method Details

    • makeJoinTable

      public uk.ac.starlink.table.StarTable makeJoinTable(uk.ac.starlink.table.StarTable[] tables, Collection<RowLink> rowLinks, boolean addGroups, uk.ac.starlink.table.JoinFixAction[] fixActs, uk.ac.starlink.table.ValueInfo matchScoreInfo) throws InterruptedException
      Constructs a table made out of a set of constituent tables joined together according to a set of RowLinks describing row matches. The columns of the resulting table are made by appending the columns of the constituent tables side by side. Each row in the resulting table corresponds to one RowLink entry in a set rowLinks; if that RowLink contains a row from one of the tables being joined here, the columns corresponding to that table are filled in. If it contains multiple rows from that table, an arbitrary one of them is filled in.

      The tables array determines which tables columns appear in the output table. It must have (at least) as many elements as the highest table index in the RowLink set. Table data will be picked from the n'th table in this array for RowRef elements with a tableIndex of n. If the nth element is null, the corresponding columns will not appear in the output table.

      The matchScoreInfo parameter is optional. If it is non-null, then an additional column, described by matchScoreInfo, will be added to the table containing the score values from the RowLinks in links. The content class of matchScoreInfo should be Number or one of its subclasses.

      Parameters:
      tables - array of constituent tables
      rowLinks - set of RowLink objects which define which rows in one table are associated with which rows in the others
      addGroups - flag which indicates whether the output table should, if appropriate, include GRP_ID_INFO and GRP_SIZE_INFO columns
      fixActs - actions to take for deduplicating column names (array of the same length as tables)
      matchScoreInfo - may supply information about the meaning of the link scores
      Throws:
      InterruptedException
    • makeSequentialJoinTable

      public static uk.ac.starlink.table.StarTable makeSequentialJoinTable(uk.ac.starlink.table.StarTable[] tables, Collection<RowLink> rowLinks, uk.ac.starlink.table.JoinFixAction[] fixActs, uk.ac.starlink.table.ValueInfo matchScoreInfo)
      Constructs a non-random table made out of a set of possibly non-random constituent tables joined together according to a RowLink collection. Any input tables which do not have random access must have row ordering consistent with (that is, monotonically increasing for) the ordering of the links. In practice, this is only likely to be the case if all the input tables are random access except for (at most) one, and the links are ordered with reference to that one. If this requirement is not met, sequential access to the resulting table is likely to fail at some point.
      Parameters:
      tables - array of constituent tables
      rowLinks - link set defining the match
      fixActs - actions to take for deduplicating column names (array of the same size as tables)
      matchScoreInfo - may suply information about the meaning of the match scores, if present
    • makeInternalMatchTable

      public static uk.ac.starlink.table.StarTable makeInternalMatchTable(int iTable, Collection<RowLink> rowLinks, long rowCount)
      Analyses a set of RowLinks to mark as linked rows of a given table. The result of this method is a two-column table whose rows correspond one-to-one with the rows of the table referenced in the link set. The output columns are defined by the constants GRP_ID_INFO and GRP_SIZE_INFO. Rows of the table linked together by rowLinks are assigned the same integer value in the new GRP_ID_INFO column, and the GRP_SIZE_INFO column indicates how many rows are linked together in this way. Each group corresponds to a single RowLink; if a row is part of more than one RowLink then only one of them will be recorded in the new columns. Any rows linked in rowLinks which do not refer to table have null entries in these columns.
      Parameters:
      iTable - the index of the table in which internal matches are to be sought
      rowLinks - a collection of RowLink objects linking groups of rows together
      rowCount - number of rows in the returned table (must be large enough to accommodate the indices in rowLinks)
      Returns:
      a new two-column table with a one-to-one row correspondance with the table describing internal row matches
    • findGroups

      public Map<RowLink,LinkGroup> findGroups(Collection<RowLink> links) throws InterruptedException
      Returns a mapping from RowLinks to LinkGroups which describes connected groups of links in the input collection. A related group is one in which the RowRefs of its constituent RowLinks form a connected graph in which RowRefs are the nodes and RowLinks are the edges. A LinkGroup with a link count of more than one therefore represents an ambiguous match, that is one in which one or more of its RowRefs is contained in more than one RowLink in the original RowLink collection.

      The returned map contains entries only for non-trivial LinkGroups, that is ones which contain more than one link.

      Parameters:
      links - link set representing a set of matches
      Returns:
      RowLink -> LinkGroup mapping describing connected groups in links
      Throws:
      InterruptedException
    • orderLinks

      public static Collection<RowLink> orderLinks(LinkSet linkSet)
      Best-efforts Conversion of a LinkSet, which is what RowMatcher outputs, to a Collection of RowLinks, which is what's used by this class. This essentially calls LinkSet.toSorted(), but in case that fails for lack of memory (not that likely, but could happen) it will write a message through the logging system and return a value giving an unordered result instead.
      Parameters:
      linkSet - unordered LinkSet
      Returns:
      input links as a collection, but if possible in natural order
    • createInstance

      public static MatchStarTables createInstance(ProgressIndicator indicator, uk.ac.starlink.table.RowRunner rowRunner)
      Creates a MatchStarTables instance based on given optional progress indicator and row runner.
      Parameters:
      indicator - progress indicator, or null for no logging
      rowRunner - parallel processing implementation, or null for default behaviour