1. Introduction

The tarray extension implements typed arrays and associated commands column and table. This page provides reference documentation for commands related to typed columns. See the main contents for guides and other reference documentation.

1.1. Installation and loading

Binary packages for some platforms are available from the Sourceforge download area. See the build instructions for other platforms.

To install the extension, extract the files from the distribution to any directory that is included in your Tcl installation’s auto_path variable.

Once installed, the extension can be loaded with the standard Tcl package require command.

% package require tarray
→ 1.0.0
% namespace import tarray::column

1.2. Columns

A typed array column contains elements of a single type, such as int or string, that is specified when it is created. The command tarray::column operates on typed columns including searching and sorting operations.

Related to columns, are tables which are ordered sequences of typed columns.

1.3. Types

All elements in a column must be of the type specified when the column is created. The following element types are available:

Table 1. Table Column types
Keyword Type

any

Any Tcl value

string

A string value

boolean

A boolean value

byte

Unsigned 8-bit integer

double

Floating point value

int

Signed 32-bit integer

uint

Unsigned 32-bit integer

wide

Signed 64-bit integer

The primary purpose of the type is to specify what values can be stored in that column. This impacts the compactness of the internal storage (really the primary purpose of the extension) as well certain operations (like sort or search) invoked on the column.

The types any and string are similar in that they can hold any Tcl value. Both are treated as string values for purposes of comparisons and operators. The difference is that the former stores the value using the Tcl internal representation while the latter stores it as a string. The advantage of the former is that internal structure, like a dictionary, is preserved. The advantage of the latter is significantly more compact representation, particularly for smaller strings.

Attempts to store values in a column that are not valid for that column will result in an error being generated.

1.4. Indices

An index into a typed column or table can be specified as either an integer or the keyword end. As in Tcl’s list commands, end specifies the index of the last element in the tarray or the index after it, depending on the command. Simple arithmetic adding of offsets to end is supported, for example end-2 or end+5.

Many commands also allow multiple indices to be specified. These may take one of two forms — a range which includes all indices between a lower and an upper bound (inclusive), and an index list which may be one of the following:

  • a Tcl list of integers

  • a column of any type other than boolean. The value of each element of the column is converted to an integer that is treated as an index.

  • a column of type boolean. Here the index of each bit in the boolean column that is set to 1 is treated as an index.

Note that keyword end can be used to specify a single index or as a range bound, but cannot be used in an index list.

When indices are specified that cause a column or table to be extended, they must include all indices beyond the current column or table size in any order but without any gaps. For example,

% set I [column series 5]
→ tarray_column int {0 1 2 3 4}
% column place $I {106 105 107 104} {6 5 7 4} 1
→ tarray_column int {0 1 2 3 104 105 106 107}
% column place $I {106 107} {6 7} 2
Ø tarray index 6 out of bounds.
1 Ok: Indices not in order but no gaps
2 Error: no value specified for non-existing index 5

2. Command reference

All commands are located in the tarray namespace.

2.1. Standard Options

Commands returning values from columns support the standard options shown in Standard options.

Table 2. Standard options

-list

The values are returned as a Tcl list.

-dict

The values are returned as a dictionary keyed by the corresponding indices.

-column

The values are returned as a typed column. This is the default if none of the other options is specified.

2.2. Commands

column bitmap0 COUNT ?INDICES?

Returns a new boolean column of size COUNT with all elements set to 0. If argument INDICES is specified, the elements at those position are set to 1.

column bitmap1 COUNT ?INDICES?

Returns a new boolean column of size COUNT with all elements set to 1. If argument INDICES is specified, the elements at those position are set to 0.

column cast COLTYPE COLUMN

Returns a new column of type COLTYPE containing elements of COLUMN cast to COLTYPE. This differs from the use of column create in that it will not raise an error if any element value in COLUMN is too large to fit into a column of type COLTYPE or if the value contains a non-zero fractional component and COLTYPE is one of the integral types. In the former case, if COLUMN is of an integer type, the higher order bits are discarded while if it is of type double, the cast value is undefined. In the latter case, the fractional component of the element value is discarded, and only its integer component is stored in the new column.

column categorize ?options? COLUMN

The command first places the elements of the column COLUMN into categories. By default, these are keyed by the value of the element. Alternatively, the -categorizer CMDPREFIX option may be specified in which case CMDPREFIX is called for every element of COLUMN. Each invocation has two additional arguments appended — the index of the element being passed and its value. The element is then placed into the category identified by the returned value from the invocation. If CMDPREFIX completes with a break control code, no further elements are processed. If it completes with a continue return code, that particular iteration is ignored and not included in the result.

The command returns a table with two columns, the first of which contains categories constructed from the unique values in COLUMN, or the returned values from CMDPREFIX if the -categorizer option was specified. The second column is of type any and of the same size as the first. Each element of this column is itself a column containing either the indices of the elements belonging to the corresponding category (by default or if the -indices option is specified), or the element values themselves (if the -values option is specified). These columns are named Category and Data by default. The -cnames option can be used to change these names, the option’s value being a pair containing the names to be used for the two columns.

By default the Category column is of type any if the -categorizer option is specified, and the same type as COLUMN otherwise. The -categorytype TYPE option may be specified to force it to be a specific category. Of course the values used for the category column must be compatible with this type.

See Grouping into categories for an example.

column count ?-range RANGE? ?-among INDICES? ?-not? ?-nocase? ?OPER? COLUMN VALUE

Counts the number of matches for a value in the elements in a column. See the column search command for a description of the various options. Note that if the -among is specified and an index occurs multiple times in INDICES, it will be counted multiple times.

column create TYPE ?INITIALIZER? ?INITSIZE?

Returns a typed array of the type TYPE which must be one of one of valid types described in Types. If INITIALIZER is specified, it is the initial content of the typed array and can be either a column of any compatible type or a list containing elements of the appropriate type. For performance reasons, the INITSIZE argument may be specified to indicate how many slots to preallocate for the array. This is only treated as a hint and actual size allocated may differ.

column delete COLUMN LOW HIGH

Returns a typed column with all specified elements deleted. Indices are specified in any of the forms described in Indices and may contain duplicates. Out-of-range indices are ignored.

column equal COLA COLB

Returns 1 if the specified columns have the same number of elements and corresponding elements of the two columns are equal. If the column types are not the same, comparison is done by converting numeric elements to strings if either column is non-numeric, conversion to doubles if either column is of type double, and conversion to wide integers otherwise. Note that this means, for example, that when comparing a column of type int to one of type any or string, the value 16 will not equate to the string 0x10.

The command will raise an error if either argument is not a column.

Also see the related command column identical which applies a stricter definition of equality.

column fill COLUMN VALUE LOW HIGH

Returns a typed column with specified indices set to VALUE. Indices are specified in any of the forms described in Indices and must follow the rules described there. The index keyword end refers to the current last occupied position in the column so to append values the index should be specified as end+1. The size of the array will be extended if necessary provided the specified indices do not have gaps beyond the current column size.

column get ?OPTIONS? COLUMN INDEXLIST

Returns the values from a typed column at the indices specified as index list. Any of the Standard options may be specified with this command.

column histogram ?options? COLUMN NBUCKETS

The command divides the target range of values into NBUCKETS intervals of equal size (except for possibly the last in case of value range overflow). The command places the values of the column, which must be of numeric type, into these NINTERVALS buckets. If no options are specified, the first target range has a lower bound that is the minimum value in the column. The size of each bucket is the minimum size required so that the maximum value is included in a bucket.

If the -min option is specified the associated value is used as the lower bound of the range and first bucket. If there happen to be any values in the column smaller than this, they are ignored in the returned result. Similarly, if the -max option is specified, any values greater than the associated option value are ignored. If the column is empty, both -min and -max values must be specified; otherwise the command will raise an error.

The command computes a bucket result for each bucket. By default, or if the -count option is specified, this bucket result is the sum of the values falling into that bucket. If the -sum option is specified, each bucket result is the sum of all values falling into that bucket. If the -indices option is specified, each bucket result is an index column containing the indices of the elements whose values fall into that bucket. Finally, if the -values option is specified, each bucket result is a column, of the same type as COLUMN, containing the actual values that fell into that bucket.

The command returns a table with two columns, the first of which contains the lower bound of each interval bucket. The second contains the corresponding computed bucket result for each bucket. These columns are named LowerBound and Data by default. The -cnames option can be used to change these names, the option’s value being a pair containing the names to be used for the two columns.

See Computing histograms for an example.

Note: for columns of type wide, the command will raise an error if the difference between the minimum and maximum covers the entire domain range of wides [-9223372036854775808, 9223372036854775807] and NBUCKETS is 1.

column identical COLA COLB

Returns 1 if both columns are of the same type, have the same number of elements and corresponding elements of the two columns are equal.

The command will raise an error if either argument is not a column.

Also see the related command column equal which applies a looser definition of equality.

column index COLUMN INDEX

Returns the value of the element at the position specified by INDEX which is a single index.

column inject COLUMN VALUES FIRST

Inserts VALUES, a list of values or a column of the same type as COLUMN, at the position FIRST and returns the resulting column. If FIRST is end, the values are appended to the column. In all cases, the command may extend the array if necessary.

column insert COLUMN VALUE FIRST ?COUNT?

Inserts COUNT (default 1) elements with value VALUE at position FIRST and returns the new column. In all cases, the command may extend the array if necessary.

column intersect3 ?-nocase? COLUMNA COLUMNB

Returns a list of three columns, the first containing elements common to both COLUMNA and COLUMNB, the second containing elements only present in COLUMNA and the third containing elements only present in COLUMNB. Both columns must be of the same type. The elements in each returned column are in arbitrary order.

The columns may contain duplicate elements. These are treated as distinct so for example if COLUMNA contain 5 elements with value A, and COLUMNB contains only 3 such elements, then the first column in the result will contain two A elements and the second column will contain three.

Option -nocase only has effect if the column type is any or string. If specified, elements are compared in case-insensitive mode.

column linspace START STOP COUNT ?-type TYPE? ?-open BOOL?

Returns a column containing COUNT values evenly spaced between START and STOP. STOP may be less than START in which case returned values are in descending order. The -type option specifies the column type and defaults to double. If the -open option is specified as true, the interval is open and STOP is not included in the returned values. The default is false.

Note that the returned column always contains COUNT elements. For integral types, this means some values may be repeated if the difference between the interval ends is less than COUNT. Moreover, the values may not be exactly spaced apart in the case that the interval cannot be divided into COUNT integral divisions.

column logspace START STOP COUNT ?-type TYPE? ?-open BOOL? ?-base BASE

Returns a column containing COUNT values evenly spaced between on a log scale between BASE**START and BASE**STOP. If unspecified, BASE defaults to 10. STOP may be less than START in which case returned values are in descending order. The -type option specifies the column type and defaults to double. If the -open option is specified as true, the interval is open and STOP is not included in the returned values. The default is false.

column lookup COLUMN ?LOOKUPKEY?

The command returns the index of an element in COLUMN that exactly matches LOOKUPKEY or -1 if not found. If LOOKUPKEY is not specified, command builds an internal dictionary (see below) and the return value is an empty string.

COLUMN must be a column of type string. Unlike the column search command, the returned index is not necessarily that of the first occurence in cases where LOOKUPKEY occurs multiple times in the column.

The command is usually much faster than column search because it is based on an internal dictionary that maps string values to their position indices in the column. This internal dictionary is either created when the command is called without the optional LOOKUPKEY argument, or is built in incremental fashion with each column lookup call.

In the current implementation, this dictionary is maintained in a loose or lazy manner and internally does not always reflect the actual content of the column. However, the return value of the command is always accurate.

column math OPERATION OPERAND ?OPERAND…​?

Performs the specified mathematical operation OPERATION on the given operands. The possible operations are shown in Column math operators below.

The operands may be any combination of scalar numerical values and columns of appropriate types shown in the table. If multiple columns are specified, they may be of differing types. All columns must have the same number of elements.

If every operand is a scalar, the return value is also a scalar numerical value computed in similar (but not identical) fashion to the Tcl expr command.

For arithmetic operations, if at least one operand is a column, the return value is a column whose type depends on the type of the ''widest'' operand. For example, if any column or scalar is a double, the resulting column will be of type double. For this purpose, the type double is considered wider than type wide. The value of each element of the result column is computed by invoking the specified operation on the corresponding elements of the operand columns. Any scalar operands specified are treated as columns of the appropriate type and size all of whose elements are equal to that scalar value. For arithmetic operations, elements of boolean columns are treated as having integer values 0 and 1. If the result type is double, all computation is done by is done by converting each operand (or element of an operand) to a double. Otherwise all computation is done using 64-bit integers and converted back to the result type. Columns of type any and string are not allowed for arithmetic operations.

For logical operations like && and comparisons like ==, the returned column is always boolean. Columns of type any and string are not allowed.

For relational operations, columns of any type are allowed and are type promoted for comparisons as for arithmetic operations with the difference that any non-numeric operand will result in string based comparisons.

Table 3. Column math operators
Operator Description Allowed column types

+

Adds all specified operands.

boolean, byte, int, uint, wide, double

-

Subtracts all remaining operands from the first operand. Note the behaviour when a single operand is specified is different from the behaviour of the Tcl expr or tcl::mathop::- commands.

boolean, byte, int, uint, wide, double

*

Multiplies all specified operands.

boolean, byte, int, uint, wide, double

/

Successively divides the first operand by each subsequent operand. Note the behaviour when a single operand is specified is different from the behaviour of the Tcl expr or tcl::mathop::/ commands.

boolean, byte, int, uint, wide, double

&

Performs a bitwise-and operation on all the operands.

boolean, byte, int, uint, wide

&&

Performs a logical-and operation on all the operands.

boolean, byte, int, uint, wide, double

|

Performs a bitwise-or operation on all the operands.

boolean, byte, int, uint, wide

||

Performs a logical-or operation on all the operands.

boolean, byte, int, uint, wide, double

^

Performs a bitwise-xor operation on all the operands.

boolean, byte, int, uint, wide

^^

Performs a logical-xor operation on all the operands.

boolean, byte, int, uint, wide, double

==

Compares each operand against the next for equality.

boolean, byte, int, uint, wide, double, any, string

!=

Compares each operand against the next for inequality. Unlike the other operators, this requires exactly two arguments.

boolean, byte, int, uint, wide, double, any, string

<

Compares whether each operand is less than the next.

boolean, byte, int, uint, wide, double, any, string

Compares whether each operand is less than or equal to the next.

boolean, byte, int, uint, wide, double, any, string

>

Compares whether each operand is greater than the next.

boolean, byte, int, uint, wide, double, any, string

>=

Compares whether each operand is greater than or equal to the next.

boolean, byte, int, uint, wide, double, any, string

**

Exponentiation. Raises a base number to a power. If multiple operands

are specified, evaluation is right to left as in Tcl’s ** command.

Tip The above operations may also be invoked directly as column + …​ instead of column math + …​.

column minmax ?OPTIONS? COLUMN

Searches the specified column for the minimum and maximum values, returning them as a pair. If -indices is specified, their indices are returned instead of their values. In case either value occurs at multiple indices in the column, the lowest index is returned.

The option -range can be specified to limit the search to a subrange of the column. It takes a pair of indices, in the one of the forms described in Indices, that inclusively specify the subrange. The second element of the pair may be omitted in which case it defaults to the last element in the column.

The option -nocase may be specified to indicate case-insensitive comparisons. This is only effective if the column type is any or string and ignored for the others.

column ones COUNT ?TYPE?

Returns a column of size COUNT with all elements initialized to 1. TYPE defaults to int.

column place COLUMN VALUES INDICES

Returns a typed column with the specified values at the corresponding indices. VALUES may be a list of values or a column of the same type. The number of values in VALUES must not be less than the number of indices specified in INDICES. INDICES must be a index list in the one of the forms described in Indices and may extend the column if the conditions listed there are satisfied.

column put COLUMN VALUES ?FIRST?

Returns a typed column with the elements starting at index FIRST replaced by the corresponding elements of VALUES. VALUES may be a list of values or a typed column of the same type. The command may extend the array if necessary. If FIRST is not specified the elements are appended to the array. The command interprets end as the position after the last element in the array.

column random TYPE COUNT ?LOWERBOUND? ?UPPERBOUND?

Returns a new column of type TYPE with COUNT elements containing randomly generated values from a uniform distribution. For types boolean, byte, int, uint and wide the range of generated values corresponds to the entire domain range by default. For type double the values are generated in the range [0,1] by default. The optional LOWERBOUND and UPPERBOUND arguments may be supplied to modify the range from which values are sampled. These are ignored for TYPE boolean.

For use cases such as testing where you want the same reproducible “random” values to be produced, you can use the randseed command to set or reset the seed values used for random number generation.

column range ?OPTIONS? COLUMN LOW HIGH

Returns the values from a typed column for indices in a specified range. Any of the Standard options may be specified with this command.

column reverse COLUMN

Returns the typed column with order of elements reversed.

Searches the specified typed column for a matching value. By default, the search starts at index 0 and returns the index of the first match, returning -1 if no matching value is found.

Options -range and -among modify which elements of the column are examined. The -range option limits the search to the range specified by RANGE which either consists of two integer indices denoting the starting and ending elements of the range (inclusive), or a single integer index denoting the start of the range with the end defaulting to the last element of the column. The -among option specifies a list of indices to be examined. INDICES is an index list or index column. Indices are allowed to be specified multiple times in arbitrary order. Elements are examined and matches returned in that same order. Indices that fall outside the range (either explicitly specified through -range or defaulting to the entire column) are ignored. Thus if both -range and -among options are specified, only those positions that meet both criteria are examined.

The command normally returns the index of the first succeeding match. Note this is not necessarily the lowest matching index since -among may specify indices in any order. If the option -all is specified, the search does not stop at the first match but instead searches for all matching elements and returns a integer column containing the indices of all matched elements. The option -bitmap implies -all, but in this case the command returns a boolean column with the bits corresponding to each matching index set to 1.

If the -inline option is specified, the command returns the matched value(s) instead of their indices.

OPER specifies the comparison operator and must be one of those shown in Search comparison operators.

Table 4. Search comparison operators

-eq

Matches when equal to value (default)

-gt

Matches when greater than VALUE. Not valid for boolean type.

-lt

Matches when less than VALUE. Not valid for boolean type.

-pat

Matches VALUE using string match pattern rules. Only valid for types any and string.

-re

Matches VALUE as a regular expression. Only valid for type any.

The sense of the match can be inverted by specifying the -not option so for example specifying -not -gt will match all elements that are less than or equal to VALUE. For case insensitive matching, the -nocase option may be specified. This is ignored for all array types except types any and string.

column series START STOP STEP

Returns a column with values between START (included) and STOP (excluded) incremented by STEP. START and STEP default to 0 and 1 respectively if unspecified. If STEP is less than 0, STOP must be less than START.

The type of the returned column may be int, wide or double depending on the operands. For example, a STEP of 1.0 would result in a column of type double whereas a STEP of 1 would return a int or wide depending on the range of operands.

column shuffle COLUMN

Returns a new column containing the elements of COLUMN in a new random order. Columns of type boolean are not supported.

column size COLUMN

Returns the number of elements in the typed column.

column sort ?-indices? ?-increasing? ?-decreasing? ?-nocase? ?-indirect TARGETCOLUMN? COLUMN

Returns a sorted typed column. COLUMN is the typed column to be sorted. The comparison is done in a column type-specific manner. Sorting is sorted in increasing order by default. The -decreasing option may be specified to change this.

If the -indices option is specified, the command returns a typed array containing the indices of the sorted values instead of the values themselves.

If -nocase is specified, the comparisons are done in case-insensitive manner. This option is only applicable when the column type is any or string and is ignored otherwise.

Option -indirect may only be used when COLUMN is of type int. In this case, the elements of COLUMN are treated as indices into TARGETCOLUMN and are compared using the corresponding values from TARGETCOLUMN. This option is useful when sorting a column or multiple columns in a table using different criteria while keeping a stable order.

column sum COLUMN

Returns the sum of all elements of the specified column which must be of a numeric type. For integer types, the sum is calculated as a 64 bit integer even if the column has a smaller integer width. There is no detection of integer overflow.

column summarize ?options? COL

The command returns a column that, depending on the passed options, summarizes the contents of the passed column COL. The command expects COL to be of the form of the data column in the table returned by column categorize or column histogram with the -values option. This form is a column of type any, all elements of which are themselves columns, all of the same type.

The return value is then a column, of the same size as COL, each element of which is a value that summarizes the corresponding column element in COL. This summary value may be computed in several ways depending on the specified options.

  • If no options are specified or the -count option is specified, the value is the number of elements of the corresponding nested column of COL. The returned column is of type int.

  • If the -sum option is specified, the value is the sum of the elements of the corresponding nested column (which must be a numeric type). The type of the column is double if the nested columns were of that type or wide for integer types.

  • If the -summarizer CMDPREFIX option is specified, the value is that returned by the command prefix CMDPREFIX which is called with two additional arguments, the index into COL and the corresponding nested column at that index. The returned column is of type any by default. The -summarytype TYPE option may be specified to change this.

Usually, the table summarize command is more convenient to use in lieu of this command.

column type COLUMN

Returns the type of the typed column.

column values COLUMN

Returns all the elements of the column as a list.

column vdelete COLUMNVAR LOW HIGH

Deletes elements from the typed array column in variable COLUMNVAR. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command. Indices are specified in any of the forms described in Indices and may contain duplicates. Out-of-range indices are ignored.

column vfill COLUMNVAR VALUE LOW HIGH

Set the elements of the typed column in variable COLUMNVAR to VALUE. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

See fill for more information.

column vinject COLUMNVAR VALUES FIRST

Inserts VALUES, a list of values or a column of the same type as the column in variable COLUMNVAR, at the position FIRST and stores the result back in COLUMNVAR. If FIRST is end, the values are appended to the column. In all cases, the command may extend the array if necessary. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

column vinsert COLUMNVAR VALUE FIRST ?COUNT?

Inserts COUNT (default 1) elements with value VALUE at position FIRST in the column stored in variable COLUMNVAR. If FIRST is end, the values are appended to the column. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command. In all cases, the command may extend the array if necessary.

column vplace COLUMNVAR VALUES INDICES

Modifies a typed column stored in the variable COLUMNVAR with the specified values at the corresponding indices. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

See the command column place for other details.

column vput COLUMNVAR VALUES FIRST

Modifies a variable COLUMNVAR containing a typed column. The elements of the column starting at index FIRST are replaced by the corresponding elements of VALUES. If FIRST is not specified the elements are appended to the array. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

See the command column put for other details.

column vreverse COLUMNVAR

Reverses the order of elements in the typed column in variable COLUMNVAR. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

column vshuffle COLUMNVAR

Shuffles the order of elements in the typed column in variable COLUMNVAR. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

column vsort ?-increasing? ?-decreasing? ?-nocase? ?-indirect TARGETCOLUMN? COLUMNVAR

Sorts a typed column stored in variable. COLUMNVAR is variable containing the typed column to be sorted. The sorted column is also returned as the command result. See the column sort command for a description of the options.

column width COLUMN ?FORMAT?

Returns the maximum width of the specified column in terms of the number of characters required to print in the given format. If FORMAT is not specified, it defaults to %s. If the column is empty, the command returns 0 irrespective of FORMAT.

column zeroes COUNT ?TYPE?

Returns a column of size COUNT with all elements initialized to 0. TYPE defaults to int.