Copyright © 2016 Ashok P. Nadkarni. All rights reserved.

1. Introduction

The tarray extension implements typed arrays and associated commands column and table. This page provides reference documentation for commands related to typed columns. See the main contents for guides and other reference documentation.

1.1. Installation and loading

Binary packages for some platforms are available from the Sourceforge download area. See the build instructions for other platforms.

To install the extension, extract the files from the distribution to any directory that is included in your Tcl installation’s auto_path variable.

Once installed, the extension can be loaded with the standard Tcl package require command.

% package require tarray
→ 0.8
% namespace import tarray::column

1.2. Columns

A typed array column contains elements of a single type, such as int or string, that is specified when it is created. The command tarray::column operates on typed columns including searching and sorting operations.

Related to columns, are tables which are ordered sequences of typed columns.

1.3. Types

All elements in a typed column must be of the type specified when the column is created. The following element types are available:

Table 1. Table Column types
Keyword Type

any

Any Tcl value

string

A string value

boolean

A boolean value

byte

Unsigned 8-bit integer

double

Floating point value

int

Signed 32-bit integer

uint

Unsigned 32-bit integer

wide

Signed 64-bit integer

The primary purpose of the type is to specify what values can be stored in that column. This impacts the compactness of the internal storage (really the primary purpose of the extension) as well certain operations (like sort or search) invoked on the column.

The types any and string are similar in that they can hold any Tcl value. Both are treated as string values for purposes of comparisons and operators. The difference is that the former stores the value using the Tcl internal representation while the latter stores it as a string. The advantage of the former is that internal structure, like a dictionary, is preserved. The advantage of the latter is significantly more compact representation, particularly for smaller strings.

Attempts to store values in a column that are not valid for that column will result in an error being generated.

1.4. Indices

An index into a typed column or table can be specified as either an integer or the keyword end. As in Tcl’s list commands, end specifies the index of the last element in the tarray or the index after it, depending on the command. Simple arithmetic adding of offsets to end is supported, for example end-2 or end+5.

Many commands also allow multiple indices to be specified. These may take one of two forms - a range which includes all indices between a lower and an upper bound (inclusive), and an index list which may be a list of integers, or a column of type int. This latter allows the indices returned by commands such as column search to be efficiently passed to other commands. When indices are specified as a list cause an array to be extended, the index list must include all indices beyond the current array size in any order but without any gaps. For example, if an array contains a thousand elements (the highest index thereby being 999), the index list 1001 1000 1002 is legal but 1001 1002 is not.

Note that keyword end can be used to specify a single index or as a range bound, but cannot be used in an index list.

2. Command reference

All commands are located in the tarray namespace.

2.1. Standard Options

Commands returning values from columns support the standard options shown in Standard options.

Table 2. Standard options

-list

The values are returned as a Tcl list.

-dict

The values are returned as a dictionary keyed by the corresponding indices.

-column

The values are returned as a typed column. This is the default if none of the other options is specified.

2.2. Commands

column cast COLUMN COLTYPE

DEPRECATED. Returns a new column of type COLTYPE containing elements of COLUMN cast to COLTYPE. This command is deprecated. Use column create instead.

column count ?-range RANGE? ?-among INDICES? ?-not? ?-nocase? ?OPER? COLUMN VALUE

Counts the number of matches for a value in the elements in a column. See the column search command for a description of the various options. Note that if the -among is specified and an index occurs multiple times in INDICES, it will be counted multiple times.

column create TYPE ?INITIALIZER? ?INITSIZE?

Returns a typed array of the type TYPE which must be one of one of valid types described in Types. If INITIALIZER is specified, it is the initial content of the typed array and can be either a column of any compatible type or a list containing elements of the appropriate type. For performance reasons, the INITSIZE argument may be specified to indicate how many slots to preallocate for the array. This is only treated as a hint and actual size allocated may differ.

column delete COLUMN LOW HIGH

Returns a typed column with all specified elements deleted. Indices are specified in any of the forms described in Indices and may contain duplicates. Out-of-range indices are ignored.

column fill COLUMN VALUE LOW HIGH

Returns a typed column with specified indices set to VALUE. Indices are specified in any of the forms described in Indices and must follow the rules described there. The index keyword end refers to the current last occupied position in the column. The size of the array will be extended if necessary provided the specified indices do not have gaps beyond the current column size.

column get ?OPTIONS? COLUMN INDEXLIST

Returns the values from a typed column at the indices specified as index list. Any of the Standard options may be specified with this command.

column index COLUMN INDEX

Returns the value of the element at the position specified by INDEX which is a single index.

column inject COLUMN VALUES FIRST

Inserts VALUES, a list of values or a column of the same type as COLUMN, at the position FIRST and returns the resulting column. If FIRST is end, the values are appended to the column. In all cases, the command may extend the array if necessary.

column insert COLUMN VALUE FIRST COUNT

Inserts COUNT elements with value VALUE at position FIRST and returns the new column. In all cases, the command may extend the array if necessary.

column intersect3 ?-nocase? COLUMNA COLUMNB

Returns a list of three columns, the first containing elements common to both COLUMNA and COLUMNB, the second containing elements only present in COLUMNA and the third containing elements only present in COLUMNB. Both columns must be of the same type. The elements in each returned column are in arbitrary order.

The columns may contain duplicate elements. These are treated as distinct so for example if COLUMNA contain 5 elements with value A, and COLUMNB contains only 3 such elements, then the first column in the result will contain two A elements and the second column will contain three.

Option -nocase only has effect if the column type is any or string. If specified, elements are compared in case-insensitive mode.

column lookup COLUMN ?LOOKUPKEY?

The command returns the index of an element in COLUMN that exactly matches LOOKUPKEY or -1 if not found. If LOOKUPKEY is not specified, command builds an internal dictionary (see below) and the return value is an empty string.

COLUMN must be a column of type string. Unlike the column search command, the returned index is not necessarily that of the first occurence in cases where LOOKUPKEY occurs multiple times in the column.

The command is usually much faster than column search because it is based on an internal dictionary that maps string values to their position indices in the column. This internal dictionary is either created when the command is called without the optional LOOKUPKEY argument, or is built in incremental fashion with each column lookup call.

In the current implementation, this dictionary is maintained in a loose or lazy manner and internally does not always reflect the actual content of the column. However, the return value of the command is always accurate.

column math OPERATION OPERAND ?OPERAND…​?

Performs the specified mathematical operation OPERATION on the given operands. The possible operations are shown in Column math operators below.

The operands may be any combination of scalar numerical values and columns of appropriate types shown in the table. If multiple columns are specified, they may be of differing types except that boolean columns may only be combined with scalar values and other boolean columns and not with columns of other types.

All columns must have the same number of elements.

If any operand is a double or a column of type double, all computation is done by converting each operand (or element of an operand) to a double. Otherwise all computation is done using 64-bit integers and converted back to the result type.

If every operand is a scalar, the return value is also a scalar numerical value computed in similar (but not identical) fashion to the Tcl expr command.

If at least one operand is a column, the return value is a column whose type depends on the type of the ''widest'' operand. For example, if any column or scalar is a double, the resulting column will be of type double. For this purpose, the type double is considered wider than type wide. The value of each element of the result column is computed by invoking the specified operation on the corresponding elements of the operand columns. Any scalar operands specified are treated as columns of the appropriate type and size all of whose elements are equal to that scalar value.

Table 3. Column math operators
Operator Description Allowed column types

+

Adds all specified operands.

byte, int, uint, wide, double

-

Subtracts all remaining operands from the first operand. Note the behaviour when a single operand is specified is different from the behaviour of the Tcl expr or tcl::mathop::- commands.

byte, int, uint, wide, double

*

Multiplies all specified operands.

byte, int, uint, wide, double

/

Successively divides the first operand by each subsequent operand. Note the behaviour when a single operand is specified is different from the behaviour of the Tcl expr or tcl::mathop::/ commands.

byte, int, uint, wide, double

&

Performs a bitwise-and operation on all the operands. This operation is not valid if any operand is a double or a column of type double.

boolean, byte, int, uint, wide

|

Performs a bitwise-or operation on all the operands. This operation is not valid if any operand is a double or a column of type double.

boolean, byte, int, uint, wide

^

Performs a bitwise-xor operation on all the operands. This operation is not valid if any operand is a double or a column of type double.

boolean, byte, int, uint, wide

Tip The above operations may also be invoked directly as column + …​ instead of column math + …​.

column minmax ?OPTIONS? COLUMN

Searches the specified column for the minimum and maximum values, returning them as a pair. If -indices is specified, their indices are returned instead of their values. In case either value occurs at multiple indices in the column, the lowest index is returned.

The option -range can be specified to limit the search to a subrange of the column. It takes a pair of indices, in the one of the forms described in Indices, that inclusively specify the subrange. The second element of the pair may be omitted in which case it defaults to the last element in the column.

The option -nocase may be specified to indicate case-insensitive comparisons. This is only effective if the column type is any or string and ignored for the others.

column place COLUMN VALUES INDICES

Returns a typed column with the specified values at the corresponding indices. VALUES may be a list of values or a column of the same type. The number of values in VALUES must not be less than the number of indices specified in INDICES. INDICES must be a index list in the one of the forms described in Indices and may extend the column if the conditions listed there are satisfied.

column put COLUMN VALUES ?FIRST?

Returns a typed column with the elements starting at index FIRST replaced by the corresponding elements of VALUES. VALUES may be a list of values or a typed column of the same type. The command may extend the array if necessary. If FIRST is not specified the elements are appended to the array. The command interprets end as the position after the last element in the array.

column range ?OPTIONS? COLUMN LOW HIGH

Returns the values from a typed column for indices in a specified range. Any of the Standard options may be specified with this command.

column reverse COLUMN

Returns the typed column with order of elements reversed.

Searches the specified typed column for a matching value. By default, the search starts at index 0 and returns the index of the first match, returning -1 if no matching value is found.

Options -range and -among modify which elements of the column are examined. The -range option limits the search to the range specified by RANGE which either consists of two integer indices denoting the starting and ending elements of the range (inclusive), or a single integer index denoting the start of the range with the end defaulting to the last element of the column. The -among option specifies a list of indices to be examined. INDICES is an index list or index column. Indices are allowed to be specified multiple times in arbitrary order. Elements are examined and matches returned in that same order. Indices that fall outside the range (either explicitly specified through -range or defaulting to the entire column) are ignored. Thus if both -range and -among options are specified, only those positions that meet both criteria are examined.

The command normally returns the index of the first succeeding match. Note this is not necessarily the lowest matching index since -among may specify indices in any order. If -all is specified, the search does not stop at the first match but instead returns a list containing the indices of all matched elements. This may be an empty list if no elements matched.

If the -inline option is specified, the command returns the matched value(s) instead of their indices.

OPER specifies the comparison operator and must be one of those shown in Search comparison operators.

Table 4. Search comparison operators

-eq

Matches when equal to value (default)

-gt

Matches when greater than VALUE. Not valid for boolean type.

-lt

Matches when less than VALUE. Not valid for boolean type.

-pat

Matches VALUE using string match pattern rules. Only valid for types any and string.

-re

Matches VALUE as a regular expression. Only valid for type any.

The sense of the match can be inverted by specifying the -not option so for example specifying -not -gt will match all elements that are less than or equal to VALUE. For case insensitive matching, the -nocase option may be specified. This is ignored for all array types except types any and string.

column series START STOP STEP

Returns a column with values between START (included) and STOP (excluded) incremented by STEP. START and STEP default to 0 and 1 respectively if unspecified. If STEP is less than 0, STOP must be less than START.

The type of the returned column may be int, wide or double depending on the operands. For example, a STEP of 1.0 would result in a column of type double whereas a STEP of 1 would return a int or wide depending on the range of operands.

column size COLUMN

Returns the number of elements in the typed column.

column sort ?-indices? ?-increasing? ?-decreasing? ?-nocase? ?-indirect TARGETCOLUMN? COLUMN

Returns a sorted typed column. COLUMN is the typed column to be sorted. The comparison is done in a column type-specific manner. Sorting is sorted in increasing order by default. The -decreasing option may be specified to change this.

If the -indices option is specified, the command returns a typed array containing the indices of the sorted values instead of the values themselves.

If -nocase is specified, the comparisons are done in case-insensitive manner. This option is only applicable when the column type is any or string and is ignored otherwise.

Option -indirect may only be used when COLUMN is of type int. In this case, the elements of COLUMN are treated as indices into TARGETCOLUMN and are compared using the corresponding values from TARGETCOLUMN. This option is useful when sorting a column or multiple columns in a table using different criteria while keeping a stable order.

column sum COLUMN

Returns the sum of all elements of the specified column which must be of a numeric type. For integer types, the sum is calculated as a 64 bit integer even if the column has a smaller integer width. There is no detection of integer overflow.

column type COLUMN

Returns the type of the typed column.

column vdelete COLUMNVAR LOW HIGH

Deletes elements from the typed array column in variable COLUMNVAR. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command. Indices are specified in any of the forms described in Indices and may contain duplicates. Out-of-range indices are ignored.

column vfill COLUMNVAR VALUE LOW HIGH

Set the elements of the typed column in variable COLUMNVAR to VALUE. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

See fill for more information.

column vinject COLUMNVAR VALUES FIRST

Inserts VALUES, a list of values or a column of the same type as the column in variable COLUMNVAR, at the position FIRST and stores the result back in COLUMNVAR. If FIRST is end, the values are appended to the column. In all cases, the command may extend the array if necessary. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

column vinsert COLUMNVAR VALUE FIRST COUNT

Inserts COUNT elements with value VALUE at position FIRST in the column stored in variable COLUMNVAR. If FIRST is end, the values are appended to the column. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command. In all cases, the command may extend the array if necessary.

column vplace COLUMNVAR VALUES INDICES

Modifies a typed column stored in the variable COLUMNVAR with the specified values at the corresponding indices. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

See the command column place for other details.

column vput COLUMNVAR VALUES FIRST

Modifies a variable COLUMNVAR containing a typed column. The elements of the column starting at index FIRST are replaced by the corresponding elements of VALUES. If FIRST is not specified the elements are appended to the array. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

See the command column put for other details.

column vreverse COLUMNVAR

Reverses the order of elements in the typed column in variable COLUMNVAR. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

column vsort ?-increasing? ?-decreasing? ?-nocase? ?-indirect TARGETCOLUMN? COLUMNVAR

Sorts a typed column stored in variable. COLUMNVAR is variable containing the typed column to be sorted. The sorted column is also returned as the command result. See the column sort command for a description of the options.