1. Introduction

The tarray extension implements typed arrays and associated commands column and table. This page provides reference documentation for commands related to typed tables. See Introduction for an overview and Programmer’s guide for a programming guide.

1.1. Installation and loading

Binary packages for some platforms are available from the Sourceforge download area. See the build instructions for other platforms.

To install the extension, extract the files from the distribution to any directory that is included in your Tcl installation’s auto_path variable.

Once installed, the extension can be loaded with the standard Tcl package require command.

% package require tarray
→ 1.0.0
% namespace import tarray::table

1.2. Tables

A typed table is an ordered sequence of typed columns of equal size. It can be viewed as an array of records where the record fields happen to use column-wise storage. The corresponding table command operates on typed tables.

The columns in a table are defined with a name, type and order when the table is created. Commands that operate on tables allow columns to be specified using either the column name or its position.

1.3. Types

All elements in a column must be of the type specified when the column is created. The following element types are available:

Table 1. Table Column types
Keyword Type

any

Any Tcl value

string

A string value

boolean

A boolean value

byte

Unsigned 8-bit integer

double

Floating point value

int

Signed 32-bit integer

uint

Unsigned 32-bit integer

wide

Signed 64-bit integer

The primary purpose of the type is to specify what values can be stored in that column. This impacts the compactness of the internal storage (really the primary purpose of the extension) as well certain operations (like sort or search) invoked on the column.

The types any and string are similar in that they can hold any Tcl value. Both are treated as string values for purposes of comparisons and operators. The difference is that the former stores the value using the Tcl internal representation while the latter stores it as a string. The advantage of the former is that internal structure, like a dictionary, is preserved. The advantage of the latter is significantly more compact representation, particularly for smaller strings.

Attempts to store values in a column that are not valid for that column will result in an error being generated.

1.4. Indices

An index into a typed column or table can be specified as either an integer or the keyword end. As in Tcl’s list commands, end specifies the index of the last element in the tarray or the index after it, depending on the command. Simple arithmetic adding of offsets to end is supported, for example end-2 or end+5.

Many commands also allow multiple indices to be specified. These may take one of two forms — a range which includes all indices between a lower and an upper bound (inclusive), and an index list which may be one of the following:

  • a Tcl list of integers

  • a column of any type other than boolean. The value of each element of the column is converted to an integer that is treated as an index.

  • a column of type boolean. Here the index of each bit in the boolean column that is set to 1 is treated as an index.

Note that keyword end can be used to specify a single index or as a range bound, but cannot be used in an index list.

When indices are specified that cause a column or table to be extended, they must include all indices beyond the current column or table size in any order but without any gaps. For example,

% set I [column series 5]
→ tarray_column int {0 1 2 3 4}
% column place $I {106 105 107 104} {6 5 7 4} 1
→ tarray_column int {0 1 2 3 104 105 106 107}
% column place $I {106 107} {6 7} 2
Ø tarray index 6 out of bounds.
1 Ok: Indices not in order but no gaps
2 Error: no value specified for non-existing index 5

2. Command reference

All commands are located in the tarray namespace.

2.1. Standard Options

Many commands take one or more of the standard options shown in Standard options below. The -list, -dict and -table options control the format of the returned values. The -columns option allows selection and ordering of specific columns from the table.

Table 2. Standard options

-columns COLUMNS

Selects a subset of columns from the table and their order. If this option is not specified, all columns in the table are selected as the target of the command and in the same order as in the table definition. If this option is specified, COLUMNS is a list of column indexes and names. For commands that retrieve data, like table get, only data from the specified columns is retrieved and in the column order specified in COLUMNS. For commands that modify data, only data in the specified columns is modified. The input data values are taken in the same order as specified in COLUMNS. Note that when a command invocation that causes a table to grow specifies the -columns option, all columns must be included in COLUMNS although they may be specified in any order depending on the order of the source data.

-dict

Specifies that the return values must be in the form of a dictionary keyed by the corresponding indices. The value of each key is a row which is a list each element of which is the value at that index in the corresponding column.

-list

Specifies that the return value must be in the form of a list of rows, one per index. Each row is itself a list, each element of which is the value at that index in the corresponding column.

-table

Specifies that the values are to be returned as a table. This is the default if neither -list nor -dict is specified.

2.2. Commands

table ctype TABLE COLSPEC

Returns the type of a column in a table.

table cnames TABLE

Returns the list of column names for the table.

table column TABLE COLSPEC ?NEWCOL?

If argument NEWCOL is not present, the command returns the table column specified by COLSPEC which may be either the column name or its position. If NEWCOL is specified, it must be a column of the same type and length as the table column specified by COLSPEC. The command then returns TABLE with that table column replaced by NEWCOL.

table columns TABLE ?COLSPECS?

If argument COLSPECS is not present, the command returns a list containing all the columns in the specified table. If COLSPECS is specified, it must be a list of column names or positions. In this case the returned list only contains the corresponding columns.

table create DEFINITION ROWVALUES

Returns a table containing a sequence of columns. DEFINITION is a list of alternating column names and column types. A column name is an identifier for a column that can be used in lieu of a column index. The type for a column must be one of the valid types described in Types.

ROWVALUES is the initial content of the table array specified as a nested list with each sublist being a row whose element types are compatible with the corresponding column types in DEFINITION.

table create2 COLNAMES COLUMNS

Returns a table whose column names are specified by COLNAMES and contents are given by COLUMNS which must be a list of tarray columns.

table csvexport ?options? OUTPUT TABLE

Writes out the contents of TABLE in CSV format to the Tcl channel or file specified by OUTPUT. In case of the latter, if the file already exists, an error is raised unless either -force or -append options are specified. The -force option causes existing files to be overwritten. The -append option specifies the CSV data should be appended to the end of the existing file content. Neither option has any effect if OUTPUT is a channel.

The -header option may be used to write out a header row to the file. The option value should generally be a list of the same length as the number of columns in the table although that is not mandated.

The command accepts the options -encoding and -translation with the same semantics as for the Tcl fconfigure command.

Any additional options are passed on to the tclcsv::csv_write command and control the CSV dialect to be used. These allow control of the CSV dialect (separators, terminators, quoting etc.) of the generated output. Refer to the documentation for that command for available options.

table csvimport ?options? INPUT

Returns a table containing the data formatted as CSV from the source specified by INPUT which may be a Tcl channel or file. The data is read using the tclcsv package. If the CSV file includes a header, it is used to form the column names for the table with characters that are illegal in column names replaced by underscores. If the file does not have a header, column names of the form COL_N are generated.

The command accepts the options -encoding and -translation with the same semantics as for the Tcl fconfigure command.

If the -sniff switch is specified, the tclcsv::sniff command is used to guess the format of the CSV file.

Any additional options are passed on to the tclcsv::reader command. These allow specification of the CSV dialect (separators, terminators, quoting etc.) of input data. Refer to the documentation for that command for available options. Any options specified thus will override the values discovered via the -sniff option.

table definition TABLE

Returns the definition of the specified table in a form that can be passed to table create.

table dbimport resultset RESULTSET TABLEVAR

Appends the contents of a TDBC result set object RESULTSET to the tarray table stored in the variable TABLEVAR in the caller’s context. The result set column types must be compatible with the corresponding columns of the tarray table. In case of errors, the original table is unmodified.

table dbimport table DBCONN DBTABLE ?COLNAMES?

Returns a table containing the contents of the database table named DBTABLE from the TDBC connection object DBCONN. COLNAMES should be a list of columns from which data is to be returned. If unspecified, all columns are returned.

The names of the columns in the returned tarray table are as returned by the database query result set. However, when the table is empty, the query result set does not specify column names. In that case, the column names are as specified by the caller or if unspecified, those returned by the TDBC connection object (this may differ from the actual names in character case).

The database column types are mapped according to the following table.

Table 3. Mapping SQL types to tarray types

int, smallint, integer

int

bigint

wide

tinyint

byte

float, decimal, numeric, double

double

bit

boolean

Anything else

any

Note in particular that the precise numeric types decimal and numeric are mapped to imprecise floats. If this is not desirable, for example mapping to type any may be preferable, use the table dbimport resultset RESULTSET TABLEVAR command instead. The same applies if the above mapping is not suitable for any other reason as well.

table delete TABLE LOW HIGH

Returns a typed table with all specified rows deleted. The row indices are specified in any of the forms described in Indices.

table equal TABA TABB

Returns 1 if the specified tables have the same number of columns and the column equal command returns true for every corresponding pair of columns in the two tables. Note that the column types need not be the same. See the description of that command for details.

The command will raise an error if either argument is not a table.

Also see the related command table identical which applies a stricter definition of equality.

table fill ?-columns COLUMNS? TABLE ROWVALUE LOW HIGH

Returns a typed table with specified rows set to ROWVALUE. Each element from the list ROWVALUE is assigned to the corresponding column of the table at the specified indices. Any additional elements in ROWVALUE are ignored. An error is generated if any element of ROWVALUE does not match the type of the corresponding column or if the ROWVALUE width differs from the table width. Indices are specified in any of the forms described in Indices. The size of the array will be extended if necessary. The index end refers to the last element of the table so to append rows the index must be specified as end+1.

The standard option -columns may be specified to target specific columns of the table or to match the order of columns to the supplied data.

table get ?OPTIONS? TABLE INDEXLIST

Returns the values from a table at the indices specified as a index list. Any of the standard options may be specified with this command.

table identical TABA TABB

Returns 1 if the specified tables have the same column names and the column identical command returns true for every corresponding pair of columns in the two tables. Note that the column types have to be the same. See the description of that command for details.

The command will raise an error if either argument is not a table.

Also see the related command table equal which applies a looser definition of equality.

table index TABLE INDEX

Returns the value of the row at the position specified by INDEX.

table inject ?-columns COLUMNS? TABLE ROWVALUES FIRST

Inserts ROWVALUES, a list of rows or a compatible table as TABLE, at the position FIRST and returns the resulting table. If FIRST is end, the values are appended to the column. In all cases, the command may extend the table if necessary.

The standard option -columns may be specified to match the order of columns to the supplied data. Note that COLUMNS must include all columns in the table as the command would not know what values to use for the unspecified columns.

table insert ?-columns COLUMNS? TABLE ROWVALUE FIRST ?COUNT?

Inserts COUNT (default 1) rows with value ROWVALUE at position FIRST and returns the new table. The rows are inserted at the specified position. In all cases, the command may extend the array if necessary.

The standard option -columns may be specified to match the order of columns to the supplied data. Note that COLUMNS must include all columns in the table as the command would not know what values to use for the unspecified columns.

table join ?options? TABLE0 TABLE1

Returns a new table containing a subset of rows from the cross product of TABLE0 and TABLE1 that satisfy a condition that the value of a specified column in TABLE0 matches that of a specified column in TABLE1.

The -on option controls the columns of the two tables that are matched. The option value must be a list of one or two elements. If the list has a single element, it must be a column name that is present in both tables. If two elements are present, they must be the name of a column in TABLE0 and a column in TABLE1 respectively. If the -on option is not specified, the value defaults to column name that is common to both tables. If there are multiple such column names, the one with the lowest index position in TABLE0 is used.

The columns being compared must be of the same type which must not be boolean.

If the -nocase option is specified, the column elements are compared in case-insensitive fashion. Otherwise, the comparison is case-sensitive. The option is ignored for numeric columns.

By default, the returned table will include all columns from both tables. If this is not required, the -t0cols and -t1cols options may be used to specify the columns to include. The option values are a list of column names from TABLE0 and TABLE1 respectively.

In case the two tables have column names in common, the returned table will add the suffix t1 to the corresponding columns from _TABLE1 respectively. The caller can choose a different prefix to be used by specifying the -t1suffix.

table place ?-columns COLUMNS? TABLE ROWVALUES INDICES

Returns a table with the specified values at the corresponding indices. ROWVALUES may be a list of row values or a compatible table. The number of rows in ROWVALUES must not be less than the number of indices specified in INDICES and the width of each row must be the same as the width of the table. INDICES must be a index list in the one of the forms described in Indices and may extend the column if the condition listed there are satisfied.

The standard option -columns may be specified to target specific columns of the table or to match the order of columns to the supplied data.

table put ?-columns COLUMNS? TABLE ROWVALUES ?FIRST?

Returns a table with the elements starting at index FIRST replaced by the corresponding elements of ROWVALUES. ROWVALUES may be a list of values or a table of the same type. The command may extend the array if necessary. If FIRST is not specified the elements are appended.

The standard option -columns may be specified to target specific columns of the table or to match the order of columns to the supplied data.

table range ?OPTIONS? TABLE LOW ?HIGH?

Returns all values from a table in the specified index range LOW to HIGH. Any of the standard options may be specified with this command.

table reverse TABLE

Returns the table with order of elements reversed.

table size TABLE

Returns the number of rows in the table.

table rows TABLE

Returns all the rows in the table as a nested list.

table slice TABLE COLUMNLIST

Returns a table containing only the specified columns from TABLE. The columns are specified by their positions or names as a list. A column must not be included more than once. The returned table contains columns in the same order as COLUMNLIST.

table sort ?options? TABLE COLSPEC

Sorts the specified table based on the values of the column specified by COLSPEC. The options -increasing, -decreasing and -nocase control the sort order as described for the column sort command.

If the -indices option is specified, the command returns the a integer column containing the indices of the table corresponding to the sorted elements.

If -indices is not specified, the return value of the command is the sorted table. The format and content of the returned table is controlled by the -columns, -table, -dict and -list options as described in Standard options.

table summarize ?options? TABLE

The command computes an aggregation function for categorized data in TABLE which must be of the form returned by the column categorize or column histogram commands with the -values option. TABLE must contain at least two columns, one of which, the category label column, only serves as part of the table returned by the command. The other column, the data column on which aggregation is done, must be a column of type any, all elements of which are themselves columns, all of the same type and contain values belonging to that category. By default, first table column is assumed to be the label column and the second is assumed to be the data column. The -labelcolumn and -datacolumn options may be used to specify different label and data columns.

The return value is a table with two columns, the first being the label column, unchanged. The second column, the summary column, named Summary by default, is the result of invoking an aggregation function on each nested column of values as described below. This column may be renamed through the -cname option to the command.

The aggregation function is specified by the following options:

  • By default, or if the -count option is specified, the aggregation function result is simply the number of elements of the corresponding nested column within the data column. The summary column is then a column of type int.

  • If the -sum option is specified, the aggregate function is the sum of the elements of the corresponding nested column (which must be of a numeric type). The type of the summary column is double if the nested columns were of that type or wide for integer types.

  • Finally, if the -summarizer CMDPREFIX option is specified, the summary column values are comprised of the values returned by the command prefix CMDPREFIX which is called with two additional arguments, the index into TABLE and the corresponding nested column at that index. The returned column summary column is then of type any by default. The -summarytype TYPE option may be specified to change this to a different type.

See Summarizing categorized data for an example.

table vcolumn TABLEVAR COLSPEC ?NEWCOL?

Returns or sets a specified column in the table contained in the variable TABLEVAR. If argument NEWCOL is not present, the command returns the table column specified by COLSPEC which may be either the column name or its position. TABLEVAR is not modified.

If NEWCOL is specified, it must be a column of the same type and length as the table column specified by COLSPEC. The command then replaces that column in TABLEVAR with NEWCOL and returns the variable’s new value.

table vdelete TABLEVAR LOW HIGH

Deletes rows from the table in variable TABLEVAR. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command. Indices are specified in any of the forms described in Indices.

table vfill ?-columns COLUMNS? TABLEVAR ROWVALUE LOW HIGH

Set the elements of the table in variable TABLEVAR at the specified indices to ROWVALUE. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

The standard option -columns may be specified to target specific columns of the table or to match the order of columns to the supplied data.

See the table fill command for more information.

table vinject TABLEVAR ROWVALUES FIRST

Inserts ROWVALUES, a list of rows or a compatible table as the table in variable TABLEVAR, at the position FIRST and stores the result back in TABLEVAR. If FIRST is end, the values are appended to the column. In all cases, the command may extend the array if necessary. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

The standard option -columns may be specified to match the order of columns to the supplied data. Note that COLUMNS must include all columns in the table as the command would not know what values to use for the unspecified columns.

table vinsert ?-columns COLUMNS? TABLEVAR ROWVALUE FIRST ?COUNT?

Inserts COUNT rows (default 1) with value ROWVALUE at position FIRST in the table stored in variable TABLEVAR. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

The standard option -columns may be specified to match the order of columns to the supplied data. Note that COLUMNS must include all columns in the table as the command would not know what values to use for the unspecified columns.

table vplace ?-columns COLUMNS? TABLEVAR ROWVALUES INDICES

Modifies a table stored in the variable TABLEVAR with the specified values at the corresponding indices. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

The standard option -columns may be specified to target specific columns of the table or to match the order of columns to the supplied data.

See the command table place for other details.

table vput ?-columns COLUMNS? TABLEVAR ROWVALUES FIRST

Modifies a table stored in variable TABLEVAR in caller’s context. The rows of the table starting at index FIRST are replaced by the corresponding elements of ROWVALUES. If FIRST is not specified the elements are appended to the array. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.

The standard option -columns may be specified to target specific columns of the table or to match the order of columns to the supplied data.

See the command table put for other details.

table vreverse TABLEVAR

Reverses the order of elements in the table in variable TABLEVAR, stores it back in the variable. The result of the command is the resulting value stored in the variable.

table width TABLE

Returns the number of columns in the table.