Copyright © 2016 Ashok P. Nadkarni. All rights reserved.

This document is a programmer’s guide for installing and using the tarray extension from Tcl. It does not list or detail every command implemented by the extension. See the command reference pages accessible from the Main Table of Contents for that information.

Important Typed array can also be manipulated using Xtal, a language embeddable in Tcl that is geared towards typed arrays and vector operations. However, Xtal is for the most part not described in this guide. See The Xtal Language for details on its use.

1. Introduction

The extension implements two data types - columns and tables. The general term typed array is used to refer to either of these. A typed column is an array containing elements of a single type that is specified when the column is created. The command tarray::column can be used to create and manipulate typed columns.

A typed table is an ordered sequence of named columns of equal size. It can be also be viewed as an array of records where the record fields happen to use column-wise storage. The corresponding tarray::table command operates on typed tables. Columns in a table can be referenced using either their name or their position in the ordered sequence.

The extension places its commands in the tarray namespace. The primary commands implemented by the extension are column and table, each being an ensemble of subcommands that operate on columns and table respectively.

Other commands provide functionality like iteration and formatting that are independent of the data type.

2. Installation and loading

Binary packages for some platforms are available from the Sourceforge download area. See the build instructions for other platforms.

To install the extension, extract the files from the distribution to any directory that is included in your Tcl installation’s auto_path variable.

Once installed, the extension can be loaded with the standard Tcl package require command.

The examples in this guide assume the commands have been imported into the calling namespace, as shown below, or are in its namespace path.

% package require tarray
→ 0.8
% namespace import tarray::column tarray::table tarray::print

If in addition you want to use the Xtal language, you need to load its package as well.

% package require xtal
→ 0.8
% namespace import xtal::xtal

3. Types

All elements in a typed column must be of the type specified when the column is created. The following element types are available:

Table 1. Table Column types
Keyword Type

any

Any Tcl value

string

A string value

boolean

A boolean value

byte

Unsigned 8-bit integer

double

Floating point value

int

Signed 32-bit integer

uint

Unsigned 32-bit integer

wide

Signed 64-bit integer

The primary purpose of the type is to specify what values can be stored in that column. This impacts the compactness of the internal storage (really the primary purpose of the extension) as well certain operations (like sort or search) invoked on the column.

The types any and string are similar in that they can hold any Tcl value. Both are treated as string values for purposes of comparisons and operators. The difference is that the former stores the value using the Tcl internal representation while the latter stores it as a string. The advantage of the former is that internal structure, like a dictionary, is preserved. The advantage of the latter is significantly more compact representation, particularly for smaller strings.

Attempts to store values in a column that are not valid for that column will result in an error being generated.

4. Indices

An index into a typed column or table can be specified as either an integer or the keyword end. As in Tcl’s list commands, end specifies the index of the last element in the tarray or the index after it, depending on the command. Simple arithmetic adding of offsets to end is supported, for example end-2 or end+5.

Many commands also allow multiple indices to be specified. These may take one of two forms - a range which includes all indices between a lower and an upper bound (inclusive), and an index list which may be a list of integers, or a column of type int. This latter allows the indices returned by commands such as column search to be efficiently passed to other commands. When indices are specified as a list cause an array to be extended, the index list must include all indices beyond the current array size in any order but without any gaps. For example, if an array contains a thousand elements (the highest index thereby being 999), the index list 1001 1000 1002 is legal but 1001 1002 is not.

Note that keyword end can be used to specify a single index or as a range bound, but cannot be used in an index list.

5. Creating columns and tables

The create subcommand creates columns and tables.

% column create int
→ tarray_column int {}

will create a typed column that can hold element of the int type. Note that the command returns a value that would normally be assigned to a variable.

The column can be initialized at creation time.

% column create int {0 1 2 3}
→ tarray_column int {0 1 2 3}

creates a column and initializes the first four elements.

Warning Applications should not depend on the string representation of a column or table as that is liable to change. Use only the tarray package commands to create and manipulated typed arrays.

The array will be grown as needed but as an optimization, preallocation may be requested.

% column create int {0 1 2 3} 1000
→ tarray_column int {0 1 2 3}

will request a preallocation of a thousand elements with the first four being initialized.

Alternatively, columns containing equally spaced values can be created with the series command.

% column series 10 1
→ tarray_column int {0 1 2 3 4 5 6 7 8 9}
% column series 5 -5 -2 2
→ tarray_column int {5 3 1 -1 -3}
% column series 10.0 3
→ tarray_column double {0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0}
1 0 (default) to 10 with step 1 (default)
2 Decreasing from 5 to -5 with step -2
3 Series of doubles instead of integer

Tables can be created and initialized in analogous fashion, for example, to create a initialized table

% set tab [table create {
    country string population wide
} {
    {China 1350000000}
    {Vatican 850}
}]
→ tarray_table {country population} {{tarray_column string {China Vatican}} {ta...

6. Specifying indices

Most commands require specification of the array locations to be targeted. This specification can be

  • a single index,

  • a contiguous range of indices, or

  • a list of indexes in arbitrary order specified as a list of integers or a column of type integer.

The various possibilities are illustrated below.

% set col [column create double {}] 1
→ tarray_column double {}
% set col [column fill $col 1.0 0 9] 2
→ tarray_column double {1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0}
% set col [column fill $col 2.0 3] 3
→ tarray_column double {1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0}
% set col [column fill $col 2.0 end-2 end] 4
→ tarray_column double {1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 2.0}
% set col [column fill $col 3.0 {2 7}] 5
→ tarray_column double {1.0 1.0 3.0 2.0 1.0 1.0 1.0 3.0 2.0 2.0}
% set col [column fill $col 3.0 [column create int {2 7}]] 6
→ tarray_column double {1.0 1.0 3.0 2.0 1.0 1.0 1.0 3.0 2.0 2.0}
1 Creates a new column
2 Indices specified as range 0 to 9
3 Single index 3
4 Range relative to end
5 Indices specified as a list
6 Indices specified as an int column

The last form, an integer column, is useful because some commands return indices in that form. For example, the following will replace all elements greater than 2.0 with 0.0.

% set col [column fill $col 0.0 [column search -all -gt $col 2.0]]
→ tarray_column double {1.0 1.0 0.0 2.0 1.0 1.0 1.0 0.0 2.0 2.0}

Although the above example used columns, table indices are specified in identical fashion.

7. Values and variables

Commands that modify typed arrays come in two flavors:

  • Commands that operate on column and table values and return the modified column or table as a result (for example fill), and

  • Commands that modify a Tcl variable containing the column or table (for example vfill).

The difference is similar to how different Tcl list commands behave, e.g. linsert and lreplace versus lset and lappend.

The examples above used the value-oriented form of the commands where the fill modifies a copy of the contents of the and returns the modified copy which is then stored back into . For large typed arrays, this is inefficient and the above would be better written as

% set col [column create double {}]
→ tarray_column double {}
% column vfill col 1.0 0 9
→ tarray_column double {1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0}
% column vfill col 2.0 3
→ tarray_column double {1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0}
% column vfill col 3.0 {2 7}
→ tarray_column double {1.0 1.0 3.0 2.0 1.0 1.0 1.0 3.0 1.0 1.0}
% column vfill col 3.0 [column create int {2 7}]
→ tarray_column double {1.0 1.0 3.0 2.0 1.0 1.0 1.0 3.0 1.0 1.0}

Here the vfill command is directly modifying the variable and assuming the content is not shared, no copy needs to be made.

Almost every command that modifies a typed array has this dual equivalent.

8. Storing data

Modifying a typed array may involve either storing a single value at multiple target locations or a different value at each target location. Further, the locations may be a contiguous range or a noncontiguous set of indices.

  • The fill and vfill commands store a single value at one or more locations, either contiguous or noncontiguous.

  • The place and vplace commands store each value from a sequence of values at one or more non-contiguous locations in a specified order (not necessarily sequential)

  • The put and vput commands store each value from a sequence of values in contiguous locations starting at a specified index.

The sequence of values to be stored may be specified as a Tcl list or a typed array. When multiple noncontiguous target locations are specified, they may be specified as a Tcl list of integers or an int column.

% column vplace col {300.0 200.0 500.0} {3 2 5} 1
→ tarray_column double {1.0 1.0 200.0 300.0 1.0 500.0 1.0 3.0 1.0 1.0}
% column vput col {7.0 8.0 9.0} 6 2
→ tarray_column double {1.0 1.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1.0}
% column vput col {11.0 12.0} 3
→ tarray_column double {1.0 1.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1.0 11.0 12.0}
1 Stores specified values at indices 3, 2 and 5
2 Stores specified values at indices 6, 7, 8
3 Appends specified values

Instead of specifying values as a list, they may also be specified as a column of the same type.

% set colA [column create double {1.0 2.0 4.0}]
→ tarray_column double {1.0 2.0 4.0}
% set colB [column create double {}]
→ tarray_column double {}
% column vplace colB $colA {1 0 2}
→ tarray_column double {2.0 1.0 4.0}

Again, this form is particularly useful when storing columns returned from commands into another column.

The table command has equivalent commands. For example

% set populations [table create {country string population wide} {}]
→ tarray_table {country population} {{tarray_column string {}} {tarray_column w...
% table vput populations {{China 1350000000} {Vatican 850}}
→ tarray_table {country population} {{tarray_column string {China Vatican}} {ta...

Note that just like in the case of columns, the list of values can be specified as a table instead, provided the column types are the same.

When storing data, the -columns option comes in handy for two different purposes. First, it allows data to be specified in a different order than that specified in the column definition. For example, in the above example, if the order of the supplied data was population followed by country, the command could have been written as follows:

% table vput -columns {population country} populations {{1350000000 China} {850 \
    Vatican}}
→ tarray_table {country population} {{tarray_column string {China Vatican China...

There is no need to reorder the fields in the input data.

Secondly, the -columns allows modification of a subset of the columns. For example,

% set populations [table create {country string population wide} {}]
→ tarray_table {country population} {{tarray_column string {}} {tarray_column w...
% table vput populations {{China 1350000000} {Vatican 850}}
→ tarray_table {country population} {{tarray_column string {China Vatican}} {ta...
% table vfill -columns {population} populations {851} 1
→ tarray_table {country population} {{tarray_column string {China Vatican}} {ta...

The population of the second table row is changed to 851.

Extending columns and tables

A point to be noted about all the above commands is that they may extend the size of the array if necessary. However, two conditions must apply for this:

  • where an index list or index column is specified, there must not be any gaps in indices that extend the array.

  • Second, if the -columns option is specified, it must include all columns (in any order) of the table. Otherwise, the command will not know what value to use for the other columns when extending the table.

As an illustration,

% set populations [table create {country string population wide} {{Vatican 850} \
    {China 1350000000}}]
→ tarray_table {country population} {{tarray_column string {Vatican China}} {ta...
% table put populations {{Vatican 860} {India 1250000000} {USA 314000000}} {0 3 2} \
    table put populations {{Vatican 860} {India 1250000000} {USA 314000000}} {1 3 4}
→ wrong # args: should be "table put ?-columns COLUMNMAP? TABLE VALUES ?POSITION?"

The first put will succeed, changing the existing value at index 1 and extending the array by two rows (note order of indices does not matter). The second put will raise an error since index 2 neither exists nor is supplied in the command.

All the commands dicussed to this point overwrite existing values, at the target locations. The column insert and column vinsert commands and their table equivalents, table insert and table vinsert store a single repeated value or row, into the type array, pushing existing elements further up. Similarly, column inject and column vinject commands and their table equivalents, table inject and table vinject, insert multiple values or rows (passed as a list or a typed array).

% column insert $col 3.0 2 10
→ tarray_column double {1.0 1.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 200.0 3...
% column inject $col {1.0 2.0 3.0} 2
→ tarray_column double {1.0 1.0 1.0 2.0 3.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1...
% column inject $col $col 2
→ tarray_column double {1.0 1.0 1.0 1.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1.0 1...

The first command returns a new column with the same value, 3.0, inserted 10 times at index 2. The second command returns a new column with all values in the passed list, 1.0, 2.0, 3.0, inserted at index 2. The last command returns a new column where all existing values in the column are reinserted at index 2.

9. Deleting data

Elements in a typed array can be deleted with the delete and vdelete commands. Succeeding elements are moved up to occupy the deleted slots. Like the fill command, the indices of the elements to be deleted may be specified as a single index, a range, a list of indices or a index column.

% column vdelete col [column search -all -lt $col 0]
→ tarray_column double {1.0 1.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1.0 11.0 12.0}

will delete all negative elements from the column.

10. Retrieving data

Retrieving data from a typed array involves specifying which elements to retrieve and what format to retrieve them in when multiple elements are retrieved.

As usual, the elements to be retrieved can be specified as a single index, an index range, a list of indices or an index column. In the simplest cast, the index command can be used to retrieve a single element.

% column index $col 4
→ 1.0
% table index $tab end
→ Vatican 850

Multiple elements can be retrieved with the get and range commands. The get command can be passed a sequence of noncontiguous indices specified as a Tcl list or a int column:

% column get $col {10 7 4}
→ tarray_column double {11.0 8.0 1.0}
% table get $tab [column search -all -lt [table column $tab 0] 0]
→ tarray_table {country population} {{tarray_column string {}} {tarray_column w...

The range command retrieves elements in a specified index range.

% column range $col 3 5
→ tarray_column double {300.0 1.0 500.0}
% table range $tab 0 10
→ tarray_table {country population} {{tarray_column string {China Vatican}} {ta...

By default, both commands returns values as a typed array. The -list and -dict commands can be used to return the values as a Tcl list or dictionary instead. In the latter case, the dictionary keys are the indices being retrieved.

% column get -list $col {3 5}
→ 300.0 500.0
% table get -dict $tab [column search -all -lt [table column $tab 0] 0]

In the case of tables, both commands also provide for retrieval of a subset of columns and in a different order than in the definition.

% table range -columns {population country} $populations 0 end
→ tarray_table {population country} {{tarray_column wide {850 1350000000}} {tar...
% table range -columns {1} $populations 0 end
→ tarray_table {population} {{tarray_column wide {850 1350000000}}}

Note columns may be specified either by position or name.

Tables provide additional commands for retrieving entire columns.

  • table column returns a column from a table. This is useful for sorting and searching columns as shown in table search examples below.

  • table slice returns a new table containing a subset of the columns of a table.

11. Searching and filtering

The column search command works similarly to Tcl’s lsearch. It returns the indices (by default) or the values (with the -inline option) of matching elements in a column. Like lsearch, column search stops on the first match and returns the matching index or value but the -all option can be used to return all matches. The command supports several matching operators. See the column search command reference for a full list.

% column search $col 0
→ -1

returns the index of the first element that is 0 using the default matching operator that tests for equality (assumes is a numeric column).

% column search -inline -gt $col 0
→ 1.0

returns the value of the first positive element.

% column search -all -gt $col 0
→ tarray_column int {0 1 2 3 4 5 6 7 8 9 10 11}

returns the indices of all positive elements. The return value is an int column.

% set exes [column create string {tclsh.exe tclsh.man wish.exe}]
→ tarray_column string {tclsh.exe tclsh.man wish.exe}
% column search -all -inline -nocase -pat $exes *.exe
→ tarray_column string {tclsh.exe wish.exe}

returns the values of all elements that match *.exe using case-insensitive matching as in Tcl’s string match -nocase.

The search can be restricted to only look at specific elements using a combination of -range and -among options.

% column search -range {0 9} $col 0
→ -1

limits the search to the first ten elements.

% column search -among {1 5 3} $col 0
→ -1

only examines the elements at positions 1, 5, and 3 in that order. The option -among is particularly useful in combining searches as in the table example below.

To search tables, use the search on individual columns. For example,

% set countries [table create {country string population wide area double} {
    {Vatican 850 0.44}
    {China 1350000000 9.55e6}
    {USA 314000000 9.63e6}
    {India 1250000000 3.3e6}
    {Russia 141930000 17e6} }]
→ tarray_table {country population area} {{tarray_column string {Vatican China ...
% set pop_col [table column $countries 1] 1
→ tarray_column wide {850 1350000000 314000000 1250000000 141930000}
% set area_col [table column $countries area] 2
→ tarray_column double {0.44 9550000.0 9630000.0 3300000.0 17000000.0}
% table get -list -columns {country} $countries [column search -all -among [column \
    search -all -gt $pop_col 250000000] -gt $area_col 5e6]
→ China USA
1 Column specified by position
2 Column specified by name

returns names of countries that are populous and large in area. Note how the outside search is limited to specific indices using the -among option.

The column intersect3 command offers another way to search across multiple columns as described later.

Tip

For more complex queries, it is more convenient to use the Xtal extension instead of some combination of search and intersect3. For example, to find all countries with a population more than a billion in less than 5 million sq.km,

% xtal::xtal { countries.country[countries.population > 1000000000 && \
    countries.area < 5000000]}
→ tarray_column string {India}

12. Sorting and ordering

Columns can be sorted using the column sort command or its variable targeting analogue column vsort. The commands take the -increasing and -decreasing options to determine the sort order.

The column sort command also takes the -indices option which results in the indices being returned instead of the values themselves. This is useful for sorting tables based on a column. For example, assuming variable countries has been initialized as above,

% table get -list $countries [column sort -indices -nocase [table column \
    $countries 0]]
→ {China 1350000000 9550000.0} {India 1250000000 3300000.0} {Russia 141930000 1...

returns rows in the sorted order based on country name.

Sort stability

When sorting tables, for display purposes for example, it is often necessary to display elements that have the same value in the sort column in the same order that they were previously displayed. Although, individual column sorts are stable, this is not enough when sorting across multiple columns. In such cases, the -indirect option to the sort command provides a solution. Using this option allows sorting where the "initial" ordering of elements is different from the actual order of elements in the column. An example will clarify this.

Consider a table that stores heights and weights.

% set tab [table create {name string height int weight int} {
    {Jeff 180 80}
    {John 175 80}
    {Jim 170 75} }]
→ tarray_table {name height weight} {{tarray_column string {Jeff John Jim}} {ta...

The user may choose to sort the table by height which boils down to the following code:

% table get -list $tab [column sort -indices [table column $tab height]]
→ {Jim 170 75} {John 175 80} {Jeff 180 80}

This results in the table being displayed in the order Jim, John, Jeff. The user may then choose to sort by weight.

% table get -list $tab [column sort -indices [table column $tab weight]]
→ {Jim 170 75} {Jeff 180 80} {John 175 80}

resulting in a display in order Jim, Jeff, John. Since they actually have the same value in the new sort column, this interchange of positions between Jeff and John is disconcerting to the user. Use of the -indirect option overcomes this problem.

% set indices [column sort -indices [table column $tab height]]
→ tarray_column int {2 1 0}
% table get -list $tab $indices
→ {Jim 170 75} {John 175 80} {Jeff 180 80}

Now use previous order of indices to order elements when their values in the weight column are equal

% table get -list $tab [column sort -indirect [table column $tab weight] $indices]
→ {Jim 170 75} {John 175 80} {Jeff 180 80}

In this last statement, the sort is done indirectly using values from table but the positioning of elements when these values compare equal is based on the order in the original table.

Reversing element order

Another form of reordering data is reversing the order of elements. Both columns and tables support reverse and vreverse commands which reverse the order of elements, an operation that is useful in many algorithms.

% print [table column $tab name]
→ Jeff
  John
  Jim
% print [column reverse [table column $tab name]]
→ Jim
  John
  Jeff

13. Arithmetic operations

The column math command can be used to perform arithmetic operations on columns on a per-element basic. The command takes multiple arguments each of which may be a column or a scalar numeric value. For example,

% set I [column create int {10 20 30}]
→ tarray_column int {10 20 30}
% set J [column create double {1.1 2.2 3.3}]
→ tarray_column double {1.1 2.2 3.3}
% column math + $I $J 1000
→ tarray_column double {1011.1 1022.2 1033.3}

As a convenience, the above command can also be issued as

% column + $I $J 1000
→ tarray_column double {1011.1 1022.2 1033.3}

See the description of column math for all the available operators.

In contrast to arithmetic commands that operate on a per-element basis, some commands operate on the entire column.

The column sum command sums all the elements in a column.

% column sum $J
→ 6.6
% column sum [table column $populations population]
→ 1350000850

The column minmax command returns a pair containing the minimum and maximum values in a column.

% column minmax [table column $populations population]
→ 850 1350000000

Note that this command is not restricted to numeric columns and will work for other types as well. Also, it has the useful -indices option which returns the indices of the minimum and maximum values instead of the values themselves.

% set indices [column minmax -indices [table column $populations population]]
→ 0 1
% column get -list [table column $populations country] $indices
→ Vatican China

14. Counting elements

The column size and The table size commands return the number of elements in a column or table.

% table size $populations
→ 2

If you are only interested in the count for elements that match specific criteria, you can use the column count command instead. Thus

% column count -gt [table column $populations population] 1000000000
→ 1

returns the number of countries with more than a billion people.

Tip Just as for searches, for more complex criteria, it is more convenient to use an Xtal query instead.

15. Formatting

The Tcl puts command is not always suitable for printing the value of a column or table for several reasons. The output is not formatted and hence difficult to read. The print command provides a alternative that outputs a more readable format.

% print [table column $countries population] -head 1 -tail 1
→ 850, ..., 141930000
% print $countries
→ +-------+----------+----------+
  |country|population|      area|
  +-------+----------+----------+
  |Vatican|       850|      0.44|
  +-------+----------+----------+
...Additional lines omitted...

By default the command only prints the first few and last few elements although this can be controlled by various options.

The prettify command is another alternative which returns the formatted string instead of printing it to a channel.

16. Introspection

Column introspection

The type of a column can be retrieved with the column type command.

% column type [table column $countries population]
→ wide
Table introspection

The table cnames command returns a list containing the names of the columns in a table.

% table cnames $countries
→ country population area

If the full table definition is desired, it can be retrieved with table definition. The returned string is in a form that can be used with table create.

% table definition $countries
→ country string population wide area double

17. Other commands

The column lookup command provides for a faster, dictionary based retrieval for string columns. This may be beneficial for columns used as keys in a table.

The column intersect3 command returns the differences between two columns. The multiple column search example above could also have written as follows:

% set pop_col [table column $countries population]
→ tarray_column wide {850 1350000000 314000000 1250000000 141930000}
% set area_col [table column $countries area]
→ tarray_column double {0.44 9550000.0 9630000.0 3300000.0 17000000.0}
% set populous [column search -all -gt $pop_col 250000000]
→ tarray_column int {1 2 3}
% set large [column search -all -gt $area_col 5e6]
→ tarray_column int {1 2 4}
% table get -list $countries [lindex [column intersect $populous $large] 0]
→ {China 1350000000 9550000.0} {USA 314000000 9630000.0}

In most cases, the previous method using column search -among is likely to be faster. However, the column intersect3 may be faster in some cases, for example, when multiple combinations are desired.

% lassign [column intersect3 $populous $large] populous_and_large \
    populous_but_small sparse_but_large

18. Usage Hints

Specifying multiple indices

When specifying indices to commands, Tcl lists of integers and columns of type int are usually interchangeable. Similarly, when passing multiple values to a command, either a Tcl list or a column of the appropriate type can be used. Note there is an ambiguity in the specific case where the target of the command is a column of type any and the passed operand is a string of the form tarray any {…​} where the operand can be interpreted either as a column or a Tcl list with three elements tarray, any and the {…​}. In this case the operand gets interpreted as a column.

Copy-on-write efficiency

Given that the tarray extension is meant for dealing with large amounts of data, it is useful to keep in mind Tcl’s object reference counting and copy-on-write implementation. Modifying a typed array that is shared will result in a copy being made, which can be expensive if the array is large. So, to modify a variable that contains a typed array, the command

% column vput col [list 1.0 2.0]
→ tarray_column double {1.0 1.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1.0 11.0 12.0...

is far more efficient than

% set col [column put $col [list 1.0 2.0]]
→ tarray_column double {1.0 1.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1.0 11.0 12.0...

assuming the value in is not itself shared. This is similar to use of Tcl’s lset command to modify lists.

Memory efficiency

As arrays get large tarray prioritizes memory usage over efficiency. As arrays grow, the additional extra memory is conservatively allocated (unlike Tcl which aggressively allocates extra memory). If the size of a typed array can be estimated in advance, for example, reading records from a database, the memory can be preallocated.

% column create int {} 1000000
→ tarray_column int {}

preallocates space for a million elements.

Typed arrays are by design implemented as consecutive elements in contiguous memory. Certain operations, such as insertion and deletion, will not be efficient when arrays get very large. For applications where such operations are common, other structures should be built on top using typed arrays as the lower level building blocks. Such higher level structures can be scripted and customized for specific usage patterns easily as they can be implemented at the script level using the low level typed array operations for efficiency. Whether this is required or not should be determined based on application benchmarks.

List and column differences

Both lists and columns have some differences in terms of functionality. Columns do not have the -stride option but the same functionality can be implemented through tables. List indexing offers nesting while although columns can be nested, the nested columns have to be explicitly accessed. On the other hand, columns offer some additional functions such as intersect3 and indexing operations (eg. extraction or storing of multiple elements through index lists).

Considerations for columns of type any

Columns of type any are stored as Tcl_Obj objects internally and thus are very similar to Tcl lists. Any advantage of an any typed array over using a simple Tcl list in terms of the memory footprint comes only from conservative memory overallocation, not from reduced memory size of individual elements. It is therefore not as big a benefit as for other types. Thus columns of type any are mostly beneficial when used in conjunction with other column types, for example in a table.

Note that columns of type string are more efficient than type any for storing small strings.

Nesting typed arrays

The type any can be any Tcl value, including typed arrays. Typed arrays can therefore be nested (tables are currently implemented as nested columns). However, unlike some of the Tcl list commands, tarray does not have commands that implicitly support nesting. Nested typed arrays have to be explicitly accessed as such.

column index [column index $outer_column 4] 0
Sort optimization

The package internally keeps track of the sorting state of a column. A column is internally marked after certain operations where the result is known to be sorted. An obvious example is the column sort command. A less obvious case is an index column returned from certain search operations. Several commands make use of this for more efficient operation. For example, the column intersect3 command is much faster when columns are known to be sorted. Thus finding the intersection of two index columns resulting from searches is an O(n) operation.