Collections are containers for other objects. Smalltalk provides a rich set of collection classes ready to be used.
The protocol offered by these collections is very orthogonal, making
good use of polymorphism. For example, most collections respond to
messages
like #at:, #includes: or #do:,
even though the implementations are totally different and the performance
may vary greatly between different collections and access methods
(*).
To get a rough idea of the common protocol, have a look at the
Collection and SequenceableCollection classes.
Collection is the superclass of all collections,
and SequenceableCollection is inherited by all collections which use
a numeric index (among others: Array, String and OrderedCollection).
These implement most of the common operations,
so knowledge about those classes' protocol is highly recommended.
The most useful collection classes are:
less frequently used collections:
seldom used or highly specialized collections:
The typical uses of arrays are:
anArray := Array new:size
anArray := someCollection asArray
theSize := anArray size
anArray at:index put:anObject
element := anArray at:index
anArray includes:anObject
anArray indexOf:anObject
anArray do:[:element | ... do something with element ...]
#( element1 element2 ... elementN )
where the individual elements must be literals (numbers, characters,
strings or other arrays).
Nested literal Arrays are written as:
#(
(element1.1 element1.2 ... element1.M)
(element2.1 element2.2 ... element2.M)
...
(elementN.1 elementN.2 ... elementN.M)
)
Notice, the above is an Array of 3 Arrays. Not to confuse
with an array containing 9 elements (which you get, if the inner parenthesises
are ommited).
Smalltalk/X also allows the objects true,
false and nil to be treated like constants in
literal arrays. This is NOT compatible with some older Smalltalk implementations,
so care should be taken if that code is to be ported to other systems later.
For example,
represents a 5-element array literal consisting of
#( nil 1 'foo' true #true )
nil,
the integer 1, the string 'foo', the boolean
true and the symbol #true.
More details are found in
"Array's class description".
Beginners typically tend to overuse arrays - please look at the description of OrderedCollection, which behaves much like arrays do, but offer functionality for growing and shrinking and are usually a better choice.
theSize := aString size
aString at:index put:someCharacter
element := aString at:index
aString replaceFrom:index1 to:index2 with:otherCollection startingAt:startIndex
newString := string1 , string2
newString := aString copyFrom:index1 to:index2
newString := aString copyFrom:index1
newString := aString copyTo:index1
newString := aString contractTo:size
Transcript show:aString
Transcript showCR:aString
aString print
aString printCR
newString := aString withoutSeparators
newString := aString withoutSpaces
newString := aString withoutLeadingSpaces
newString := aString withoutTrailingSpaces
newString := aString asUppercase
newString := aString asUppercaseFirst
someString leftPaddedTo:size
someString paddedTo:size
someString centerPaddedTo:size
someString decimalPaddedTo:size
index := aString indexOf:aCharacter startingAt:startIndex
index := aString indexOfAny:aCharacterCollection startingAt:startIndex
index := aString indexOfSeparatorStartingAt:startIndex
index := aString indexOfNonSeparatorStartingAt:startIndex
index := aString indexOfSubCollection:subString startingAt:startIndex
aString includesString:subString
aPatternString match:someString
index := someString findMatchString:aPatternString startingAt:index
Constant strings (string literals) are entered by enclosing the characters in single quotes
('). Single quotes within string constants are to be doubled.
String literals can cross line boundaries; in this case, the newline character
is part of the string.
Examples:
Currently, there exists no standard on how non printable characters
are to be represented in strings via escape sequences.
For embedded newline characters,
Smalltalk-80 provides the
'a normal string constant'
'a string constant with embedded ''quotes'''
'a string constant
with mutliple
lines'
#withCRs method,
VisualAge calls this #addLineDelimiters.
Both are supported by ST/X and convert embedded '\' (backslash)
characters to the newLine character, as in:
In addition, the (non standard) method
'a string\with newlines\' withCRs
#withEscapes converts
embedded C-language escape sequences:
'a string\n\t\twith newline and tabs' withEscapes
More details are found in
"String's class description".
or its superclass
"CharacterArray", where common protocol of String and
TwoByteString is implemented.
Therefore, symbols can be compared using identity compare (==), which is much faster than value compare (=). Identity compare is technically implemented as a pointer compare, while value compare on strings is implementated by comparing each element of the two objects.
Symbols are internally used by the system to assign names to class objects (in the Smalltalk global table), and to assign method names to method objects (in each classes method/selector association). Also, symbols are perfect to use as keys in IdentityDictionaries.
Since symbols are handled just as fast as integers, they are also perfect for flags, states (state machines) or other variables which are represented by enumeration types or define-constants in other programming languages.
Symbols do not allow any change of their character elements; the at:put:
method will raise an error. Other than that, all read-accesses are the same as for
strings (actually, the Symbol class inherits all those methods from the String class).
Consider symbols as being some kind of readonly string.
Symbols can be created (at runtime) from a string with:
also, symbol constants (i.e. symbol literals)
can be written directly by prefixing some identifier with the #-character as in:
sym := aString asSymbol
These symbol literals are created at compile time -
NOT at execution time.
Therefore, the cost of using them is not more than using integers
sym := #helloThere.
Which is why there is no need for enumeration-types in smalltalk;
simply use a symbol, wherever other programming languages would
use a numeric define or an enumeration type.
Only alphanumeric identifiers or valid message selectors
are allowed in the above example.
Examples for valid symbol iterals are:
Use quotes around the characters for other symbol literals:
#helloThere
#FooBar
#+
#at:
#at:put:
The quotes are not part of the symbols characters;
therefore
sym := #'a symbol with spaces and .... other stuff'
#+ and #'+' represent the same symbol.
Symbol literals can also be present in array literals
as in:
Within such an array literal, the leading #-character can be ommitted
(for your convenience).
Thus, the above is equivalent to:
#( #foo #bar #baz )
but, NOT equivalent to:
#( foo bar baz )
which is an array of strings.
#( 'foo' 'bar' 'baz' )
Since true, false and nil are treated
as constant objects, you have to add a #-prefix as in:
if you want those symbols in an array literal (which is unlikely).
#( #true #false)
More details are found in
"Symbol's class description".
=, which typically compares objects values.
1 (the Integer "one") will be considered
equal to 1.0 (the Float "one") in
set-inclusion tests.
IdentitySet uses identity compare
(==) and treats 1 and 1.0 as different
objects.
aSet := Set new
aSet := Set new:1000
theSize := aSet size
aSet add:anObject
aSet remove:anObject
aSet includes:anObject
aSet do:[:element | ... do something with element ...]
#at: and #at:put:) are
not allowed and trigger an error.
Sets and identitySets use hashing algorithms internally for inclusion tests. Thus, they are usually fast and show a constant access time which is determined by their fill-grade and the quality of the hash key algorithm; not by the absolute size of the set. The hash key generation is implemented by the elements - not by the set.
Sets automatically grow (and shrink) as required when elements are added or removed. To get good hashing performance, they grow larger (by some percentage) than the actual size required for the elements. Expect some 20% storage overhead when using sets.
IdentitySets are usually much faster than Sets, since both hashKey generation
and comparing is implemented more efficient in the first.
If your keys are small integers or symbols, you should always use IdentitySets.
However, for string elements, using identitySets is usually a bad choice.
More details are found in
"Set's class description"
and
"IdentitySet's class description".
#at: and #at:put: messages using an access key.
However, in contrast to arrays they allow any other object to be used as key.
Thus an array can be thought of as a special dictionary using numeric keys
only (if you want to do this, its faster to use an array rightaway - except
if your array is filled very sparse).
Dictionary uses
value compare (=) and IdentityDictionary which uses
identity compare (==) for key accesses.
|d value|
d := Dictionary new.
d at:1 put:'one'.
...
value := d at:1.0
will return the expected 'one' in the variable "value",
because the integer "1" compares equal to the float "1.0" and therefore,
the dictionary will find a value under the key "1.0".
|d value|
d := IdentityDictionary new.
d at:1 put:'one'.
...
value := d at:1.0
will raise an error, since there is no element stored under the key "1.0"
(remember: "1 = 1.0" returns true, while "1 == 1.0"
returns false).
Like with sets, dictionaries use hashing internally, and IdentityDictionaries are usually faster than Dictionaries.
Dictionaries are very powerful collections, and worth a second look.
You can put associations between any objects into dictionaries.
A simple example:
You can also use Dictionaries as a replacement for simple data structures,
which are called ``record'' or ``struct'' in other programming languages:
|nameToAge|
nameToAge := Dictionary new.
nameToAge at:'fred' put:17.
nameToAge at:'peter' put:21.
...
age := nameToAgeAt:name ifAbsent:['dont know'].
Of course, real programmers define a Person class and create an
instance of it ... this is both more 'object oriented' and provides
faster access than using dictionaries (which are not meant to be used
for the above 'constant' key access).
|peter marry|
peter := IdentityDictionary new.
peter at:#age put:23.
peter at:#firstName put:'peter'.
peter at:#middleName put:'m'.
peter at:#name put:'smalltalker'.
peter at:#salary put:51234.
...
marry := IdentityDictionary new.
marry at:#age put:21.
marry at:#firstName put:'marry'.
marry at:#middleName put:'s'.
marry at:#name put:'smalltalker'.
marry at:#husband put:peter.
...
age := peter at:#age.
firstNameOfHusband := (marry at:#husband) at:#firstName.
However, the above scheme may be used, if the number of information slots
to be held is not constant, or not foreseeable by the programmer.
More details are found in
"Dictionary's class description"
and
"IdentityDictionary's class description".
typical use:
coll := OrderedCollection new
coll := OrderedCollection new:1000
coll := OrderedCollection new grow:1000
theSize := coll size
coll add:anObject
coll addFirst:anObject
coll removeFirst
coll removeLast
coll includes:anObject
coll do:[:element | ... do something with element ...]
#includes: message
is usually much slower for large orderedCollections or arrays than it
is for sets or dictionaries. The reason is that the former have to do
a sequential search over all elements, while the later can use a fast
hashed access, typically finding an element after a few compares.
Stacks are implemented by using #addLast: as push operation,
and #removeLast as a pop operation.
Queues are implemented using #addLast: and #removeFirst.
Although, smalltalk provides a specialized Queue class, which is tuned
for this particular type of access.
More details are found in
"OrderedCollection's class description".
Sorting is done using quickSort and binary search (when inserting elements). typical use:
sorted := someOtherCollection asSortedCollection
sorted := SortedCollection new
sorted sortBlock:[:a :b | a > b]
sorted sortBlock:[:a :b | a < b]
sorted sortBlock:[:a :b | a asUppercase < b asUppercase]
sorted add:someObject
sorted includes:someObject
More details are found in
"SortedCollection's class description".
Support for twoByteStrings is (currently) somewhat preliminary and limited: there exists no language standard for 16bit string literals and no 16bit keyboard input method is supported (yet).
More details are found in
"TwoByteString's class description".
Bytearrays are also valuable for data interchange with other programs via
files, sockets, shared memory or pipes.
Protocolwise, they behave much like Arrays.
More details are found in
"ByteArray's class description".
More details are found in
"FloatArray's class description".
(2 to:10 by:2).
More details are found in
"Interval's class description".
Link or
ValueLink instances. They provide reasonable fast
methods for insertion at either ends,
but show poor performance when any inner element is to be accessed
(since the list has to be walked up to the element).
Although often overemphasized in existing computer science literature,
linkedLists are not very often used in smalltalk, due to their poor
indexed access performance and their storage overhead.
In most cases, an orderedColllection or a specialized tree implementation
will outperform a linkedList.
More details are found in
"LinkedList's class description".
SharedQueue
(which is a subclass of Queue) and helps in writing
producer/consumer applications with separate writer/reader processes.
Notice, that queue functionality is also provided by OrderedCollection
(addLast: / removeFirst messages);
however the Queue-class protocol mimics a stream (next and nextPut: messages)
and is therefore slightly
different.
If you dont care for this and if you dont need the synchronization
of a SharedQueue, use whichever you find more appealing.
More details are found in
"Queue's class description".
More details are found in
"WeakArray's class description" or
"WeakIdentitySet's class description" or
"WeakIdentityDictionary's class description".
More details are found in
"Registry's class description" or
"HandleRegistry's class description".
ResourcePacks are used to keep track of string translations for national language variants. Technically, they are dictionaries. However, the class provides additional protocol for loading resourcePacks from so called resource files.
More details are found in
"CacheDictionary's class description" or
"ResourcePack's class description".
More details are found in
"MappedCollection's class description".
Notes:
(*)
Some collections are tuned for growing, while others are not;
some use hashing or other fast access methods to support fast
search and/or inclusion test, while others do sequential searches.
Check out and understand the individual collection classes and use the one
which best fits your needs to get best performance.
Copyright © 1996 Claus Gittinger Development & Consulting
<info@exept.de>