To a hammer, everything looks like a nail.
Before describing the language and how you can create your own programs, we should explain a few basics - both to give you some background and to define the technical terms used in the documentation (and literature).
Keep in mind, that this text is only a short introduction - we recommend reading of a standard textbook on the language for more detailed information on the language (-> 'literature').
In contrast to hybrid systems like C++ or Java, everything is an object
in Smalltalk; this includes integers, characters, arrays, classes
and even the programs stackframes, which hold the local variables
during execution.
In Smalltalk, there are no such things as "builtin" types or classes,
which have to be treated different, or which do not behave exactly like
other objects with respect to message sending, inheritance or debuggability.
To the outside world, any internals of an object are hidden - all interaction is only via messages. The set of messages an object understands is called its message protocol, or protocol for short.
+
, -
, *
... messages.
asUppercase
, asLowercase
... messages.
On the other hand, it makes the system very flexible.
For example, it is very easy to extend the numeric class hierarchy by additional
things like Complex numbers, Matrices, Functional Objects etc.
All that is required for those new objects to be handled correctly is
that they respond to some basic mathematical protocol for arithmetic,
comparison etc.
Existing mathematical code is unually not affected by such extensions,
which makes Smalltalk one of the best environments for code reuse and sharing.
Classes may have zero, one or many instances.
You may wonder how a class without instances could be
useful - this will become clear when inheritance and abstract
classes are described further down in this document.
1
, 99
and -8
are instances of the Integer
class
1.0
and 3.14159
are instances of the Float
class
'hello'
, 'foo'
are instances of the String
class
Button
class
nil
is the one and only instance of the UndefinedObject
class
Every class keeps a table (called "MethodDictionary") which
associates the name of the message (the so called message selector) to a method.
When a message is sent to an object, the classes method table
is searched for a corresponding entry and - if found - the associated
method is invoked (more details below ...).
Since Smalltalk is a pure object oriented language,
this table is also an object and accessable at execution time;
it may even be modified during execution
and allows objects to learn about new messages dynamically.
Of course, the interactive programming environment heavily depends on this;
for example, the browser is a tool which adds new items to this table when
a method's new or changed code is to be installed.
A class inherits all protocol as defined by its superclass(es) and may optionally redefine individual methods or provide additional protocol.
Therefore, a message send performs the following actions (***):
#doesNotUnderstand:
) to
the receiver with the message object as argument.
Object
, there is no need to.
Actually, it may occasionally make sense for a class
to inherit from no class at all (i.e. to have no superclass).
The effect is that instances of such classes do not inherit ANY protocol
and will therefore trigger an error for all received messages.
All instances of a class provide the same message protocol,
but typically contain different internal state.
It is actually the class, which
provides the definition of the protocol and amount of internal
state of its instances.
String
class
and respond to the same set of messages. But the internal state of the first
string consists of the characters "h" and "i", whereas the second contains
the characters "w", "o", "r", "l", "d".
An object's instance variables are only accessable via protocol,
which is provided by the object - there is no way to access an object's
internals except by sending messages to it.
This is true for every object - even for the strings in the example above.
There is no need for the sender of a message to actually know the class of
the receiver - as long as it responds to the message and performs the
appropriate action.
'at:'
message. You could write
an ExternalString class, which fetches characters from a file
and returns them from this message.
The sender of the 'at:'
message would not be affected at all by this
(except for a possible performance degration ;-).
#basicSize
, #identityHash
etc.).
Thus, when we send a message to some `normal' object, the corresponding class
object provides the behavior - when some message is sent to a class object,
the corresponding metaclass provides the behavior.
Technically, messages to classes are treated exactly the same way as
messages to non-class objects: take the receiver's class, lookup the method in its
method table, execute the method's code.
Since different metaclass may provide different protocol for their class
instances, it is possible to add or redefine class messages just like any other
message.
As a concrete example, take instance creation which is done in Smalltalk
by sending a "new"-message to a class.
In Smalltalk, there is no such thing as a built-in "new" (or any other built-in)
instance creation message
- the behavior of those instance creation (class) messages is defined exclusively by metaclass protocol.
Therefore, it is possible (and often done) to redefine the "new" method for special handling;
for example singletons (classes which have only a single unique instance), caching and pooling
(the "new" message returns an existing instance from a cache), tracing and many more are easily
implemented by redefining class protocol.
Object
-Class,
which provides a rich protocol useful for all kinds of objects
(comparing, dependency mechanism, reflection etc.).
As we will see shortly, Smalltalk programs only consist of messages being sent to objects.
Since even control structures
(i.e. conditional evaluation, loops etc.)
are conceptionally implemented as messages,
a common syntax is used in your programs both for
the programs flow control and for manipulating objects.
Once you know how to send messages to an object,
you also know how to write and use fancy control structures.
Smalltalks power (and difficulty to learn) does not lie in the language itself, but instead in the huge protocol provided by the class libraries.
Lets start with languages building blocks...
:=
"
"some comment"
"this
is
a
multiline comment"
"
another multiline comment
"
As a language extension, ST/X also allows end-of-line comments.
These are introduced by the character sequence "/ (double quote-slash) and treat everything
up to the end of the line as a comment:
End-of-line comments are especially useful to comment-out code which contains comments.
"/ this is an end-of-line comment
#new
-message sent to a class or the #copy
-message sent to an instance.
The following literal constant types are allowed:
Integer
constants:
6
,
-1
,
12345678901234567890
8r0777
,
16r80000000000
,
16rAFFE
,
16r123456789abcdef0123456789abcdef
,
2r0111000
Float
) constants:
1.234
,
1e10
,
1.5e15
16r10.1
" or "2r10.1
") are allowed,
but should not be used in practice.
For compatibility with other Smalltalk systems, the "d"-character is also recognized
as an exponential character. I.e. 1d10
has the same value as 1e10
.
1.234s4
,
10s4
,
Because scaled decimals are not supported by all Smalltalk systems, their support has to be explicitely enabled via the settings menu.
Boolean
constants:
true
, false
UndefinedObject
constant:
nil
Character
constants from the 8-bit iso8859-1 character set:
$c
ST/X also allows unicode character constants with a codepoint above 16rFF,
of up to 30 bit (i.e. up to 16r3FFFFFFF).
Therefore, $日
is also a valid character constant in ST/X
and represents a character with a codePoint of 16r65E5 (26085).
String
constants:
'foo'
or
'a long string constant'
ST/X also allows unicode string constants where individual characters may have a codepoint of up to 30 bit (i.e. up to 16r3FFFFFFF).
Symbol
constants:
#'bar'
,
#'++'
or
#'foo bar baz'
Symbols are much like immutable Strings, with the big advantage that
they can be compared using identity compare (== / ~~) whereas Strings
usually have to be compared using equality (i.e. contents-) compare operators (= / ~).
If the symbol's characters are all alphanumeric or all from the set of binary special
characters (+, -, *, and a few others), the quotes can be omitted and
the short form #bar
can be used instead of #'bar'
.
More information on symbols is found in
"collection classes".
Array
constants:
#(1 2 $b)
#(1 #two #(3 4) #( #(5 6) 7) )
.
#(1 two (3 4) ( (5 6) 7) )
ByteArray
constants:
#[0 1 2 3 4]
Identifiers must start with a letter or an underscore character.
The remaining characters may be letters, digits or the underline character (*).
Examples:
foo
aVeryLongIdentifier
anIdentifier_with_underline_characters
For portability with some (VMS-)VisualWorks Smalltalk variants, a dollar character ($) can also be allowed inside an identifier as a compiler option.
nil
true
and false
self
super
thisContext
here
Since "here" is a Smalltalk/X language extension, its builtin-ness is less strict than that of the other special variables: if a variable named "here" is defined and visible in the current variable scope, here will refer to that variable; otherwise, it refers to the receiver (with different lookup semantics).
1 negative
sends the message "negative"
to the number 1, which is the receiver of the
message.
Unary messages, like all other messages, return a result,
which is simply another object.
In the above case, the answer from the "negative" message is the
boolean false
object.
Evaluate this in a workspace (using printIt); try different receivers (especially: try a negative number).
Unary messages parse left to right, so, for example:
first sends the "
1 negative not
negative
"-message to the number 1.
Then, the "not
"-message is sent to the returned value.
The response of this second message is returned as the final value.
If you evaluate this in a workspace,
the returned value will be the boolean true
.
Try a few unary messages/expressions in a workspace:
1 negated
-1 negated
false not
false not not
-1 abs
1 abs
10 factorial
10 factorial sqrt
5 sqrt
1 isNumber
$a isNumber
$a isNumber not
1 isCharacter
$a isCharacter
'someString' first
'hello world' size
'hello world' asUppercase
'hello world' copy
'hello world' copy sort
#( 17 99 1 57 13) copy sort
1 class name
1 class name asUppercase
Notice, that in the above examples, you already encountered polymorphy: both strings and
arrays respond to the sort
message and sort their contents in place.
5 between:3 and:8
"between:
" and "and:
" are the keywords,
the numbers 3 and 8 are the arguments and the number 5 is the receiver of the message.
The message's actual selector is formed by the concatenation of all individual
keywords; in the above example, the selector is "between:and:
".
As a beginner, keep in mind that
this is different to both a "between:
" and an "and:
"-message.
And of course, also "between:and:
" and "and:between:
"
are different messages.
In the browser, the method will be listed under the name: "between:and:
".
Keyword messages do parse left to right,
but if another keyword follows a keyword message, the expression is parsed as
a single message (taking the keywords concatenation as selector).
Thus, the expression:
would send a "
a max: 5 min: 1
max:min:
"-message to the object referred to by the variable
"a".
This is not the same as:
which first sends the "
(a max: 5) min: 1
max:
"-message to "a",
then sends the "min:
"-message to the result.
Try these in a
workspace
(don't fear the error...)
To avoid ambiguity you must place parentheses around.
Try a few keyword messages/expressions in a workspace (also see what happens, if you ommit
or change the parenthesis):
1 max: 2
1 min: 2
(2 max: 3) between: 1 and: 3
(1 max: 2) raisedTo: (2 min: 3)
Unary messages have higher precedence than keyword messages,
thus
evaluates to 9.
9 max: 16 sqrt
(because it is evaluated as: "9 max: (16 sqrt)" which is "9 max:4".
It is not "(9 max: 16) sqrt", which is "16 sqrt" and would give 4 as answer.)
An example of a binary message is the one which implements arithmetic addition
for numeric receivers (it is implemented in the Number classes):
This is interpreted as a message sent to the object 1 with the selector '+'
and one argument, the object 5.
1 + 5
Binary messages
parse left to right (like unary messages).
Therefore,
results in 21, not 17.
2 + 5 * 3
(first, '+' is sent to 2, with 5 as argument. This first message returns 7.
Then, '*' is sent to 7, with 3 as argument, resulting in 21 being answered.)
To change the execution order or to avoid ambiguity you should place parentheses around:
Now, the execution order has changed and the new result will be 17.
2 + (5 * 3)
Unary messages have higher precedence than binary messages, thus
evaluates as "9 + (16 sqrt)", not "(9 + 16) sqrt".
9 + 16 sqrt
On the other hand, binary messages have higher precedence than
keyword messages, thus
evaluates as "(9 + 16) max: (3 + 4)" which is "25 max: 7" and answers 25.
9 + 16 max: 3 + 4
It is not the same as "9 + (16 max: 3) + 4" (which results in 29) or
"((9 + 16) max: 3) + 4" (which in this case also results in 29)
Again, we highly recommend the use of parentheses - even when the default evaluation order matches the desired order; it makes your code much more readable, and helps beginners a lot.
To practice, try a few binary messages/expressions in a workspace:
1 + 2
1 + 2 * 3
(1 + 2) * 3
1 + (2 * 3)
-1 * 2 abs
(-1 * 2) abs
5 between:1 + 2 and:64 sqrt
5 between:(1 + 2) and:(64 sqrt)
The second example above shows why parenthesis are so useful:
from reading the code, it is not apparent, if the evaluation
order was intended or is wrong.
You will be happy to see parenthesis when you have to debug
or fix a program which contains a lot of numeric computations.
Here are a few more "difficult" examples:
1 negated min: 2 negated
1 + 2 min: 2 + 3 negated
There are a few binary messages found in the system,
which look like syntax on first sight,
and are therefore a bit difficult to understand and read for beginners.
Examples for such fancy messages which are worth mentioning are:
@
@
"-message is understood by numbers. As a binary message, it expects
a single argument. It returns a Point-object (coordinate in 2D space) with the receiver
as x, and the argument as y value.
Thus,
"10 @ 20
"
returns the same as
"(Point new x:10 y:20)"
.
->
->
"-message is similar to the above "@
" in that it is a shorthand instance creation message.
It is understood by any object and returns an association (a pair) object.
The message,
"10 -> 20
"
returns the same as
"(Association new key:10 value:20)"
.
?
?
"-message returns the receiver if it is non-nil, and the argument otherwise.
It is used to deal with possibly uninitialized variables in assignment or as message argument.
Thus,
"a ? 20
"
returns the same as
"(a notNil ifTrue:[a] ifFalse:[20])"
.
In ST/X, the actual set of allowed characters can be queried from the system
by evaluating (and printing) the expression "Scanner binarySelectorCharacters
".
If you compare your favourite programming language
against regular english,
you will find Smalltalk to be much more similar to plain english
than most other programming languages.
For example, consider the order to a person called "tom",
to send an email message to a person called "jane":
(assuming that tom, jane, theEmail refer to objects)
English | Smalltalk | Java / C++ |
---|---|---|
tom, send an email to jane. | tom sendEmailTo: jane. | tom.sendEmail(jane);
tom->sendEmail(jane); |
tom, send theEmail to jane. | tom send: theEmail to: jane. | tom.sendEmail(theEmail, jane);
tom->sendEmail(theEmail, jane); |
tom, send theEmail to jane with subject: 'hi'. | tom send: theEmail to: jane withSubject: 'hi'. | tom.sendEmail(theEmail, jane, "hi");
tom->sendEmail(theEmail, jane, "hi"); |
album play.
album playTrack: 1.
album repeatTracksFrom: 1 to: 10.
and it does exactly what it looks like.
Another plus in Smalltalk is that the meaning of an argument is described by the keyword before it. Whereas in Java or C++ you have to look at a functions definition to get information on the order and type of argument, unless you use fancy function names like "sendEmail_to_withSubject()" which actually mimics the Smalltalk way.
Smalltalk was originally designed to be easily readable by both programmers AND non-programmers. Humour says, that this is one reason why some programmers do not like Smalltalk syntax: they fear to loose their "guru" aura if others understand their code ;-) .
1 negated
"negated"
to the number 1, which gives
us a -1 (minus-one) as result.
1 negated abs
"negated"
to the number 1, which gives
us an intermediate result of -1 (minus-one);
then, the message "abs"
is sent to it, giving us
a final result of 1 (positive-one).
-1 abs negated
"abs"
to the number -1 (minus-one), which gives
us a 1 (positive one) as intermediate result. Then this object
gets a "negated"
message.
1 + 2
"+"
to the number 1, passing it
the number 2 as argument. The returned object is 3.
"+"
message.
1 + 2 + 3
"+"
is sent to the number 1, passing it
the number 2 as argument. Then, another "+"
message is sent to
the intermediate result, passing the integer-object 3 as argument.
1 + 2 * 3
-1 abs + 2
"abs"
to the number -1 (minus-one), then sends "+"
to the result, passing 2 as argument.
1 + -2 abs
"abs"
to the number -2, then sends "+"
to the number 1, passing the result of the first message as argument.
-1 abs + -2 abs
"abs"
to the number -1 (minus-one) and remembers the result.
Then sends "abs"
to the number -2 and passes this as argument
of the "+" message to the remembered object.
1 + 2 sqrt
"sqrt"
to the number 2, then passes this as argument
of the "+" message to the number 1.
(1 + 2) sqrt
"+"
to the number 1, passing 2 as argument.
Then sends "sqrt"
to the result.
1 min: 2
"min:"
(minimum)
message to the number 1, passing 2 as argument.
(1 max: 2) max: 3
"max:"
(maximum)
message to the number 1, passing 2 as argument. Then sends "max:"
to the returned value, passing 3 as argument.
(1 + 2 max: 3 + 4) min: 5 + 6
"+"
to the number 1 passing 2
as argument and remembers the result.
Then, "+"
is sent to the
number 3, passing 4 as argument.
Then, "max:"
is sent to the remembered first result,
passing the second result as argument. The result is again
remembered.
Then, "+"
is sent to the number 5, passing
6 as argument.
Finally, the "min:"
message is sent to the
remembered result from the first max: message, passing
the result from the "+"
message.
1 max: 2 max: 3
"max:max:"
message to the number 1, passing the two arguments, 2 and 3.
"max:max:"
message,
this leads to an error (message-not-understood).
This example illustrates why parenthesis are highly recommended - especially with concatenated keyword messages.
'hello' at:1
"at:"
message to the string constant.
'hello' , ' world'
","
binary message to the first string constant, passing another string as argument.
'hello' , ' ' , 'world'
","
binary message to the first string constant, passing ' ' as argument.
Then, the result gets another ","
message, passing 'world' as
argument.
#(10 2 15 99 123) min
"min"
unary message to an array object (in this case: a constant array literal).
All collections respond to the "min"
message, by searching for its smallest
element, and returning it.
WorkspaceApplication new open
new
"
unary message to the WorkspaceApplication class object, which returns a new instance of itself.
Then, this new instance gets the "open
" message, which asks for a window
to be shown.
-1 negated.
1 + 2.
first sends the "negated
" message to -1 (minus one), ignoring the result.
Then, the "+
" message is sent to 1 (positive one), passing the number 2 as argument.
Notice that there is actually no need for a period after the last statement
(it is a statement-separator) - it does not hurt, though.
We will encounter more (useful) examples for multiple statements below.
nil
, when created.
Important Note to C, C++ and C# programmers:
Smalltalk variables always hold a pointer reference to some object. Every object "knows" its type. Its NOT the pointer, which knows the type of the object it points to. In Smalltalk it is totally impossible to treat a pointer to an integer like a pointer to something else. There is no such thing like a cast in Smalltalk. Therefore we say, that Smalltalk is a "dynamically strongly typed language". In contrast to C++, which is a "statically weakly typed language".In Smalltalk, all objects are always and only created conceptionally on the dynamic garbage collected heap storage. There is no such thing as "boxing" or "unboxing". Assignments do never copy the value, but only the reference to the object.
For now, only global variables and local variables are described (because we need them for more interesting examples); the other variable types will be described later.
Beside classes, only a few other objects are bound to globals; the most interesting for now are:
Transcript
show:
something
cr
showCR:
something
show:
followed by cr
.
flash
Smalltalk
Stdin
, Stdout
and Stderr
That said (and kept in mind), being able to access the console via the Transcript
is often very helpful: it allows to send debugging and informative messages from the
program.
For example:
shows that greeting in the Transcript window,
and
Transcript show: 'Hello world'
advances its text cursor to the next line.
Transcript cr
There is also a combined message, as in:
Finally, to wakeup a sleepy user, try:
Transcript showCR: 'Hello world'
Transcript topView raise.
Transcript showCR: 'Ring Ring - Wakeup !'.
Transcript flash.
A global is created by sending the message at:put:
to the global called Smalltalk
,
passing the name of the new global as a symbol.
For exampe:
and can then be used:
Smalltalk at:#Foo put: 'Now there is a Foo'
or simply:
Smalltalk at:#Foo
if you want Smalltalk to forget about that variable, execute
Foo
(be careful to not remove one of your loved one's by accident).
Smalltalk removeKey:#Foo
Having said this, you now better immediately forget about global variables.
Workspace variables are created and destroyed via corresponding menu functions in the workspace window. You can also configure the workspace to auto-define any unknown variable as a workspace variable (in the workspaces "workspace"-"settings"-menu). Thats the way to go for the remainder of this lecture, because it makes your life so much easier.
Be aware of the fact, that workspace variables are invisible to compiled code - i.e. any reference to such a variable from within compiled code will actually refer to a global variable with the same name (which will be seen as nil if it never gets a value assigned to).
For a C++, Java or C# programmer, class instance variables are hard to understand, unless they see the class objects as real objects with private slots, protocol etc. This is because none of those languages offers a similar construct.
Instance variables are private to some object and their lifetime is the lifetime of the object.
We will come back to instance variables, once we have learned how classes are defined.
A local variable declaration consists of an opening '|' (vertical bar) character,
a list of identifiers and a closing '|'.
It must be located before any statement within a code entity
(a doIt-evaluation, block or method; the later being described below).
For example:
declares 3 local variables, named 'foo', 'bar' and 'baz'.
| foo bar baz |
A local variables lifetime is limited to the time the enclosing context is active - typically, a method or a block.
Notice, that when a piece of code is evaluated in a workspace window, the system generates an anonymous method and invokes it for execution. Therefore, a local variable declaration is also allowed with doIt-evaluation (the variables lifetime will be the time of the execution).
foo
" and "bar
" have beend declared as
variables before, you can assign a value with:
foo := 1
or:
bar := 'hello world'
foo := bar := baz := 1
:
" in ":=
".
=
" instead, you will get a binary message send expression
which means "is equal to" (i.e. it is a comparison operator).
foo := baz = 1.
would assign true or false to "foo
", depending on whether "baz
" is equal to 1 or not.
foo := (baz = 1).
Even if they are not required, it is a bit easier to read.
All variables are initially bound to nil.
This is the same behavior as found in Java or C#,
but opposed to C or C++.
You will never get random or even invalid values in a Smalltalk variable.
Keep in mind, that only a reference to an object is stored into the variable,
not the state of the object itself.
This means, that multiple variables may refer to the same object.
For example:
The previous example demonstrates,
that both var1 and var2 refer to the same array object.
I.e. that in Smalltalk, a variable actually holds a reference to an object,
and that more than one variable may refer to the same object
|var1 var2|
"create an Array with 5 elements ... and assign it to var1"
var1 := Array new:5.
"and also to var2"
var2 := var1.
"change the 2nd element..."
var1 at:2 put:1.
Transcript show:'var1 is '. Transcript showCR:var1.
Transcript show:'var2 is '. Transcript showCR:var2.
Technically speaking: a variable holds a pointer to the object.
This is especially true with multiple assignments;
so:
binds both "
foo := bar := 'hello'
foo
" and "bar
" to the same string-object.
Array := nil
To prevent beginners from
doing harm to the system, ST/X checks for this situation
and gives a warning.
As a general rule:
do not assign to global variables - it is usually a sign of very very bad design if you have to. As you read above and will see below, there are other variable types which can be used in most situations.
Ask the Float
class for the π (pi) constant:
Ask the
Float pi
Transcript
object to raise its top view:
Ask the
Transcript topView raise
Transcript
object to flash its view:
Ask the
Transcript flash
WorkspaceApplication
class to create a new instance and open
a view for it:
Declare a local variable, assign a value and display it on the transcript
window:
WorkspaceApplication open
Remember, that a variable may refer to any object.
|foo|
Transcript show:'foo is initially bound to: '.
Transcript showCR:foo.
foo := -1.
Transcript show:'foo is now bound to: '.
Transcript showCR:foo.
foo := foo + 2.
Transcript show:'foo is now bound to: '.
Transcript showCR:foo.
Thus, the following is legal (although not considered a good style):
A rule of wisdom:
|foo|
foo := -1.
Transcript show:'foo is: '.
Transcript show:foo.
Transcript cr.
Transcript show:'and it is a: '.
Transcript showCR:foo class name.
foo := 'hello'.
Transcript show:'foo is now: '.
Transcript show:foo.
Transcript cr.
Transcript show:'and it is a: '.
Transcript showCR:foo class name.
do not reuse variables (as in the above case) unless needed for accumulating something.
Having an extra variable in a method does not cost anything (neither time, nor space).
However, it helps a lot in readability.
Sometimes even use a temporary variable just for the name of it, to document what an
intermediate result represents.
| coll |
coll := Set new. "/ create an empty Set-collection
coll add:'one'.
coll add:'two'.
coll add:3.
A cascade expression (semicolon) allows this to be written a little shorter:
it sends another message - possibly with arguments - to the previous receiver.
The following cascade is semantically equivalent to the above
albeit a bit shorter:
| coll |
coll := Set new. "/ create an empty Set-collection
coll add:'one'; add:'two'; add:3.
add:
" method returns its argument
(for historic reasons beyond my understanding).
This means, that the following code does NOT what it looks like:
| coll |
coll := Set new
add:1; add:2. "/ Attention: add returns its argument
Instead of the expected, it leaves the integer 2 in the variable named "coll",
because the assigned value is the value of the last "add:
" message.
Because this is a recurring pattern, a method named "yourself
" has been added to the Object class.
As the name implies, it simply returns itself.
Use this as the last message of the cascade:
to prevent the above problem and get the expected value assigned.
You may encounter this kind of code at various places in the system.
| coll |
coll := Set new
add:1; add:2;
yourself. "/ returns the receiver - i.e. the Set
A block represents a piece of executable code. Being a "real object" just like any other, it can be stored in a variable, passed around as argument or returned as value from a method - just like any other object. When required, the block can be evaluated at any later time, which results in the execution of the blocks statement(s). The fancy thing is that the blocks statements can see and are allowed to access all of the surrounding variables. Those which are visible within the static block scope.
| someBlock |
someBlock := [ Transcript flash ].
later, when the block has to be evaluated (i.e. its statements executed),
send it the "#value
" message:
...
someBlock value.
...
Blocks may be defined with 0 (zero) or more argument(s);
|someBlock|
...
someBlock := [:a | Transcript showCR:a ].
...
defines a block which expects (exactly) one argument.
#value:
" message, passing the desired
argument object.
someBlock value:'hello'
(here, a string-object is passed as argument).
Blocks can be defined to expect multiple arguments, by declaring each
formal argument preceeded by a colon. For evaluation, a message of the form
"#value:...value:
" with a corresponding number of arguments must be used.
For example, the block:
can be evaluated with:
|someBlock|
...
someBlock := [:a :b :c |
Transcript show:a.
Transcript show:' '.
Transcript show:b.
Transcript show:' '.
Transcript show:c.
Transcript cr
].
...
someBlock value:1 value:2 value:3
|someBlock|
...
someBlock := [:a :b :c | a + b + c].
...
Transcript showCR:(someBlock value:1 value:2 value:3).
...
When executed, the above will display "6" on the Transcript window.
|someBlock|
...
someBlock := [:a :b :c | a + b + c].
...
result := someBlock value:1 value:2 value:3.
...
will assign the numeric value 6 to the result variable.
Notice, that blocks close over the variables of the environment which was active at
the time the closure was created.
And also, that blocks also create such a variable-environment when executed.
This means that in the following:
the "action at:5" retrieves a block which has captured the current value of the factor
variable (which was 5) and therefore multiplies the argument by 5.
|actions|
actions := (1 to:10) collect:[:factor | [:arg | arg * factor] ].
(actions at:5) value:10.
Blocks have many nice applications: for example, a GUI-Buttons action can be defined using blocks, a timer may be given a block for later execution, a batch processing queue may use a queue of block-actions and a sorted collection may use a block to specify how elements are to be compared.
However, the most striking application of blocks is in defining control structures (like if, while, repeat, loops etc.), and as "higher order functions" when enumerating or processing collections and the like.
Boolean
, Block
and the Collection
classes.
ifTrue:
/ ifFalse:
protocol as implemented by the boolean objects bound to the globals "true
" and "false
":
ifTrue:
aBlock
ifFalse:
aBlock
ifTrue:
trueBlock ifFalse:
falseBlock
ifFalse:
falseBlock ifTrue:
trueBlock
So, to compare two variables and send some message to the Transcript
window, you can write:
of course, you may change the indentation to reflect the program flow;
...
(someVariable > 0) ifTrue:[ Transcript showCR:'yes' ].
...
this is what a C-Hacker (like I used to be) would write:
and that is how a lisper (and many smalltalkers) would write it:
...
(someVariable > 0) ifTrue:[
(someVariable < 10) ifTrue:[
Transcript showCR:'between 1 and 9'
] ifFalse:[
Transcript showCR:'positive'
]
] ifFalse:[
Transcript showCR:'zero or negative'
].
...
Because the above constructs are actually message sends
(NOT statement syntax), they do also return a value when invoked.
Thus, some smalltalkers or lispers would probably prefer a more functional style,
as in:
...
(someVariable > 0)
ifTrue:
[(someVariable < 10)
ifTrue:
[Transcript showCR:'between 1 and 9']
ifFalse:
[Transcript showCR:'positive']]
ifFalse:
[Transcript showCR:'zero or negative'].
...
Which one you prefer is mostly a matter of style,
and you should use the one which is more readable
- sometimes, deeply nested expressions can become quite
complicated and hard to read.
...
Transcript showCR:
((someVariable > 0)
ifTrue:
[(someVariable < 10)
ifTrue:['between 1 and 9']
ifFalse:['positive']]
ifFalse:
['zero or negative']).
...
As a final trick, noticing the fact that every object responds to the #value
-message,
and that the #if
-messages actually send #value
to one of the alternatives and
returns that,
you may even encounter the following coding style sometimes (notice the non-block args of the inner ifs):
The above "trick" should (if at all) only be used for constant if-arguments
and only when using the if for its value.
With message-send arguments, both alternatives would be evaluated, which is probably not the desired
effect.
...
Transcript showCR:
((someVariable > 0)
ifTrue:
[(someVariable < 10)
ifTrue:'between 1 and 9'
ifFalse:'positive']
ifFalse:
'zero or negative').
...
whileTrue:
loopBlock
whileFalse:
loopBlock
whileTrue
whileFalse
|someVar|
someVar := 1.
[someVar < 10] whileTrue:[
Transcript showCR:someVar.
someVar := someVar + 1.
]
"(someVar < 10)"
would return a boolean, which does
not implement the while messages.)
condition := [ something evaluating to a Boolean ].
...
condition whileTrue:[
...
]
If while-loops are used that way, the condition is typically passed in as
an argument or configured in some instance variable.
The above while-loops check the condition at the beginning - i.e. if the condition block evaluates to false initially, the loop-block is not executed at all.
The Block class also provides looping protocol for condition checking at the end
(I.e. where the loop-block is executed at least once):
and also:
[
...
loop statements
...
] doWhile: [ ...condition... ]
[
...
loop statements
...
] doUntil: [ ...condition... ]
Of course, an obvious way to write an endless loop is:
However, to document the programmers intention, it it better to
use one of the explicit endless loop constructs (#
[true] whileTrue:[
...
endless loop statements
...
]
loop
or #repeat
),
as in:
[
...
endless loop statements
...
] loop
n timesRepeat:[
...
repeated statements
...
]
where n stands for an integer value (constant, variable or message expression).
|anArray|
anArray := #( 'one' 'deux' 'drei' 'quatro' 5 6.0 ).
1 to: 6 do: [:idx |
Transcript showCR: (anArray at: idx)
].
or, with an increment,
|anArray|
anArray := #( 'one' 'deux' 'drei' 'quatro' 5 6.0 ).
1 to: 6 by: 2 do: [:idx |
Transcript showCR: (anArray at: idx)
].
However, no real Smalltalk programmer would use "to:do:
" to enumerate a collection's elements.
|anArray|
anArray := #( 'one' 'deux' 'drei' 'quatro' 5 6.0 ).
anArray do:[:eachElement |
Transcript showCR: eachElement
].
Notice that this example also demonstrates good vs. bad resuability of the code:
the first version (using to:do:) uses a numeric-index-based address to fetch each element.
This implies that the collection must be some kind of numerically-sequenceable collection.
The second version simply leaves that decision to the collection itself.
It will therefore work with any kind of collection (lists, trees, hashtables, sets, etc.).
Of course, in the above example we hardcoded an array as receiver, which is known to allow access
via a numeric index. However, in practice, the collection is often coming from elsewhere via a
message argument or variable value. In that case, a changing collection representation in other parts of
the program will not affect the enumeration loop.
Open a browser, and look at the implementation of
#reverseDo:
,
#collect:
, #detect:
, #select:
, #findFirst:
etc.
do
"- or even "while
"-loops with
indexing to enumerate elements for element searching or processing.
[
'nonExistingFile' asFilename contents
] on:Error do:[:exceptionInfo |
Transcript showCR:(exceptionInfo description).
].
Smalltalk's blocks are prefectly well suited for this style of programming, because they allow for all of the above. And actually, they are used heavily as arguments in the collection class protocol.
Array
, Set
, Dictionary
etc.) provide for
messages to enumerate their elements,
and evaluate a given block for each of them.
The most useful of those enumeration messages is:
do:
aOneArgBlock
|anArray|
anArray := #( 'one' 'deux' 'drei' 'quatro' 5 6.0 ).
anArray do:[:eachElement | Transcript showCR:eachElement ].
of course, you should indent the code to reflect the intended control flow.
With C-style indentation the code looks as:
|anArray|
anArray := #( 'one' 'deux' 'drei' 'quatro' 5 6.0 ).
anArray do:[:eachElement |
Transcript showCR:eachElement
].
|bag mostUsed|
bag := Bag new.
'../../doc/online/english/getStart' asFilename directoryContentsAsFilenames
select:[:eachFile | eachFile isDirectory not]
thenDo:[:eachFile |
eachFile contents do:[:eachLine |
bag addAll: eachLine asCollectionOfWords.
].
].
mostUsed := (bag valuesAndCounts asArray sort:[:a :b | a value > b value ]) first:10.
CodingExamples_GUI::HistogrammView new
extent:500@300;
labels:(mostUsed collect:[:eachPair | eachPair key storeString]);
values:(mostUsed collect:[:eachPair | eachPair value]);
open.
The higher-order functions used are:
select:thenDo:
select:thenDo:
do:
sort:
collect:
collect:
|function measureData|
function :=
[
1000000 timesRepeat:[
'abcdefxghijklxmn' occurrencesOf:$x
]
].
measureData := (1 to:30)
collect:[:n |
Time millisecondsToRun: function.
].
CodingExamples_GUI::HistogrammView new
extent:750@400;
labels:nil;
values:measureData;
open.
Notice again, that higher order functions are used as the function itself,
with the timesRepeat and with the collect: expressions.
Historically, due to its very readable, english-like syntax, Smalltalk does not have lots of syntactic sugar. Everything was expressed as message-sends to objects. This includes class- and method-definition, variable initialization, looping, exception handling etc.
In contrast, most other programming languages typically provide separate syntactic constructs for each of the above mentioned issues (lisp being a well-respected exception here). The only existing syntactic sugar are the additional message-syntax for binary selectors (which was added to make mathematic expressions more readable) and the cascade message.
{ expression1. expression2. ... expressionN }
to construct a new Array (at runtime) with N elements, computed by the corresponding expressions.
{ 'str'. Date today. Time now. 1. #sym }
creates a 5-element array at run time.
Notice, that the brace-constructor shows the same behavior as a multi-new-message to
the Array
class, or (for more than a small number of elements),
for an "Array new:"
followed by
a bunch of at:put:
messages.
Thus, the above is equivalent to:
If you use this feature, be aware that "#( )" and "{ }" both return an empty array.
However, the array returned by "#( )" has been created at compilation time, and the
same identical object will be returned, whenever the "#( )"-expression is evaluated again.
(Array new:5)
at:1 put:'str';
at:2 put:(Date today);
at:3 put:(Time now);
at:4 put:1;
at:5 put:#sym;
yourself
In contrast, every evaluation of "{ }" will construct and return a new Array at runtime.
(Notice by the author: I personally do not like the brace constructor - why should the Array class
be so special as to justify a special syntactic sugar construct?
Most collections in real life are variable in size,
so creating an OrderedCollection could pretty much the same be justified.
But then, why exclude Set, Dictionary and all other fancy collections?
In addition, those with a functional background would definitely love to have a simple constructor
for Lisp-like linked lists or cons-objects.
In other words, the brace constructor seems to be a quick hack for a single programmer's needs (lazyness ?).
It should have been more thought-through for a more generic solution before finding its way into thousands of methods...)
Now, we reached a point, where we realize that the key to becoming a Smalltalker lies in the knowledge of the systems class library. Although this is true for all big programming systems, it is even more true for Smalltalk, since even control structures and looping is implemented by message protocol as opposed to being a syntax feature.
No programming is possible if you don't
know the protocol of the classes in the system, or at least part of it.
To give you a starting point, we have compiled a
list of the most useful messages as implemented by
various classes in the
``list of useful selectors''
document.
A rough overview of the most common classes and their typical use is found in the "Basic Classes Overview". Please, read this document now.
Copyright © Claus Gittinger Development & Consulting
Copyright © eXept Software AG
<cg@exept.de>