The AM is designed to work similarly to the Vision Module. There is a store of "features" called the audicon, and these can be transformed into ACT chunks by way of an attention operator. "Features" in the audicon are, of course, not things that have spatial extent like visual features, but instead have temporal extent--they are sound events.
Each sound event has several attributes:
Support will later be added for spatial location of the sound, for use in modeling things like dichotic listening.
Basic kinds of sounds currently supported are tones, digits, and speech. The content delay and recode time for tones and digits are uniform (settable via system parameters), as is digit duration. Since speech strings can differ in length, and presumably content delay and recoding time, these must be supplied when sound events are created. Finally, the audicon has a decay parameter (default is three seconds). After a sound event ends and the decay time elapses, the sound event is deleted from the audicon.
There are two ways to use the Audition Module. One of them is more or less parallels vision, which uses tests built into the production syntax to deal with audio. For example, if you wanted the earliest unattended sound in the audicon, this test would find it:
=event> isa audio-event onset lowest time now
Note that the detect-time of the sound (after onset) has to have passed for this to match. To shift auditory attention to it, send an attend-sound command to the Audition Module. Once a sound event has been attended to, after some time (determined by the recode time for that sound), a sound chunk will be created. The base level activation of new sound chunks is controlled by the parameter :sound-base-level. Sound chunks have three slots, type for the type of sound, content for the content, and pitch for the pitch range (high, middle, low). For example, the spoken string "nine" would have a type of digit, a content of "seven", and a pitch of middle (unless the speaker is someone like Kerri Strug, in which case high might be more appropriate). If this was the last attended sound, it could be matched with:
=snd> isa sound time now type digit content "nine"
There is another paradigm for audio perception, the listen-for command. It takes the similar parameters to those used by find-sound: currently these are :attended, :onset, and :type. The standard use of this command would be to pass NIL for :attended and LOWEST for :onset, which would cause the AM to encode the earliest sound not already attended. If the appropriate sound event's content is not yet available, the AM's preparation state will be set to BUSY. Once the event's content is available, but not yet recoded, the AM's preparation state will be set to FREE and its execution set to BUSY. This command automatically executes an attend-sound command if an appropriate audio event occurs.
When sounds occur, they can be simulated by creating sound events through the Lisp functions new-digit-sound, new-tone-sound, and new-other-sound. Parameters for these commands are documented in the Parameter Reference.