We've had so many suggestions for Arc features that I thought I'd better collect them all. So far I have barely skimmed most of these, but there are clearly some interesting ideas here, so I thought I should put them online so that everyone can see them. --pg *** Dorai Sitaram: Recursion fn-named could be renamed rfn or fnr, r for recursive. This gets rid of your one hyphenated name. You can then have, a la Scheme's named let, the form rlet. In the following, == notates "is syntax for". (rlet NAME VAR INIT BODY ...) == (let NAME (rfn VAR BODY ...) (NAME INIT)) Of course you need rwith too. (rwith NAME (VAR0 INIT0 ...) BODY ...) == (let NAME (rfn NAME (VAR0 ...) BODY ...) (NAME INIT0 ...)) Of course you want tail-call elimination, and since you can use tail recursion to iterate, you can remove your four looping operators. I think this mayhem is OK, because I have never thought of tail-recursive loops as a poor human's substitute for "real" loops. If you really must have the latter, you have macros anyway. *** Greg Sullivan: Keep, Sum, Syntaxes Re the "keep" and "sum" operations, it seems like you could have those take an optional function argument that would apply arbitrary computation to the arguments, with the defaults being (fn x . (add! x them)) and (fn x (set! total (+ x total))) or something. Seems like you're aiming for more usable versions of map and fold. More off the deep end is: what about optional front-end syntaxes? Much as we all know that syntax really shouldn't matter, it seems to. We all know that S-expressions are just a syntactic and direct representation of the syntax tree -- could we have parsers from infix syntax to S-expressions? I suppose the hard issue -- maybe the only hard issue -- is macros. But I think we could solve it. *** Alan Bawden: First-class Macros During your presentation at LL1 last Saturday you mentioned that you were considering macros that were first-class values. Many people in the audience seemed to think that this was clearly an absurd idea -- I think you even wavered about this yourself at one point. Well, don't give up on this idea until you have read my paper about how first-class macros can be a sensible notion. See . The executive summary is that you can can have first-class macros and still do all macro-expansion at compile time, at the cost of some (quite minimal) "type" declarations. Unfortunately for you and your current project, I wrote that paper with the statically typed programming language audience in mind (ML, Java or even C). I do mention in passing that the idea would also work for a dynamic programming language (Scheme or Dylan), but I don't explain how in detail -- so you'll have to extrapolate a little. I think the idea is quite cool, and, as I show in the paper, it has some very interesting applications. (For example, it's almost all you need to construct a complete a module system.) *** Ken Anderson: (get)? Some questions: Q1: How do i write a function that takes two arguments that i want to get the x and y fields of? (def area ((get x y) (get x y)) ???) They could be labeled by argument position so i could say: (def area ((get x y) (get x y)) (abs (* (- x2 x1) (- y2 y1)))) Q2: If there is a get argument, (get x y), how can i access the underlying object itself? I think it is interesting to try to provide desctructuring information in the function defnition. However, Haskell does this, for example, by allowing multiple definitions of the function. While i like some kind of pattern matching in function definition, i'm not sure it should be part of the core language. For the core i'd like to just see what Scheme provides, ie (x y . rest). I'd even live fixed number of arguments if there is an easy way to define macros. *** Ken Anderson: Common Lisp arglist data I put together these statistics from a common lisp application 27,935 Function and macros 6,328 Function or macros with keyword arguments Distribution of number of arguments (no keyword case) 0=1254 1=12961 2=4063 3=1885 4=628 5=302 6=173 7=95 8=69 9=69 10=34 11=18 12=27 13>= 29 Distribution of keyword usage: (&KEY)=2048 (&OPTIONAL)=1663 (&REST)=1176 (&BODY)=988 (&KEY &REST)=217 (&ALLOW-OTHER-KEYS &KEY)=60 (&REST &OPTIONAL)=51 (&KEY &OPTIONAL)=48 (&ALLOW-OTHER-KEYS &KEY &REST)=43 (&KEY &REST &OPTIONAL)=9 (&BODY &OPTIONAL)=6 (&REST &WHOLE)=6 (&BODY &WHOLE)=4 (&AUX)=4 (&ALLOW-OTHER-KEYS &REST &OPTIONAL)=2 (&OPTIONAL &WHOLE)=1 (&AUX &KEY)=1 (&WHOLE)=1 From this i'd conclude that most functions have 3 or less arguments and that keywords are used less than about 1/3 of the time. &key seems to be used as much as &optional and &rest combined. *** David Morse: strings implemented as lists In one of your arc articles you said you thought it "might be useful" to have strings be implemented as lists. The benifits I can think of would be: * (subseq /str/ :start /n/) doesn't cons * can share substructure among strings * easy insert onto front of strings The second and third don't seem like they'd be all that handy. The first is attractive. Because its "arrays" are untagged, C strings already have this property. Its often nice there. On the other hand, using CL's :start and :end parameter passing pattern doesn't fill me with dread either. It seems like a tradeoff between tripling the number of parameters to string functions versus approximately octupling the heap usage of strings. *** Todd Proebsting: Pronouns Your comments on pronouns and iteration captures remind me of some ideas that Ben Zorn and I threw around a few years back; you may find two TRs interesting. One describes "shorthands" (pronoun-like things) and the other describes "histories" (capture-like things): ftp://ftp.research.microsoft.com/pub/tr/tr-2000-03.ps ftp://ftp.research.microsoft.com/pub/tr/tr-2000-54.ps *** Stephen Ma: Strings as Lists Strings as lists: seems you just want to scan the string from left to right, which is what the car and cdr functions let you do. Instead of that, how about storing strings as vectors of bytes, and using "scanners": (a) A scanner is a mutable object that contains a string and a current position within that string. String-matching functions update the current position. Make a new scanner with "(sc )". (b) A scanner is completely distinct from its string: a string can have any number of scanners attached to it, and a scanner can be reattached at any time to another string. (c) Scanners will be especially useful with the "match" macro, which works a lot like Unix's Lex utility: (match else ) This attempts to match one or more regular expressions against 's current position. If one of the s match, the corresponding gets executed. A successful match advances the scanner's current position past the matched string. In each , the pronouns $1, $2, ..., et cetera refer to matched substrings, just like Perl. $& refers to the entire matched string. In keeping with Arc's "make it easy" philosophy, if is not a scanner but a string S, then S is implicitly replaced by (sc S 0). So it's easy to do a quick match against a string. As I mentioned, a single invocation of "match" works much like an instance of Lex, and like Lex can be precompiled to run blazingly fast. Using more than one "match" lets you scan strings in very powerful, context-dependent ways. (d) Contrived example. Here's a function on a string s that prints the initials of all the words inside s, and returns the sum of all the integers inside s. (Assume Perl-style regexps.) (fn scan (s) (= scanner (sc s 0)) (while (match scanner "\s+" t ;Ignore whitespace "(\w)\w*" (do (pr $1) t) ;Print the word's initial. "\d+" (do (sum $&) t) ;Accumulate numeric sum. "$" nil))) ;End of string terminates loop. Thus (scan "alpha 2 7 beta 8") prints "ab" and returns 17. *** Ray Blaak: Implicit Local Vars I believe automatic variable introduction will hinder the programmer more than help. "Expert" programmers type fast, and slightly mistyped variable names would be hard to find visually, and there would be no language support to detect them. This will make debugging much more difficult. Instead, make the syntax for binding variables be succinct and clear (e.g. as you have done with let). Perhaps allow a (def var value) form. The point is that declaring bindings should be inexpensive to the programmer while still allowing undeclared name detection. I want my languages to be powerful and flexible, and easy to type, but I also want decent error checking from my language environment as well. If there is no other error detection in a language environment, *the* thing that I have found important in almost 20 years of programming, more than type checking, parameter matching, etc., is the reporting of unknown names. My advice and plea: don't create local variables implicitly. *** Ray Blaak: Use of *fail* A cool hacker's language always needs to balance expressivity, ease-of-use, and safety. With regards to the use of *fail* for db lookups, I believe there is a better way. The problem with the use of a global variable to indicate the special failure value is that it is not thread safe (Arc will be supporting threads eventually, no?), and can lead to insidious bugs if some code fails to restore the global properly. The code required to make it thread safe is tedious and requires the programmer to be aware of potential problems on every lookup. Furthermore, if a non-default fail value is needed, the programmer must do the *fail* override on every call, which is also tedious. If instead one realizes that the fail value is a property of the db itself, one can instead specify it when the db is created, perhaps with a :failure keyword, that if omitted defaults to nil. Then different db's can have their own notion of fail values simultaneously without interfering with each other. Usage is then easier (all lookups require the same, uniform work), safer, and expressivity is actually improved since things can compose better. Alternatively, one could also specify the fail value as an optional parameter on a lookup call. Note also that the idea of keyword parameters is in general a better way to "fiddle" with settings, giving the control that hackers love while avoiding the problems that special global control variables give rise to. *** Shiro Kawai: M17N Is there any plan for Arc to support multilingualization (M17N)? Traditionally, M17N is considered by language designers as some sort of add-on, library-level stuff. Yet it is indispensable for the real world applications that are used worldwide, and I think it does affect the implementation decision of the language core to some degree, if you want efficiency. In fact, out of frustration to the existing implementations I've even written my own Scheme that supports multibyte strings natively to do my own job. Why is it important? Since the choice of implementation does affect the programming style. There may be some possibilities: (a) A string is just a byte array. Application is responsible to deal with multibyte characters. (It's pretty much the situation until recently). (b) A string is an array of wide characters, e.g. UCS-4. (Note: UCS-2 is not enough). The stream converts internal representation and external representation. Java's going to this way, and I think Allegro CL also adopts this after version 6.0. (c) A string may contain multibyte characters, e.g. UTF-8, or more complex structure that uses, for example, a struct of language tag and byte vector. String access by index is no longer a constant operation, and a destructive change of string may be much heavier than just a destructive change of a vector. I think Python is going this way, as well as some other popular scripting languages. The approach (b) is simple and just works, and may be enough for Arc's purpose. However, in some cases it doesn't works well, especially when (1) you have to deal with large amount of text data that mostly consists of single-byte characters and not want to consume as 4 times of memory as byte string, and (2) you have to do lots of text I/O, for most of external data will likely be stay encoded in variable-length (multibyte) characters and you have to convert them back and forth. The approach (c) solves those problems, but it affects the programming style. You tend to avoid random access of string. Searching in string performs better if it directly returns substring rather than indices. Allocating a string by make-string and using string-set! to fill it by a character at a time is a popular technique in Scheme to construct a known-length string, but it behaves extremely poorly in this implementation. I tend to use string ports heavily and then the penalty of non-constant access diminishes. Another advantage of this approach is that you can have "double-view" of a string, one as a sequence of characters and the other as a sequence of bytes, which is convenient to write an efficient code to deal with packets that mixes binary data and character data. *** Vinay Gupta: Python, XML, SQL I use Python quite frequently. Although I can't point to specific features I think you should duplicate in Arc, I can think of specific aesthetics which you should: 1> fixed code style with minimal punctuation python's use of indentation (only) to mark blocks, and move towards to banning of tabs, really does help move code between programmers and encourage both reuse and learning-by-inspection. All python code looks basically the same. 2> easy language support for putting basic documentation in code files (python does it very nicely, with tools for pulling it all together later). A nice extension to that would be to have support for XML in the code comments, so you can yank the XML of the comments out then format it at whim without having to feed the text you pulled out of the code through Word or some other manual step where things can get out of step. On another note, I want to suggest that you seriously think about how you build your interface to SQL into the system. At this point, SQL is present on just about all servers in some form, and Mac OS X is really beginning to push it down into the desktop. But at present, access to SQL is rather a bolt-on in most languages, in particular relative to file system access which is much more tightly integrated. I'd like to suggest that you consider integrating SQL support at much the same level as file system support: as a conceptually integrated part of the language design. I don't know quite where that might go, but I suspect it to be a very interesting direction (particularly if the abstraction which supports filesystems and SQL-based databases could be extended to other, less common forms (XML databases being the obvious one) by users). *** Will Duquette: What CL Misses What I'd like that I missed in Common Lisp. 1. Standard control structures. From what I can see, you've already got this well under.... Well, under control, I guess. 2. Standard notation. As you comment, Unix won. It would be nice to see standard notation for character constants (e.g., '\n' for newline) and for format strings. "~A" isn't wrong, but C, C++, Perl, Python, Tcl, awk, and so forth can all use C's sprintf format strings. These conventions might not be perfect, but they are familiar to vast number of programmers. 3. Better string processing. I particularly like Tcl's variable and command interpolation. For example, here's an interactive Tcl session that defines a variable and a function, and interpolates both into a string. % set a "Hello" Hello % proc f {} {return "World"} % set b "$a, [f]!" Hello, World! You're contemplating treating strings as lists where each character is an element. One of the real conveniences of Tcl is that almost any string can be manipulated as a list of white-space delimited tokens. The "split" command can split up a string into a list of tokens on any set of delimiters. This is, practically speaking, incredibly useful. Common Lisp falls down here in two ways: * There's no equivalent of the "split" command. (I've seen the discussions about "partition", currently defined on cliki; the word "over-engineered" comes to mind.) Anyway, the point isn't that this can't be done in CL; just that it should be provided. * You can't easily use a CL list of symbols as a proxy for a string of white-space delimited tokens; the list of symbols has too many syntactic gotchas, and then there's the whole case thing. Which leads me to 4: 4. Case-sensitivity. Symbol names, especially, should be case sensitive. 5. Associative arrays. Hash tables are incredibly useful, but the syntax to use them should be terse. Some kind of array notation works very well. For example, (= myDB[x] y) assigns y to key x in hash table myDB. Tcl works much like this. You also get multi-dimensioned arrays quite naturally using your propose "x.y" notation: though I would suggest using "," and ";" instead of "." and ":". Why? x,y --> (x y) A cons 1,1 --> (1 1) A cons 1.1 --> 1.1 A floating point number 6. A more standard notation for record structures, or, more specifically, for object members. C/C++/Java/Python syntax has become common for this, and is more terse than either of the CL possibilities. *** Dan Milstein: Concurrency One problem which, IMHO, no popular language has come even close to solving well is allowing a programmer to write multithreaded code. This is particularly important in server-side programming, one of Arc's major targets. I've written a good deal of multithreaded Java, and the threading model is deeply, deeply wrong. As a programmer, there's almost no one way to write the kind of abstractions which let you forget about the details. You're always sitting there, trying to work through complicated scenarios in your head, visualizing the run-time structure of your program. I didn't see another way until I read John H. Reppy's "Concurrent Programming in ML". Instead of building his concurrency constructs around monitored access to shared memory, he builds them around a message passing model (both synchronous and asynchronous). What's more, he provides powerful means of capturing a concurrent pattern in an abstraction which hides the details. I highly recommend giving that book a read. Here's an example of some of what you get (not the abstraction, actually, just the basic power of message-passing over shared memory). The abstraction facilities are complex enough that, like Lisp macros, a small example doesn't really capture their power. I'm in no way familiar with concurrent extensions to Lisp, so I'm not able to provide the code for how much harder it would be in CL or Scheme. Assuming they were augmented with a shared memory model (as Java is), which forces the programmer to deal with synchrnoized access to memory, I can only imagine it would be significantly more complex. A producer/consumer buffer. You want a buffer with a finite number of cells. If a producer tries to add an element to a full buffer, it should block until a consumer removes an element. If a consumer tries to remove an element from an empty buffer, it should block until a producer adds something. In Concurrent ML: datatype 'a buffer = BUF of { insCh : 'a chan, remCh : 'a chan } fun buffer () = let val insCh = channel() and remCh = channel() fun loop [] = loop [recv inCh] | loop buf = if (length buf > maxlen) then (send (remCh, hd buf); loop (tl, buf)) else (select remCh!(hd buf) => loop (tl buf) or insCh?x => loop (buf @ [x])) in spawn loop; BUF{ insCh = insCh, remCh = remCh } end fun insert (BUF{insCh, ...}, v) = send (insCh, v) fun remove (BUF{remCh, ...}) = recv remCh --------------- Translated into a Lisp-ish syntax (very easy to do from ML), this would look something like: (defstruct buffer ins-ch rem-ch) (defun create-buffer () (let ((ins-ch (make-channel)) (rem-ch (make-channel))) (labels ((loop (buf) (cond ((null buf) (loop (list (recv ins-ch)))) ((> (length buf) maxlen) (send rem-ch (car buf)) (loop (cdr buf))) (t (select (rem-ch ! (car buf) => (loop (cdr buf))) (ins-ch ? x => (loop (append buf (list x))))))))) (spawn #'loop) (make-buffer :ins-ch ins-ch :rem-ch rem-ch)))) (defun insert (b v) (send (buffer-ins-ch b) v)) (defun remove (b) (recv (buffer-rem-ch b))) Key things to notice: 1) Language features: Communication between threads *only* occurs over channel objects, which can be thought of as one-element queues. In CML, channels are typed, but in Lisp they probably wouldn't be. send/recv: synchronous (blocking) communication over the channel. A thread can attempt to send an object over the channel, and will then block until another thread does a recv on the channel. Creating a new thread is done via 'spawn', which takes a function as its argument. (I can't remember what the signature of that function is supposed to be -- clearly, in this case it can't be a function of no arguments, but imagine it to be something like that). Selective communication: the call to 'select' is one of the very powerful features. It is like a concurrent conditional -- it simultaneously blocks on a list of send/recv calls, and executes the associated code with whichever call returns first (and then drops the rest of the calls). The ! syntax means an attempt to send, the ? means an attempt to receieve. In both cases, the '=>' connects the associated code to execute. (I haven't really come up with a Lispy translation of that syntax). I don't think select could be efficiently implemented without language support. It requires a sort of 'partial blocking', which is tricky to implement on top of normal blocking. 2) The Idiom The buffer is implemented as a separate thread which has connections to two channels and an internal list to keep track of the elements of the buffer. This thread runs through a loop forever, taking the current state of the buffer as its argument, and waiting on the channels in the body of its code. It is tail-recursive. Note the absolute lack of any code to deal with synchronization or locking. You might notice the inefficient list mechanism (that append is going to get costly in terms of new cons cells), and think that this is only safe code because of the inefficient functional programming style. In fact, that's not true! The 'loop' function could desctructively modify a list (or array) to which only it had access. There would still be no potential for sync'ing problems, since only that one thread has access to the internal state of the buffer, and it automatically syncs on the sends and receieves. It's only handling one request at a time, automatically, so it can do whatever it wants during that time. It could even be safely rewritten as a do loop. What I find so enormously powerful and cool about this is that the programmer doesn't need to worry about the run-time behavior of the system at all. At all. The lexical structure of the system captures the run-time behavior -- if there is mutating code inside of the 'loop' function, you don't have to look at every other function in the file to see if it is maybe modifying that same structure. This is akin to the power of lexical scoping over global scoping. I have never seen concurrent code which lets me ignore so much. This really just scratches the surface of Concurrent ML (and doesn't touch on the higher-level means of abstraction). But I hope it gives a sense of how worthwhile a language it is to learn from. 3) Issues I think that channels themselves would be fairly easy to implement on top of the usual operating system threading constructs (without needing a thread for each one). However, the style which this message-passing model promotes can easily lead to a *lot* of threads -- if you have a lot of buffers, and each of them has its own thread, things can get out of hand quickly. I believe that Mr. Reppy has explored these very issues in his implementation of CML. http://cm.bell-labs.com/cm/cs/who/jhr/sml/cml/ Insofar as I have time (which, realistically, I don't) I would love nothing more than to play around with implementing CML'ish concurrency constructs in a new version of Lisp like Arc. *** Ben Evans: Adjustable Competence One idea which I've been thinking about a lot lately is "use strict;" in Perl. It's an example of how to write a general purpose language which can still speak to the top of the intelligence curve of users. What I was thinking of is a language with a directive to change the competency level of a piece of code. For example, in an imaginary Perlish language called Gimbal: #!/usr/bin/gimbal use beginner; { variables = "by default have lexical scope in beginner mode"; global VAR = "I must be prefigured by keyword global, and my name must all be in upper case in this mode, otherwise, compile-time errors occur"; print variables + "\n"; } print VAR + "\n"; use competent; global dog = "I will generate a compile-time warning because my name is not all in capitals in competent mode"; use expert; scope global; cat = "I generate no warnings at all, and the keyword global is not required because I changed the default scope with the scope keyword"; .... Having a competency level pragma means that real hackers can get on with their real work, while still having something which students etc can use. If this was combined with a code signing system (Eg, in 'use beginner' mode, you can't use a class which isn't signed by someone who you can establish a trust relationship to according to the keyring for your local language install) then it might look a lot more appealing to the large, mediocre hordes of programmers out there. *** James Bullard: sequences, and range parameters. One thing I especially like about Python are the primitives used to specifiy range. For instance, >> seq = [1,2,3,4,5] >> seq[1:2] [2] >>> seq[1:3] [2,3] I could see this being easily incorporated into your 'sequences are an implicit method on indices' by also adding 'and ranges'. Sytax, such as: >> ("something" (1 . 3)) (o m) Is a possibility. Mostly, I just wanted to suggest a feature which I use all the time in Python. *** Trevor Blackwell: Databases Databases are an easy, low-risk way to store and index lots of data without having to design a custom system for each kind of data. When you store data in files, it's hard to get good performance. Usually you either end up reading many files at each click, or reading them all in at startup time and taking a lot of time and memory. Databases also ensure correct concurrent access, so you can have multiple users working simultaneously on the same data without conflicts. *** Eric Tiedemann: Python Syntax Python is one of my languages of choice at the moment because of it's terse, simple, and regular syntax. Those attributes could come in handy for a macro-friendly syntax-for-Lisp. esr has a continuing interest in this subject. As far as I can figure, *all* statements in Python (aside from import and print) are either expressions or assignments (which we could `promote' to expressions) or have one of the two forms: : Where -> simple_stmt | NEWLINE INDENT stmt+ DEDENT This even includes class definitions! The major complexity I can see is that some macro expressions (e.g., if, try) would want to incoporate expressions *after* their containing expression (e.g., else, except). This could be handled by have macros (optionally) take and return both an expression and an `expression continuation'. *** Guyren Howe: OOP In response to your request for new ideas for Arc, I'm going to mention an OOP feature from the language REALbasic (). It's a new type of class extension mechanism: as well as allowing for overloading/method overriding, a class can define a New Event that allows it to call its subclasses. An event declaration is just a function header. The most immediate subclass to implement a handler for the event traps the event and makes it disappear for further subclasses, unless the class that defined the handler re-defines and calls the event with the same name. The result is that rather than extension by uncontrolled, ad-hoc overriding, you get extension through a well-defined protocol. It's a nice mechanism. Example: you have a NiceButton object. A TextNiceButton is defined that writes text onto the face of the button. Now you want to draw the NiceButton in a different way (with a gradient, say). If TextNiceButton was written with traditional overloading, it might be that the drawing of the text happens at the wrong time, and if you try to draw the gradient you might overwrite the text. But if TextNiceButton was responding to a DrawText event issued by NiceButton, you can control when it is drawing its text, so you can make sure you've drawn your gradient first. When I describe this to people, they react by saying that it is strictly weaker than overriding methods, and this is true. But for most purposes, having a well-defined protocol is nice. *** Sam Tregar: Perl Community You mentioned the Perl library many times but rarely mentioned the Perl community. It's quite obvious to me that Perl would not be where it is today, with the library it has today without the Perl community. Also, from reading Larry Wall's various speeches and essays it's obvious to me that the Perl community is no accident. Larry designed the Perl community and the result is the powerhouse that's resulted in the Perl we know today. Simply: the Perl library was not carefully designed; the Perl community was carefully designed and the Perl community created the Perl library. Perhaps this is a lesson Arc can follow? *** Drew Csillag: Misc Suggestions You should be able to call a function with keyword arguments without having to define the function to be able to take them (Python vs. CL) Reference counted GC is a great thing (but still needs some supplemental GC to pick up the cyclic trash) because it's very predictable which is very useful if you've ever had to open up a bunch of files in a tight loop before. Multiple inheritance is a useful thing. Not often, but there are a number of times it can save you a bunch of headaches. Especially for mixin classes. Also, builtin types should be largely indistinguishable from user classes. You should be able to write a class in such a way that you could do (+ userob1 userob2) and there would be a method in either one or the other (maybe both) object's class to handle this. Introspection into the system is a wonderful thing. Being able to write a profiler and debugger for your language in your language is an great thing to be able to do. It allows all sorts of cool tracing and debugging (coverage analysys) things you'd ever want to do. Being able to get a hold of the namespaces you are executing in is also a huge boon I learned from Python. If you can also introspect into the compiled forms of functions (assuming that they compile to some bytecode object) is a great thing too. Some of the iteration stuff seems a bit ooky to me though. While, with the it binding is cool, but part of it doesn't sit well with me. Perhaps because you wouldn't be able to get at it if you are in a nested while loop or something. Also each with keeps is more of a filtering function than straight iteration (like Python's filter()). As for things like scheme-like continuations, for me it's hard to say yes, but hard to say no because they are useful for building exception handling systems (if you don't include some kind of exception handling system (built with or without explicit continuations) -- which you should -- runtime errors should raise exceptions, not just halt the system) and microthreading systems. The interpreter should also be able to be multithreaded, but unlike Java, it shouldn't force it on you. I'm not big on multithreading in general, but sometimes it actually is the best way to do things. Sather-like (I think it's Sather) generators are also very nice things to have. One last suggestion: the interpreter should be written in C. Language systems (with the exception of C) that self host are a pain in the neck. *** Thomas Baruchel: Call/cc vs. Exceptions Though call/cc is better than exceptions, I think it's much easier to use exceptions than call/cc. Why ? Because you can more easely give a name to an exception (after having declared it) and catch it. Do you see what I mean? What is very clean in OCaml is the structure: OBtry ... ... with Division_By_Zero -> ... ; IncoherentParameter -> ...; FunnyResult -> ...; WhyDidYouTryThat -> ... (ie you can put in a separate place everything that can be raised as an exception and handle with it separately). I think that something as powerful as call/cc but as convenient to use as try/with would be a great idea ;-) Another limitation of Scheme is that it isn't intended to implement new types (or at least, it isn't very easy to do it). I think that many languages do it better (OCaml, or even C). That is a big lack: I am now playing with quantum computation, and found it is very easy to implement a 'qubit' type in C or in OCaml and not in Scheme (what I do is use a pair of two complex numbers, and it isn't bad, but...) On the same idea, some languages allow definition of functions with several return values (not necessarely in a structure), by declaring input and output values of the function. This is quite useful for some purposes. Prolog: why not implement some declarative aspects? (at least the lisp 'amb' function, but implemented at a low level, which would make it easier to use (amb-fail isn't a so good idea: Prolog doesn't use anything like 'amb-fail'; whenever a condition isn't satisfied it does the same thing by itself and the programmer hasn't to care about it). Another thing you can't do when using 'amb' after having implemented it by yourself in Scheme is using several 'amb' declarations nested... Think that some tools are more than tools and almost are languages. A CLP(FD) solver has been implemented in GNU/Prolog and it isn't a bad idea... But this has to be done by considering the general concepts of the language. clp(fd) suits well the declarative aspects of the Prolog. Forth like languages: the concept of stack is really impressive, because you can play with much data without using any variables. MY CREDO: a good language doesn't have to use variables. I like functionnal languages for that; I also like Forth (and postscript) for the same reasons. *** David Sloo: Strings The fact that strings behave as lists is convenient. It means there is a one category less to memorize. However, one of the problems I keep running into in language interop is string processing, even in Perl. Everytime I move a string from a module in language X into an application, I have to write a UTF8 processor. Worse, half the time I need to use someone else's code within an application in the same language, I have to check for Shift-JIS, or do UTF8 processing, or some similar dopiness. A string in the language should be treated as an array (in Arc, a list) of Unicode characters no matter what. There are a few good things about C#, and this is one of them. There are a few bad things about Perl, and this is also one of them. Also, I think you made a logical decision in scoping the iterators locally in for loops, but it does mean you need extra stuff for this common idiom, where you preserve the exit-state of the loop: for (int i = 0; i < 10; ++i) if (array[i] == UNSNOOGLE) break; printf("The array of snoogles had a un-snoogle at spot %d", i); I suppose it's worth changing, given the downside. *** Dave Moon: S-expressions are a bad idea I want to comment on your use of S-expressions, based on what I learned in my couple of years researching Lisp hygienic macro ideas and working on Dylan macros. Summary: I learned that S-expressions are a bad idea. There are three different things wrong with them, none of which have anything to do with surface syntax. Having as a convenient abbreviation a surface syntax different from the program representation that macros work on is a fine thing to do. [1] The (function . arguments) notation is not extensible. In other words, there isn't any place to attach additional annotations if some should be required. 20 years ago it was noticed that there is no place to attach a pointer back to the original source code, for use by a debugger. That's just one example. This is easily patched by changing the notation to (annotation function . arguments), where annotation is a property list, but this is a bit awkward for macro writers. The next point suggests a better fix. [2] Representing code as linked lists of conses and symbols does not lead to the fastest compilation speed. More generally, why should the language specification dictate the internal representation to be used by the compiler? That's just crazy! When S-expressions were invented in the 1950s the idea of separating interface from implementation was not yet understood. The representation used by macros (and by anyone else who wants to bypass the surface syntax) should be defined as just an interface, and the implementation underlying it should be up to the compiler. The interface includes constructing expressions, extracting parts of expressions, and testing expressions against patterns. The challenge is to keep the interface as simple as the interface of S-expressions; I think that is doable, for example you could have backquote that looks exactly as in Common Lisp, but returns an rather than a . Once the interface is separated from the implementation, the interface and implementation both become extensible, which solves the problem of adding annotations. So you don't get confused, I am not saying that Dylan macros provide a good model of such an interface (in fact that part of Dylan macros was never done), nor that Dylan macros necessarily have any ideas you should copy. I'm just talking about what I learned from the Dylan macros experience. Actually if you succeed in your goals it should be straightforward for a user to add Dylan-like macros to Arc without changing anything in Arc. Maybe someday I can explain to you the representation of programs I used in the Dylan compiler I wrote (never finished) after I left Apple. It has some clever ideas so that it is efficient in terms of consing and the same representation can be used in all levels of the compiler from the surface syntax parser down to the machine code generator. I'd have to retrieve the code from backup first. [3] Symbols are not the right representation of identifiers. This becomes obvious as soon as you try to design a good hygienic macro system. This is really another case of failing to separate interface from implementation. A symbol can be a convenient abbreviation, but in general an identifier needs to be a combination of a name and a lexical context. Because of macros, the structure of lexical contexts is not simple block-structured nesting: an identifier can be referenced from a place in the code where it is not visible by name. Also an identifier can be created that doesn't have a name, or equivalently isn't visible anywhere just by name, only by insertion into code by a macro. So I think you should have an internal Arc-expression representation for programs, and a surface syntax which is more compact and legible, but Arc-expressions should not be specified to be made out of conses, symbols, and literals. Arc-expressions should be defined only by an interface, which should be open (i.e. new interfaces can be added that apply to the same objects). The implementation should be up to the compiler writer and there should be room for experimentation and evolution. Interesting question: is there a surface syntax for Arc-expressions different from the abbreviated surface syntax, and if so, what is it? Other unrelated comments: "Here are a couple ideas: x.y and x:y for (x y) and (x 'y) respectively." Do you mean x.y stands for (y x)? Oh, I see, you have field names as indices rather than accessors, so you actually meant x.y stands for (x 'y). As for x:y, I think what Dylan borrowed from Smalltalk is the right answer, thus x: stands for (quote x). Colon as a postfix lexical "operator" works surprisingly well. "local variables can be created implicitly by assigning them a value. If you do an assignment to a variable that doesn't already exist, you thereby create a lexical variable that lasts for the rest of the block." This is a really bad idea, because inserting a block of code that includes a local variable declaration into a context where a variable with that name already is visible changes the declaration into an assignment! Or if you fix that by making an assignment at the front of a block always do a local declaration instead of an assignment, then when you put code at the front of a block you have to remember to stick an "onion" in front of that code if there is any chance that the code being inserted could be an assignment. Weird context-dependent interpretation of expressions like that makes a language harder to use, both for programmers and for macros. It's one of the problems with C, don't put it into Arc too. I'm not saying that you need to use Common Lisp's let for local declarations. One possibility would be to use = for declaration, := for assignment, and == for equality. Another would be to use = for all three, but with a prefix set for assignment and let for declaration. Other possibilities abound. But then later you introduce a let macro, so I don't know what you really think about local declaration. "Making macros first-class objects may wreak havoc with compilation." That depends entirely on whether it is permissible to write functions that return macros as values and permissible to pass macros as arguments to functions that are not locally defined (or directly visible fn expressions), or whether all macros are directly visible at compile-time as initial values of variables. I think you should limit the language to what you know how to implement, so the macro special-form should only be allowed in places where its complete flow is visible at compile time and should never be allowed to materialize as a run-time object. You need a different way to create a macro object that exists in the run-time in which macros run, the compile-time run-time. You need to define precisely "flow visible at compile time," possibly being very restrictive and only allowing what we really know is needed, namely a macro special form can be used in function position, as an argument in a function call where the function is a fn special form, in the initial value of a global constant (not variable!), and nowhere else. This is tricky but tractable I think. "strings work like lists" does rplaca [probably (= (car x) y] work on strings? What about rplacd? "Overload by using a fn as a name." I think you need to rethink this way of defining methods, it has a lot of problems. *** Dave Moon: Object-Oriented Re http://www.paulgraham.com/reesoo.html When I say object-oriented, I am often referring to yet a different property, which I think is a good property for most languages to have. It's not the same as your "5.Everything is an object" although it might sometimes be called "Everything is an object": Things are independent of their names. To put it another way, as much as possible of the machinery is exposed as an object that gets named by binding an identifier to it, and you operate on the object, not on the name. Classes work this way in CLOS, they don't work this way in C++. Modules work this way in Dylan, modules (packages) don't work this way in Common Lisp, modules (packages) almost but don't quite work this way in Java. Method invocation works this way in CLOS (generic functions are objects), it doesn't work this way in Smalltalk and Objective C (method selectors are like Lisp symbols). I think most things done in Scheme work this way. Arc probably follows this principle even though you claim it's not object-oriented. One advantage is if you don't like the name of something you can rename it by manipulating identifier bindings; there are no or few reserved words. (I decided that "reserved words" are words that we don't want to hear our children say.) Another advantage is you can make things anonymous. It also might become easier to expose the guts of the language for extension by users ("meta objects"). Another advantage is the language gets the conceptual simplicity that comes from making things orthogonal. I think that is the real advantage. *** Dave Moon: Arc Syntax Here is how I would do syntax for Arc or a language like it. I hope you find these ideas useful. The lexical grammar is hard-wired, and is the same for the surface syntax and for the S-expression syntax. I have my ideas about what the lexical syntax should be, but I'll omit them here. The lexical grammar produces only two types of tokens: literals and names. A name can look like an identifier in other languages, but also can look like an operator, a dot, a semicolon, a comma, or even a parenthesis. The meaning of a literal is built into the language. The meaning of a name is defined by a binding. Bindings can be top-level or local. The language comes with a bunch of top-level bindings that define the standard language, but users can add more and can replace the standard ones to customize the language. Top-level bindings should be organized into modules, but that's a separate topic. A top-level binding consists of one or more of the following: a value, which is either an ordinary runtime value or a macro; a type declaration if you allow those; a constant declaration if you allow those; a dynamic binding declaration (special in Common Lisp) if you allow those; a syntax declaration. The phrase grammar for the surface syntax is built up by syntax declarations from the following hard-wired elements: - token - a literal or a name - literal - a literal - string - a character string literal - keyword - a Dylan-like symbol literal, identifier colon - name - a token that is not a literal - word - a name that does not have a syntax declaration - expression - one of: + a literal + a word + a syntax form consisting of a name with a syntax declaration followed by the expected syntax + a compound expression constructed from subexpressions, prefix operators, and infix operators, with ambiguities resolved by operator precedence I think the grammar as declared by syntax declarations has to be LR(1) for this surface syntax project not to be sunk by grammatical ambiguities. That remains to be considered in detail. Note that there is no built-in distinction between statements and expressions. Of course a user could add a syntax form whose body is composed of statements. I wouldn't do that myself. The phrase grammar for S-expression syntax is hard-wired. I won't discuss it in this email except to use square brackets to indicate S-expressions. Having a syntax for S-expressions is unnecessary except for bootstrapping and debugging, unless some programmers prefer to bypass the surface syntax for unfathomable reasons. There are two forms for syntax declaration: defsyntax and defoperator. These specify how particular constructs in the surface syntax are parsed and converted into S-expressions. They control the parser by establishing bindings of names to syntax declarations. In most cases the name will appear in the S-expression, so the name must also have a value binding to a function or a macro to give the S-expression a meaning. defsyntax declares a syntax form. It is followed by the name that introduces the syntax form and a grammar description for the syntax form. The grammar description implies how to translate the parsed surface syntax into an S-expression. defoperator declares an operator that can be used to make compound expressions, specifies whether it is prefix or infix or both, and for each case (prefix and infix) specifies its operator precedence and a grammar description for what appears to its right, defaulting to expression. defoperator is general enough to define things like C's semicolon and parentheses, see below. defsyntax and defoperator are themselves syntax forms as well as macros, and the grammar description language is general enough to define them. See below. A grammar description is a sequence of items selected from the seven hard-wired elements of the phrase grammar mentioned above, plus literal tokens represented as character strings, plus the following six grammar "constructors": - repeat(grammar) - zero or more repetitions of the subgrammar - optional(grammar) - zero or one repetition of the subgrammar - or(grammar,grammar, ...) - one of the subgrammars - noise(grammar) - the subgrammar is omitted from the S-expression As a special case, noise by itself at the beginning of a grammar description means that the name of the syntax declaration is omitted from the S-expression - recursive(name,grammar) - the subgrammar, but inside the subgrammar the specified name means the same subgrammar - error(string) - report a syntax error Note that the meaning of names in grammar descriptions is not defined by bindings, these are just treated as literal symbols. This is mostly to avoid name conflicts but also might be necessary for bootstrapping reasons. I could have used BNF-like notation with brackets, bars, and stars instead, but I personally find it more readable to use spelled out names and I don't think conciseness is a virtue here since relatively few grammar descriptions will be written. Note that when defining a macro with surface syntax different from the syntax of a function call, you use both defsyntax and defmacro. One defines the translation from surface syntax to S-expressions, the other defines how to compile (or interpret) the S-expression. I suppose there could be a macro that combines the two. Now for some examples to try to make this comprehensible. One way to define a cond-like if "statement" would be: defsyntax if (expression noise("then") expression repeat(noise("elseif") expression noise("then") expression) optional(noise("else") expression)) if a then b elseif c then d else e parses into [if a b c d e] If you prefer a more C-style if "statement", with parentheses instead of then as the noise necessary to avoid the syntactic ambiguity that occurs when two expressions are adjacent: defsyntax if (noise("(") expression noise(")") expression optional(noise("else") expression)) if (a) b else c parses into [if a b c] You could even tack noise("end") optional(noise) on the end of the first defsyntax if, if you like. Remember noise with no "argument" means the name of the syntax form; noise("if") is not the same if you copy the binding of if to another name, perhaps using modules. The usual multiplication and subtraction operators could be defined this way: defoperator * (infix: 10) defoperator - (prefix: 100, infix: 10) Semicolon could be defined this way, allowing "blocks" to be written as in C except using () instead of {}: defoperator ; (infix: 0, repeat(expression noise(";")) optional(expression)) Parentheses could be defined this way (!): defoperator ( (prefix: 1000, noise expression noise(")"), infix: 900, noise repeat(or(keyword expression, expression) ",") optional(or(keyword expression, expression)) noise(")")) defoperator ) (error("Unbalanced parentheses")) This says that left parenthesis as a prefix operator is followed by an expression and a right parenthesis, and turns into just the expression, i.e. the usual use of parentheses for grouping. Left parenthesis as an infix operator is followed by a comma-separated argument list, with the option of Dylan-style keyword arguments, and turns into the left operand followed by the arguments, i.e. a function call S-expression. Its precedence of 900 should be higher than anything but dot. Of course the following really has to be written in S-expression syntax for bootstrapping reasons but it makes a good example of complex syntax: defsyntax defsyntax (name noise("(") recursive(grammar, repeat( or("token", "literal", "string", "keyword", "name", "word", "expression", string, "repeat" noise("(") grammar noise(")"), "optional" noise("(") grammar noise(")"), "or" noise("(") grammar repeat(noise(",") grammar) noise(")"), "noise" noise("(") grammar noise(")"), "recursive" noise("(") name noise(",") grammar noise(")"), "error" noise("(") string noise(")")))) noise(")")) I think the implementation of all this should be very straightforward although I haven't tried it myself. It should give users of the language great flexibility to define their own sublanguages and also has the nice property of allowing almost all of the language to be explained in terms of itself. The reason expression is hard-wired is because there would be great apparent complexity and little gain from exposing to users the machinery needed to convert expression grammar into a nonambiguous grammar and to deal with operator precedence. *** Dave Moon: Arc modules Here is what I think about modules, in somewhat disorganized form. David A. Moon wrote (on 1/21/2002 12:00 PM): >The language comes with a bunch of top-level bindings that define the >standard language, but users can add more and can replace the standard >ones to customize the language. Top-level bindings should be organized >into modules, but that's a separate topic. A module is a map from symbols to top-level bindings. Package and namespace are other names that have been used for the same concept. All processing of code is done with respect to a current module. Bindings in that module control how code is parsed, macro-expanded, interpreted, compiled, and code-walked. Printing of code is also done with respect to a current module, if S-expressions are to be converted back into surface syntax and variable references inserted into macros are to be mapped into expressions that would access the same variable from the current module. Common Lisp's approach of modularizing the map from names to symbol objects instead of the map from symbol objects to bindings is wrong. It comes from a 1974 historic context before lexical scoping. C++'s and Java's conflation of modules with classes is wrong. So is C's conflation of modules with files. I don't think Arc needs any private/protected/public/static/extern type of concept. Does Arc need modules? I think so. Modules aren't just for armies of programming morons. Even solo programmers need a way to avoid name conflicts, and if Arc were to become an open-source type of success it would certainly need the ability to load libraries written by many different authors and thus would need a way to avoid name conflicts. A C-like convention with prefixes on symbols, all my symbols are named moon-xxx and so forth, would almost be workable, but modules are more flexible and produce much more readable code. Besides its set of bindings, each module has an export list, which is a list of symbols, and an import list, which is a list of other modules. The visible bindings in a module include not only the ones in the module's own set of bindings, but also the bindings exported by each module from which this module imports. This feature is just an abbreviation, but a very, very convenient one. I don't care whether Dylan's features for selective import and for renaming during import are adopted, but something has to be done to address name conflicts when importing. It could be as simple as taking the one that appears earliest in the import list. When a new module is created, it initially has no bindings of its own, but imports from a standard set of modules including the module that exports the features of the Arc language. The module can then be further modified by executing expressions that call module-manipulation functions. The value of a binding can be a module. As with macros, this is only useful if the value is known at compile time. This allows one to form a heterarchy of modules and follow a path of bindings whose values are modules to traverse that heterarchy. It's probably useful to include a module named com in the default set of bindings visible to a new module; this would allow copying the Java hierarchical naming convention for packages as Arc's naming convention for modules. When a binding is not exported from its module, the binding is still accessible from other modules, you just have to specify the containing module explicitly, perhaps starting from com. If your surface syntax has x.y mean (x 'y), and calling a module with a symbol as argument returns the value of that symbol's binding in that module, then you can use Java's notation and say that com.paulgraham.arc.core.cons(1,2) returns (1 . 2) even if cons is not accessible in the current module. This assumes the infix dot operator has higher precedence than the infix parentheses operator and is left-associative so the above expression means (((((com 'paulgraham) 'arc) 'core) 'cons) 1 2). >A top-level binding consists of one or more of the following: a value, >which is either an ordinary runtime value or a macro; a type declaration >if you allow those; a constant declaration if you allow those; a dynamic >binding declaration (special in Common Lisp) if you allow those; a syntax >declaration. I suspect that a top-level binding should also include a setter value, which like the regular value can be either an ordinary runtime value or a macro. Then if the assignment operator's left-hand side refers to a binding, it sets the value of that binding, but if the left-hand side is a call expression whose callee refers to a binding, it calls the setter value of the callee with the arguments from the left-hand side and the value of the right-hand side. There is a special form named setter which accesses the setter value of a binding, and is itself settable. This seems better than doing string-concatenation on identifiers. Of course a binding whose value is a module will have a setter value so a path like com.paulgraham.arc.core.cons can be used on the left-hand side, thus com.paulgraham.arc.core.cons = fn(x,y) x + y means ((setter (((com 'paulgraham) 'arc) 'core)) 'cons (fn (x y) (+ x y))) Of course actually executing that expression would really mess things up! *** Todd Gillespie: Multi-User I was reading 'Being Popular', and the following line arrested my attention: "It could be an even bigger win to have core language support for server-based applications. For example, explicit support for programs with multiple users, or data ownership at the level of type tags." The more I consider the implications of such an approach, the more I love this idea. Multi-user support as a basic part of a language, carried around in the type system. Over the weekend I worked over some rough designs, and I envision a simple version of such like this: 1. All variables have a user id. 2. Most functions operate in a context that has been silently filtered of any values they cannot access. Ideally, in this context programs are written that cannot make incorrect permissions choices. In LISP style, these restrictions are written in LISP and are redefinable. 3. Access to shared variables is handled by the runtime via a MVCC locking mechanism. 4. Macros & fxns operating in non-user or root context can access all the data, to handle the inter-user cases of ownership transitions, modifications, application specific locking, complex sharing patterns, and so forth. This strongly enforces a layering in programs, where any dangerous inter-user code is all found in this one small place. 5. The above is made more complex in that some users administrate groups of other users, so their runtime should scope data to their group, and tag their personal ownership vs. group. Possibly this is simplified by unifying users & groups (wherein any user may be a tree of permissions) I did some research, looking for any languages that have similar features already. I have found none so far; any attention paid to the basic idea is devoted to instantiation ("isolate users with threads") or connection ("use an object broker"). So I don't have a code comparison; in my mind code looks similar, except the new language would lack mutexes and permissions checking in 99% of the code. As an aside, having MVCC (wherein writers never block readers and vice-versa) in the language runtime could be a major boon to both speed and simplicity: it is perhaps a primary reason that so many DB-backed multi-user web applications have been successfully built by people who have never heard the term 'canonical resource ordering'. There are times when you need a more complex or ruthless lock, but 99% of the time MVCC is what you want. Using MVCC in the background could be very resource intensive, but as it assures that there is no blocking on reads, the total use on the wall clock is much lower, possibly allowing more users per processor (a key metric for a language focused on server apps). Also, since lock contention on multiple intersecting users performing a variety of actions is still one of the most difficult debugging problems around, this could be an extremely powerful feature for simplicity. I discussed this idea with friends, most of whom didn't see the gain in making a language over making a library in Java or its ilk, except for my claims of terseness and the need to alter the runtime. I *was* made to see that there is nothing in my design that could not be generalized to other uses, but I think having an explicit pronoun for the user construct makes the whole design much more immediate and valuable. *** Will Duquette: Tcl I was browsing your site, and happened to read the article on the history of T. In that article, the author mentions an early Scheme program called "markup". markup took as input files containing plain text with Scheme code embedded in curly braces, executed the Scheme code, and (presumably) wrote the result to the output along with the plain text. I was intrigued by this, because I'm the author of a similar program based on Tcl rather than Scheme (see http://www.wjduquette.com/expand) if you're at all interested. And when I was experimenting with Common Lisp recently I considered writing such a thing in Common Lisp. Eventually I rejected the idea, and the reason was a nifty if obscure feature of Tcl that I didn't find in CL (or perhaps I just missed it). You might want to provide in Arc. Tcl has an interactive shell. If you type something that's obviously a partial command, it displays a secondary prompt and allows you to keep typing for as many lines as you need. When it detects that you've entered a full command, it executes it. In Lisp terms, this means that Tcl can take an arbitrary string and determine whether it is a complete S-expression or not, taking into account all the nasty cases, such as mismatched parentheses in embedded text strings. Now obviously a Lisp shell can do the same thing. Where Tcl possibly differs is that there is a standard Tcl command, "info complete". If variable "mightBeComplete" contains some text, I can write code like this: if {[info complete $mightBeComplete]} { # Execute the code in mightBeComplete } Now, consider the markup program again. It's going to read text from the file, looking for the first (non-escaped?) left curly bracket. Then it's going to look for the matching right curly bracket, and execute the code in between. But unless it forbids right curly brackets to appear in the embedded Scheme code, it can't just scan for the next right curly bracket; it has to find the first right curly bracket that follows a complete S-expression (or perhaps a sequence of them). And that essentially implies parsing the embedded Scheme code. In Tcl, I don't need to do that. I search for the next close brace, and ask Tcl whether the intervening text is a complete command. If it is, I execute it; otherwise, I step onward to the next close brace and so forth. You're targetting Arc to the server side of things, and this kind of easy code embedding might be very useful. I also note that Tcl has a "subst" command, which lets you explicitly ask for variable and command interpolation into a string. My "expand" tool could have been implemented so that it simply read in the entire file of text and let "subst" do its thing. In this case, I wouldn't scan the file at all. But "subst" does its job all at once; it develops that at times it's useful to process the input sequentially. For example, consider this input text: Some text [capitalize]with embedded markup[/capitalize]. Because "expand" processes the input sequentially, I can make the "capitalize" tag begin to capture the input text. Then, the "/capitalize" tag can retrieve the input received since "capitalize" and convert it all to upper case. Tcl's "subst" command doesn't allow for that kind of manipulation. *** Rene De Visser: CL Lacks 1) Immutabile vs mutible In my lisp programs I find myself often deciding whether a particular data structure, will for the purpose of the program be: Immutable: In this case the data structure will be shared through all parts of the program, and if any part of the program wants a changed version of the datastructure it must copy it, creating a modified new data structure (so as not to cause wrong side-effects in the rest of the program). Mutable: The data structure represents a single thing that changes. It should not be copied, but changed directly. In common lisp data structures are not tagged as mutable or immutable so the above can not happen automatically. One must be extremely carefull to track which data structure falls into which category. A single error can lead to subtle problems in another part of the program. It also leads to the fact that most operaters in common lisp there are two versions. One which copies and one which modifies destructively. Also Immutable / Mutable is related to equality and so without data being tagged as mutable or immutable it is not possible to have a correct equality check (resulting in multiple equality operations in CL which still have problems) and no deep copy. 2) Data structure representation. Using STL in C++ one can change the container representation used, for example from unsorted list, to sorted list, to red/black tree without changing any code in the rest of the program. In Lisp, perhaps you are using lists to represent sets, however some of these sets are small, an so an unsort list is best, others need to be used in a sort manner and others are very large and need to be kept in a red/black tree as sets with 1000000 items don't behave very well otherwise. in Lisp each set (list) needs to be handle with different client code. Again any mistake (for example resulting from change in representation, or mixing up representations) results in subtle errors. Basically lisp needs a 'SET' data type in order to separate the representaton of a SET from the use of it. More broadly this could be interpreted that LISP needs more abstract functions (as in the STL) which don't care about the underlying data abstract. This is to a degree supported by the sequence concept in CL, but in practice this can't even cope with the SET problem above. *** J Storrs Hall: Semantic Incoherency I consider one of the worst semantic incoherencies of Common Lisp the incompatibility between "functions" defined as macros and the "functional programming" features. Both are higher-order concepts but each seems to be designed without any consideration of the other. *** Michael Vanier: LFSPs My Ph.D. research involved writing very complex simulations of nervous systems (real "neural networks", not the sorta-AI kind). I used a simulation package that was written in C and had its own (horrible, ghastly) scripting language, all written in-house. I extended the hell out of it, but the experience was so painful I don't think I can ever work on a large C project again. Since I want to continue working in this field, and since I love to hack, I want to re-write the simulator "the right way". However, I've been dithering on the choice of language. It's pretty clear that the core simulation objects have to be written in C++. C is too painful, and anything else is going to give an unacceptable hit in speed (simulation is one of those rare fields where it is impossible to have too much speed). But this is probably less than 50% of the code, maybe much less. The rest is infrastructure; scripting interface as well as a lot of support code. For scripting I want to use scheme or some lisp dialect. But the language choice for infrastructure is unclear. I could use C++, but that's unnecessarily painful especially since the infrastructure is not speed-critical. So I'd decided to use java; it's fast enough, there are a lot of libraries, and a lot of people know it so I could conceivably get others to work on it as well. After making this decision, my interest waned and I started another (unrelated) project. In the process of working on that other project (which involves scheme and Objective Caml (ocaml), an ML dialect), it occurred to me that ocaml would be a better choice than java for the intermediate layer. It's faster, has better type-checking, is much more powerful, and can even be used as its own scripting language because of the type inference and interactive REPL. If necessary, I could write a simple lisp-like language on top of ocaml with little difficulty. The C interface to ocaml is also quite mature, and there is a good-sized standard library (though nothing like the enormous java libraries). Also, it's much lighter weight than java. But here is the most important reason: it's a hell of a lot more fun to program in than java. Writing java code, though not particulary painful in the sense that C is painful (core dumps etc.), puts me to sleep. Writing ocaml (which is a "language designed for smart people" if there ever was one) is exciting. My motivation to tackle the project has tripled overnight. The interesting question is: why is ocaml so much more fun than java? Why are "languages designed for smart people" (LFSPs) so much more fun to program in than "languages designed for the masses" (LFMs)? One possibility is that LFSPs tend to be more unusual, and hence are more novel. I'll admit that this is part of the answer, but it misses the main point. *Any* new language is going to be novel, but the novelty usually wears off quickly. The real point is that LFSPs have a much greater support for abstraction, and in particular for defining your *own* abstractions, than LFMs. This is not accidental; LFMs *deliberately* restrict the abstractive power of the language, because of the feeling that users "can't handle" that much power. This means that there is a glass ceiling of abstraction; your designs can only get this abstract and no more. This is reassuring to Joe Average, because he knows that he isn't going to see any code he can't understand. It is reassuring to Joe Boss, because he knows that he can always fire you and hire another programmer to maintain and extend your code. But it is incredibly frustrating to Joe Wizard Hacker, because he knows that his design can be made N times more general and more abstract but the language forces him to do the Same Old Thing again and again. This grinds you down after a while; if I had a nickel for every time I've written "for (i = 0; i < N; i++)" in C I'd be a millionaire. I've known several programmers who after only a few years of hardcore hacking get burned out to the point where they say they never want to code again. This is really tragic, and I think part of it is that they're using LFMs when they should be using LFSPs. I note with some interest that the original Smalltalk designers are *still* writing code in Smalltalk; the Squeak project was founded by them and is mainly maintained by them. I'm not sure if Smalltalk was intended to be a LFSP (actually, I'm pretty sure it wasn't), but it does have good abstractive power and does scare off newcomers, so I guess it is one :-) So the bottom line is: computer languages designed for smart people don't just liberate the language designer, they liberate the programmer as well. *** Peter Norvig: Syntax Some comments: * car and cdr are warts; why not hd and tl? * If Unix won, then the symbol red should be red, not RED * For the Scheme I wrote for Junglee's next-generation wrapper language, I allowed three abbreviations: (1) If a non-alphanumeric symbol appeared in the first or second element of a list to be evaled, then the list is in infix form. So (x = 3 + 4) is (= x (+ 3 4)), and (- x * y) is (* (- x) y). And (2), if a symbol is delimited by a "(", then it moves inside the list. So f(a b) is (f a b), while f (a b) is two s-exps. And (3), commas are removed from the list after infix parsing is done, but serve as barriers to infix beforehand, so f(a, b) is (f a b), while in f(-a, b + c), each of -a and b + c gets infix-parsed separatly, and then they get put together as (f (- a) (+ b c)). This seemed to satisfy the infix advocates (and annoy some of the Scheme purists). You might consider something like this. * Non-chars in strings doesn't make sense to me; I always think of strings as arrays of type char; the tricky issues are whether strings are mutable, and if so, extensible, and if they're mutable, which operations are efficient (i.e. should removing the first char be O(1) or O(n)). The other issue is which library functions should only work on strings, and which work on arrays or sequences. Finally, you need an extensible protocol for sequences (like Dylan, and in-the-works in Python). * If you have (do (= x val) ...) then you don't need let and with. * My inclination would be to have ds, opt and get be non-alphabetics, e.g. (def f (a (: (b c)) (? d 4) (& e f)) ...) This way its easier to see what's a param and what isn't. *** Peter Norvig: Another Onion I think conflating pairs and lists is an onion that accounts for several pages worth of equivocation in ANSI CL. Both pairs (lightweight structures that can hold two components) and lists (hold arbitrary number of components with O(n) acces time) are useful ideas, but Lisp uses a cons cell for both for two reasons: (1) at the time, data abstraction was not in vogue, and (2) if you've only got 2 bits of tags, you want to be parsimonious with your basic types. But conflating the two means that every function that takes a list must specify what it does when passed an improper list. If it were my language, (cons x y) would raise an exception if y is not a list, and I'd consider whether to separately have (pair x y), which would be accessed with first and second, not first and rest, or whether to just use (array x y). If you do have pair as a subclass of array, you might also want to consider triplet. As for (+ "foo" "bar"), I'm somewhat against it, because it means (a) + is no longer commutative, and (b), no longer associative: compare (+ (+ 2 3) "four") and (+ 2 (+ 3 "four")). I've actually seen errors of the second kind in Java programs: sys.out.println("answer = " + x + y) instead of ("answer =" + (x + y)). On the other hand, I admit, after using Python, that "foo" + "bar" feels natural, to the extent that I wasted 20 minutes debugging a Perl program where I used + instead of . for string concatenation. Whatever you decide about the operator (I'm thinking I'd rather write "foo" & "bar"), I recommend NOT having an automatic coercion from object to string; I think Python is right in raising an error for this. In Perl its even worse, because my strings were coerced to numbers (0). *** Michael Vanier: OO I think one thing that lisp really needs that C and C++ have is the ability to lay out data types in a very low-level fashion. This is not something that should be done lightly and perhaps not at all unless you are a very experienced programmer, but it's critical to getting really good performance in many cases. I suspect the reason why lisps don't have it is that it can interact in hairy ways with the garbage collection scheme, which makes life much harder than it is in C/C++. There are also issues of array bounds non-checking etc. and then before you know it all safety bets are off. This leads naturally to the idea of having a "low-level lisp" in which such programs can be written and then used safely by the standard lisp. I believe this is how prescheme/scheme48 worked (works?), although this was done just to bootstrap the system AFAICT. There is also an analogy to C# and Modula-3 (garbage collected languages that allow for "unsafe modules"). I would really like it if Arc offered some facility like this, but I don't understand the issues well enough to know if it's feasible or not. *** Matthias Felleisen: OO Paul, you don't have to go to bunches of bytes and do things atop. In PLT Scheme v200 everything is a struct, yet these values won't respond with #t to struct? because we don't want anyone to think of these things as structs. Also, we are currently designing extensions so that programmers can define functions and ask the environment to treat them as primitives, especially with respect to soft typing. So you will say something like (: f (number -> number)) (define (f x) ... (+ x ...) ...) and f is treated as a new "primitive". If the soft typer sees that you misapply it somewhere, it will be hi-lited in red. It's up to you to think of this as a typer and you will refrain from running this code, or you may decide that the type system is too conservative and you run the code anyway, knowing that we enforce the types anyway. The key is not just to write the types down. You need to analyze and enforce them. And in "Lisp" (or Perl or Python or Curl) the question is how to scale this to higher-order types. Even more general: we have known fro 50 years how to build a language atop bunches of bytes. PL design will make progress if we do the exact same thing but without ever thinking what is inside the machine. Computations are about value manipulations. At the moment, we represent values in bits and bytes, but why think about those. It just pollutes program design. *** Mike Thomas: Squeak - I think that an interesting way to test language ideas is the development model used for Squeak (Smalltalk dialect) - anyone, even raw beginners, can add anything they like and stuff which is popular survives simply because it is used. (Survival of the fittest.) - I like the work being done on region based memory management eg: MLKit at the University of Copenhagen, and a C dialect from Lucent technologies (name escapes me). - Coherent FFI from the beginning and an IDL compiler targeted at ARC to allow ease of interfacing to OS and third party libraries. Glasgow Haskell compiler has successfully used this strategy. - incorporate type inferencing and optional type hints. I make these comments from the point of view of a person who leans to the Haskell/SML side of programming with fond memories of Scheme (apart from the fact that my bread and butter is obtained from the C family), with a background in mathematical, game, GIS and large oil industry technical software development. *** Jecel Assumpcao: Objects There is a lot of good advice in your website but you don't seem to be following it regarding objects: don't put in stuff for others to use or to be politically correct. If you don't like them yourself, just leave them out. My own languages have always been OO and I think objects really help in any non trivial program, but each person has his own preferences. There are several different styles of object systems and I think that the kind in E or Beta might be a better fit for Arc and might be more useful to you. Let's do the classical cartesian point example. My first try will avoid adding syntax at the cost of making most expressions look infix or even postfix. A list (c x y z), where c is a closure, has the same effect as evaluating (x y z) in c's context. And make-closure returns the value of the current execution context. (define point (x y) ((define + (p) (point (x + (p x)) (y + (p y)))) (define pr () (("point " pr) (x pr) (" " pr) (y pr))) (make-closure) )) (((point 3 4) + (point 5 6)) pr) The reason why the "+" ended up as an infix operator was that it had to be preceded by a context (object) so that the normal "+" wouldn't be invoked. Note that strings and numbers also were treated as closures in the code above. This is not very Lisp-like, so I will add the following syntax element: a ":" will indicate that the element following it is the context in which the surrounding expression should be evaluated: (define point (x y) ((define + (self p) (point (+ :x (:p x)) (+ :y (:p y)))) (define pr (self) ((pr :"point ") (pr :x) (pr :" ") (pr :y))) (make-closure) )) (pr :(+ :(point 3 4) (point 5 6))) Well, it is still ugly but at least it is starting to look like Lisp. The idea is to use a few general constructs instead of a lot of specialized ones. For example, using functions as "classes" has the nice side effect of us not having to do anything extra to get a single inheritance-like functionality - lexical scoping will do just fine. With the above notation, you don't have to dispatch on the first argument (your cons complaint). The "+" would have to be rewritten in a more symetrical form before we could have (pr :(+ (point 3 4) :(point 5 6))) however: (define + (a b) (point (+ :(:a x) (:b x)) (+ :(:a y) (:b y)))) All those (:b x) are very ugly and should be simplified along the lines of your ("hello" 2). This isn't really a solution, but just a general direction that I feel could be followed. If objects are convenient enough, I don't think you will the the database types. While "fn" is cleaner than "lambda", I would rather use "?" since it isn't harder to type and stands out better in the code. It is also mnemonic for "mystery function" ;-) *** John McClave: End Test I am interested in the 'given' overhead of the pervasive end loop test and its possible elimination. First a hardware analogy. Is it possible to use a hardware settable interrupt to replace the end of loop test and just use unconditional jumps to keep the loop runnable. In Lisp-like coding where end of list tests are part of most recursive function definitions is it possible to preclude this overhead by some form of latent daemon tagged onto the list structure and some form of unconditional jump. Could a closure of some sort be triggered to end the function process. *** Avi Bryant: Continuations So here's a cool code example for your consideration, very slightly abbreviated: ----------------------------------- Object subclass: #Continuation instanceVariableNames: 'stack ' classVariableNames: '' poolDictionaries: '' category: 'Kernel-Methods'! !Continuation methodsFor: 'invocation'! value: v thisContext sender: stack copyStack. ^v !! !Continuation class methodsFor: 'instance creation'! fromContext: aStack ^super new stack: aStack copyStack !! !BlockContext methodsFor: 'continuations'! callCC ^self value: (Continuation fromContext: thisContext sender)! ! ------------------------------------- Of course, it's easy to provide an equivalent example in Scheme - call/cc already exists. But it's completely impossible, as far as I know, to provide an equivalent example in CL, even a really ugly one. I bring it up partly because I know you realize the usefullness of continuations for web development, and yet I haven't seen any mention (did I miss it?) of continuations in Arc. And I know from personal experience that a closure-based continuation-passing style, as it sounds like you adopted for ViaWeb, is useful but far more frustrating and less transparent than true continuations. So, at the very least, let me request that full continuations be included in Arc. But I also bring it up because, to my surprise, I have found myself using Smalltalk rather than Lisp for my web development - although I deplore the lack of macros, in cases like continuations I have found smalltalk *more* powerful than lisp. Say what you like about "everything is an object", the fact that stack frames (and closures) are reified makes certain games much easier. Here's another code snippet: snapshot := self copy. context := thisContext sender. [context isNil] whileFalse: [((context isKindOf: MethodContext) and: [context receiver = self]) ifTrue: [context receiver: snapshot]. context := context sender]. Since using a functional style is all but impossible in smalltalk, this offers an interesting alternative - switching midmethod to a copy of the current object. Invoked right after a call/cc, it ensures that each invocation of that continuation occurs in a separate context of sorts, which is very useful in a browser-back-button world (yes, it's also dangerous and limited, but it's very pragmatic if you know what you're doing - the actual version I use also has various sanity checks that only make sense in the context of my framework). So will we ever see this kind of reflective power in a lisp? Or am I going to have to write a prefix-syntax parser for smalltalk to have my cake and eat it too? *** Seth Gordon: Syntax I have a theory that one reason people get techy about all of Lisp's parentheses is that people's brains have trouble parsing deeply recursive sentences, so when they see a piece of code with deeply nested structures, it looks very intimidating. Therefore, I suggest creating an infix operator ~, defined as: (f ~ g x) ==> (f (g x)) (f ~ g x ~ h y) ==> (f (g x (h y))) To pick a piece of Scheme code that comes to hand: this operator would let one represent (define serve ; use default values from configuration.ss by default (opt-lambda ([port port] [virtual-hosts virtual-hosts] [max-waiting max-waiting]) (let ([custodian (make-custodian)]) (parameterize ([current-custodian custodian]) (let ([listener (tcp-listen port max-waiting)]) ; If tcp-listen fails, the exception will be raised in the caller's thread. (thread (lambda () (server-loop custodian listener (make-config virtual-hosts (make-hash-table) (make-hash-table) (make-hash-table))))))) (lambda () (custodian-shutdown-all custodian))))) as (define serve ; use default values from configuration.ss by default ~ opt-lambda ((port port) (virtual-hosts virtual-hosts) (max-waiting max-waiting)) ~ let ((custodian (make-custodian))) (parameterize ([current-custodian custodian]) ~ let ([listener (tcp-listen port max-waiting)]) ; If tcp-listen fails, the exception will be raised in the caller's thread. ~ thread ~ lambda () ~ server-loop custodian listener ~ make-config virtual-hosts (make-hash-table) (make-hash-table) (make-hash-table)) (lambda () ~ custodian-shutdown-all custodian)) *** Trevor Blackwell: Modularity It's worth distinguishing a third category of programming environments: one where a solitary hacker uses many open-source modules written by various folks. I consider this a very good environment, unlike pack programming. But it has some requirements in common with it. Various languages succeed or fail dramatically in making this work smoothly. In C/C++, I find I can almost never use other people's stuff conveniently. Most have some annoying requirements for memory management, or use different string types (char *, STL string, GNU String, custom string class), or array/hash types, or streams (FILE *, iostream) or have nasty portability typedefs (int32, Float). CL has fewer such troubles since it has standard memory management & string/hash/array types, but there are often nasty namespace collisions or load order dependencies, especially with macros. Most chunks of CL code I've seen (which are mostly PG's) won't work without special libraries, which have short names and are likely to conflict with other macro libraries or even different versions of the same macro libraries. Perl packages work pretty well, because everyone agrees on how basic types like string, stream, array and hash should work, and the normal use of packages avoids any namespace collisions. I don't think I ever had a open source Perl module break something. I guess Java also prevents conflicts, but you have to give up an awful lot to get it. The huge assortment of open source Perl packages testifies to the ease of writing them. I've taken a few packages that I wrote for my own purpose and found it very easy to make them self-contained and contribute them. Usually, people only write C++ libraries for very large projects, and it requires a different programming style from what you'd use for your own code. Anyway, I suggest that Arc's modularity features should be designed to support use of open-source modules in single-hacker projects, not to support pack programming. This suggests, in addition to what CL already has: - that modularity be a convention (CL, Perl), not something enforced by the compiler (Java). Sadly, I find the CL module system way to cumbersome to actually use. It has to be convenient enough to use in ordinary programming, not a special thing you use when you're writing a module for external use. - there be a sufficient basic library that everyone won't have to write their own basic library. I'm talking about the sort of functions from On Lisp like last1, in, single, append1, conc1, mklist, flatten. If everyone has their own, then you have to package it up with any code you publish (making it awkward to publish) and it'll be hard to read other people's code with different definitions of basic functions. *** Mike Vanier: MATLAB Point 1: Matrix manipulation is a canonical example of something that deserves its own dialect. DO NOT build this in to the core. Point 2: Having matrix functions like eig() for eigenvalues is trivial and can be done in any language if the library exists. So that's not a big deal. Point 3: Syntactically, you need this: a) A nice literal data entry notation for arrays. Matlab's notation is NOT nice because it doesn't scale for 3D+ arrays. What would be better is e.g. my_array = [[[1 2] [3 4]] [[5 6] [7 8]]] This scales nicely. Hey, S-expressions again! :-) b) Functions that operate by default on arrays as opposed to on scalars (or really on both). So, sin(my_array) will compute the sin of all the elements. c) Various functions for manipulating and transforming array shape. In 2D arrays, you have resize() and transpose (written ' in matlab), but these can be generalized. d) [Hard!] Higher-order functions to specify the "rank" of a function call. In simple terms, this refers to which axes of the array are used in the computation. For instance, the sum() function applied to a 2D array might mean "return the column sums" or "return the row sums". The only language that gets this right that I know of is J (www.jsoftware.com). J is beautifully designed, but I don't like the line-noise syntax. If Arc had a dialect that could handle this material as elegantly, I would use it. Also, J is closed-source (but free), yadda yadda (the Rebol/Curl syndrome). Look at the J documentation; it's all available for download. Quite mind-stretching. Aside from scalar data types (characters and several types of numbers) the only data types are multidimensional arrays which hold a uniform data type and "boxed arrays" (which are arrays that can hold arbitrary data types, including other arrays). Very neat stuff. *** Neil Conway: Ruby 1) Blocks: 90% of the benefit of functional programming, no weird academic ML-type stuff ;-) ary = [1,2,3] ary.each do |element| puts "Item: #{element}" end 2) Everything is an object: 5.times do puts "hello, world" end 3) Dynamic class definitions and singleton method definitions: class Foo def bar "bar" end end f = Foo.new f.bar => "bar" # add more definitions to Foo class Foo def baz "baz" end end f.baz => "baz" def f.another_method "Singleton" end f.another_method => "Singleton" Foo.new.another_method => error, no such method (you can also add definitions to built-in classes like String and Array like this -- making it a convenient place to store utility functions) 4) Built-in design patterns: (or rather, a mixin implementation that is flexible enough to support this kinds of patterns very naturally) require 'singleton' class FooS include Singleton #mixin the Singleton module def initialize puts "FooS init" end def bar "bar" end end f = FooS.new => error, 'new' is private f1 = FooS.instance # "FooS init" printed to screen f2 = FooS.instance f1.inspect => # f2.inspect => # If you plan to draw any language features from Python for the design of Arc, I'd also suggest looking carefully at Ruby: IMHO, it offers all of Python's features with a more elegant design, as well as a number of very interesting features from languages like Smalltalk. *** Olin Shivers: Regexps If you provide regexps, then you should feel a moral obligation to also provide context-free grammars, i.e., a parser tool, as well. It's a big problem that languages that provide cheap, easy regexp matching encourage programmers to use regexps to analyse things that are *not* regular languages. Oops. So you get heuristic code that works... most of the time. Perl is the classic example. If you steal Manuel Serrano's little built-in, lightweight grammar language, then people can use regexps to recognise regular languages and CFGs to recognise more complex things. *** Pinku Surana: Brevity - Check out Todd Proebsting's TR on "Programming Shorthands". It's about "pronouns" in PL syntax. http://www.research.microsoft.com/~toddpro/ - Syntactically, Haskell bested ML's "fn" for lambda: Haskell: \x -> x + 1 ML: fn x => x + 1 Scheme: (lambda (x) (+ x 1)) - Haskell treats strings as lists of characters. You might find their experience interesting. - pattern matching syntax for lists is short & concise. - List comprehensions (syntax within []'s below) can do powerful things with brief syntax: quicksort [] = [] quicksort (x:xs) = quicksort [y | y <- xs, y=x] An equivalent Scheme program: (define (quicksort lst) (if (null? lst) lst (let ((x (car lst)) (xs (cdr lst))) (append (quicksort (filter (lambda (y) (< y x)) xs) (list x) (quicksort (filter (lambda (y) (>= y x)) xs))))) *** Kragen Sitaker: Comments on Arc fn ---- Since setf plays such a prominent role in Arc, there should be a version of fn that allows you to define a 'setter' function as well as a getter. Compound = Functions on Indices ------------------------------- I wish every language used the same form for function call, hash fetches, and array indexing, the way arc does. It's a brilliant idea. In order to be able to write new compound data types in Arc, there needs to be a way to create values that can be called as functions and also implement other operations. This can be done in a straightforward way with the overloading system described for objects in "Arc at 3 Weeks": overload apply. (Apply should probably be called "call", if we're rebuilding Lisp from the ground up.) Pronouns -------- 'it' being bound by iteration and conditionals is also a brilliant idea. Python is working toward getting something similar, but nobody's done it yet. FORTH has sort of had it with DO and I and J for a while, but not in conditionals. (One might argue that FORTH is almost all pronouns; very little data is named.) And, of course, Perl is full of pronouns: <>, $_, the current filehandle, etc. Shouldn't there be a form of 'each' and 'to' that bind a pronoun, too, as in Perl and FORTH? (each '(a b c) (pr item)) (to 10 (= (ary i) (* i i))) >From the examples, it looks like 'keep' gives a way to write a list comprehension, and 'sum' gives a way to write foldl (+) (APL's +/) over a list comprehension; except, of course, you can write "list comprehensions" that iterate over non-list-like things. Some other languages (Python, for example) are handling this by making almost anything you can iterate over look like a list; list comprehensions are nice, compact, and readable. DBs ---- The proposed semantics for fetching from a DB are that nonexistent keys return the current value of the global variable *fail*, which defaults to nil. This is wrong, for two reasons: - If mydb is a DB that doesn't have any false values in it, and nil is false, this code will almost always work, but it's broken: (if (mydb foo) (x it) (y)) because it assumes that *fail* is set to something in particular. A correct variant of this code (assuming = can set a global variable) is (do (= *fail* nil) (if (mydb foo) (x it) (y))) To write correct code, you'll almost always have to set *fail* like this before a test for existence, because you must ensure it's set to something you know can't legitimately be in the DB. Making correct code be much more verbose than incorrect code that almost always works, but breaks when a completely unrelated part of the program changes, is probably a bad idea. (I know Graham is designing a language for good programmers, who will presumably take the extra time to write the correct form, but I think that's going a little too far.) - there's a much weaker reason, which is that *fail* might actually be the value of an item in a DB --- for example, a DB that contained the program's global variables. I like Python's approach to this problem. Fetching a nonexistent value in the normal way will raise an exception, which is usually the right thing. So you could write the above code as (try (x (mydb foo)) 'KeyError (y)) But there is also an operation 'get', which returns a specified default value if the requested key doesn't exist, and is *much* briefer than the corresponding operation with *fail*; compare: (do (= *fail* "") (mydb foo)) (get mydb foo "") or even (mydb foo "") and an operation 'has_key', which allows the operation to be written as: (if (has_key mydb foo) (x (mydb foo)) (y)) although this is nearly as verbose as the correct version with *fail*. Another approach: The two-argument variant could actually be a macro (since macros are first-class objects, your DB can be a macro) which evaluates and returns its second argument only if the lookup fails. This almost works to shorten the example above to (x (mydb foo (y))), but that calls x even if the lookup fails. A third approach: has_key is an instance of a wider concept of 'exists'; in Perl, 'exists' applies only to hash lookups, but it applies to a wide variety of operations, essentially everything it makes sense to call setf on. (exists var) (exists (sqrt -1)) (exists (mydb 'bob)) (exists (car list-or-nil)) Python has a fourth generic operation as well: deletion. You might want to check to see whether (foo 1) exists, get its value, set its value, or delete it, regardless of whether foo is a vector, an associative array, or even possibly something else altogether. A fourth approach: in Icon (and, sort of, in Prolog, and, in another way, sort of, in Lisp), expressions can return any number of values, including none; no value returned is boolean false. If a dictionary lookup returns no values if it fails, then (if (mydb foo) (x it) (y)) can be correct, because 'no value' is distinct from any particular value, even nil, just as the empty string is distinct from any particular character, even NUL. Lisp's assoc family takes another approach: return something containing the correct value, not the value itself. I don't like this; although pattern-matching could make it less painful to use, I don't think it could provide any better syntax than the 'try' approach above. The default of being indexed by 'eq' is not very good for string processing, unless you use string->symbol a lot --- which might be a good idea for string processing, but will certainly make your code more verbose. If you want the option of using arbitrary equality operators for DBs, and you want some DBs to be hash tables, there needs to be a way of associating a hash with an equality operator. assignment ---------- The current proposal for the language has (= var form) either declare a new lexical variable valid until the end of the current block, or change the value of an existing variable. I really prefer languages that have different forms for declaring a local variable (with an initial value) and changing an existing variable; this is one of Python's big deficiencies. I care for the following three reasons: - I can see immediately whether a piece of code uses assignment; I avoid assignment in some parts of my code for testability - the difference between modifying variables declared in a larger scope and shadowing them with local variables with the same names is obvious, and there's an obvious way to do both - misspelled variable names get detected instead of silently producing incorrect results (or occasional run-time errors) I do like the way C++ lets you declare a variable in the scope "the rest of the block", and I think the Scheme/Lisp/C way leads to unreadably-indented code or variables being declared too far from their use. Syntax ------ There is another argument for having syntax, other than that it makes programs shorter: it can make programs more readable (in the sense that source code generally is more readable than bytecode, or mathematical formulas are generally more readable than English sentences describing the same formula). In other words, it makes the language more accessible to anyone, good programmers included. I don't think of decompiling bytecode as "dumbing down". Unicode offers a plethora of punctuation; perhaps tasteful and restrained use of some of this punctuation could make the syntax more readable than any possible ASCII syntax. For example, I'd be inclined to write infix bitwise Boolean operators with real Boolean operators instead of ASCII stand-ins, and I'd be inclined to declare variables with some conspicuous piece of punctuation. (In ASCII, I'd probably prefer := if it didn't already have so many closely-related meanings in other languages (Pascal, GNU make).) Implicit progn -------------- Graham is right on the mark here; eliminating implicit progn means we can eliminate many of the uglier and more verbose pieces of Lisp syntax. (Implicit progn seems to have crept into the 'to' syntax, though; one of Graham's examples is (to x 5 (sum x) (pr x)).) Pattern matching ---------------- Destructuring arguments are very good. I'd like to have full ML-style pattern matching, though; it's possible to write that as a macro, but I'd like it to be part of the language. Recursion on strings -------------------- Recursing on strings can be very efficient if the strings are allowed to share structure the way lists do and your compiler is smart enough to reuse storage that can be statically proven garbage. (Or even if it can allocate it on the stack.) Canonical symbol case --------------------- The "Arc at 3 Weeks" paper's examples show symbols being canonicalized into upper case. I don't like this; upper-case is hard to read. I assume it's an artifact of the first Arc implementation running in an existing Common Lisp system. Unicode makes canonical case impractical, anyway, because case-mapping is very complicated. Classes ------- The proposed overloading semantics won't give an extensible numeric system. My tastes in object systems appear to be markedly different from Graham's; mine are largely founded in experience with Python. So the object system I want is probably not something he wants: Classes, like compounds, should be called to instantiate objects; this makes the calling code shorter, and also allows you to turn classes into factory functions and vice versa without changing the calling code. There doesn't seem to be a provision for a constructor or methods (other than overloads of existing functions). There probably should be. Class instances should not be callable unless they overload apply, for the following reasons: - there's no need to have one syntax for functions the objects overload and another syntax for getting attributes of the objects. After I say (= pt (class nil 'x 0 'y 0)) (= p1 (pt)), I should be able to ask for (x pt). This does have the disadvantage that "class" must somehow bind x and y in my calling environment if they aren't already bound; I think that's easily accomplished with a macro. (This also removes the necessity for quoting x and y.) (Hmm, what if x is bound to nil? Should we then make nil callable? Perhaps we should just raise an error if x is bound in the local environment, and require that it be quoted or otherwise marked when it's defined as a method in this manner. Quoted attributes would define functions in your local environment which sent doesNotUnderstand to the object passed as their argument, and then overload those functions for this particular object type.) - the current example (p1 'x) prevents p1 from being able to act like some other kind of compound. Most dynamic object systems let you get an object of a class you've never heard of and invoke the correct methods on it. Doing this with the object system described above requires that you have some variable bound to the method "x" defined in "pt" above. I thought this was a problem at first, but I don't think it is now; presumably if your code was written without knowledge of that "x", it won't mention "x" (at least, not meaning that method), and if it was written with knowledge of that "x", it's presumably because there's an interface somewhere that defines "x" to mean something. That interface can be defined in some module somewhere that both "pt" and my calling code import, in a form like this: (defmethods 'x 'y) This has the advantage, not shared by most dynamic object systems, that we can have many methods named "x" without conflict, and we can be sure that the "x" we're calling is intended to implement the interface we expect it to, not some other interface that has a method called "x". Like others, I haven't yet found a compelling use for multiple inheritance, and there are compelling reasons against it. No doc strings -------------- No elisp/Python-style documentation strings are mentioned. I want these. *** Miles Egan: Generators The idea is to provide an abstract & extensible iteration interface for collections. This makes it easy to write code that works with any collection and to define new collection types that work with existing code. A brief example: class Series def each for i in 1..3 yield i end end end class Words def each for i in ["one", "two", "three"] yield i end end end Series.new.each { |i| puts i * 2 } Words.new.each { |i| puts i * 2 } would output: 2 4 6 oneone twotwo threethree I can define a new class that provides an "each" method that returns lines from a file, rows from a database, or widgets from a gui and yield values to the same code block. Alternatively, I can supply arbitrary code blocks to my object's "each" function without coupling the code in the code blocks with the collection's storage implementation. The crucial feature is the "yield" keyword. I don't know of an equally abstract, general, and programmable counterpart in CL. *** Ed Cashin: Things I Want Things that I was very pleased to hear: * arc is UNIX aware! -- languages that acknowledge UNIX are much easier to use for day to day things as well as for big projects. -- they're also easier to learn, since you know a lot of the concepts already (e.g. a socket) * arc knows that I don't want busy work :) -- short but meaningful names -- arc has fewer parentheses when there's no ambiguity * syntax as abbreviation sounds good. I especially like the way syntax can visually mimic data structures (see below). * "it" pronoun is a nice feature * recognition that "do" in common lisp is hard to read. I like the loop constructs in arc so far, except I prefer "times" to "to". I like ruby's syntax: 3.times { print "Ho" } * "keep" and "sum" are very natural and handy looking. Things that worried me: * strings as lists may be nifty, but I want string operations to be lightening fast at runtime. the other could be a "string-list" or something. * when I read that programmers might use indentation to show structure, it worried me that arc might be whitespace dependent like python. I know a lot of good programmers who don't want anything to do with python because it is whitespace dependent. * is the backquote syntax an onion or the best compromise? It's a little weird, but I can't think of anything better. * the "with"/"let" thing ... more simple would be making let do what you have with doing and then forget the one-variable form. * will objects be able to have access controls (e.g. "private")? if not, is that a problem? Things that would be exceptionally nice (literally): * forth-like assembler * easy to use C libraries * easy to make standalone binaries (shared-lib dependency ok, but not external program dependency) (go figure.) * non-buggy, non-experimental native thread support (def do-some-work () ; ... ) (push my-thread-list (thread-new do-some-work)) Maybe something like ... (thread-lock 'balance) (= balance (* interest balance)) (thread-unlock 'balance) Ruby has nice threads support, but they are really juggled by a single-threaded ruby process. Perl's thread support is still experimental, according to the docs in the latest release. * internationalization -- this seems like a hairy thing, especially with LOCALE environment variables running everywhere, but it might be something to brag about -- maybe unicode support or whatever is in vogue now Things that would be nice (like ruby or perl): * case-sensitive symbols I'm not sure whether this is a win or not, but I was kind of turned off when I first learned that lisp internally thinks of symbols as uppercase ... it reminded me of crippled MS-DOS based filesystems. * syntax for associative data structures makes for visibly obvious initialization. e.g. ruby: keytab["UP"] = "^" keytab = { "LEFT" => "foo", "RIGHT" => "bar", "X" => { "A" => "x a", "B" => "x b", "C" => "x c", } } * excellent regular expression support -- up-to-date fancy stuff like perl's /(\w+?)fancy!/ -- must run fast * mixins from ruby are pretty neat -- in C++ and Lisp, there's multiple inheritance; in Java and Objective-C there's single inheritance plus a way to say "and I can also do this and this" in ruby there's a way to "mix in" code too http://www.rubycentral.com/book/tut_modules.html * no dependency on source file names or locations in ruby and lisp namespace/module issues are not related to source code files, which is very convenient, especially compared to the way Java does public classes and package hierarchies based on filename and file location. * adding to the definition of a class after defining it * a way to specify a multi-line section of text in the source both perl and ruby have here documents, where interpolation can be turned on and off. It's very readable and very convenient. * a searchable, browsable web interface to a plethora of arc code comparable to CPAN * a way to do multiple assignment naturally and conveniently e.g., ruby: a, b = 14.divmod 4 Some ideas: * Objective-C's method syntax I don't have time to elaborate, but it strikes me as the most readable, self-documenting syntax with the exception of keyword parameters like in lisp. * you could sell the way lisp (and so I'm guessing arc) encompasses the OO model of classical OO languages while providing more. you discuss that in _ANSI Common Lisp_, but I need to reread that chapter, to be honest *** Mikkel Fahnøe Jørgensen: Strings You mention string handling will be important - I agree. Please do not make the mistake of ignoring multiple charactersets. You mention strings may contain non-printable charactes and even character generatingn objects. I think this is a fine concept. And it may go well with multiple charactersets. I had a discussion with Matz who designed the Ruby language. Currently Ruby only supports 8bit characters with a certain level of UTF-8 support. I argued that he should just make it Unicode. But it turns out that a number of popular Asian scripts (language encodings) do not fit well with Unicode. Also, Unicode is not really fixed width, hence the 2-byte representation would sooner or later proove to be incomplete. Matz is currently working on internationalization but I believe his solution would basically be to have a string as a character string with addition type info that stores the encoding. This simple idea is very powerful once you think about it. Conversion between data from old applications to new Unicode charactersets become much easier because the string carried around will tell you what it is, allowing you to do the proper conversion. String concatenation will be require a conversion of at least one of the strings if the types doesn't match. This is like ordinary type coercion. Handling strings as typed instances of an underlying binary stream is a good solution. Different representions can be viewed like different numerical representations (float, double, integer). In this sense the concept of a string as a list works well with multibyte represenations where you cannot easily index a particular character without traversing the entire string. Originally I thought multibyte charactersets were outdated, but now I'm convinced that we just have to be better at dealing with them. *** Miles Egan: Bytecode I agree that the performance limitations of bytecode are a big drawback, although I don't think bytecode-compiled languages need necessarily be as slow as Python or Java. Ocaml, for example, compiles to bytecode or native code and Ocaml bytecode is often quite efficient. The big advantage of bytecode systems, of course, is that you can transparently mix code written in different languages. This could save you the trouble of writing all those boring but essential support libraries - HTML and XML parsers, database access libraries, image processing tools, network protocol libraries etc. I think that it's really all the high-quality CPAN modules out there that keep Perl going at this point. I'm not any kind of expert on .NET. As I understand it, it is comprised of a common bytecode runtime environment (the CLR), the C# programming language, support libraries for system, network, and web services, and identification and authentication protocols. Microsoft intends it to be the infrastructure for their next-generation web services and the whole environment has been designed with that in mind. Several prominent free software groups, most notably the Ximian folks, have announced plans to build open-source implementations of .NET. You can read about Ximian's plans here: http://www.go-mono.com/. I mention this to you because it sounds to me like they're trying to solve some of the same problems. They plan to implement their own runtime and C# compiler, but I think a better language that could transparently access all the library code the enviroment provides would be very appealing to a lot of open-source web hackers and a lot easier to implement than a completely from-scratch language and web development environment. There are drawbacks too, of course, but I think there are some intriguing possibilites. *** Scott Draves: Zope zope is the closest thing to a lisp machine that i've ever seen, but it runs as a web server, and instead of lisp you have python, which is lisp without parens or macros. there is now a layer on top of zope that adds an asp toolkit oriented towards corporate communications and publishing. the company does consulting using the tools they support as open source. *** Matthew O'Connor: type-narrowing I have a BIRD and a PLANE that each inherit from FLYING_THING. Now some part of my code knows about FLYING_THINGs and in there I have a list of them. I pass this list of FLYING_THINGs to another part of my code that knows about PLANEs. In this section of code I have to be able to take a FLYING_THING and cast (type narrow) it to PLANE if it is indeed a PLANE. This is what OCaml doesn't seem to allow me to do when all of the other OO languages I know allow this. *** Simon Britnell: running remote programs from command line The idea is primarily that it would be easy to publish (small) applications by placing a directory tree on an http server. Presumably there will be some facility to load files, libraries, etc in arc. I think that when a program is run from a url, the directory base in the URL should effectively become the cwd for that session. So that when loaded from http://foo.com/bar.arc: load "baz.arc" ; reads and executes http://foo.com/baz.arc and (excuse my poor lisp/scheme vocabulary here, I actually use scheme very little): open-input-port "blargh.txt" ; opens http://foo.com/blargh.txt for input It would also be nice to be able to specify files to be loaded over http in the arc source: load "http://libvendor.com/graphics/charts.arc" In this example files referred to at the top level of charts.arc will also be loaded from http://libvendor.com/graphics. Functions defined in charts.arc should not necessarily load from http://libvendor.com/graphics, but rather from the cwd at the point they were executed from. In short: load should set the cwd within the scope of the file it's executing to the directory that file was loaded from. It would be nice to be able to get the directory the actual source was loaded from inside a function too, I just don't think it should be the cwd at that point. I think at this point a security modified version of arc becomes important so that you can run untrusted code with confidence. OTOH, perhaps such confidence is better acquired via chroot or some such. I was also thinking about caching frequently used http fetched files locally, but that's what a caching proxy is for. The comment about having a low level (ie. framebuffer) graphics library for easy integration into a web browser was because: a) A low level graphics interface is important anyway for various applications. b) If you can already load files over http easily and your display interface is a framebuffer, integration into a browser is relatively low hanging fruit (replace the url fetching primitive and the framebuffer primitive and you're done) and may appeal to those who (like me) distain java applets as too big and clunky. > If so there would have to be something on the client too. What? I was thinking of having some kind of library macro that would allow you to write the code to be executed on the server in the same source as the code to be executed on the client. The code would be initiated on the server and a client macro would surround the bits to transmit to the client. I need to think more about this area however as it would be nice for the client to be able to return values. The original thought was that the code: (some-server-stuff foo) (more-server-stuff bar) (client (some-fn param (server (baz param)) blah)) (continue-with-server-stuff) Would execute the stuff enclosed in (client ) on the client simply by emitting it the text of an http response and that the (server ) stuff would be evaluated into a value by the server before being sent to the client to play with. I don't think this is quite adequate however as (client ) doesn't return a value and http would be an obnoxious protocol to make it work over. Actually, an http post containing a session id for state management and some arc to execute would almost do the job for client returns. You'd definitely want some kind of security restrictions on what client returned code could do. ;; Server code for input from a client side form (update-person (client (input-form "my.form))) Likewise, it would be nice for client code to be able to send ad-hoc requests to the server: (server (get-tax-code (client zip-code))) I need to think about this some more. I've almost got it figured out. *** Peter Armstrong: J There's a quick way for Lispers to get the feel of the J language. John Howland has built a course around a set of programs parallel-coded in J and Scheme! Try -- http://www.cs.trinity.edu/About/The_Courses/cs301/ and scroll to the bottom for source listings. I should mention also Howland's language Aprol, an attempt to integrate the functionalities of J & Scheme. While the project itself may be inactive, you might be interested in his perspective on such attempts (or even on the logistics of simply calling J from Arc -- which I would find cool!). *** Henrik Ekelund: strings with linebreaks Here is a simple but very good idea taken from Python. In python you can express string literals in more than one way. Short strings can be enclosed by either single quotes OR double quotes. Strings can span more than one row if they are enclosed by three quotes (single or double). It gives very readable code: sqlstring=""" SELECT * FROM TABLE WHERE field='VALUE' AND otherfield='Something' """ xmlString = "" cplusplusString = '#DEFINE SOMECONSTANT "Enclosed in double quotes" ' *** Sudhir Shenoy: ideas from Perl 1) Please don't use parentheses for s-expressions ('[' and ']' or even '{' and '}' would be preferable. The biggest complaint I have about parentheses is that it makes infix mathematics (which you are planning to introduce in Arc) really hard to read. e.g. (def interest (x y) (expt (-r * (t2 - t1) / (days-in-year)))) reads really badly when compared to [def interest (x y) [expt (-r * (t2 - t1) / [days-in-year])]]. The overloading of meanings of parentheses (grouping + s-expression) which isn't a problem in Lisp may cause a loss of readability in Arc. 2) An idea you may want to look at (from Perl) is how the AUTOLOAD function works. When a function is not found in a module, the AUTOLOAD function is called with the name of the function that was attempted to be executed. You can write code that will (depending on the context) return the name of the correct function. We use this routinely to provide accessors (set/get methods) on Perl class (hash) objects. Another useful application is when you have a container class/struct that holds specialised objects and you don't want to duplicate the interface of the contained objects in the container. Writing a suitable AUTOLOAD function would simply redirect a call to the non-existent method in the container to a real function in the appropriate child object ... 3) Again, from Perl, interpolation of objects inside strings is incredibly useful. Perl makes a distinction between single quoted strings (no interpolation) and double quoted strings (which have interpolation). Although the parser and compiler have to look into strings to see whether interpolation is to be performed, the end result is code that is easy to read and understand. *** Ben Yogman: calling Calling pattern: - There should be requirable keyword arguments. Keywords just specify roles, right? So then not having keyword arguments implicitly assumes that roles are naturally evident by function name and argument position. This goes awry in practice even with just two arguments, and the more arguments you have the worse the combinatorics. Requiring arguments in a certain order is generally an icky thing people do to make function calls cheaper, but the info to swizzle argument lists and the lists themselves will in practice be in place at compile time, as generally you funcall with one or two argument functions that don't need keyword help to disambiguate. By specifying roles, the function calls become not only easier to write, but also better documenting once written. - Optional boolean keyword arguments should go directly into function calls and result in a binding of true in the function body. Again, this makes the calls a quicker and more natural read... just say :verbose, :pretty, :coerce, whatever or don't say it. Many other examples, the only issue is what special syntax for this in the function definition. - The simple OO system's defined methods ought to have a (obj.method ), rather than having the object as the first argument. Like with the previous point, this is to enhance readability... putting the subject first in the expression followed by the verb mirrors a large class of human languages. Function meaning: - There should be operator overloading It's damn useful. I'm not saying string concatenation should be + necessarily (though maybe... having a small minimal working vocabulary really helps the ramping time), but let's say I'm doing something commutative... adding two vectors, or matricies, or polynomials.... you get the idea. - There should be MI, and it should look a lot like trying to use possibly conflicting packages together You have the same problems and you have to solve it once, so why not use it twice? I myself have not used MI much, but I have a little, and it really is the right tool sometimes. Library stuff: - There should be a reader macro predefined for converting infix, reverse polish, whatever... math, logical, and assignment ops to prefix complete with a transparent operator precedence table. Why scare away people when you don't have to? Why make people retranslate formulas? Also, maybe people are used to HP calculators... dunno. - Supply even the really primitve primitives, like bit shifting Even if you don't have an implementation now that takes direct advantage, you could at least infer a more specific type than with /2 /4 /4*N, and you leave the door open for blazing speeds. - Go through CLR and design patterns and have something ready-made to describe what people know from canon Read, find corresponding algorithm and or design pattern... fill in gaps. Macros can kick some serious but here, especially with the design patterns. I'd be very happy to contribute. Random: - There ought to be a reasonable attempt to deduce types, and part of the information that comes back from the REP should be best guess types. That info is directly usefull, though at some point, possibly early, it could go a little stale through layers of indirection. Though in principle algorithm first, continuing to declare types selectively could continue to help, and it certainly won't hurt speed. Also, though the profiler is the ultimate arbiter, you often know at least some of the hot spots in your code before coding them, so you can spot possible bottlenecks. *** Ravi Mohan: refactoring, vm 1)built in refactoring ,introspection,programming IDE suport. I don't know how refactoring would work in functional languages but the refactoring browser works great in samll talk. I guess what i am trying to say is atht the language should be highly introspective (which LISP is ) AND the language release should come with few tools (like the REF browser) . These don't need to be visual candy, the interface can be simple but i suggest that the tools would improve from version to version an dhave a positive feedback effect on the development of libraries etc as they improve.This is one of the great features of smalltalk ,one has a lot of browsers AND one can define ones own. of course the problem with smalltalk is that it is too toghtly bound to its ide. with ARC this should be an optional add on so that one could use arc without the rB, IDE etc so one could for example use emacs . I would once again emphaise the need for the refactorin browserm bcause this is one tool that would "build" in the language idioms and best practices and with the RB itself being extensible... 2)A "standard" VM with excellent documentation. Ideally teh VM should be written i ARC with , perhaps the lowest layer in C for performance. This idea comes from Squeak as well, where the "one language from top to bottom" approach makes it easy to improve the VM etc without switcjing to C.also there needs to be a way of adding new C primitives to the VM (again with mnested name scopes - see my 1 line suggesy\tion in my last letter so one could have a scope like vm.primitives.posix or vm.primitives.networking 3)XML throughout for documentation dataflows etc( i guess this could be oprional ) but this is just a suggestion 4)start a mailing group (perhaps on yahoo) to collect these suggestions ? this would enable folks to respond to other people's suggestions and quickl build a community of intereseted folks. *** Steven H. Rogers: Overloading + I'm glad that you decided not to use + to catenate sequences. A better use of overloading + would be to apply + individual elements a la APL, e.g. (+ (2 3) (4 5)) evaluates to (6 8). If you're implementing strings as lists of characters, this could make for efficient string manipulation that goes beyond regular expressions.