Elmo - Project Discussion

Elmo
"An application for creating interactive and evolvable Web sites"

Project Discussion
2 May, 1996

Department of Computer Science
University of Colorado, Boulder

Berkebile     Masson     O'Hara     Stoller

Sponsored by:

Gerry Stahl

Center for LifeLong Learning and Design
Department of Computer Science
and Institute of Cognitive Science
University of Colorado, Boulder
Boulder, Colorado 80301 USA

Introduction

The Elmo project has been successful in many regards. It improves on the familiar static, presentation oriented web model by creating web pages dynamically based on information authored by users of the system. It provides a means to maintain personalized views into the underlying information. It allows users to build and share links to pages generated by the system and to use these links to glue together related information. It provides a mechanism for users to share their knowledge through an annotation system. All of these features would improve or augment current practices in use by LAN managers on the Boulder campus of the University of Colorado. However, because the focus of the project was entirely on a conceptual level, the usability of the final product suffered. Despite the fact that the concepts of interest were illustrated in the project, their usefulness will never have an opportunity to be tested in practice since the system is much more difficult to work with than the tools that are currently in use.

This document will therefore discuss a number of ways in which future work could improve on the Elmo system to produce a product which would interest LAN managers in practice (rather than in theory; which is only useful if it can be tested in practice). There are two broad areas of concern that limit the usefulness of the Elmo system. We will consider both the conceptual limitations of the application model used in the project and the improvements which could be made to the implementation of the system.

Conceptual Limitations

One of the goals of the project was to produce an interactive system that uses standard HTML as a presentation layer. There are a number of advantages to this approach. Information presented in standard HTML can be viewed on any platform and browsers that understand standard HTML are nearly ubiquitous in today's computing environments. So, a system deployed in standard HTML is immediately available to most desktops in an organization. However, there are severe limitations to using standard HTML as an application interface to an interactive system.

Primary among these limitations is that the form/submit model of interaction does not provide any feedback until the contents of an entire form have been submitted and processed. LAN managers are accustomed to rapid interaction with information. To understand how an information space should interact with a LAN manager, consider the following scenario. Larry, a local LAN manager sits down to use a friendly, civilized descendant of Elmo. He first wants to add a new host to the system which will be a server for a lab he is installing. He enters the host table and brings up the add host form. He first enters "foobar" for the name of server in his new lab. A warning dialog appears telling Larry that the host name he has chosen is already in use in the organization's network. It offers to open the host search form to allow him to search for a name that isn't in use. Larry elects to enter the host search engine. He starts to type a host name in the search form. As he types each letter, the application displays the hosts that match the substring he has entered so far. After Larry types the first few letters of the name he had wanted to use, he can see that, in fact it is already taken. He clicks on "foobar" to see who owns it. After looking at the details of the host record for "foobar", he returns to the host search screen and trys the name "blargh" instead. In the interactive find window, the system says "No matches were found, would you like to use this name for your new host?" Larry answers "yes" by clicking on the appropriate button and is returned to the host add form which has the host name field filled in with the name from the search form. He now enters an IP address. The system immediately brings up another warning message stating that the address is not a valid IP address. It offers to display the glossary entry for "IP address." Larry declines to read the glossary entry and fixes his typographical error. However, the system now warns Larry that "the address you entered is already in use." It asks if he would like to have the system choose an IP address for him or if he would like to view all the unused addresses on the same subnet. After choosing to view the unused address, Larry simply clicks on the address he wants to use which is then added to his form. He fills out the rest of the form without further incident and gets no error messages after submitting the entry to the database.

In a form/submit model of navigation the same process that Larry went through would take a good deal more time. Feedback on information being entered into the form would be delayed until after the entire form is submitted. So, after filling out and submitting a form, a user will often be faced with a list of errors that must then be resolved. In Larry's case, he was able to handle errors in the context in which they occurred, but in the form/submit model presented by standard HTML, a user must leave the form context in order to gather the information necessary to fix errors. Using a combination of Java and JavaScript, it should be possible to retain the web based model while making the experience more interactive and the feedback more immediate.

The second principle limitation of the application model used in the project is that the system has no ability to interact with the real world. There is no mechanism which would allow Elmo to harvest information about hosts from the LAN itself. It was beyond the scope of the project to incorporate such functionality, but because of the tools used to develop the system (primarily Tango), it is also unlikely that it is possible to extend the current system to include such functionality. It would be advantageous to rebuild the system using a general purpose programming language (such as C, C++ or Perl) in such a way that interface libraries could be added to gather information about the local environment from standard tools (such as SNMP or site specific mechanisms) and/or communication channels (SMTP mail or newsgroups). In the next section, we will discuss other reasons why rebuilding the system would generate a more usable and useful system.

Implementation considerations

The choice of the development environment, the server platform and the server software had an impact not only on the level of complexity that the system could obtain but also on the usability of the final product in the LAN managers' domain. In this section, we will discuss how future work on systems like Elmo can avoid some of the pitfalls we encountered. We will first discuss the database system, followed by the server platform, and finally the development tools used in the project.

Database System

In the type of shared information space implemented in Elmo, all of the critical information in the system resides in a database. The information that LAN managers use on a daily basis must be available at all times and it must be reliable. So, the underpinnings of any system that would be useful to LAN managers must be stable and robust. The database used in the Elmo project was Everyware Inc's Butler SQL. It was chosen for the project because it has the most complete implementation of SQL available on the Macintosh. Such a feature was attractive because (in theory) it avoids arbitrary limitations to the complexity of queries on the database. A database application that requires queries to utilize a proprietary scripting mechanism generally does so at the expense of generality and power. However, as it turns out, Everyware Inc's product lacks the reliability of other databases such as Oracle 7 or Sybase's SQL Server 11. In fact, some operations on the database required that the database application be restarted. For example, simple operations such as dropping and creating tables in the database would cause the database system to crash if it was not restarted immediately after the operations. This obviously an unacceptable requirement for any server system that is to be used by more than one or two people.

A database that is adequate for a system like Elmo must also meet the needs of developers. For instance, choosing a database that supports SQL makes the query code written for the database portable. A complete implementation of SQL also gives the developer much of the power and flexibility of a general purpose programming language as well as the ability to construct queries with arbitrary complexity. However, as it turns out, Butler SQL has a crippled implementation of SQL. In fact, to our great dismay, we found that some of the language constructs which appear in the documentation are actually not implemented. For instance, views are discussed in the Butler SQL manuals, but in reality they are not valid operations in Butler. This limitation placed restrictions on the complexity of queries. As an example, when we wanted to perform queries based on a user's preferences, it was necessary to join several tables and then perform a query on the result of that join. SQL views are designed to perform just this sort of task. Without views, however, we were required to manually iterate over the join. This caused simple code to become complex so quickly that we were often compelled to limit the capabilities of the application in order to make the code manageable.

Aside from a complete implementation of SQL, there are other requirements a database system should meet in order to host a system like Elmo. For instance, SQL programs should be able to have unlimited size. If this requirement seems arbitrary, consider that Elmo's seed program is 72k and that we discovered that Butler SQL would not process programs larger than 32k. Again, this limitation of the tool caused us to have to redesign the way the applications behaved. Thus time was squandered that could have been spent adding useful capabilities to the system (were it not for the fact that the tools couldn't support any more complexity). Another feature that is supported in most databases is referential integrity which allows the database to keep track of references between tables. This provides the capability to exhibit reasonable behavior when records that are referenced from other tables are deleted. The Elmo system was designed to support the linking of information and therefore the contents of its database are frequently cross-referenced. One consequence of this when coupled with the lack of referential integrity is that there are few delete operations in the system since delete operations in Butler SQL can leave references throughout the database that point to non-existent records. Unless we crawl through the database looking for these references, they can only be discovered after they are followed by Tango, at which point they cannot be handled gracefully. So, the lack of the ability to delete hosts and glossary entries in the Elmo system is a direct result of the tools used to build the system. Again, we see that in order to build a system that is usable, the capabilities of the tools used must be a primary consideration.

As a final point regarding the database system, a database with a robust SQL debugger is needed. Lacking a real debugger is not likely to limit the functionality of the final product, but having one will certainly preserve the sanity of the person or persons who will work on the Elmo project. As an example, consider this situation which we encountered: We had written a large SQL routine which had been tested and was working correctly. After making a number of changes that did not affect the functional structure of the code, we submitted the routine to the database and error was returned. The text of the error message read "6". As it turns out, this number was neither a line number nor a documented error number. In fact, the meaning of the cryptic "6" never became clear as the error was simply a parse error in the code. This incident was not an isolated or even uncommon event. In fact, since line numbers were never provided in error messages in Butler SQL, hours were wasted tracing through code to find and correct simple errors. This of course did not prevent the creation of the code we wanted to produce, but it certainly did consume time that could have otherwise been used to improve or extend code that already existed.

Server Platform

As we discussed earlier, LAN managers will generally not tolerate tools that are slow and unresponsive. They will blithely choose a flat text file as an information repository over a database if it is faster for them to access the information using grep than it is to query the database. They will also quickly abandon any tool that proves to be unreliable. For these reasons, if the Elmo system is to be made useful, interesting and useful to LAN managers, it must utilize a server platform that is responsive, stable and scalable. In the current implementation of Elmo we have observed long delays of up to 30 seconds after submitting forms (with only one client connected to the server). Most pages take at least a few seconds to be displayed. In practice, in the LAN managers domain, this type of performance is not acceptable. Moreover, although we did not test the system with more than two clients connected, we can speculate from the dismal performance observed with one client connected that the system would not scale well. Since the Elmo system is designed to be used by communities of practice, the ability of the system to scale well to a large number of users while maintaining a useful level of responsiveness under load is essential.

Concerns about responsiveness aside, the server platform used in Elmo was prone to crashes and other instabilities. A system that is designed to be relied upon to resolve network problems must have an extremely low failure rate if it to be used in practice. As an example, a user working on the Elmo server system can halt all filesystem and network activity simply by holding down the mouse button (when dragging a file, for instance). If too many operations are pending during the time the mouse button is being held down, the sever is likely to crash. So, if one the goals in future work on Elmo is to observe the system in use by LAN managers, it would be advisable to consider a platform for the system that is designed to be used as an application server. There are many modern operating systems that incorporate fast network and filesystems, exhibit a high degree of stability and are designed to be scalable. As discussed earlier, one benefit of using the web as an interface is that the client side of the Elmo system is platform independent. This means that changing the system the server runs on would not impact the ease-of-use of the client's system. In fact, the only impact that choosing an appropriate server platform would have is that the clients connected to it would observe increases in performance and would be able to rely on the system in a way that is not possible now.

Development Environment

One decision that was made early in the project was to use a rapid development tool (Tango) to construct the system. This choice appeared to have a number of advantages. Primary among these was that the concepts that were of interest in the project could be deployed and observed quickly without the need to grapple with building a foundation for the system from scratch. To a certain extent this assumption was true. Tango allows a single web page, which accesses data in a database, to be constructed quickly. However, it proved to be entirely inadequate for a system such as Elmo which contains nearly one hundred query documents that are heavily interdependent. As the U&U group is aware, tools should ideally strike a balance between usability and usefulness. Tango is very usable in a limited context, but its usefulness suffers greatly as a result.

So, with Tango, as with Butler SQL, we experienced a situation where the tool we were using became the factor which limited the features, complexity and ultimately the usefulness of the final product. There are a number of simple programming concepts that a development system must support in order to build a system that demonstrates anything more than a trivial level of complexity. A few relevant examples of such concepts are the ability to reuse code and support for modularity and detail hiding. Tango has no facility for reusing code. This would not be an issue in systems that consist of just a few query documents, but the Elmo system has sections of code that are duplicated across seventy query documents. This makes trivial improvements or changes take several hours to implement since any change must be propagated manually over each query document.

Suppose, for example, that we recognize the need to display certain details about hosts in a number of different places in the system. It would be ideal if we could create a chunk of code which could retrieve and display, in html format, any field from a host record. For instance, we could simply create a routine called GetHostByName that would take as parameters the hostname to find and the field in its record to display. We could then ask that routine to return a snippet of html with the IP address of all hosts with the name "foo". Then, if we found some other location where we needed to display a host's location, we could just utilize the generic display functionality we had already created. Unfortunately, in Tango there is no concept of modularity. Every time you need to perform a task like looking up the IP address of the host named "foo", you need to know every detail about the process. Just as a program like Mathematica could not have been written in assembly language a system as interesting as Elmo has the potential to be can not be written in Tango.

Conclusion.

The Elmo project implemented a number of interesting ideas and we believe that the system has the potential to be a useful tool for LAN managers. Because of this potential it could also provide valuable insight into dynamic shared information spaces as they are used in active communities of practice. However, in order to improve on the current system, it will be necessary to carefully evaluate the requirements necessary to satisfy the needs of LAN managers. It will also be necessary to evaluate the tools and systems needed to construct a meaningful product. We were able to implement the features currently demonstrated in Elmo despite the tools that were used rather than because of them. Instead of being able to focus on the interesting questions raised by the project, we were constantly engaged in a struggle to get our tools to produce acceptable results. The Elmo system really represents the limit of complexity that can be achieved with the tools used in the project. This is unfortunate not only because it limited what we were able to accomplish, but also because it overshadowed the initial objectives of the project and produced a system that is unlikely to be used by LAN managers in practice.