yapc Proceedings (June 1999)

SAFARI: CONCEPTION, BIRTH, AND FUTURE DEVELOPMENT OF AN OPEN SOURCE CODE BROWSING SYSTEM

R. B. SLAYMAKER, Jr.

     Safari is a web based application that provides access to hierarchically organized files (programs, images, documents, or other media) stored in file systems, databases or archiving systems such as revision control and software configuration management systems. Multiple versions of any file can be browsed using meta information such as revision number, revision labels, project membership, change set numbers, or other methods natural to the structure of the storage system. Relative links between documents link to the correct revisions of the target documents as determined by the meta information.
    Safari also provides a standard (GNU Make based) framework for integrating analysis and processing tools and making their output easily available within the context of the hierarchy the source files are stored (and thus presented) in.
     Safari was conceived as a tool for publishing automatically extracted documentation for an in-house system, using a 3rd party documentation extraction tool (Cocoon). The original (0.0) version is a monolithic perl script that grew from the original design goals to include source code browsing, syntax highlighting, navigation of back revisions, a second documentation extractor (pod2html) and activation of #include files to be links to the appropriate file.
     Several key design decisions in the 0.0 version were validated in practice, others invalidated. When the burden of maintaining and extending the monolithic script grew too large, Safari was rewritten and is now headed for a 1.0 version. The 0.5 version is being released under an open source license in conjunction with this paper, and a demonstration web site is being made available for the conference.
     This paper outlines the past and near future life cycle of Safari: the design decisions, the current state, and future directions are all discussed.

1. Introduction

Source code browsing systems allow a developers and other interested parties to explore and research bodies of source code. To varying extents, they provide analytical information about the code base, such as indexes of identifiers, meta information such as change logs, labels, file dates and sizes. A few are complete web based SCM (Software Configuration Management) or RCS (Revision Control System) interfaces.

There are many existing systems that provide these features. The Linux Cross Reference project [1] provides browsing and indexing of identifiers for source code and is used by several high profile open source projects and by in-house developers working on proprietary code.

The Perforce Webkeeper [2] provides for simple retrieval of files from a perforce source code repository.

The p4db Depot Browser [3] provides another front end for browsing files held in a perforce browsing system.

Mortice Kern Systems, Inc. provides a web based interface for the Source Integrity Pro [4] software configuration management system. In addition to these, there are probably hundreds of similar open source, closed source, and hand-rolled systems in existence.

[[If you know of any such systems that are distributed free (in either sense), please let me know. Safari's an equal opportunity borrower. Several of the p4db scripts have already been borrowed (with much gratitude).]]

Safari became a source code browser as a means to a different end: we needed to extract source code documentation (using an existing open source tool) and publish it on an intranet. Source code browsing was originally intended to provided a natural navigational interface for this system, but rapidly became the most used feature. As with many open source projects, it grew out of a need to scratch a fairly small itch and is growing to be a very general tool.

The current rewrite of Safari is intended to allow incorporation of many different tools and to be able to interface to a wide variety of file storage and archival systems.

2. Conception

Safari was originally written starting in 1997 to automatically extract and publish documentation from comments in C++ source code kept in a closed source SCM system. The extraction tool in use is Cocoon [5]. The SCM is MKS' Source Integrity [6]. The fact that both of these were pre-existing sytems meant that Safari was initially conceived as cgi-bin to glue together chains of external tools (SI -> Cocoon -> HTTP output).

Over time, feeping creaturism added:

File-, project-, label- and revision-based browsing of source code,
Automatic mark-up of filenames in #include statements to be links to the referred-to files,
Syntax highlighting and colorizing of C++ and (in a limited fashion) perl source code,
On-the-fly documentation extraction using Cocoon and pod2html (a utility distributed with perl),
Change log description browsing on a per-file basis, and
Simple searching for text or regular expressions within a file.

As Safari was extended, it grew beyond it's original mission as a documentation extraction tool and became a tool for researching, reviewing, and discussing code. It turns out that reading code is often easier in a browser than when using typical development environments of SCMs, file system browsers, documentation extractors, HTML browsers, editors, and the like.

The reasons a system such as Safari is more usable than traditional development environments are:

There's no need to check out the files. Safari extracts files on-the-fly.
Source code can be marked up to include links to other code. Things that can be activated in this manner include:
1. References to other files / modules.
2. Identifiers can link back to the definitions, or forward to a cross reference of their uses.
3. URLs in comments or code
4. Links can be added to point to external documentation of design and implementation.
Documentation extractors and code analysis tools can be made almost effortless. For instance, Cocoon is complicated enough that only one or two developers at this site ever mastered it (not me). No other developer managed to get it up and running and use it for any length of time.
Links to line numbers (a feature borrowed from the Linux Cross Reference tools [1]) can be emailed to others. If link longevity is required, a search query that leads to a line of code can be emailed or stored, so edits that change line numbers don't cause link rot nearly as easily.

All of this facilitates code reviews, documentation and research: it promotes those things that make open source so succesful today: communications and information.

With Safari, checking a file in is publishing it. Checking in is also publishing any documentation to be extracted from it. This turned out to be a very powerful mechanism. We found ourselves storing design documentation in the project. This allowed the design documentation to be automatically associated with the files that it referred to revision by revision, release by release.

In short, we found ourselves using Safari daily as an core technology for software development. It's not the most used or most important system, but it provides significant value to the developers. We extended it several times to incorporate new features, bloating the original script beyond the point of the maintainer's sanity (which may explain a few things about this paper). Eventually it became clear that there was an ecological niche in the broader internet community that a generalized, open source, modular Safari-like system could fill, even given the number of other similar systems available.

3. Gestation

In the spring of 1998, I began to rewrite Safari in a more modular fashion while attempting to preserve the aspects that made Safari useful. Another goal was to allow for for easy integration of existing tools without requiring extensive programming knowledge. Perl's very good for controlling external programs, but there are many people with little or no Perl expertise out there.

Here's the general structure of a Safari instance: Safari Dataflow Diagram

A fully functional Safari system is built with a web server, a cgi-bin (or preferably mod_perl) script cgimake, a make program, preferably GNU Make or Make.pm, a script to convert source materials into HTTP documents, and any external programs needed to fetch files and meta data from the data store and process it in to web-ready form. These tools are the key element of Safari: they are your existing tools, not special purpose Safari tools.

Several key factors that made the original single-script version of safari useful:

Coherent, simple user interface,
Coherent, simple URL 'API' design, and
Emphasis on minimal administration and real-time updating.

Adding the goals of modularity and extensibility have rounded out Safari and made it a general purpose tool, with possible applications beyond source code browsing.

3.1 User Interface Metaphors

The key user interface decision that made Safari usable was basing the navigation on the hierarchical file structure present in the source code archives. This structure is already known by existing developers, and basing Safari's navigation system on it also provides a tool for learning and exploration by those who need to become familiar with it. Documentation and other analytical tools should be reachable by browsing to a file or directory, then following a link to the desired output.

The disadvantage of this approach is that popular web browsers make mediocre hierarchy browsers. Javascript, Java, and custom browsers all provide possible avenues of approach for more friendly user interfaces. Safari takes a lowest common denominator approach with low graphics intensity to make it more generally usable. The emphasis is on features, not flash (for now). That being said, a very nice tree oriented GUI would be a fantastic addition.

Alternate browsing structures can easily be provided by adding reports or pages to Safari. These pages can be built automatically by an indexing tool (pod2html does this, for example), or manually. An example of the manual technique is a web page that describes each main active project and provides links to 'interesting' places in it. Interesting places might be the project root, design documents (standalone and automatically extracted), output of analysis tools, key routines, structure definitions, or files within the project, and external resources such as mailing lists, news groups, or other web sites.

A search engine or permuted index generator can provide master 'random access' to the file tree. This is one of the most important features that Safari lacks support for at this time, and is the feature that the Linux Cross Reference Project was built for.

3.2 URL Design

The URL serves as the basic API that ties the user, the browser, and the Safari scripts together. Users often type them in manually instead of browsing to a location, so the URL should be easy to construct manually. In effect, it's a low-level alternate user interface. Browsers base relative link calculations on URLs, so placing contextual information to the left of the destination document's path (as opposed to putting it in a '?' query specification) allows relative links between documents to lead to sensible destinations. And simple, consistent URL design makes scripting the back end much easier.

The URL design must support the hierarchical structure of the underlying archival system, must incorporate a revision identifier and also specification of which analysis or extraction tool's output is being browsed. The typical Safari URL looks like:

http://a.b.com/checkers/_head/pretty/code/inc/checkers.h
|                      |     |      |                  |
+----------------------+-----+------+------------------+
| Project identity     | Rev |Filter| File spec        |

where:

Project Identity: This is the root URL which determines what project Safari is concerned with. In this case, the web server is configured to map '/checkers/' to the Safari system.
Rev: The revision specifier comes next. This is usually a label or change number, but can be a raw revision number in some cases. The reason that it is not usually a raw revision number is that it's rare that revision 123 of one file corresponds to revision 123 of another file. Using change set numbers or labels that mark a release to specify revisions means that all filenames specified to the right of the revision field refer to a coherent, consistent set of files.
Filter: The filter specified the lens through which the file is to be viewed. This may be a pretty-printer, a word/line counter, a documentation extractor, a lint-like tool, etc.
File Spec: This indicates which file in the namespace of the project, revision, and filter is being accessed.

Safari uses the query string portion of the URL (the part after a '?') to provide transient information. This transient information is information that should not affect the revision or current filter settings, and thus should not affect the document's position in the overall heirarchy.

Here are some examples of Safari URLs. Sample pages generated by these are included in the appendix.

http://localhost/safaridev/perforce/_head/Default/
   Leads to a list of files in the top level of Perforce Inc.'s public
   source code archive.

http://localhost/safaridev/perforce/_head/Default/public/index.html
   Displays the head revision of the index.html extracted from that archive.

http://localhost/safaridev/perforce/_head/pretty/public/index.html
   Displays the same page as syntax highlighted source code

http://localhost/safaridev/perforce/_head/pretty/public/index.html?filter=wc
   Displays the output of the 'wc' command when run on index.html

http://localhost/safaridev/perforce/_head/pretty/public/index.html?rev=_head&filter=filelog
   Displays the complete history of the file index.html.

http://www.slaysys.com/safaridev/perforce/@6/Default/public/index.html
   Diplays the versins of index.html associated with change set number 6.

In essence, each combination of project and revision label or change number specifies a consistent set of files that correspond to each other. The filter determines how the source file should be processed before viewing, and the file spec leads to the file itself.

Some filters generate different namespaces than the underlying archive structure. This makes linking into and out of that filter a little tricky, but relative links between files within a filter's namespace work fine.

It's important to note that links between documents (both hard coded and those automatically marked up by Safari) should be relative links, for two reasons:

Relative links will work in multiple filters and revisions
A user should be able to take a copy of the files and browse them using a file: URL or other scheme.

This is not always possible given the fact that third party tools are not always prepared to generate relative links. Workarounds do exist for some cases.

3.3 Administration and upkeep

A key element in the adoption of Safari was minimal administration. Safari was born out of a need to publish sets of extracted source code documentation on the web, combined with the extreme distaste several of us had for manually generating and publishing docsets.

Safari makes extensive use of file time stamps and meta information from the underlying storage system to determine when to check out a new file or (re)generate output derived from the source files. This can be done on a timed basis (to avoid having to evaluate things every HTTP request), or it can be done on every HTTP request.

In the original script, I found myself using and debugging a lot of dependency rules. This inspired the use of GNU Make as the tool to tie the cgi-bin (or mod_perl) script to the underlying tools.

GNU Make provides several important features for Safari:

a standard, well known, documented language for declaring the processes needed to generate target files,
a program that evaluates this definition file and uses filesystem meta information to decide what to remake,
the ability to incorporate existing tools in to Safari without massive hacking efforts. gcc -c -Wall (ie lint) and wc are implemented in the demonstration server as examples, but others are available.

GNU Make also poses a few challenges:

There's no way to force a complete regeneration of output from the source file in the event that a bug or transient condition causes invalid output. This is especially important for Safari implementors, but it's also a handy fly swatter for end users occasionally.
There's not an easy way to tell GNU Make to keep intermediate files around while not ignoring errors (ie .SECONDARY's feature set is not full enough for Safari). To work around this, Safari uses .PRECIOUS and the scripts and Makefile shell commands that implement the underside of Safari delete their output files in the event of an error.
There's no natural way to use Make to check to see if a new revision exists in a modern client-server RCS / SCM.
GNU Make's functional programming API is very weak, and it can be tricky to implement content-based decision making in a Makefile.
Make doesn't give a flock, so access sharing must be implemented at a more global level that would be necessary if make could do it.

Improvements that address the first two of these are now on the TODO list for the GNU Make developers. Workarounds exist for all of these, but improvements should be made that remove them.

3.4 Miscellaneous Design Issues

A few other key notes about the design and implementation:

Safari is completely implemented in perl and GNU Make (although the binaries necessary to connect to various RCSs are used).
A perl module implementing GNU Make with some powerful perlish extensions has been developed and will be adapted to Safari use as needed. This will run more slowly than GNU Make, but may make up for that by reducing the forking required to do a build.
An enhanced version of the standard Unix file command has been developed as a module File::Type. It reports mime types and is more accurate at guessing source code languages.

4. Birth

Safari's first open source release (0.50) has been developed to coincide with yapc (Yet Another Perl Conference) in June of 1999. The 0.50 release has a full basic feature set, with a lot of room to grow. A demonstration site URL should be available shortly before and for a while after the conference.

Safari's source is all browse-able on the web in the perforce public depot and will be distributed via CPAN in tarball form.

Mailing lists exist for Safari announcements and developer discussions. More details are available on the Safari web site. Please join and contribute.

5. Growth Plan

Safari has a lot of room to grow. It is designed to be an open-ended project. Areas of significant development are:

Indexing. Since many source code archives contain a milieu of programming and non-programming languages, indexing is quite a challenge. It should be a very interesting project.
Given good indexing, automatic activation of important words in displayed documents provides an extremely useful mechanism that makes Safari much more useful than most editors as a code browser.
A library of back ends to different RCSs and SCMs needs to be built. Perforce and the local filesystem are the only back ends currently implemented. Back ends to PVCS, CVS and MKS SI are in development by myself and others.
A library of contributed scripts and Makefiles will be collected and published to promote cross-pollination.
Partial parsing of source languages will greatly facilitate better indexing and syntax highlighting. This parsing can provide the context for each indexed word or phrase, like whether it's a variable, macro, function, typename, or comment. This can allow for finer grained searches and for better syntax highlighting and automatic link generation when marking up source files.
Alternate user interfaces, like a tree view control. Safari's intentionally lowest-common-denominator HTML GUI needs to remain, but that should not limit it's growth.
It shouldn't be very far from a browser to a simple check-in/checkout locking and merging interface, especially given the work on WebDAV.
Allowing a mechanism for sticky-notes to be created and attached to source code would make Safari an incredibly powerful code review tool.

Please join and contribute!!

Appendix A. Sample Pages

Several example pages are given in the order you would browse them in.

Location: http://localhost/safaridev/perforce/_head/Default/
The list of depots available at public.perforce.com:1666

perforce/ (Default filter)

PROJECT
     top
     up
     changes
     labels
     rebuild

	Depot	Description
1	guest/	Depot for guest users.
2	public/	Perforce's open source depot.

This page generated by Safari at Wed Jun 16 15:08:39 1999

Location: http://localhost/safaridev/perforce/_head/Default/public/
The list of depots files available in the public depot ( //public/* ) at public.perforce.com:1666, and their revision levels and last change number, as of the head revision.

perforce/public/ (Default filter)

PROJECT
     top
     up
     changes
     labels
     rebuild

	File	Rev	Description	Change	Type
1	index.html	#17	edit	114	ktext
2	jam/	#2	integrate	76	xtext
3	perforce/	#2	edit	98	ktext
4	tutorial.html	#9	edit	147	ktext

This page generated by Safari at Wed Jun 16 15:42:46 1999

Location: http://localhost/safaridev/perforce/_head/Default/public/index.html
The head revision of //public/index.html from public.perforce.com:1666

perforce/public/index.html (HTML filter)

PROJECT
     top
     up
     changes
     labels
     rebuild

FILTERS
     Default
     POD
     pretty
     plain
     HTML

TOOLS
     gcclint
     wc
     filelog

Welcome to the Perforce Public Depot

About Perforce and the Public Depot
The Depot Road Map
How to Browse the Depot
How to Contribute to the Depot

About Perforce and the Public Depot
Back to
Table of
Contents

[...lots of good information snipped for brevity...]

Copyright ©
1998, 1999
Perforce Software
You're browsing a file stored as
$Id: //public/index.html#17 $
in the Perforce Public Depot. Back to
Table of
Contents

This page generated by Safari at Fri Jun 18 02:24:59 1999

Location: http://localhost/safaridev/perforce/_head/pretty/public/index.html
The syntax highlighted source code for the head revision of //public/index.html from public.perforce.com:1666

perforce/public/index.html (pretty filter)

PROJECT
     top
     up
     changes
     labels
     rebuild

FILTERS
     Default
     POD
     pretty
     plain
     HTML

TOOLS
     gcclint
     wc
     filelog

  1  <HTML>
  2  
  3  <HEAD>
  4  
  5  <TITLE>
  6  Perforce Public Depot
  7  </TITLE>
  8  
  9  </HEAD>
 10  
 11  <BODY BGCOLOR="#FFFFFF">
 12  <CENTER>
 13  <P>
 14  <A NAME="toc"></A>
 15  <A HREF="http://www.perforce.com">
 16  <IMG SRC="http://www.perforce.com/images/logo.gif" alt="Perforce" border=0></A>
 17  <H1>
 18  Welcome to the Perforce Public Depot
 19  </H1>
 20  <P>
 [...more lines, snipped for brevity...]

This page generated by Safari at Fri Jun 18 02:25:38 1999

Location: http://localhost/safaridev/perforce/_head/pretty/public/index.html?filter=wc
Output of the '`wc`' command applied to the file //public/index.html

perforce/public/index.html (wc filter)
PROJECT top up changes labels rebuild FILTERS Default POD pretty plain HTML TOOLS gcclint wc filelog	453 lines, 1257 words, 12271 bytes in _head/public/index.html
This page generated by Safari at Fri Jun 18 02:27:13 1999

Location: http://localhost/safaridev/perforce/_head/pretty/public/index.html?rev=_head&filter=filelog
The complete revision history of the file //public/index.html

perforce/public/index.html (filelog filter)

PROJECT
     top
     up
     changes
     labels
     rebuild

FILTERS
     Default
     POD
     pretty
     plain
     HTML

TOOLS
     gcclint
     wc
     filelog

Rev	Act.	Date	User	Change	Desc
17	edit	1999/03/22	laura_wingerd	114	`Fix typos in links.`
16	edit	1999/03/15	laura_wingerd	94	`Re-org "triggers" directory --`
15	edit	1999/01/05	laura_wingerd	52	`Minor web page format changes.`
14	edit	1999/01/05	laura_wingerd	51	`Update copyright year.`
13	edit	1999/01/05	laura_wingerd	50	`Minor PD doc changes.`
12	edit	1998/12/24	perforce	46	`Add WebKeeper source.`
11	edit	1998/12/03	laura_wingerd	42	`Fix links in index pages, add o`
10	edit	1998/11/05	perforce	28	`Reword intro, fix typos, fix na`
9	edit	1998/10/26	perforce	26	`Fleshed out "how to contribute"`
8	edit	1998/10/22	perforce	22	`Added "About the depot" section`
7	edit	1998/10/09	perforce	19	`Fix browser links, add lost tra`
6	edit	1998/10/09	perforce	15	`Added browser links to index.`
5	edit	1998/10/05	perforce	10	`Added "How to Browse".`
4	edit	1998/10/02	perforce	9	`Add road map & nicer formatting`
3	edit	1998/10/02	perforce	8	`Change to ktext.`
2	edit	1998/10/02	perforce	7	`Test links in index page.`
1	add	1998/10/02	perforce	6	`Open source depot index.`

This page generated by Safari at Fri Jun 18 02:27:54 1999

Location: http://www.slaysys.com/safaridev/perforce/@6/Default/public/index.html
The file index.html as of change number 6.

perforce/public/index.html (HTML filter)

PROJECT
     top
     up
     changes
     labels
     rebuild

FILTERS
     Default
     POD
     pretty
     plain
     HTML

TOOLS
     gcclint
     wc
     filelog

Testing...

This is a test.

This should be the Perforce home page.

This page generated by Safari at Fri Jun 18 04:59:03 1999

References

[1] The Linux Cross Reference System ( http://lxr.linux.no/ ).

[2] The Perforce WebKeeper ( http://www.perforce.com/perforce/webkeeper.html ).

[3] The p4db Perforce Depot Browser ( http://public.perforce.com/cgi-bin/p4db/dtb.cgi?FSPC=public/perforce/utils/p4db ).

[4] Mortice Kern Systems' Source Integrity Pro Software Configuration Management System ( http://www.mks.com/solution/si/pro/ ).

[5] The Cocoon Utilities, Version 3.2, Jeffrey Kotula ( http://www.stratasys.com/software/cocoon/ ).

[6] Mortice Kern Systems' Source Integrity ( http://www.mks.com/solution/si/ ).