Web Content Management FAQ

Contributed by Mike Meyer

Some pointers on using Perforce for Web Content Management, phrased in the form of a FAQ.

Some answers

What is W.C.M.?

Web Content Management - W.C.M. - is the maintainence of the content of a web site. Like an S.C.M. task, it involves keeping track of a collection of files being revised by different people for different reasons, and being able to recreate a specific set of those files from any moment in time.

How is W.C.M. different from S.C.M?

Instead of being a collection of files that go through a build process to create a product that can be tested, a web site is a collection of pages which users access over the web. Some of those pages are just bits of text. Others are programs - possibly compiled - that dynamically generate new text on every access. Any possible variation between those extremes is possible, from pages that are nearly static with a few bits of dynamically generated text, to programming languages that have the same structure as the generated HTML page.

A major difference is in the build phase. The result of a build is critical to an S.C.M., as that is the product. For a W.C.M., there may not be a build phase at all.

A second major difference is that there is only one web site - you can't have a copy for each user, and one for testing. If the web site includes dynamic data from a database, then proper testing may not be feasible, as it would require running the test against the production database server. The impact of this on the production server is undesirable, so the entire environment is typically duplicated on a test server. Finally, it may not be possible to provide complete test environments on each developers desktop, so the development environment may need to be shared. This leads to quite predictable resource conflicts.

How can I use Perforce to help with my W.C.M. problems?

By using it to help manage the files, just as you would if they were part of any other product. See the white paper on Web Content Management with Perforce for a detailed description of several plans for doing this.

How can I ensure that no one edits files in the published branch?

By adding a trigger to the published branch which verifies that the changes which affect files in that branch are branches, integrations or deletions. The checkfor.py script can do this, with a trigger line similar to:

integration //depot/web/published/... "checkfor.py ' - (integrate|branch|delete) change ' %changelist% %serverport%"

How can I check my HTML documents?

Strangely enough, Perforce can do this for you. Doing so requires that you have one branch for unchecked files, and a branch for checked files - which is one of the two methods recommended by the white paper on Web Content Management with Perforce. Since files in the published branch should be exact copies of the files in the development branch (see the question on preventing edits), a trigger on the published branch can get the file from the development branch to check. Since the error messages from the checker will be sent to the change submitter if the check fails, the developer even knows what needs to be fixed.

After such an integration fails, the file in the published branch is locked. Fixing the problem will require reverting the file on the published branch before reintegrating the fixed file from the production branch.

How can I update my search engine's database?

The answer depends on how you are getting files from Perforce to the web server, and how your search engine builds the database.

To take proper advantage of Perforce in this case, you need a search engine that can update its database when a single file changes. Once you have that, you add hooks to the W.C.M. system to get a list of files that have changed so the database can be updated.

If your production web server is a client which you synchronize to the Perforce depot to update, then the list of new files is immediately available as the output of the p4 sync command. It will need to be massaged into a form acceptable to the search engine, which should be a simple text substitution.

If you are using WebKeeper, then the list of files isn't quite so readily available. In this case, you set up a review daemon needs to be set up to get lists of changes that involve the web files in the depot, and extract the list of depot files from those changes. From this point, it's again a simple matter of text substitution.