Web Content Management FAQ

Contributed by Mike Meyer

Some pointers on using Perforce for Web Content Management, phrased in the form of a FAQ.

Some answers

What is W.C.M.?

Web Content Management is the task of maintaining the content of a web site. Like an S.C.M. task, it involves keeping track of a collection of files that are being revised by different people for different reasons, and being able to recreate a specific set of those files from any moment in time.

How is W.C.M. different from S.C.M?

Instead of being a collection of files that go through a build process to create a product that can be tested, a web site is a collection of pages that users access over the web. Some of those pages are just bits of text. Others are programs - possibly compiled - that dynamically generate new text on every access. Any possible variation between those extremes is possible, from pages that are nearly static with a few bits of dynamically generated text, to programming languages that have the same structure as the generate HTML page.

The critical difference is that there is only one web site - you can't have a copy for each user, and one for testing. This makes testing problematical, because you can't simulate being thousands, or even hundreds, of users all over the world using the same system. This reduces testing to making sure that the web site returns the correct data before making it available to users. This is normally done by running a test server that new versions of the system are run on before they are put into production.

How can I use Perforce to help with my W.C.M. problems?

By using it to help manage the files, just as you would if they were part of any other product. See the white paper on Web Content Management with Perforce for detailed description of several plans for doing this.

How can I ensure that no one edits files in the published branch?

By adding a trigger to the published branch that verifies the changes that effect files in that branch are branches, integrations or deletes. The checkfor.py script can do this, with a trigger line similar to:

integration //depot/web/published/... "checkfor.py ' - (integrate|branch|delete) change ' %changelist% %serverport%"

How can I check my HTML documents?

Strangely enough, you can get Perforce to do this for you. Doing so requires that you have one branch for unchecked files, and a branch for checked files - which is one of the two methods recommended by the white paper on Web Content Management with Perforce. Since files in the published branch should be exact copies of the files in the development branch (see the question on preventing adds, a trigger on the published branch can get the file from the development branch to check. Since the error messages from the checker will be sent to the user if the check fails, the developer even knows what needs to be fixed.

After such an integration fails, the file in the published branch will be locked. Fixing the problem will require reverting the file on the published branch before reintegrating the fixed file from the production branch.

How can I update my search engine's database?

The answer depends on how you are getting files from Perforce to the web server, and how your search engine builds the database.

To take proper advantage of Perforce in this case, you need a search engine that can udpate it's database when a single file changes. Once you have that, the problem then becomes one of adding hooks to the W.C.M. system to get a list of files that have changed so the database can be udpated.

If your production web server is a client that you synchronize to the Perforce depot to update, then the list of new files is immediately available as the output of the p4 sync command. It will need to be massaged into a form acceptable to the search engine, but that should be a simple text substitution.

If you are using WebKeeper, then the list of files isn't quite so readily available. In this case, a review daemon needs to be set up to get lists of changes that involve the web files in the depot, and extract the list of depot files from those changes. From this point, it's again a simple matter of text substitution.