Web Content Management FAQ

Contributed by Mike Meyer

Some pointers on using Perforce for Web Content Management, presented in the form of a FAQ.

Answers

What is W.C.M.?

Web Content Management - W.C.M. - is the maintainence of the content of a web site. Like an S.C.M. task, it involves keeping track of a collection of files being revised by different people for different reasons, and being able to recreate a specific set of those files from any moment in time.

How is W.C.M. different from S.C.M?

S.C.M. involves maintaining a collection of files that go through a build process to create a product that can be tested. W.C.M. involves maintaining the files for a web site. A web site is a collection of pages which users access over the web. Some of those pages are just bits of text. Others are programs - possibly compiled - that dynamically generate new text on every access. Any possible variation between those extremes is possible, from pages that are nearly static with a few bits of dynamically generated text, to programming languages that have the same structure as the generated HTML page.

One major difference between W.C.M. and S.C.M. is in the build phase. The result of a build is critical to an S.C.M., as that is the product. For a W.C.M., there may not be a build phase at all.

A second major difference between W.C.M. and S.C.M. is that there is only one web site, and everyone uses it. You don't have a copy of the product for each user, and one for testing.

How do I test a web site?

For static pages, you verify that the pages meet your standards (see the question on checking documents for instruction on automating this), and that all the links are to valid documents. Since web sites sometimes vanish or change - even yours - running link checks on the production files at regular intervals is a good practice.

For dynamic pages, part of whose content come from an external database, the problem is much harder. You don't want to run tests against the production database. The solution is to duplicate the production environment on a test server, and run tests on that. In extreme cases, the database client software running on the web server may not be something that can be duplicated on each developers desktop. In this case, a third copy of the server can be set up as a development server, though this leads to quite predictable resource conflicts.

Software for testing web sites is a rapidly changing field. Rather than recommend a specific product or product list, I'd recommend trying your favorite search engine for web server testing

How can I use Perforce to help with my W.C.M. problems?

By using it to help manage the files, just as you would if they were part of any other product. See the white paper on Web Content Management with Perforce for a detailed description of several plans for doing this.

How can I ensure that no one edits files in the published branch?

By adding a trigger to the published branch which verifies that the changes which affect files in that branch are branches, integrations or deletions. The checkfor.py script can do this, with a trigger line similar to:

integration //depot/web/published/... "checkfor.py ' - (integrate|branch|delete) change ' %changelist% %serverport%"

How can I check my HTML documents?

Perforce can do this for you. Doing so requires that you have one branch for unchecked files, and a branch for checked files - which is one of the two methods recommended by the white paper on Web Content Management with Perforce. Since files in the published branch should be exact copies of the files in the development branch (see the question on preventing edits), a trigger on the published branch can get the file from the development branch to check. The error messages from the checker will be sent to the change submitter if the check fails, so the developer is notified about what needs to be fixed.

After such an integration fails, the file in the published branch is locked. Fixing the problem will require reverting the file on the published branch before reintegrating the fixed file from the production branch.

How can I update my search engine's database?

The answer depends on how you are getting files from Perforce to the web server, and how your search engine builds the database.

To take proper advantage of Perforce in this case, you need a search engine that can update its database when a single file changes. Once you have that, you add hooks to the W.C.M. system to get a list of files that have changed so the database can be updated.

If your production web server is a client which you synchronize to the Perforce depot to update, then the list of new files is immediately available as the output of the p4 sync command. It will need to be massaged into a form acceptable to the search engine, which should be a simple text substitution.

If you are using WebKeeper, then the list of files isn't quite so readily available. In this case, you set up a review daemon which gets the lists of changes that involve the web files in the depot, and extract the list of depot files from those changes. From this point, it's again a simple matter of text substitution.

How can I make my scripts executable?

Perforce stores the file type for each file which includes this information. See the p4 help filetypes text for complete information for your server. The correct type for scripts is text+x.

The easy way to get this set properly is to make the file executable in the client workspace before adding it to the depot. Perforce will set the type properly from that. If the file doesn't exist, you can use p4 add -t text+x filename to set the file type. If it has already been added, or is being edited, use p4 reopen -t text+x filename. If the file is not open for editing, then you must edit it, and can use p4 edit -t text+x filename. These commands will set the file type to executable when the file is submitted to the depot, and the next sync on the server will make the file executable there.