track2sql.php #3

  • //
  • guest/
  • stewart_lord/
  • track2sql/
  • track2sql.php
  • View
  • Commits
  • Open Download .zip Download (16 KB)
#!/usr/local/bin/php
<?php


/**
 *
 *  TRACK2SQL IS PROVIDED "AS IS", WITHOUT A WARRANTY OF ANY 
 *  KIND. ALL EXPRESS OR IMPLIED REPRESENTATIONS AND WARRANTIES, 
 *  INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS 
 *  FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY 
 *  EXCLUDED.
 *
 *
 *  T R A C K 2 S Q L . P H P 
 * ---------------------------
 *
 * Takes a VTRACK log file (server version 2005.2 or greater) as input and 
 * produces an SQL file as output. Requires PHP and an SQL database, such as 
 * MySQL. For the best results, use a vtrack=1 log.
 *
 *
 *  U S A G E
 * -----------
 *
 *   php track2sql.php [ logFile | - [ sqlFile | - ] ] [ -d dbName ]
 *    
 *   Examples:
 *     $ track2sql.php log track.sql -d track
 *     $ cat log | track2sql.php | mysql
 *     $ tail -F log | track2sql.php | mysql
 *
 *
 *  O U T P U T
 * -------------
 *
 * Produces an SQL file that will create and populate the following
 * tables:
 *
 *  +------------+---------------+------+-----+
 *  | P R O C E S S                           |
 *  +------------+---------------+------+-----+
 *  | Field      | Type          | Null | Key |
 *  +------------+---------------+------+-----+
 *  | processKey | int(11)       | NO   | PRI |
 *  | time       | int(11)       | NO   |     |
 *  | pid        | int(11)       | NO   |     |
 *  | user       | varchar(255)  | NO   |     |
 *  | client     | varchar(255)  | NO   |     |
 *  | ip         | varchar(255)  | NO   |     |
 *  | app        | varchar(255)  | NO   |     |
 *  | cmd        | varchar(255)  | NO   |     |
 *  | args       | text          | YES  |     |
 *  | lapse      | decimal(10,3) | YES  |     |
 *  | uCpu       | int(11)       | YES  |     |
 *  | sCpu       | int(11)       | YES  |     |
 *  | diskIn     | int(11)       | YES  |     |
 *  | diskOut    | int(11)       | YES  |     |
 *  | ipcIn      | int(11)       | YES  |     |
 *  | ipcOut     | int(11)       | YES  |     |
 *  | maxRss     | int(11)       | YES  |     |
 *  | pageFaults | int(11)       | YES  |     |
 *  | rpcMsgsIn  | int(11)       | YES  |     |
 *  | rpcMsgsOut | int(11)       | YES  |     |
 *  | rpcSizeIn  | int(11)       | YES  |     |
 *  | rpcSizeOut | int(11)       | YES  |     |
 *  +------------+---------------+------+-----+
 * 
 *  +-------------+--------------+------+-----+
 *  | T A B L E   U S E                       |
 *  +-------------+--------------+------+-----+
 *  | Field       | Type         | Null | Key |
 *  +-------------+--------------+------+-----+
 *  | processKey  | int(11)      | NO   | PRI |
 *  | tableName   | varchar(255) | NO   | PRI |
 *  | pagesIn     | int(11)      | YES  |     |
 *  | pagesOut    | int(11)      | YES  |     |
 *  | pagesCached | int(11)      | YES  |     |
 *  | readLocks   | int(11)      | YES  |     |
 *  | writeLocks  | int(11)      | YES  |     |
 *  | getRows     | int(11)      | YES  |     |
 *  | posRows     | int(11)      | YES  |     |
 *  | scanRows    | int(11)      | YES  |     |
 *  | putRows     | int(11)      | YES  |     |
 *  | delRows     | int(11)      | YES  |     |
 *  | readWait    | int(11)      | YES  |     |
 *  | readHeld    | int(11)      | YES  |     |
 *  | writeWait   | int(11)      | YES  |     |
 *  | writeHeld   | int(11)      | YES  |     |
 *  +-------------+--------------+------+-----+
 *
 */


// run.
main( $argv, $argc );


/**
 * The main function. 
 * loops over input file - writes to output file.
 *
 * @param   array   argv   the options given to dictate behavior.
 * @param   int     argc   the number of arguments.
 */
function main( $argv, $argc )
{

    // relax error reporting to suppress notices.
    error_reporting( E_ALL & ~E_NOTICE );

    // parse args:
    //  -> if arg looks like '-d', then take next piece for db name.
    //  -> take first (non -d) arg to be inputFile.
    //  -> take second (non -d) arg to be outputFile.
    for ( $i = 1; $i < $argc; $i++ )
    {
        if ( substr( $argv[ $i ], 0, 2 ) == "-d" )
        {
            if ( strlen( $argv[ $i ] ) == 2 )
                $dbName = $argv[ ++$i ];
            else
                $dbName = substr( $argv[ $i ], 2 );
        }
        else if ( !$inputFile )
        {
            $inputFile = $argv[ $i ];
        }
        else if ( !$outputFile )
        {
            $outputFile = $argv[ $i ];
        }
    }

    // input and output file arguments are optional.
    //  -> if no input file given, then use stdin.
    //  -> if no output file given, then use stdin.
    //  -> if input or output file is '-', then use stdin/out.
    if ( !$inputFile || $inputFile == "-" ) $inputFile = "php://stdin";
    if ( !$outputFile || $outputFile == "-" ) $outputFile = "php://stdout";
        
    // open the files and test handles.
    $in  = fopen( $inputFile,  'r' );
    $out = fopen( $outputFile, 'w' );
    if ( !is_resource( $in ) )
        die( "Error: Unable to open input file." );
    if ( !is_resource( $out ) )
        die( "Error: Unable to open output file." );

    // if dbName given, write out the create/use db statements.
    if ( trim( $dbName ) )
    {
        fwrite( $out,
            "CREATE DATABASE IF NOT EXISTS " . $dbName . "; \n" .
            "USE " . $dbName . "; \n" 
        );
    }

    // write out the create table statements.
    fwrite( $out, getCreateStatements() );

    // start reading lines - recognize 'info' blocks, buffer and parse them.
    while ( $line = fgets( $in ) )
    {

        // if not inside info block, look for start of block.
        if ( !$inInfo )
        {
            if ( isInfoStart( $line ) )
                $inInfo = TRUE;
        }
        
        // if inside info block, buffer and look for end of block.
        else if ( $inInfo )
        {
            if ( isInfoEnd( $line ) )
            {
                $sql = parseInfo( $buffer );
                if ( $sql ) fwrite( $out, $sql );
                if ( !isInfoStart( $line ) )
                    $inInfo = FALSE;            
                $buffer = NULL;
            } else {
                $buffer .= $line;
            }    
        }
        
    }
    
    // clean-up.
    fclose( $in );
    fclose( $out );

}


/**
 * Quick test for the beginning of an info block.
 *
 * @param   string  line    the line to test.
 * @return  bool    the result of the test.
 */
function isInfoStart( $line )
{
    if ( ( $line === "Perforce server info:\n" ) or
         ( $line === "Perforce server info:\r\n" ) )
        return TRUE;
}


/**
 * Quick test for the end of an info block.
 *
 * @param   string  line    the line to test.
 * @return  bool    the result of the test.
 */
function isInfoEnd( $line )
{
    if ( !trim( $line ) || isInfoStart( $line ) )
        return TRUE;
}


/**
 * Extract track information from info block.
 *
 * @param   string  info    the info block to parse.
 */
function parseInfo( $info )
{

    // early exit if not track output.
    if ( !isTrackOutput( $info ) ) return;

    // manufacture a key for joining process
    // and table records.
    $key = makeProcessKey();

    // build the process record.
    $processInfo  =  extractProcessInfo( $info );
    $sql          =   
        "INSERT INTO process " .
        "VALUES ( " . $key . "," . implode( ",", $processInfo ) . " );\n";
        
    // build the table records.
    $tablesInfo   =  extractTablesInfo( $info );
    if ( !is_array( $tablesInfo ) ) return $sql;
    foreach ( $tablesInfo as $tableInfo )
    {
        $sql .= 
            "INSERT INTO tableUse " .
            "VALUES ( " . $key . "," . implode( ",", $tableInfo ) . ");\n";
    }
 
    // return the sql.
    return $sql;
    
}


/**
 * Test if the given string looks like vtrack output.
 *
 * @param   string  info    the info block to test.
 * @return  bool    the result of the test.
 */
function isTrackOutput( $info )
{
    return strstr( $info, "\n--- " );
}


/**
 * Make a numeric key for joining processes and table use records.
 *
 * @return  int the key.
 */
$PROCESS_KEY = 0;
function makeProcessKey()
{
    global $PROCESS_KEY;
    return ++$PROCESS_KEY;
}


/**
 * Take a vtrack block, and extract process information.
 * Resulting array contains 21 elements (quoted where appropriate).
 *
 * @param   string  track   the vtrack output to parse.
 * @return  array   the exploded process information.
 */
function extractProcessInfo( $track )
{

    // break into lines.
    $lines = explode( "\n", $track );

    // pull apart the first line.
    $parts = explode( " ", trim( $lines[0] ), 7 );
    list( $user, $client ) = explode( "@", $parts[4], 2 );
    list( $app, $command ) = explode( "] '", $parts[6], 2 );
    list( $cmd, $args )    = explode( " ", $command, 2 ); 
    $process[] = strtotime( $parts[0] . " " . $parts[1] );
    $process[] = intval( $parts[3] );
    $process[] = "'" . addslashes( trim( $user ) ) . "'";
    $process[] = "'" . addslashes( trim( $client ) ) . "'";
    $process[] = "'" . trim( $parts[5] ) . "'";
    $process[] = "'" . addslashes( substr( $app, 1 ) ) . "'";
    if ( $args )
    {
        $process[] = "'" . $cmd . "'";
        $process[] = "'" . addslashes( substr( $args, 0, -1 ) ) . "'";
    } else {
        $process[] = "'" . substr( $cmd, 0, -1 ) . "'";
	$process[] = "NULL";
    }

    // search the next three lines for lapse, usage and rpc.
    //  -> only first could be lapse.
    //  -> first or second could be usage.
    //  -> anyone could be rpc.
    for ( $i = 1; ( $i < 4 ) && ( $i < count( $lines ) ); $i++ )
    {
        if ( $i == 1 && substr( $lines[ $i ], 0, 10 ) == "--- lapse " )
            $lapseTime = trim( $lines[ $i ] );
        else if ( $i < 3 && substr( $lines[ $i ], 0, 10 ) == "--- usage " )
            $usageStats = trim( $lines[ $i ] );
        else if ( substr( $lines[ $i ], 0, 8 ) == "--- rpc " )
            $rpcStats = trim( $lines[ $i ] );
    }
    
    // insert lapse time into process info array - or pad with 'NULL'.
    if ( $lapseTime )
    {
        $process[] = floatval( substr( $lapseTime, 10, -1 ) );
    } else {
        $process[] = "NULL";
    }

    // insert usage stats into process info array - or pad with 'NULL'.
    if ( $usageStats )
    {
        $parts = split( " |\+", $usageStats );
        for ( $i = 2; $i < 10; $i++ )
            $process[] = intval( $parts[ $i ] );
    } else {
        for ( $i = 0; $i < 8; $i++ )
            $process[] = "NULL";
    }    

    // insert rcp stats into process info array - or pad with 'NULL'.
    if ( $rpcStats )
    {
        $parts = split( " |\+|/", $rpcStats );
        for ( $i = 6; $i < 10; $i++ )
            $process[] = intval( $parts[ $i ] );
    } else {
        for ( $i = 0; $i < 4; $i++ )
            $process[] = "NULL";
    }    

    // return process info array.
    return $process;

}


/**
 * Take a vtrack block, and extract tables usage information.
 *
 * @param   string  track   the vtrack output to parse.
 * @return  array   the exploded tables information.
 */
function extractTablesInfo( $track )
{

    // break apart track output on tables.
    $blocks = explode( "--- db.", $track );
    for ( $i = 1; $i < count( $blocks ); $i++ )
    {
        $tables[] = extractTableInfo( $blocks[ $i ] );
    }
    return $tables;

}


/**
 * Take a snippet of vtrack, and extract single table usage information.
 * Returns an array containing 15 elements.
 *
 * @param   string  track   the vtrack output to parse.
 * @return  array   the exploded table information.
 */
function extractTableInfo( $track )
{

    // break into lines.
    $lines = explode( "\n", $track );

    // first line is table name.
    $table[] = "'" . trim( $lines[0] ) . "'";

    // search the next three lines for pages, rows and locks.
    //  -> only first could be pages.
    //  -> first or second could be rows.
    //  -> anyone could be locks.
    for ( $i = 1; ( $i < 4 ) && ( $i < count( $lines ) ); $i++ )
    {
        if ( $i == 1 && substr( $lines[ $i ], 0, 12 ) == "---   pages " )
            $pageStats = trim( $lines[ $i ] );
        else if ( $i < 3 && substr( $lines[ $i ], 0, 12 ) == "---   locks " &&
            substr( $lines[ $i ], 0, 16 ) != "---   locks wait" )
            $rowStats = trim( $lines[ $i ] );
        else if ( substr( $lines[ $i ], 0, 16 ) == "---   locks wait" )
            $lockStats = trim( $lines[ $i ] );
    }

    // insert page stats into table info array - or pad with 'NULL'.
    if ( $pageStats )
    {
        $parts = split( "[ |\+]+", $pageStats );
        for ( $i = 5; $i < 8; $i++ )
            $table[] = intval( $parts[ $i ] );
    } else {
        for ( $i = 0; $i < 3; $i++ )
            $table[] = "NULL";
    }    

    // insert row stats into table info array - or pad with 'NULL'.
    if ( $rowStats )
    {    
        
        // switch for server version.
        //  -> 14 parts in 2005.2 track output.
        //     *read/write locks aren't differentiated.
        //  -> 17 parts in 2006.1+ track output.
        //     *read/write locks are broken out.
        $parts = split( "[ |\/|\+]+", $rowStats );
        if ( count( $parts ) == 14 )
        {
            $table[] = intval( $parts[2] );
            $table[] = intval( $parts[2] );
            for ( $i = 9; $i < 14; $i++ )
                $table[] = intval( $parts[ $i ] );
        } else {
            $table[] = intval( $parts[4] );
            $table[] = intval( $parts[5] );
            for ( $i = 12; $i < 17; $i++ )
                $table[] = intval( $parts[ $i ] );
        }
        
    } else {
        for ( $i = 0; $i < 7; $i++ )
            $table[] = "NULL";
    }    

    // insert lock stats into table info array - or pad with 'NULL'.
    if ( $lockStats )
    {
        $parts = split( "[ |\/|\+]+", $lockStats );
        for ( $i = 6; $i < 10; $i++ )
            $table[] = intval( $parts[ $i ] );
    } else {
        for ( $i = 0; $i < 4; $i++ )
            $table[] = "NULL";
    }    

    // return table info array.
    return $table;

}


/**
 * Create process and tableUse tables.
 *
 * @return  string  the sql to create the tables.
 */
function getCreateStatements()
{

    // create syntax for process table.
    $sql  =
        "DROP TABLE IF EXISTS process; \n" .
        "CREATE TABLE process ( \n" .
        " processKey int NOT NULL, \n" .
        "       time int NOT NULL, \n" .
        "        pid int NOT NULL, \n" . 
        "       user varchar(255) NOT NULL, \n" . 
        "     client varchar(255) NOT NULL, \n" . 
        "         ip varchar(255) NOT NULL, \n" . 
        "        app varchar(255) NOT NULL, \n" . 
        "        cmd varchar(255) NOT NULL, \n" . 
        "       args text NULL, \n" .
        "      lapse decimal(10,3) NULL, \n" .
        "       uCpu int NULL, \n" .
        "       sCpu int NULL, \n" .
        "     diskIn int NULL, \n" .
        "    diskOut int NULL, \n" .
        "      ipcIn int NULL, \n" .
        "     ipcOut int NULL, \n" .
        "     maxRss int NULL, \n" .
        " pageFaults int NULL, \n" .
        "  rpcMsgsIn int NULL, \n" .
        " rpcMsgsOut int NULL, \n" .
        "  rpcSizeIn int NULL, \n" .
        " rpcSizeOut int NULL, \n" .
        " PRIMARY KEY ( processKey ) \n" .
        "); \n";
        
    // create syntax for tableUse table.
    $sql .=
        "DROP TABLE IF EXISTS tableUse; \n" .
        "CREATE TABLE tableUse ( \n" .
        "  processKey int NOT NULL, \n" .
        "   tableName varchar(255) NOT NULL, \n" . 
        "     pagesIn int NULL, \n" .
        "    pagesOut int NULL, \n" .
        " pagesCached int NULL, \n" .
        "   readLocks int NULL, \n" .
        "  writeLocks int NULL, \n" .
        "     getRows int NULL, \n" .
        "     posRows int NULL, \n" .
        "    scanRows int NULL, \n" .
        "     putRows int NULL, \n" .
        "     delRows int NULL, \n" .
        "    readWait int NULL, \n" .
        "    readHeld int NULL, \n" .
        "   writeWait int NULL, \n" .
        "   writeHeld int NULL, \n" .
        " PRIMARY KEY ( processKey, tableName ) \n" .
        "); \n";        
    
    return $sql;
    
}


?>
# Change User Description Committed
#16 7621 Stewart Lord Updated create table statements to use the IF NOT EXISTS
condition. This allows SQL to be fed into an existing database
without errors/warnings.
#15 7620 Stewart Lord Minor update to Track2SQL to avoid timezone unset issue
and split() function deprecated in PHP 5.3.
#14 7338 Stewart Lord Rolled-back change 7209.
This removes the endTime column
from the process table. Track2SQL no longer attempts to analyze
'completed' entries.

This change was influenced by three factors:
 - Analyzing 'completed' entries significantly degrades
   performance (about 2.75x slower in my tests).
 - In some versions of PHP (5.2.8) the strtotime() function
   suffers from a memory leak.
 - The 'completed' entries are not part of Vtrack output.
#13 7209 Stewart Lord Integrating an enhancement from Michael Shield's guest
branch. Track2SQL now records the end time of each process
(when it is reported). This information is reported for every
completed process when -vserver=2|3 logging is enabled.

If verbose server logging is enabled this is more reliable than
start 'time' + 'lapse' because (by default) lapse is only reported
when it exceeds a certain threshold. If, however, vtrack=1 is
set then lapse time will be reported for every command.

Note: this change brings a schema change. It adds a
'endTime' column to the process table.
#12 7198 Stewart Lord It is now possible to accumulate output from multiple
invocations of track2sql in a single database. Previously,
the processKey was a incrementing value that always started
at zero. Therefore, processKeys from separate runs of
track2sql would collide if inserted into the same databse.

Now, track2sql uses a 36 character universally unique
identifier (UUID) for each process. UUIDs give us reasonable
confidence that the process keys will never collide.

One consequence of this change is that the schema is
slightly different. The type of the processKey column is
now a 36 character varchar instead of a integer.

Another side-effect of this change is that the output of
track2sql cannot be predicted and will always be different
even when processing the same log file.

Track2sql performance is largely unaffected, however, insert
and select performance may degrade somewhat due to the
larger key size.
#11 7197 Stewart Lord Fixed a bug where incomplete process lines such as:
'journal rotation', produced invalid SQL output.

Log entries with process lines that fail to pass a basic
sanity check are now ignored. The added check is rather
crude and therefore performance is largely unaffected.
#10 7193 Stewart Lord Fixed a bug where 'pages reordered' data could be
mistaken for 'pages i/o' data, if the 'pages reordered'
line was not preceded by a 'pages i/o' line.
#9 7104 Stewart Lord Follow-on to 7098.
The quote() function now wraps and
escapes the given string.
#8 7098 Stewart Lord Fixed a sqlite incompatibility.
Backslash is not a valid way
to escape quotes in sqlite. Quote-quote ('') is valid in both
mysql and sqlite.

Replaced use of addslashes with a quote() function that
replaces occurrences of single-quotes with two single quotes.
#7 6424 Stewart Lord Updated track2sql disclaimer. Addded a link to the
readme file from the script itself.
#6 6379 Stewart Lord Fixed a bug where track2sql failed to properly extract
some table usage data if 'pages reordered' were reported.
#5 6289 Stewart Lord Minor update to track2sql.
 - Added version and usage information. Can be viewed
   with -v, -V or -h.
 - Added error handling for the case of a non-existent
   input file or a empty input file.
 - Removed 'drop table if exists' statements from the
   table creation SQL.
#4 6010 Stewart Lord Fixed a bug where track2sql failed to properly parse
lock times in 2007.2 log files. This was due to a small
change in the log file format.
#3 5889 Stewart Lord Modified create table statements to use signed columns instead
of unsigned columns. This avoids subtraction problems that can
occur in some versions of MySQL when SQL_MODE is not set to
NO_UNSIGNED_SUBTRACTION.

Main() now sets error_reporting to E_ALL & ~E_NOTICE to
suppress notices.
#2 5858 Stewart Lord Added disclaimer to script.
#1 5857 Stewart Lord Initial add of track2sql to the public depot.