I recently deployed a job on which the timeline was so tight that my ability to type quickly was what made the difference between delivering on time or not.

Everything was rushed, the budget was tight, it was one of those real seat of the pants deals and there was far too little testing done.

Just before I cut the site live (ie. minutes before I had to put this into production and have hundreds of people using it) I thought “You know, I bet something will go wrong somewhere here and I’ll need complete logs of all the SERVER, POST and GET requests to piece it back together”.

Boy am I glad I had that thought! Because it did. Nothing catastrophic, mind you, but there was a tricky integration with an external system that, as it turned out, periodically died without any errors or exceptions and just started returning bogus values.

The only problem is that, in the 5 minutes before the site was supposed to go live, I didn’t really have much time to thoughtfully prepare a logging system to record all this stuff and, in my haste, I settled for:

    ob_start();
    print_r($_GET);
    print_r($_POST);
    file_put_contents($logdir.'catchall.txt',
                       $_SERVER['REMOTE_ADDR'].' '.date('Y-m-d H:i:s').PHP_EOL.PHP_EOL.ob_get_clean().PHP_EOL.PHP_EOL,
                       FILE_APPEND);

When I came to actually process this information into something usable I realised what a dumb format for a log file this was and began kicking myself.

I had a look around for a snippet to parse the output of PHPs print_r but couldn’t find anything so I thought I’d post my solution here just in case anyone else ever makes the same mistake – might save them a couple of hours (I’ve added comments in the code below):

<?php
    //grab all the file contents
    $contents = file_get_contents($log_dir.'catchall.txt');
    //split it up by three newlines - this is particular to the 
    //way I actually wrote the log in the first place
    $split = explode("\n\n\n",$contents);
    //setup a few variables we'll need below
    $all_packets = array();
    $cur_packet  = NULL;
    $cur_data    = NULL;
    //foreach chunk of data
    foreach($split as $packet)
    {   
        //split into individual lines
        $lines = explode("\n",$packet);
        //for each line
        foreach($lines as $line)
        {   
            //if this is the line with the timestamp and ip address
            if(preg_match('/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+ [0-9]+-[0-9]+-[0-9]+ [0-9]+:[0-9]+:[0-9]+/',$line))
            {   
                //if we have already started
                if($cur_packet !== NULL)
                    //store the information for the previous packet
                    $all_packets[] = $cur_packet;
                //initialise a new packet
                $cur_packet = array();
                $cur_packet['data'] = array();
                $cur_packet['origin'] = $line;
            }
            //if this line contains key => value data
            else if(preg_match('/^.*\[(.*)\] => (.*)/',$line,$matches))
                //store it in the current data array
                $cur_data[$matches[1]] = $matches[2];
            //if this is the start of a print_r
            else if(trim($line) === '(')
                //create a new current data array
                $cur_data = array();
            //if this is the end of a print_r
            else if(trim($line) === ')')
            {   
                //store the current data array of key/value pairs
                $cur_packet['data'][] = $cur_data;
            }
        }
    }
    //store the last packet
    $all_packets[] = $cur_packet;

    //now you should be able to run something like:
    //
    //php parse.php > todiff && diff todiff $log_dir.catchall.txt
    //
    //in order to verify that your parser was correct
    foreach($all_packets as $packet)
    {   
        echo $packet['origin'].PHP_EOL.PHP_EOL;

        foreach($packet['data'] as $data)
            print_r($data);

        echo PHP_EOL.PHP_EOL;
    }