PHP CSV Utilities v0.2 released - now able to detect the format of a csv file

Posted on March 15th, 2008 by Luke Visinoni

Download PHP CSV Utililties v0.2
Read Documentation for PHP CSV Utilities

I have just wrapped up version 0.2 of our csv library. It includes several new features. The most exciting of which is the new Csv_Sniffer class.

Csv_Sniffer

sniff(string $sample)
Csv_Sniffer’s sniff method accepts a sample of csv data and attempts to deduce its format. In this library, there is a class called Csv_Dialect, which tells Csv_Reader and Csv_Writer the format they should read and write in. If Csv_Sniffer::sniff is successful, it will return a Csv_Dialect object representing the format of the csv file (or at least it’s best guess). You can then pass this dialect to Csv_Reader and it will know how to read the file. You can also pass it to Csv_Writer if you need to append the file or write one in the same format.

try {
    $sample = implode("", array_slice(file('./data/products.csv'), 0, 20)); // grab 20 lines
    $sniffer = new Csv_Sniffer();
    $dialect = $sniffer->sniff($sample);
    $reader = new Csv_Reader('./data/products.csv', $dialect);
} catch (Csv_Exception_CannotDetermineDialect $e) {
    printf("<p>%s</p>", $e->getMessage());
}

hasHeader(string $sample)
Csv_Sniffer’s hasHeader method accepts a sample of csv data and attempts to detect if the file has a header row or not. If so it will return true.

$sample = implode("", array_slice(file('./data/products.csv'), 0, 20)); // grab 20 lines
$sniffer = new Csv_Sniffer();
if ($sniffer->hasHeader($sample)) {
    print("The file probably has a header");
} else {
    print "The file probably doesn't have a header";
}

Csv_Reader_String

This new class is exactly the same as Csv_Reader, except instead of accepting a filename and reading from a file, it reads directly from a string. This could be useful if for some reason somebody had stored csv data in a database and you were retrieving it from there, or if you needed to collect submitted csv data directly from a web form.

if (isset($_POST['csv_data'])) {
    $data = $_POST['csv_data']
    $reader = new Csv_Reader_String($data);
    foreach ($reader as $row) {
        // now you could insert it into a database or whatever else you need to do with it
    }
}

Csv_Dialect::__construct([array $options])

You may now pass an associative array to Csv_Dialect’s constructor to override any of it’s properties. While this doesn’t actually provide any new features, it definitely is a convenience.

$dialect = new Csv_Dialect(array('quotechar' => "'", 'escapechar' => "'", 'quoting' => Csv_Dialect::QUOTE_NONNUMERIC));
$reader = new Csv_Reader('./data/orders.csv', $dialect);

Plans for version 0.3

  • Csv_Writer will write immediately, rather than when you call close() - This won’t change the interface at all, but in the next version, instead of writing to disk when the user calls close(), it will write immediately when writeRow() or writeRows() is called.
  • Interface changes for Csv_Sniffer - I don’t like how you have to pass the same sample data to both sniff() and hasHeader(). I will probably change it to accept $sample in its constructor instead. Another issue I have with it is that if there is a tie between delimiter characters in the sniff method, it just chooses by ascii order. I would like to allow the user to specify an array of characters in order of priority in case of a tie.
  • A more advanced unit testing interface - The unit tests I have written are all run at once and since they are reading / writing actual csv files it is beginning to take a while to run them all. I’m putting together an interface that will allow me to run tests seperately or all together as well as a way to time some operations so that I can speed them up as much as possible.
  • Csv_Dialect classes for any and all formats I can dig up (Open Office, Miva Merchant, Google Docs and Spreadsheets, standard csv?, etc.)
  • Csv_Mapper - A class that maps keys to columns so that you can access them like $row['first_name'].
  • Even more documentation I have written some documentation on the google code wiki, but I am planning on writing more consistent docs. The ones I have now are all sort of willy-nilly.
  • Csv_Reader_Zip - A csv reader that can read zipped files
  • Character encoding - This will be the first time I have really had to deal with multiple character encodings, so this may take me a while. I will need to do some research on the subject.
  • More to come - I will finish writing about the new features and complete the docs within the next week or so, for I am tired and I’m going to bed.

Download PHP CSV Utililties v0.2
Read Documentation for PHP CSV Utilities

One Response to “PHP CSV Utilities v0.2 released - now able to detect the format of a csv file”

  1. Great work Luke. I’ll take a look and try the code.

Leave a Reply