PHP Csv Utilities Documentation (PhpDocumentor)
Since I began doing web development five years ago, I have been exclusively a PHP developer. Recently though, I have taken quite a liking to python. In fact many times while I’m writing PHP I find myself thinking, “It sure would be nice if I could do this the python way”. It’s not that I don’t love PHP, it’s just that python is such an absolute joy to work with. Many features of PHP feel sort of tacked-on as an afterthought. For instance, many standard features available for object-oriented languages are rudimentary or missing completely. Although to PHP’s credit, PHP5 and the new Zend engine 2 have improved the situation considerably.
Several of the recent projects I’ve been working on have required an “import from / export to CSV” feature. PHP comes with a few functions for reading and writing csv files right out of the box: fgetcsv and fputcsv. While these are good functions and they get the job done I find they just aren’t enough in many cases. It would be nice if PHP had an interface like python’s csv module. Enter PHP CSV Utilities. The library is still in its infancy and nothing about the interface is concrete yet, but here are some of the things that are already possible with the library:
Open a csv file and parse it line-by-line:
The simplest possible method of reading a csv file is to create a new Csv_Reader object and loop over it with foreach.
$reader = new Csv_Reader('/path/to/myfile.csv'); foreach ($reader as $row) { // do something.. $row now contains an array of the current row }
What if the file doesn’t exist?
Csv_Reader will throw a Csv_Exception_FileNotFound exception if it can’t open the file you provide
try { $reader = new Csv_Reader('/path/to/file'); } catch (Csv_Exception_FileNotFound $e) { echo $e->getMessage(); }
Read the header line and then loop through the rest of the lines
If you know the file you are reading has a header row, simply grab it and then loop through the reader with a while loop (foreach will not work here because it starts at the beginning and loops through everything in the reader).
$reader = new Csv_Reader('/path/to/myfile.csv'); $header = $reader->getRow(); // grabs first row while ($row = $reader->getRow()) { // starts from second row // do something }
Read a csv file that was created by Excel
Excel has the ability to save as a csv file instead of its proprietary xls format. The library supports what are called Csv dialects (terminology borrowed from python). Csv dialects simply tell reader and writer objects how to do their job. For instance, Excel only quotes columns if there are special characters in them. Using the standard reader would inproperly parse the output of an Excel csv file. To read an Excel csv file properly, simply provide a Csv_Dialect_Excel object in the constructor.
$dialect = new Csv_Dialect_Excel; $reader = new Csv_Reader('/path/to/file', $dialect); // now reader will read excel csv file properly (this does not mean xls) foreach ($reader as $row) { // do something }
But what if I have an excel file with a different delimiter, like a tab character?
If you need to change any of the options in a dialect, you may either extend it via inheritance, or you may simply change said option at run-time.
$dialect = new Csv_Dialect_Excel(); $dialect->delimiter = "\t"; $reader->setDialect($dialect); foreach ($reader as $row) { // do something with $row }
Csv_Dialects represent the format of a csv file - This means you can read or write just about any type of csv file. As of right now, Csv_Dialect has the following properties (most of which were borrowed from python’s csv module):
- delimiter - The delimiter is the character that seperates column values from eachother in a row
- quotechar - The character used to quote columns
- escapechar - The character used to escape special characters such as the delimiter or the quotechar
- lineterminator - The character used to denote the end of a line
- quoting - This tells the reader which rows in the csv file are quoted, and the writer which columns to quote. Can be any of the following Csv_Dialect constants:
- Csv_Dialect::QUOTE_MINIMAL - Quotes only items that contain special characters such as the delimiter or quotechar
- Csv_Dialect::QUOTE_ALL - Quotes all items regardless of their content
- Csv_Dialect::QUOTE_NONNUMERIC - Quotes only items with nonnumeric values
- Csv_Dialect::QUOTE_NONE - Quotes no items regardless of their content
- doublequote - Not yet implemented
- skipinitialspace - Not yet implemented
- skipblanklines - Set to true if you would like for the reader to ignore blank lines
Rewrite a csv file with Excel formatting (or any other format)
The Csv_Writer object accepts a Csv_Reader object in its writeRows() method, so rewriting a csv file with a new dialect (new delimiter, line-ending, etc) is really simple.
$reader = Csv_Reader('/path/to/file.csv'); $writer = new Csv_Writer('/path/to/file.csv', new Csv_Dialect_Excel()); $writer->writeRows($reader); $writer->close(); // writes csv file but now it's in Excel csv format
I’d really like to get an idea of features that would be useful to people other than myself. I’m also interested in feedback on the interface, and ways I could improve the library. If you’re interested, download the library and play around with it a bit. When you find something you like or dislike or if you have any kind of input at all, leave a comment below or drop me an email. Enjoy!
PHP Csv Utilities Documentation (PhpDocumentor)
Plans for version 0.2
- Csv_Sniffer - a class that accepts a sample of csv and attempts to deduce its format. It will then return a dialect that fits the csv file as well as attempt to detect whether or not the file has a row of headers.
- Csv_Dialect classes for any and all formats I can dig up (Open Office, Miva Merchant, Google Docs and Spreadsheets, standard csv?, etc.)
- An option parameter for Csv_Dialect - I was considering allowing an array of parameters in Csv_Dialect’s constructor. This way you could easily put together a dialect like $dialect = new Csv_Dialect(array(’quotechar’ => “‘”, ‘escapechar’ => ‘”‘));
- More comprehensive unit tests - Although I made my best effort to stick to test-driven development practices (since learning them properly has been on my priority list for a while), there are still some areas where I could use some more testing.
- More documentation - I’m going to strive to document every feature of this library. Poor documentation is a pet peeve of everybody.
Possible / Eventual Features
- Csv_Reader_Zip - A csv reader that can read zipped files
- Csv_Reader_String - A csv reader that accepts a string instead of a file
- Character encoding - This will be the first time I have really had to deal with multiple character encodings, so this may take me a while. I will need to do some research on the subject.
Matthijs Says:
February 25th, 2008 at 12:16 am
Looks interesting. Great work. Will keep an eye on your blog for further developments.
Luke Visinoni Says:
February 26th, 2008 at 1:22 pm
Thanks Matthijs. I expect to release the next version within a week or so… stay tuned!
PHP CSV Utilities v0.2 has been released - The Q Says:
March 15th, 2008 at 11:30 pm
[...] have just wrapped up version 0.2 of our csv library. It includes several new features. The most exciting of which is the new Csv_Sniffer [...]