PHP Csv Utilities Documentation (PhpDocumentor)
Since I began doing web development five years ago, I have been exclusively a PHP developer. Recently though, I have taken quite a liking to python. In fact many times while I’m writing PHP I find myself thinking, “It sure would be nice if I could do this the python way”. It’s not that I don’t love PHP, it’s just that python is such an absolute joy to work with. Many features of PHP feel sort of tacked-on as an afterthought. For instance, many standard features available for object-oriented languages are rudimentary or missing completely. Although to PHP’s credit, PHP5 and the new Zend engine 2 have improved the situation considerably.
Several of the recent projects I’ve been working on have required an “import from / export to CSV” feature. PHP comes with a few functions for reading and writing csv files right out of the box: fgetcsv and fputcsv. While these are good functions and they get the job done I find they just aren’t enough in many cases. It would be nice if PHP had an interface like python’s csv module. Enter PHP CSV Utilities. The library is still in its infancy and nothing about the interface is concrete yet, but here are some of the things that are already possible with the library:
Open a csv file and parse it line-by-line:
The simplest possible method of reading a csv file is to create a new Csv_Reader object and loop over it with foreach.
$reader = new Csv_Reader('/path/to/myfile.csv'); foreach ($reader as $row) { // do something.. $row now contains an array of the current row }
What if the file doesn’t exist?
Csv_Reader will throw a Csv_Exception_FileNotFound exception if it can’t open the file you provide
try { $reader = new Csv_Reader('/path/to/file'); } catch (Csv_Exception_FileNotFound $e) { echo $e->getMessage(); }
Read the header line and then loop through the rest of the lines
If you know the file you are reading has a header row, simply grab it and then loop through the reader with a while loop (foreach will not work here because it starts at the beginning and loops through everything in the reader).
$reader = new Csv_Reader('/path/to/myfile.csv'); $header = $reader->getRow(); // grabs first row while ($row = $reader->getRow()) { // starts from second row // do something }
Read a csv file that was created by Excel
Excel has the ability to save as a csv file instead of its proprietary xls format. The library supports what are called Csv dialects (terminology borrowed from python). Csv dialects simply tell reader and writer objects how to do their job. For instance, Excel only quotes columns if there are special characters in them. Using the standard reader would inproperly parse the output of an Excel csv file. To read an Excel csv file properly, simply provide a Csv_Dialect_Excel object in the constructor.
$dialect = new Csv_Dialect_Excel; $reader = new Csv_Reader('/path/to/file', $dialect); // now reader will read excel csv file properly (this does not mean xls) foreach ($reader as $row) { // do something }
But what if I have an excel file with a different delimiter, like a tab character?
If you need to change any of the options in a dialect, you may either extend it via inheritance, or you may simply change said option at run-time.
$dialect = new Csv_Dialect_Excel(); $dialect->delimiter = "\t"; $reader->setDialect($dialect); foreach ($reader as $row) { // do something with $row }
Csv_Dialects represent the format of a csv file – This means you can read or write just about any type of csv file. As of right now, Csv_Dialect has the following properties (most of which were borrowed from python’s csv module):
- delimiter – The delimiter is the character that seperates column values from eachother in a row
- quotechar – The character used to quote columns
- escapechar – The character used to escape special characters such as the delimiter or the quotechar
- lineterminator – The character used to denote the end of a line
- quoting – This tells the reader which rows in the csv file are quoted, and the writer which columns to quote. Can be any of the following Csv_Dialect constants:
- Csv_Dialect::QUOTE_MINIMAL – Quotes only items that contain special characters such as the delimiter or quotechar
- Csv_Dialect::QUOTE_ALL – Quotes all items regardless of their content
- Csv_Dialect::QUOTE_NONNUMERIC – Quotes only items with nonnumeric values
- Csv_Dialect::QUOTE_NONE – Quotes no items regardless of their content
- doublequote – Not yet implemented
- skipinitialspace – Not yet implemented
- skipblanklines – Set to true if you would like for the reader to ignore blank lines
Rewrite a csv file with Excel formatting (or any other format)
The Csv_Writer object accepts a Csv_Reader object in its writeRows() method, so rewriting a csv file with a new dialect (new delimiter, line-ending, etc) is really simple.
$reader = Csv_Reader('/path/to/file.csv'); $writer = new Csv_Writer('/path/to/file.csv', new Csv_Dialect_Excel()); $writer->writeRows($reader); $writer->close(); // writes csv file but now it's in Excel csv format
I’d really like to get an idea of features that would be useful to people other than myself. I’m also interested in feedback on the interface, and ways I could improve the library. If you’re interested, download the library and play around with it a bit. When you find something you like or dislike or if you have any kind of input at all, leave a comment below or drop me an email. Enjoy!
PHP Csv Utilities Documentation (PhpDocumentor)
Plans for version 0.2
- Csv_Sniffer – a class that accepts a sample of csv and attempts to deduce its format. It will then return a dialect that fits the csv file as well as attempt to detect whether or not the file has a row of headers.
- Csv_Dialect classes for any and all formats I can dig up (Open Office, Miva Merchant, Google Docs and Spreadsheets, standard csv?, etc.)
- An option parameter for Csv_Dialect – I was considering allowing an array of parameters in Csv_Dialect’s constructor. This way you could easily put together a dialect like $dialect = new Csv_Dialect(array(‘quotechar’ => “‘”, ‘escapechar’ => ‘”‘));
- More comprehensive unit tests – Although I made my best effort to stick to test-driven development practices (since learning them properly has been on my priority list for a while), there are still some areas where I could use some more testing.
- More documentation – I’m going to strive to document every feature of this library. Poor documentation is a pet peeve of everybody.
Possible / Eventual Features
- Csv_Reader_Zip – A csv reader that can read zipped files
- Csv_Reader_String – A csv reader that accepts a string instead of a file
- Character encoding – This will be the first time I have really had to deal with multiple character encodings, so this may take me a while. I will need to do some research on the subject.
Matthijs Says:
February 25th, 2008 at 12:16 am
Looks interesting. Great work. Will keep an eye on your blog for further developments.
Luke Visinoni Says:
February 26th, 2008 at 1:22 pm
Thanks Matthijs. I expect to release the next version within a week or so… stay tuned!
PHP CSV Utilities v0.2 has been released - The Q Says:
March 15th, 2008 at 11:30 pm
[...] have just wrapped up version 0.2 of our csv library. It includes several new features. The most exciting of which is the new Csv_Sniffer [...]
Cameron Says:
July 28th, 2008 at 4:02 am
I found two bugs when using CSV_Writer
Cvs/Writer.php
I needed to print an extra newline in writeData
Add this to the end :
fwrite($this->handle, $this->dialect->lineterminator);
Cvs/Dialect/Excel.php
A literal string "\r\n" is being printed instead of return + newline character.
Change
public $lineterminator = ‘\r\n’;
to
public $lineterminator = "\r\n";
arena Says:
October 15th, 2008 at 8:28 am
Hello,
I tried to use your php code with an csv file coming from the export of my gmail contact. Apparently, the Csv_Sniffer is not abe to handle the data properly.
Please help.
Vikas Mohan Says:
November 5th, 2008 at 8:41 pm
Hello,
I am doing a social network site and i have to let the user sign into their account from my page.Once they enter the login details i have to collect all their address book and further use the same for sending invites.Can you guide me how to do this
Thanks in advance
vikas
Luke Visinoni Says:
November 6th, 2008 at 10:07 am
Arena,
I’ll look into it. The next release will be a lot more stable.
Vikas,
What are you asking me? How to import a csv file full of contacts into your system? That’s a very general question. Be specific.
Everybody else,
My apologies, but I have been unable to find the time to work on this project as of late. Other, more lucrative projects have been keeping me busy unfortunately. Hopefully I’ll get a chance to finally wrap up version 0.3 some time before the end of the year.
Florian Says:
November 25th, 2008 at 11:56 pm
Just a short question to the directory structure and include paths:
The directory structure is
+ Csv
+ Dialect
+ Exception
…
+ Reader.php
…
But in Reader.php is
require_once ‘Csv/Dialect.php’;
which php won’t find because Reader is IN Csv. Should it not be
require_once ‘Dialect.php’;
?
Trying to use the lib in a symfony app.
thank you for your work.
Florian
Florian Says:
November 25th, 2008 at 11:58 pm
and now my comment about the directory structure is messed up:
Csv is the top level.
all other files are below.
Csv > Dialect
Csv > Exception
Csv > Reader.php
….
Kip Says:
November 27th, 2008 at 1:48 am
I’ve just been sitting around waiting for something to happen.,
Luke Visinoni Says:
November 28th, 2008 at 11:01 am
Yes, I am aware of these issues, thank you!
I’m sorry guys, I’ve unexpectedly become very very busy. I’m working on this when I can, but it still may be a bit.
Josh Ribakoff Says:
February 12th, 2009 at 8:27 am
Nice work! I may end up extending this so that you can access fields like a hash map, using numerical keys goes against being intuitive. If you shoot me SVN info I’ll send in my patch. Regarding the filenames guys just add the folder to your include path, that’s what you normally would do with a library like this.
Josh Ribakoff Says:
February 12th, 2009 at 8:46 am
Looks like there needs to be a trim() filter on fields as well.
Ryan Says:
September 25th, 2009 at 8:36 am
Hey everyone,
I have the class writing form data to a csv and it works perfectly. What I am wondering is am I able to compare the form data first with the csv rows to make sure its not writing a duplicate entry? Maybe a loop to go through all the rows and compare it to the new data first? If anyone has a duggestion it is much appreciated. I could post my code if it helps.
Ryan