PHP CSV Utilities – a PHP library similar to python’s standard CSV module

Posted on February 19th, 2008 by Luke Visinoni

Download PHP Csv Utilities

PHP Csv Utilities Documentation (PhpDocumentor)

Since I began doing web development five years ago, I have been exclusively a PHP developer. Recently though, I have taken quite a liking to python. In fact many times while I’m writing PHP I find myself thinking, “It sure would be nice if I could do this the python way”. It’s not that I don’t love PHP, it’s just that python is such an absolute joy to work with. Many features of PHP feel sort of tacked-on as an afterthought. For instance, many standard features available for object-oriented languages are rudimentary or missing completely. Although to PHP’s credit, PHP5 and the new Zend engine 2 have improved the situation considerably.

Several of the recent projects I’ve been working on have required an “import from / export to CSV” feature. PHP comes with a few functions for reading and writing csv files right out of the box: fgetcsv and fputcsv. While these are good functions and they get the job done I find they just aren’t enough in many cases. It would be nice if PHP had an interface like python’s csv module. Enter PHP CSV Utilities. The library is still in its infancy and nothing about the interface is concrete yet, but here are some of the things that are already possible with the library:

Open a csv file and parse it line-by-line:

The simplest possible method of reading a csv file is to create a new Csv_Reader object and loop over it with foreach.

$reader = new Csv_Reader('/path/to/myfile.csv');
foreach ($reader as $row) {
    // do something.. $row now contains an array of the current row
}

What if the file doesn’t exist?

Csv_Reader will throw a Csv_Exception_FileNotFound exception if it can’t open the file you provide

try {
    $reader = new Csv_Reader('/path/to/file');
} catch (Csv_Exception_FileNotFound $e) {
    echo $e->getMessage();
}

Read the header line and then loop through the rest of the lines

If you know the file you are reading has a header row, simply grab it and then loop through the reader with a while loop (foreach will not work here because it starts at the beginning and loops through everything in the reader).

$reader = new Csv_Reader('/path/to/myfile.csv');
$header = $reader->getRow(); // grabs first row
while ($row = $reader->getRow()) { // starts from second row
    // do something
}

Read a csv file that was created by Excel

Excel has the ability to save as a csv file instead of its proprietary xls format. The library supports what are called Csv dialects (terminology borrowed from python). Csv dialects simply tell reader and writer objects how to do their job. For instance, Excel only quotes columns if there are special characters in them. Using the standard reader would inproperly parse the output of an Excel csv file. To read an Excel csv file properly, simply provide a Csv_Dialect_Excel object in the constructor.

$dialect = new Csv_Dialect_Excel;
$reader = new Csv_Reader('/path/to/file', $dialect); // now reader will read excel csv file properly (this does not mean xls)
foreach ($reader as $row) {
    // do something
}

But what if I have an excel file with a different delimiter, like a tab character?

If you need to change any of the options in a dialect, you may either extend it via inheritance, or you may simply change said option at run-time.

$dialect = new Csv_Dialect_Excel();
$dialect->delimiter = "\t";
$reader->setDialect($dialect);
foreach ($reader as $row) {
    // do something with $row
}

Csv_Dialects represent the format of a csv file – This means you can read or write just about any type of csv file. As of right now, Csv_Dialect has the following properties (most of which were borrowed from python’s csv module):

  • delimiter – The delimiter is the character that seperates column values from eachother in a row
  • quotechar – The character used to quote columns
  • escapechar – The character used to escape special characters such as the delimiter or the quotechar
  • lineterminator – The character used to denote the end of a line
  • quoting – This tells the reader which rows in the csv file are quoted, and the writer which columns to quote. Can be any of the following Csv_Dialect constants:
    • Csv_Dialect::QUOTE_MINIMAL – Quotes only items that contain special characters such as the delimiter or quotechar
    • Csv_Dialect::QUOTE_ALL – Quotes all items regardless of their content
    • Csv_Dialect::QUOTE_NONNUMERIC – Quotes only items with nonnumeric values
    • Csv_Dialect::QUOTE_NONE – Quotes no items regardless of their content
  • doublequote – Not yet implemented
  • skipinitialspace – Not yet implemented
  • skipblanklines – Set to true if you would like for the reader to ignore blank lines

Rewrite a csv file with Excel formatting (or any other format)

The Csv_Writer object accepts a Csv_Reader object in its writeRows() method, so rewriting a csv file with a new dialect (new delimiter, line-ending, etc) is really simple.

$reader = Csv_Reader('/path/to/file.csv');
$writer = new Csv_Writer('/path/to/file.csv', new Csv_Dialect_Excel());
$writer->writeRows($reader);
$writer->close(); // writes csv file but now it's in Excel csv format

I’d really like to get an idea of features that would be useful to people other than myself. I’m also interested in feedback on the interface, and ways I could improve the library. If you’re interested, download the library and play around with it a bit. When you find something you like or dislike or if you have any kind of input at all, leave a comment below or drop me an email. Enjoy!

PHP Csv Utilities Documentation (PhpDocumentor)

Download PHP Csv Utilities

Plans for version 0.2

  • Csv_Sniffer – a class that accepts a sample of csv and attempts to deduce its format. It will then return a dialect that fits the csv file as well as attempt to detect whether or not the file has a row of headers.
  • Csv_Dialect classes for any and all formats I can dig up (Open Office, Miva Merchant, Google Docs and Spreadsheets, standard csv?, etc.)
  • An option parameter for Csv_Dialect – I was considering allowing an array of parameters in Csv_Dialect’s constructor. This way you could easily put together a dialect like $dialect = new Csv_Dialect(array(‘quotechar’ => “‘”, ‘escapechar’ => ‘”‘));
  • More comprehensive unit tests – Although I made my best effort to stick to test-driven development practices (since learning them properly has been on my priority list for a while), there are still some areas where I could use some more testing.
  • More documentation – I’m going to strive to document every feature of this library. Poor documentation is a pet peeve of everybody.

Possible / Eventual Features

  • Csv_Reader_Zip – A csv reader that can read zipped files
  • Csv_Reader_String – A csv reader that accepts a string instead of a file
  • Character encoding – This will be the first time I have really had to deal with multiple character encodings, so this may take me a while. I will need to do some research on the subject.

14 Responses to “PHP CSV Utilities – a PHP library similar to python’s standard CSV module”

  1. Looks interesting. Great work. Will keep an eye on your blog for further developments.

  2. Thanks Matthijs. I expect to release the next version within a week or so… stay tuned!

  3. [...] have just wrapped up version 0.2 of our csv library. It includes several new features. The most exciting of which is the new Csv_Sniffer [...]

  4. I found two bugs when using CSV_Writer

    Cvs/Writer.php

    I needed to print an extra newline in writeData

    Add this to the end :

    fwrite($this->handle, $this->dialect->lineterminator);

    Cvs/Dialect/Excel.php

    A literal string "\r\n" is being printed instead of return + newline character.

    Change

    public $lineterminator = ‘\r\n’;

    to

    public $lineterminator = "\r\n";

  5. Hello,

    I tried to use your php code with an csv file coming from the export of my gmail contact. Apparently, the Csv_Sniffer is not abe to handle the data properly.

    Please help.

  6. Vikas Mohan Says:

    Hello,

    I am doing a social network site and i have to let the user sign into their account from my page.Once they enter the login details i have to collect all their address book and further use the same for sending invites.Can you guide me how to do this

    Thanks in advance
    vikas

  7. Arena,
    I’ll look into it. The next release will be a lot more stable.

    Vikas,
    What are you asking me? How to import a csv file full of contacts into your system? That’s a very general question. Be specific.

    Everybody else,
    My apologies, but I have been unable to find the time to work on this project as of late. Other, more lucrative projects have been keeping me busy unfortunately. Hopefully I’ll get a chance to finally wrap up version 0.3 some time before the end of the year.

  8. Just a short question to the directory structure and include paths:
    The directory structure is
    + Csv
    + Dialect
    + Exception

    + Reader.php

    But in Reader.php is

    require_once ‘Csv/Dialect.php’;

    which php won’t find because Reader is IN Csv. Should it not be

    require_once ‘Dialect.php’;

    ?
    Trying to use the lib in a symfony app.

    thank you for your work.

    Florian

  9. and now my comment about the directory structure is messed up:
    Csv is the top level.
    all other files are below.
    Csv > Dialect
    Csv > Exception
    Csv > Reader.php
    ….

  10. I’ve just been sitting around waiting for something to happen.,

  11. Yes, I am aware of these issues, thank you!

    I’m sorry guys, I’ve unexpectedly become very very busy. I’m working on this when I can, but it still may be a bit.

  12. Nice work! I may end up extending this so that you can access fields like a hash map, using numerical keys goes against being intuitive. If you shoot me SVN info I’ll send in my patch. Regarding the filenames guys just add the folder to your include path, that’s what you normally would do with a library like this.

  13. Looks like there needs to be a trim() filter on fields as well.

  14. Hey everyone,
    I have the class writing form data to a csv and it works perfectly. What I am wondering is am I able to compare the form data first with the csv rows to make sure its not writing a duplicate entry? Maybe a loop to go through all the rows and compare it to the new data first? If anyone has a duggestion it is much appreciated. I could post my code if it helps.
    Ryan