Hey everybody...
I know that a lot of people have written code to parse CSVs etc... I have written one which I believe is a lot smaller and faster than those listed here...
<?php
/*
class.parser.php
Dynamically parses a CSV file and returns the values as array for processing.
*/
class parser {
function doParse($csvFile,$sep) {
$csvFile = file($csvFile);
foreach ($csvFile as $key=>$value) {
$v = explode($sep,$value);
foreach ($v as $kk=>$lineItem) {
$csv[$key][$kk] = trim(trim($lineItem),"\"");
}
}
return $csv; // an associative array of the csv.
}
function dumpCSV($data,$sep,$heading = "") {
/*
build the "csv" from a 2 dimensional array
$result = $parser->dumpCSV($data,",",$heading);
where $data is an array formed similar to doParse (see above), and $heading
is the heading line for the CSV (titles etc).... Something like
$heading = "\"Booking Number\",\"Booking Date\"\n";
if you do not want/need a headling line, do not include $heading
*/
unset($message);
$message[] = $heading;
$x = 0;
foreach ($data as $key=>$v) {
unset($tempmsg);
foreach ($v as $item) {
$tempmsg .= "\"".$item."\"".$sep;
}
$message[$x] = trim($tempmsg,","); // gets rid of excess , @ the end of each line.
$x++;
}
//print_r($message);
// calculate accurate file size for the "downloaded file"
foreach ($message as $line) {
$bytes .= strlen($line);
}
return $message;
}
function downloadCSV($data,$filename) {
header ("Content-Type: application/vnd.ms-excel");
header ("Content-disposition: attachment; filename=\"".$filename.".csv\"");
header ("Content-length: $bytes");
foreach ($data as $line) {
echo $line."\n";
}
}
function searchCSVKey($data,$searchkey) {
foreach ($data as $key=>$v) {
foreach ($v as $item) {
if ($item == $searchkey) {
$returnvalue = $data[$key];
$returnvalue['line'] = $key;
break 2;
}
}
}
if ($returnvalue == "") {
$returnvalue['0'] = "NULL";
$returnvalue['line'] = 0;
}
return $returnvalue;
}
function getPartialCSVAlpha($data,$start) {
// gets from $start to the end of the CSV. useful for searching
for ($x = $start + 1; $x <= count($data); $x++) {
$d[] = $data[$x];
}
return $d;
}
function getPartialCSVOmega($data,$finish) {
// 1 to the $finish of the CSV and return
for ($x = 0; $x < $finish; $x++) {
$d[] = $data[$x];
}
return $d;
}
function getSection($data,$startSearch,$finishSearch) {
$r1 = $this->searchCSVKey($data,$startSearch);
$data = $this->getPartialCSVAlpha($data,$r1['line']); // gets rid of the first section not needed.
$r2 = $this->searchCSVKey($data,$finishSearch);
$data = $this->getPartialCSVOmega($data,$r2['line']); // sections down the CSV.
return $data;
}
function getSectionByLine($data,$start,$finish) {
for ($x = $start; $x <= $finish; $x++) {
$output[] = $data[$x];
}
return $output;
}
function addValues3($array1,$array2) {
foreach ($array1 as $key1=>$val1) {
foreach ($val1 as $i=>$v) {
if ((is_numeric($array1[$key1][$i]) == TRUE) && ($i > 1)) {
$output[$key1][$i] = floatval($array2[$key1][$i]) + floatval($array1[$key1][$i]);
} else {
$output[$key1][$i] = $array1[$key1][$i];
}
}
}
return $output;
}
}
// </eof> //
?>
Hope it helps all :)
fgetcsv
(PHP 4, PHP 5)
fgetcsv — 从文件指针中读入一行并解析 CSV 字段
说明
array fgetcsv
( int $handle
[, int $length
[, string $delimiter
[, string $enclosure
]]] )
- handle
- 一个由 fopen()、popen() 或 fsockopen() 产生的有效文件指针。
- length (可选)
- 必须大于 CVS 文件内最长的一行。在 PHP 5 中该参数是可选的。如果忽略(在 PHP 5.0.4 以后的版本中设为 0)该参数的话,那么长度就没有限制,不过可能会影响执行效率。
- delimiter (可选)
- 设置字段分界符(只允许一个字符),默认值为逗号。
- enclosure (可选)
- 设置字段环绕符(只允许一个字符),默认值为双引号。该参数是在 PHP 4.3.0 中添加的。
和 fgets() 类似,只除了 fgetcsv() 解析读入的行并找出 CSV 格式的字段然后返回一个包含这些字段的数组。
fgetcsv() 出错时返回 FALSE,包括碰到文件结束时。
Note: CSV 文件中的空行将被返回为一个包含有单个 null 字段的数组,不会被当成错误。
Example#1 读取并显示 CSV 文件的整个内容
<?php
$row = 1;
$handle = fopen("test.csv","r");
while ($data = fgetcsv($handle, 1000, ",")) {
$num = count($data);
echo "<p> $num fields in line $row: <br>\n";
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c] . "<br>\n";
}
}
fclose($handle);
?>
从 PHP 4.3.5 起,fgetcsv() 的操作是二进制安全的。
Note: 该函数对区域设置是敏感的。比如说 LANG 设为 en_US.UTF-8 的话,单字节编码的文件就会出现读取错误。
Note: 如果碰到 PHP 在读取文件时不能识别 Macintosh 文件的行结束符,可以激活 auto_detect_line_endings 运行时配置选项。
fgetcsv
tanthalas at magickfox dot org
30-May-2008 10:33
30-May-2008 10:33
Verlustmeldung at gmx dot de
24-May-2008 03:59
24-May-2008 03:59
i wrote a decent litte class which is able to import excel 2003 .csv files kickass fast and easy. havent tested it with other excel versions, but they should work as well.
<?php
class read_csv {
function read_csv () {
// nothing
}
function read_csv_run($f="") {
if ( $f AND is_file($f) ) {
// set excel type delimiter, etc
$delimiter = ';';
$enclosure = '"';
// read file & parse
$input = file($f);
$csv = array();
foreach ( $input as $key => $value ) {
// rtrim crap at the end of the string
$tmp = explode($delimiter,rtrim($value));
// parse
$in_quote = false;
$arr = array();
foreach ( $tmp as $key => $value ) {
if ( $in_quote ) {
if ( $this->read_csv_has_quote($value,$enclosure) ) {
$in_quote = false;
$value = substr_replace($value,'',-1,1);
}
$key = (count($arr)-1);
$arr[$key] .= $delimiter.$value; // continue last array element
} else {
if ( $this->read_csv_has_quote($value,$enclosure) ) {
$in_quote = true;
$value = substr_replace($value,'',0,1);
} else if ( substr($value,0,1) == $enclosure AND substr($value,-1,1) == $enclosure ) {
// string is quoted, remove quotes
$value = substr_replace($value,'',0,1); // start
$value = substr_replace($value,'',-1,1); // end
}
$arr[] = $value; // append to array
}
}
foreach ( $arr as $key => $value ) {
$arr[$key] = str_replace($enclosure.$enclosure,$enclosure,$value);
}
// append to array
$csv[] = $arr;
} // end foreach
echo nl2br(print_r($csv,1));
} // end if
} // end func
function read_csv_has_quote ($str="",$enc="") {
$c = substr_count($str,$enc);
if ( stristr(($c/2),".") ) {
return true;
}
}
} // end class
$csv =& new read_csv();
$csv->read_csv_run("katalog_excel_D.csv");
?>
mfg RR
skirkendall at NOSPAM dot dsl-only dot net
01-May-2008 09:55
01-May-2008 09:55
The array_flip() function is handy for converting column names to column numbers. Assuming the first row contains column names, you can simply read it via fgetcsv(); this will give you a number-indexed array of column names. Applying array_flip() converts that into a name-indexed array of column numbers.
The following example does this, and assumes that two of the columns are named "animal" and "sound" but does not make any assumption about where those columns are.
$fp = fopen($url, "r");
$names = array_flip(fgetcsv($fp, 1000));
while (($values = fgetcsv($fp, 1000)) !== FALSE) {
print "The ".$values[$names["animal"]]." says ".$values[$names["sound"]].".\n";
}
fclose($fp);
Philipp
20-Apr-2008 02:31
20-Apr-2008 02:31
With this modification the last item will be added to the array: "a","b","c" is transformed to array("a","b","c") - old function returned array("a","b")
<?php
/*
Modified function from user comment by Marcos Boyington / 06-Mar-2008 03:08
This is a pretty useful update/modification to the fgetcsv function, which allows for:
* Multiple-character/multibyte delim/enclosure/escape
* Multibyte values
* Escape character specification in < PHP5
* Escape character = delim character
* Direct reading from files without bloating memory too much
*/
define('BUFFER_READ_LEN', 4096);
function fgetcsv_ex($file_handle, $delim = ',', $enclosure = '"', $escape = '"') {
$fields = null;
$fldCount = 0;
$inQuotes = false;
$complete = false;
$search_chars_list = array('\r\n', '\n', '\r');
if ($delim && ($delim != ''))
$search_chars_list[] = $delim;
if ($enclosure && ($enclosure != '')) {
$search_chars_list[] = $enclosure;
$enclosure_len = strlen($enclosure);
} else
$enclosure_len = 0;
if ($escape && ($escape != '')) {
$search_chars_list[] = $escape;
$escape_len = strlen($escape);
} else
$escape_len = 0;
$search_regex = '/' . implode('|', $search_chars_list) . '/';
$cur_pos = 0;
$line = '';
$cur_value = '';
$in_value = false;
$last_value = 0;
while (! $complete) {
$read_result = fread($file_handle, BUFFER_READ_LEN);
if ($read_result) {
$line .= $read_result;
} else if (strlen($line) == 0) {
return null;
} else {
$line .= "\n";
}
$line_len = strlen($line);
while (true) {
if (! preg_match($search_regex, $line, $matches, PREG_OFFSET_CAPTURE, $cur_pos)) {
if ($read_result) {
// need more chars
break;
} else {
// Incomplete file
return null;
}
} else {
$non_escape = false;
$cur_char = $matches[0][0];
$cur_len = strlen($cur_char);
$new_pos = $matches[0][1];
if (($enclosure == $escape) && $in_value && ($cur_char == $escape)) {
// Escape char = enclosure char special handling
if (($new_pos + $cur_len + $enclosure_len) >= $line_len) {
// We need the next char
break;
}
$next_char = substr($line, $new_pos + $cur_len, $enclosure_len);
if ((! $enclosure) || ($next_char != $enclosure)) {
$non_escape = true;
}
}
$cur_pos = $new_pos;
if ($in_value && (! $non_escape)) {
$cur_value .= mb_substr($line, $last_value, $cur_pos - $last_value);
if ($cur_char == $escape) {
// Skip escape char
$cur_pos += $escape_len;
}
$last_value = $cur_pos;
} else if (($cur_char == "\n") || ($cur_char == "\r") || ($cur_char == "\r\n")) {
$blank_start_lines = ($cur_pos == 0);
++$cur_pos;
$cur_pos = $cur_pos + strspn($line, "\n\r", $cur_pos);
if (! $blank_start_lines) {
$complete = true;
} else {
$last_value = $cur_pos;
continue;
}
}
if ($cur_char == $delim || $complete) {
if (is_null($fields)) {
$fields = array();
}
$fields[] = $cur_value . trim(mb_substr($line, $last_value, $cur_pos - $last_value));
$last_value = $cur_pos + $cur_len;
$cur_value = '';
} else if ($cur_char == $enclosure) {
if ($in_value) {
$cur_value .= mb_substr($line, $last_value, $cur_pos - $last_value);
}
$last_value = $cur_pos + $cur_len;
$in_value = ! $in_value;
}
if ($complete) {
break;
}
$cur_pos += $cur_len;
}
}
}
fseek($file_handle, $cur_pos - strlen($line), SEEK_CUR);
return $fields;
}
?>
ohira atto web dotto de
08-Nov-2007 05:59
08-Nov-2007 05:59
Yet another tool to parse CSV data into a associated 2d array. However, when within quotes, newline characters are treated as data instead of syntax.
<?php
define('LF', "\n");
// Parse a CSV data to a associated 2D array
function csvToArray($data)
{
// output
$csv = array();
$line = array();
$fieldnames = array();
$got_fieldnames = false;
$escaped = false; // Flag: escape char
$quoted = false; // Flag: quoted string
$buffer = ''; // Buffer (quoted values)
$junk = ''; // Junk buffer (unquoted values)
$fieldname_index = 0;
for($i = 0; $i < strlen($data); $i++)
{
$char = $data[$i];
if($quoted)
{
if(($char == '\\') && ($escaped === false))
{
// Set flags
$escaped = true;
}
elseif(($char == '"') && ($escaped === false))
{
// Set flags
$quoted = false;
$escaped = false;
}
else
{
// Add char to buffer
$buffer .= $char;
// Set flags
$escaped = false;
}
}
else
{
if($char == LF) // Start a new line
{
if(strlen($buffer) > 0)
{
// Add buffer to line
if($got_fieldnames)
{
$line[$fieldnames[$fieldname_index]] = $buffer;
$fieldname_index++;
}
else
{
$fieldnames[] = $buffer;
}
// Clear buffer
$buffer = '';
}
else
{
$junk = trim($junk);
// Add junk to line (possible unquoted values?)
if($got_fieldnames)
{
$line[$fieldnames[$fieldname_index]] = $junk;
$fieldname_index++;
}
else
{
$fieldnames[] = $junk;
}
}
// Clear junk
$junk = '';
// Add line to CSV
if($got_fieldnames)
{
$csv[] = $line;
}
$got_fieldnames = true;
// Clear line
$line = array();
$fieldname_index = 0;
}
elseif($char == '"') // Start new value
{
// Set flags
$quoted = true;
}
elseif($char == ';')
{
if(strlen($buffer) > 0)
{
// Add buffer to line
if($got_fieldnames)
{
$line[$fieldnames[$fieldname_index]] = $buffer;
$fieldname_index++;
}
else
{
$fieldnames[] = $buffer;
}
// Clear buffer
$buffer = '';
}
else
{
$junk = trim($junk);
// Add junk to line (possible unquoted values?)
if($got_fieldnames)
{
$line[$fieldnames[$fieldname_index]] = $junk;
$fieldname_index++;
}
else
{
$fieldnames[] = $junk;
}
}
// Clear junk
$junk = '';
}
else // Add to junk char
{
$junk .= $char;
}
}
}
return $csv;
}
?>
Tim Henderson
04-Oct-2007 09:40
04-Oct-2007 09:40
Only problem with fgetcsv(), at least in PHP 4.x -- any stray slash in the data that happens to come before a double-quote delimiter will break it -- ie, cause the field delimiter to be escaped. I can't find a direct way to deal with it, since fgetcsv() doesn't give you a chance to manipulate the line before it reads it and parses it...I've had to change all occurrences of '\"' to '" in the file first before feeding ot to fgetcsv(). Otherwise this is perfect for that Microsoft-CSV formula, deals gracefully with all the issues.
marcus at synchromedia dot co dot uk
04-Oct-2007 12:44
04-Oct-2007 12:44
This is a minor fix to mortanon@gmail.com's CSVIterator. The original version would die if the last line of a file did not end in a line break and you called valid() inside the iterator loop because the file would have already been closed and thus feof() would have an invalid file pointer param.
<?php
/**
* @author mortanon@gmail.com
* @link http://uk.php.net/manual/en/function.fgetcsv.php
*/
class CsvIterator implements Iterator {
const ROW_SIZE = 4096;
/**
* The pointer to the cvs file.
* @var resource
* @access private
*/
private $filePointer = NULL;
/**
* The current element, which will
* be returned on each iteration.
* @var array
* @access private
*/
private $currentElement = NULL;
/**
* The row counter.
* @var int
* @access private
*/
private $rowCounter = NULL;
/**
* The delimiter for the csv file.
* @var str
* @access private
*/
private $delimiter = NULL;
/**
* This is the constructor.It try to open the csv file.The method throws an exception
* on failure.
*
* @access public
* @param str $file The csv file.
* @param str $delimiter The delimiter.
*
* @throws Exception
*/
public function __construct($file, $delimiter=',') {
try {
$this->filePointer = fopen($file, 'r');
$this->delimiter = $delimiter;
}
catch (Exception $e) {
throw new Exception('The file "'.$file.'" cannot be read.');
}
}
/**
* This method resets the file pointer.
*
* @access public
*/
public function rewind() {
$this->rowCounter = 0;
rewind($this->filePointer);
}
/**
* This method returns the current csv row as a 2 dimensional array
*
* @access public
* @return array The current csv row as a 2 dimensional array
*/
public function current() {
$this->currentElement = fgetcsv($this->filePointer, self::ROW_SIZE, $this->delimiter);
$this->rowCounter++;
return $this->currentElement;
}
/**
* This method returns the current row number.
*
* @access public
* @return int The current row number
*/
public function key() {
return $this->rowCounter;
}
/**
* This method checks if the end of file is reached.
*
* @access public
* @return boolean Returns true on EOF reached, false otherwise.
*/
public function next() {
if (is_resource($this->filePointer)) {
return !feof($this->filePointer);
}
return false;
}
/**
* This method checks if the next row is a valid row.
*
* @access public
* @return boolean If the next row is a valid row.
*/
public function valid() {
if (!$this->next()) {
if (is_resource($this->filePointer)) {
fclose($this->filePointer);
}
return false;
}
return true;
}
}
?>
daevid at daevid dot com
26-Sep-2007 03:39
26-Sep-2007 03:39
A much simpler way to map the heading/column names to the elements on each line. It also doesn't fill up one big array which could cause you to run out of memory on large datasets. This loads one at a time so you can process/insert to db/etc...
$handle = fopen('somefile.csv', 'r');
if ($handle)
{
set_time_limit(0);
//the top line is the field names
$fields = fgetcsv($handle, 4096, ',');
//loop through one row at a time
while (($data = fgetcsv($handle, 4096, ',')) !== FALSE)
{
$data = array_combine($fields, $data);
}
fclose($handle);
}
jszatmary at hotmail dot com
21-Aug-2007 10:06
21-Aug-2007 10:06
This function appears to assume that \" is an escaped quote - similar to "" - which may lead to incorrect results while reading some files. Found while running under PHP 5.1.6.
myrddin at myrddin dot myrddin
22-Jun-2007 04:16
22-Jun-2007 04:16
RE post by:- stinkyj at gmail dot com
02-Aug-2006 10:15
the enclosure param defaulting to " and giving a warning if it's an empty string makes this function nearly worthless. csv files do not always have the fields enclosed, and in those cases it doesn't work.
---------
I had the same problem with this as well, enclosure really should be possible to be made null.
However, perhaps a solution to the problem is to use "\n" as the enclosure character in fgetcsv. As far as I tested it seems to work out just fine. I was thinking of using "\0" but that may cause problems with some data files. If anyone knows of any issues that might crop up when using "\n" as enclosure, please post away. Thanks.
e at osterman dot com
13-Jun-2007 05:39
13-Jun-2007 05:39
A 5.2 way to lazily parse a single CSV line
function parseCSV($str, $delimiter = ',', $enclosure = '"', $len = 4096)
{
$fh = fopen('php://memory', 'rw');
fwrite($fh, $str);
rewind($fh);
$result = fgetcsv( $fh, $len, $delimiter, $enclosure );
fclose($fh);
return $result;
}
D Steer
11-Jun-2007 10:32
11-Jun-2007 10:32
Here is a simple to include the field names in the array. Altough this is very simple, it does the job fantastically
<?php
print_r(buildStock('stock.csv'));
function buildStock($File) {
$handle = fopen($File, "r");
$fields = fgetcsv($handle, 1000, ",");
while($data = fgetcsv($handle, 1000, ",")) {
$detail[] = $data;
}
$x = 0;
$y = 0;
foreach($detail as $i) {
foreach($fields as $z) {
$stock[$x][$z] = $i[$y];
$y++;
}
$y = 0;
$x++;
}
return $stock;
}
?>
anykey
24-May-2007 04:40
24-May-2007 04:40
final version...
<?php
private function parseCsvLine($str) {
$delimier = ';';
$qualifier = '"';
$qualifierEscape = '\\';
$fields = array();
while (strlen($str) > 0) {
if ($str{0} == $delimier)
$str = substr($str, 1);
if ($str{0} == $qualifier) {
$value = '';
for ($i = 1; $i < strlen($str); $i++) {
if (($str{$i} == $qualifier) && ($str{$i-1} != $qualifierEscape)) {
$str = substr($str, (strlen($value) + 2));
$value = str_replace(($qualifierEscape.$qualifier), $qualifier, $value);
break;
}
$value .= $str{$i};
}
} else {
$end = strpos($str, $delimier);
$value = ($end !== false) ? substr($str, 0, $end) : $str;
$str = substr($str, strlen($value));
}
$fields[] = $value;
}
return $fields;
}
?>
02-May-2007 05:07
a flexible parser that can be used for csv or tsv (or any delimited flatfile data source).
<?php
/* assumes a single line of input; automatically determines the number of fields */
function parse_line($input_text, $delimiter = ',', $text_qualifier = '"') {
$text = trim($input_text);
if(is_string($delimiter) && is_string($text_qualifier)) {
$re_d = '\x' . dechex(ord($delimiter)); //format for regexp
$re_tq = '\x' . dechex(ord($text_qualifier)); //format for regexp
$fields = array();
$field_num = 0;
while(strlen($text) > 0) {
if($text{0} == $text_qualifier) {
preg_match('/^' . $re_tq . '((?:[^' . $re_tq . ']|(?<=\x5c)' . $re_tq . ')*)' . $re_tq . $re_d . '?(.*)$/', $text, $matches);
$value = str_replace('\\' . $text_qualifier, $text_qualifier, $matches[1]);
$text = trim($matches[2]);
$fields[$field_num++] = $value;
} else {
preg_match('/^([^' . $re_d . ']*)' . $re_d . '?(.*)$/', $text, $matches);
$value = $matches[1];
$text = trim($matches[2]);
$fields[$field_num++] = $value;
}
}
return $fields;
} else {
return false;
}
}
?>
Bob
29-Apr-2007 07:19
29-Apr-2007 07:19
Thank you to the mystery contributor of csv_string_to_array function:
http://uk3.php.net/manual/en/function.fgetcsv.php#62524
This works great when your CSV data has literal commas inside enclosures that you want to preserve, fgetcsv fails at this & interprets comma as end of item even without ending enclosure.
r at smagoo dot ch
11-Apr-2007 10:00
11-Apr-2007 10:00
If you had a problem with fgetcsv and multibyte characters, you have to set the correct local setting:
<?php
setlocale(LC_ALL, 'en_US.UTF-8');
?>
Change it to your local settings and/or charset.
pinkgothic at gmail dot com
03-Apr-2007 09:47
03-Apr-2007 09:47
I find the documentation mildly misleading:
fgetcsv() does not - as this documentation seemingly claims in the descriptive line - get a line out of the file (via the file pointer) and then parses this for CSV fields, but instead retrieves a CSV row out of the file, which it then splits into an array.
The difference may seem trivial, but reading the description of this function I feared it might not support linebreaks in individual CSV values. Testing, however, revealed that fgetcsv() [fortunately!] works as one would expect from a CSV parser, and my fears were without cause.
In fact, fgetcsv() is remarkably hard to break. It's not confused by the value """,""" for example (three quotation marks followed by a comma followed by three quotation marks - which represents the value "quotation mark, comma, quotation mark" in case it's not immediately obvious).
I hope this extra documentation is helpful for someone.
eoj at seznam dot cz
13-Mar-2007 04:09
13-Mar-2007 04:09
I had a problem with fgetcsv and multibyte characters so i used one of functions below (16-Nov-2002 04:01 to be specific) and modified it to be (hopefully) multibyte safe.
<?php
/**
* @param the csv line to be split
* @param the delimiter to split by (default ';' )
* @param if this is false, the quotation marks won't be removed from the fields (default true)
*/
function mb_csv_split<