Category Archives: Utilities

Fast CSV Row Count using Binary Reader

Introduction

The code snippet in this article illustrates an efficient/fast row/line count algorithm using BinaryReader API.

Background

CSV file format has text data. There is no official restriction on number of rows, number of columns or file size. 

RFC4180 Format

Due to no restriction, to read number of lines, complete file reading is required.

Using the code

In windows operating system Line break is represented by CR LF \r \n .

The basic approach is to read all the content through streaming, and find for the Line breaks.

BinaryReader API is used for stream read. This class  reads primitive data types as binary values in a specific encoding.

private static int GetLineCount(string fileName)
{
    using (FileStream fs = File.OpenRead(fileName))
    using (BinaryReader reader = new BinaryReader(fs))
    {
       int lineCount = 0;

       char lastChar = reader.ReadChar();
       char newChar = new char();

      do
      {
        newChar = reader.ReadChar();
        if (lastChar == '\r' && newChar == '\n')
        {
          lineCount++;
        }
        lastChar = newChar;
      } while (reader.PeekChar() != -1);
      return lineCount;
}

Alternatives:

  1. Read all records at a time, and calculate the Array Length using File.ReadAllLines API. This is good for small files. For large files (>2GB) OutOfMemoryException is expected.
  2. StreamReader API: There are 2 options
    1. using ReadLine function to read lines. This has trade-off of line to string conversion which is not needed.
    2. using Read() and Peek() method. This is similar to using BinaryReader approach but these methods return integer and not char so little bit more logic is required for character comparisons.

Points of Interest

Below are some efficient CSV parsers I have come across/used.

  1. TextFieldParser : This is built-in .NET structured text file parser. This parser is placed in Microsoft.VisualBasic.dll library.
  2. KBCsv library: This is efficient, easy to use library developed by Kent Boogaart.