Fast CSV Row Count using Binary Reader
Introduction
The code snippet in this article illustrates an efficient/fast row/line count algorithm using BinaryReader API.
Background
CSV file format has text data. There is no official restriction on number of rows, number of columns or file size.
Due to no restriction, to read number of lines, complete file reading is required.
Using the code
In windows operating system Line break is represented by CR LF \r \n .
The basic approach is to read all the content through streaming, and find for the Line breaks.
BinaryReader API is used for stream read. This class reads primitive data types as binary values in a specific encoding.
private static int GetLineCount(string fileName) { using (FileStream fs = File.OpenRead(fileName)) using (BinaryReader reader = new BinaryReader(fs)) { int lineCount = 0; char lastChar = reader.ReadChar(); char newChar = new char(); do { newChar = reader.ReadChar(); if (lastChar == '\r' && newChar == '\n') { lineCount++; } lastChar = newChar; } while (reader.PeekChar() != -1); return lineCount; }
Alternatives:
- Read all records at a time, and calculate the Array Length using File.ReadAllLines API. This is good for small files. For large files (>2GB) OutOfMemoryException is expected.
- StreamReader API: There are 2 options
Points of Interest
Below are some efficient CSV parsers I have come across/used.
- TextFieldParser : This is built-in .NET structured text file parser. This parser is placed in Microsoft.VisualBasic.dll library.
- KBCsv library: This is efficient, easy to use library developed by Kent Boogaart.
Posted on January 12, 2013, in .NET, Utilities and tagged .NET, C#, CSV, Windows. Bookmark the permalink. Leave a comment.
Leave a comment
Comments 0