Dev 4
Dev 4
Dev 4
Cinchoo
Rate this:
Features:
Ultra-fast CSV Reader, parses CSV file with 15 columns / 500,000 records in 15
seconds.
Stream based parsers allow for ultimate performance, low resource usage, and nearly
unlimited versatility scalable to any size data file, even tens or hundreds of
gigabytes.
Event based data manipulation and validation allows total control over the flow of
data during the bulk insert process.
Follows CSV standard file rules. Gracefully handles data fields that contain commas
and line breaks.
In addition to comma, most delimiting characters can be used, including tab
delimited fields.
Exposes IEnumarable list of objects - which is often used with LINQ query for
projection, aggregation and filtration etc.
Supports deferred reading.
Supports processing files with culture specific date, currency and number formats.
Supports different character encoding.
Recognizes a wide variety of date, currency, enum, boolean and number formats when
reading files.
Provides fine control of date, currency, enum, boolean, number formats when writing
files.
Detailed and robust error handling, allowing you to quickly find and fix problems.
Performance:
This framework was tested with the test files containing 15 fields with different
number of records (250,000, 500,000, 1M, 1.5M, 2M, 2.5M etc). The graph below shows
the total time taken to load each file. The test was performed in Intel Core i5 /
8GB RAM machine.
Image 1
2. Requirement
This framework library is written in C# using .NET 4.5 Framework / .NET core 2.x.
Delimiter - The value used to separate the fields in a CSV row. Default is
Culture.TextInfo.ListSeparator used.
EOLDelimiter - The value used to separate CSV rows. Default is \r\n (NewLine).
CultureName - The culture name (ex. en-US, en-GB) used to read and write CSV data.
IgnoreEmptyLine - A flag to let the reader know if a record should be skipped when
reading if it's empty. A record is considered empty if all fields are empty.
Comments - The value used to denote a line that is commented out. Multiple comments
can be specified. Must be separated by comma.
QuoteChar - The value used to escape fields that contain a delimiter, quote, or
line ending.
QuoteAllFields - N/A for reader.
Encoding - The encoding of the CSV file.
HasExcelSeperator - N/A for reader. Reader seamlessly recognize the excel separator
if specified in the CSV file and use them for parsing.
ColumnCountStrict - This flag indicates if an exception should be thrown if reading
an expected field is missing.
ColumnOrderStrict - This flag indicates if an exception should be thrown if reading
an expected field is in wrong position in the file. This check will be performed
only when ColumnCountStrict is true.
BufferSize - The size of the internal buffer that is used when reader is from the
StreamReader.
NullValue - Special null value text expect to be treated as null value from CSV
file at the record level.
ErrorMode - This flag indicates if an exception should be thrown if reading and an
expected field is failed to load. This can be overridden per property. Possible
values are:
IgnoreAndContinue - Ignore the error, record will be skipped and continue with
next.
ReportAndContinue - Report the error to POCO entity if it is of
IChoNotifyRecordRead type
ThrowAndStop - Throw the error and stop the execution
IgnoreFieldValueMode - A flag to let the reader know if a record should be skipped
when reading if it's empty / null. This can be overridden per property. Possible
values are:
Null - N/A
DBNull - N/A
Empty - skipped if the record value is empty
WhiteSpace - skipped if the record value contains only whitespaces
ObjectValidationMode - A flag to let the reader know about the type of validation
to be performed with record object. Possible values are:
Off - No object validation performed.
MemberLevel - Validation performed at the time of each CSV property gets loaded
with value.
ObjectLevel - Validation performed after all the properties are loaded to the POCO
object.
7. Customize CSV Header
If the CSV file has header, you can instruct the POCO entity by using
ChoCSVFileHeaderAttribute.
FieldPosition - When mapping by position, you specify the index of the CSV column
that you want to use for that property. It is 1 based.
FieldName - When mapping by name, you specify the name of the CSV column that you
want to use for that property. For this to work, the CSV file must have a header
record. The name you specify must match with the name of the header record.
FillChar - N/A for reader.
FieldValueJustification - N/A for reader.
FieldValueTrimOption - This flag tells the reader to trim whitespace from the
beginning and ending of the field value when reading. Possible values are Trim,
TrimStart, TrimEnd.
Truncate - Truncate field value if over maximum field size.
Size - Maxiimum field value size of field.
QuoteField - A flag that tells the reader that the CSV column value is surrounded
by quotes.
NullValue - Special null value text expect to be treated as null value from CSV
file at the field level.
FormatText - Specify custom format specifier to parse the CSV field values.
ErrorMode - This flag indicates if an exception should be thrown if reading and an
expected field failed to load. Possible values are:
IgnoreAndContinue - Ignore the error and continue to load other properties of the
record.
ReportAndContinue - Report the error to POCO entity if it is of IChoRecord type.
ThrowAndStop - Throw the error and stop the execution.
IgnoreFieldValueMode - A flag to let the reader know if a record should be skipped
when reading if it's empty / null. Possible values are:
Null - N/A
DBNull - N/A
Empty - skipped if the record value is empty.
WhiteSpace - skipped if the record value contains only whitespaces.
8.1. DefaultValue
It is the value used and set to the property when the CSV value is empty or
whitespace (controlled via IgnoreFieldValueMode).
Any POCO entity property can be specified with default value using
System.ComponentModel.DefaultValueAttribute.
8.2. ChoFallbackValue
It is the value used and set to the property when the CSV value failed to set.
Fallback value only set when ErrorMode is either IgnoreAndContinue or
ReportAndContinue.
Any POCO entity property can be specified with fallback value using
ChoETL.ChoFallbackValueAttribute.
There are couple of ways you can specify the converters for each field
Declarative Approach
Configuration Approach
8.3.1. Declarative Approach
This model is applicable to POCO entity object only. If you have POCO class, you
can specify the converters to each property to carry out necessary conversion on
them. Samples below shows the way to do it.
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name",
2));
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name1",
2));
In above, we construct and attach the IntConverter to 'Id' field using AddConverter
helper method in ChoCSVRecordFieldConfiguration object.
Likewise, if you want to remove any converter from it, you can use RemoveConverter
on ChoCSVRecordFieldConfiguration object.
Sometime you may want override the defined declarative validation behaviors comes
with POCO class, you can do with Cinchoo ETL via configuration approach. The sample
below shows the way to override them.
writer.Flush();
stream.Position = 0;
object rec;
while ((rec = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", rec.Id));
Console.WriteLine(String.Format("Name: {0}", rec.Name));
Console.WriteLine(String.Format("Salary: {0}", rec.Salary));
}
}
}
Sample below shows how to use the BeforeRecordLoad callback method to skip lines
stating with '%' characters.
[MetadataType(typeof(EmployeeRecMeta))]
public partial class EmployeeRec
{
public int Id { get; set; }
public string Name { get; set; }
}
Sample below shows how to attach Metadata class for sealed or third party POCO
class by using ChoMetadataRefTypeAttribute on it.
TIP: If you want to skip the line from loading, set the source to null.
TIP: If you want to take control of parsing and loading the record properties by
yourself, set the source to String.Empty.
Return true to continue the load process, otherwise return false to stop the
process.
Return true to continue the load process, otherwise return false to stop the
process.
Return true to continue the load process, otherwise return false to stop the
process.
Return true to continue the load process, otherwise return false to stop the
process.
Return true to continue the load process, otherwise return false to stop the
process.
CSVReader looks for FallbackValue value of each CSV property. If present, it tries
to assign its value to it.
If the FallbackValue value not present and the Configuration.ErrorMode is specified
as ReportAndContinue., this callback will be executed.
Return true to continue the load process, otherwise return false to stop the
process.
If you have CSV file, you can parse and load the file with minimal/zero
configuration. If the CSV file does not have header record line, the parser
automatically names the columns as Column1, Column2, etc.
13.1. DefaultValue
It is the value used and set to the property when the CSV value is empty or
whitespace (controlled via IgnoreFieldValueMode).
Any POCO entity property can be specified with default value using
System.ComponentModel.DefaultValueAttribute.
For dynamic object members or to override the declarative POCO object member's
default value specification, you can do so through configuration as shown below.
Any POCO entity property can be specified with fallback value using
ChoETL.ChoFallbackValueAttribute.
For dynamic object members or to override the declarative POCO object member's
fallback values, you can do through configuration as shown below.
In the dynamic object model, you can specify these converters via configuration.
See below example on the approach taken to specify type converters for CSV columns
Likewise, if you want to remove any converter from it, you can use RemoveConverter
on ChoCSVRecordFieldConfiguration object.
13.5. Validations
CSVReader leverages both System.ComponentModel.DataAnnotations and Validation Block
validation attributes to specify validation rules for individual CSV fields. Refer
to the MSDN site for a list of available DataAnnotations validation attributes.
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name",
2));
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name1",
2));
In example above, we used Range validation attribute for Id property. CSVReader
performs validation on them during load based on Configuration.ObjectValidationMode
is set to ChoObjectValidationMode.MemberLevel or
ChoObjectValidationMode.ObjectLevel.
Is there a way to extend the sealed third party object with CSV specification? Yes,
Cinchoo provides model to extend them by creating metadata class with
ChoMetadataRefTypeAttribute. Cinchoo auto discover it and load the specs from this
special class.
[ChoCSVFileHeader]
[ChoCSVRecordObject(Encoding = "Encoding.UTF32", ErrorMode =
ChoErrorMode.ThrowAndStop,
IgnoreFieldValueMode = ChoIgnoreFieldValueMode.All, ThrowAndStopOnMissingField =
false,
ObjectValidationMode = ChoObjectValidationMode.MemberLevel)]
public class EmployeeRecMeta : IChoNotifyRecordRead, IChoValidatable
{
[ChoCSVRecordField(1, FieldName = "id", ErrorMode =
ChoErrorMode.ReportAndContinue )]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, 1, ErrorMessage = "Id must be > 0.")]
[ChoFallbackValue(1)]
public int Id { get; set; }
Manual Configuration
Auto Map Configuration
Attaching MetadataType class
I'm going to show you how to configure the below POCO entity class on each approach
config.FileHeaderConfiguration.HasHeaderRecord = true;
config.ThrowAndStopOnMissingField = true;
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Id",
1));
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name",
2));
First define a schema class for EmployeeRec POCO entity class as below
This model, accounts for everything by defining MetadataType class and specifying
the CSV configuration parameters declaratively. This is useful when your POCO
entity is sealed and not partial class. Also it is one of favorable and less error-
prone approach to configure CSV parsing of POCO entity.
There are 2 sets of format specs members given to each intrinsic type, one for
loading and another one for writing the value, except for Boolean, Enum, DataTime
types. These types have only one member for both loading and writing operations.
NumberStyles (optional) used for loading values from CSV stream and Format string
are used for writing values to CSV stream.
In this article I'll brief about using NumberStyles for loading CSV data from
stream. These values are optional. It determines the styles permitted for each type
during parsing of CSV file. System automatically figures out the way to parse and
load the values from underlying Culture. In odd situation, you may want to override
and set the styles the way you want in order to successfully load the file. Refer
the MSDN for more about NumberStyles and its values.
Sample below shows how to use ChoCurrency CSV field in POCO entity class.
You can use any valid standard or custom datetime .NET format specification to
parse the datetime CSV values from the file.
Note: As the datetime values contains CSV separator, it is given with double quotes
to pass the parsing.
21.1 WithDelimiter
This API method sets the CSV field separator on CSVReader.
In dynamic object model, all the CSV columns will be created and parsed as string
type.
Sample below shows how to specify the encoding to CSVReader configuration object to
let the reader handles the opening of the file with passed encoding value.
22.5. How to handle column headers having spaces or special characters in them?
CSVReader can handle this situation gracefully, load the CSV file without any
issues. When you use dynamic parser, the column names with special characters are
replaced with _ character.
ex.
Now you can access members by original names through indexers as shown below
Alternatively, you can provide column name overrides via configuration as below to
access them by name
When using POCO class for parsing, define the members with CSV column by position
using ChoCSVRecordFieldAttribute. In below, we defined Memo1 object member to map
to duplicate Memo column.
Nested CSV:
bcp.BatchSize = 10000;
bcp.BulkCopyTimeout = 0;
bcp.NotifyAfter = 10;
bcp.SqlRowsCopied += delegate (object sender, SqlRowsCopiedEventArgs e)
{
Console.WriteLine(e.RowsCopied.ToString("#,##0") + " rows copied.");
};
bcp.WriteToServer(dr);
}
}
22.15. What is the best way to get column names from CSV file?
The header is stored in the CSVReader.Context object. In order to get the headers,
you must read the first record. Sample show how to get the columns names of the CSV
file
Emp1.csv
Emp.csv
Lets say, DataSet1 is created with Employees typed datatable. Sample below shows
how to load the CSV to it.
Date,Count
1/1/2012,1
2/1/2012,3
3/1/2012,10
4/2/2012,6
Define POCO classes matching the records in the CSV file as below
if (kvp.Item1 < 4)
return typeof(LocationDef);
else
return typeof(CountDef);
}
)
)
{
foreach (var rec in p)
Console.WriteLine(ChoUtility.Dump(rec));
}
22.32. How to validate CSV file?
You may have requirement to take the CSV file, validate them before doing any other
operation on it (ex. saving to db). Cinchoo gives API to do validation on it. It
leverages DataAnnonations/Validation library to perform the CSV validation. Sample
below shows how to do it.
If a string sort will work for you, then just use the Windows SORT command. Sort
the file and be done with it.
If you need to filter and convert the file, specifically the date / custom type
fields, then you would need to write a small conversion program that converts the
fields whatever you like, and rewrites the record. This is where Cinchoo ETL
framework places a role to ease up the conversion of large CSV file the way you
want with all possible requirements with minimal memory footprint and ultra-fast
performance.
In a POCO opt-in model, you must define the class with all the members with
ChoCSVRecordFieldAttrbute as below
Here are the possible standard formatting specs available to use (case-insensitive
comparision is performed on the bool values)
Sample below shows to handle such values and load the file successfully using
dynamic model
Hide Copy Code
using (var p = new ChoCSVReader(new StringReader(csv))
.WithFirstLineHeader()
.WithField("Id", fieldType: typeof(int))
.WithField("Date", fieldType: typeof(DateTime), formatText: "yyyyMMdd")
.WithField("IsActive", fieldType: typeof(bool), formatText: "A")
)
{
foreach (var rec in p)
Console.WriteLine(rec.Dump());
}
In a POCO opt-out model, you have to draft class as below
In a POCO opt-in model, you must define the class with all the members with
ChoCSVRecordFieldAttrbute as below
public StudentInfo()
{
Courses = new Course[2];
}
}
public class Course
{
[DisplayName("CreId")]
public string CourseId { get; set; }
[DisplayName("CreName")]
public string CourseName { get; set; }
}
In above, define Courses as array of Course object with RangeAttribute to specify
the number of occurances (in this case 2 items [0, 1]). Must intialize Courses with
array of objects.
Here is the sample code to load the CSV into StudentInfo objects
Hide Copy Code
string csv = @"Id, Name, CreId_0, CreName_0, CreId_1,
CreName_1,Grade_1,Grade_2,Grade_3
1, Tom, CI0, CN0, CI1, CN1,A,B,C
2, Mark, CI20, CN20, CI21, CN21,A,B,C
";
using (var r = ChoCSVReader<StudentInfo>.LoadText(csv)
.WithFirstLineHeader()
)
{
foreach (var rec in r)
{
Console.WriteLine(rec.Dump());
}
}
Default array separator is '_'. If your CSV file comes with different array
separator, you can specify them in Configuration.ArrayIndexSeparator.
If the CSV file is excel aware file, you can parse them using ChoCSVReader to
handle these values out of the box. There are couple of ways you can instruct the
parser.
}
23.40. Does ChoCSVReader supports multiline header?
In a rare case, some CSV file may come with multi-line header. ChoCSVReader handle
it seamlessly. Sample CSV below with multiline csv header at line 1-2
Then you can consume the CSV file, ChoCSVReader will take care of plumbing work to
load the data into the object appropriately.
23. ChoTSVReader
ChoTSVReader is specialized version of CSV reader, used to parse TSV (Tab
Seperated) files.
25. References
Here are the list of other modules exposed by Cinchoo ETL for your reference to
take look at it
1. CSV Reader / Writer
CSV Reader
CSV Writer
2. Fixed Length Reader / Writer
Xml Reader
Xml Writer
4. JSON Reader / Writer
JSON Reader
JSON Writer - Coming soon
5. Key-Value Reader / Writer
License
This article, along with any associated source code and files, is licensed under
The Code Project Open License (CPOL)
Share
About the Author
Cinchoo
United States United States
No Biography provided
First PrevNext
General
My vote of 5 Pin Member manosabari 3-May-20 18:47
General
My vote of 3 Pin Member David A. Gray 9-Mar-20 20:35
General
Re: My vote of 3 Pin professional Chris Solutions 4-May-20 18:01
Question
Dynamic Column not matching correctly Pin Member Dannoman1234 30-Dec-19
19:18
Answer
Re: Dynamic Column not matching correctly Pin Member Cinchoo 26-Jan-20
13:16
Question
Column names... Pin Member Paul G. Scannell 11-Jul-19 8:05
Answer
Re: Column names... Pin Member Cinchoo 12-Jul-19 3:15
Question
Missing object?? Pin Member Paul G. Scannell 11-Jul-19 3:06
Answer
Re: Missing object?? Pin Member Paul G. Scannell 11-Jul-19 3:50
General
Re: Missing object?? Pin Member Cinchoo 12-Jul-19 2:56
General
Re: Missing object?? Pin Member Paul G. Scannell 13-Jul-19 2:00
General
Re: Missing object?? Pin Member Cinchoo 13-Jul-19 7:29
Question
CSV Reader and Writer downloads Pin Member Paul G. Scannell 10-Jul-19 6:24
Answer
Re: CSV Reader and Writer downloads Pin Member Paul G. Scannell 10-Jul-19
6:40
General
Re: CSV Reader and Writer downloads Pin Member Cinchoo 12-Jul-19 2:43
Suggestion
Pay attention to delimiter Pin Member Gluups 10-Jul-19 6:05
General
Re: Pay attention to delimiter Pin Member Cinchoo 12-Jul-19 2:39
General
Re: Pay attention to delimiter Pin Member Gluups 12-Jul-19 3:03
Question
Schema in mongoDB using C# Pin Member Member 14198848 14-May-19 2:24
Question
Hello, is there any way to use it with .NET 4.0? Pin Member Sergio Bombien
19-Feb-19 21:05
Question
Load CSV (headers, schema not know) to IEnumerable Or List<> Or JSON Pin Member
longnights 13-Dec-18 14:20
Answer
Re: Load CSV (headers, schema not know) to IEnumerable Or List<> Or JSON Pin Member
Cinchoo 8-Jul-19 11:11
General
Re: Load CSV (headers, schema not know) to IEnumerable Or List<> Or JSON Pin Member
Paul G. Scannell 11-Jul-19 2:29
General
Re: Load CSV (headers, schema not know) to IEnumerable Or List<> Or JSON Pin Member
Cinchoo 12-Jul-19 2:34
Answer
Csv file is locked When Using ChoCSVReader Pin Member vcj#B4kT 4-Mar-18
10:27
Last Visit: 31-Dec-99 18:00 Last Update: 12-Jun-20 4:19Refresh 123 Next �
General General News News Suggestion Suggestion Question Question Bug
Bug Answer Answer Joke Joke Praise Praise Rant Rant Admin Admin
Article
View Stats
Revisions (195)
Comments (56)
Posted 24 Oct 2016
Tagged as
C#
.NET
Stats
163.6K views
194 bookmarked
Go to top
Permalink
Advertise
Privacy
Cookies
Terms of Use
Layout: fixed | fluid
Web04 2.8.200606.1