This is an article to demonstrate how to handle commas in a CSV file that were enclosed between two double-quotes. I've added to this the handling of new lines or carriage returns in between a pair of double-quotes.
Why?
Our use case is that we were trying to loop through rows of a CSV file which contained addresses which in turn contained commas. Saving this as a CSV and asking Deluge to parse the data in the appropriate columns was not working as expected.
How?
The quick answer is a regex that will replace any commas between two quotes with a custom string, to be exact:
v_FormattedData = r_Data.replaceAll("(\"[^\",]+)[,]([^\"]+\")","$1|mySpecialComma|$2",false);
- v_FormattedData = r_Data.replaceAll("(\"[^\",]+)[,]([^\"]+\")","$1|mySpecialComma|$2",false);
The slightly longer answer to describe this might be better explained if you consider the following snippet of code:
// sample data v_Test = "00011,Joel Lipman,\"Flat 8, House Corner\",Brummieland"; // regex to replace a comma found between 2 double-quotes to a string of your choice v_FormattedString = v_Test.replaceAll("(\"[^\",]+)[,]([^\"]+\")","$1|mySpecialComma|$2",false); info v_FormattedString; // yields: 00011,Joel Lipman,"Flat 8|mySpecialComma| House Corner",Brummieland // split into a list (string delimited by commas) l_StringParts = v_FormattedString.toList(); // show me column 3 with myspecialcomma replaced back to a comma info l_StringParts.get(2).replaceAll("|mySpecialComma|", ",", true); // yields: Flat 8, House Corner
- // sample data
- v_Test = "00011,Joel Lipman,\"Flat 8, House Corner\",Brummieland";
- // regex to replace a comma found between 2 double-quotes to a string of your choice
- v_FormattedString = v_Test.replaceAll("(\"[^\",]+)[,]([^\"]+\")","$1|mySpecialComma|$2",false);
- info v_FormattedString;
- // yields: 00011,Joel Lipman,"Flat 8|mySpecialComma| House Corner",Brummieland
- // split into a list (string delimited by commas)
- l_StringParts = v_FormattedString.toList();
- // show me column 3 with myspecialcomma replaced back to a comma
- info l_StringParts.get(2).replaceAll("|mySpecialComma|", ",", true);
- // yields: Flat 8, House Corner
The long answer is to consider the following code which generates a sample CSV and then loops through storing each row as a data record and outputting it to screen:
// generate a sample CSV file v_DataCSV = "1,Joel Lipman,Kings Castle,England\n"; v_DataCSV = v_DataCSV + "11,General Dogsbody,\"Flat 8, House Corner, My Street\",Brummieland\n"; v_DataCSV = v_DataCSV + "64,Elephant Man,\"123 New Street,\nNew York City\",USA\n"; f_CSVfile = v_DataCSV.toFile("test.csv"); // // read and process CSV file v_FileContent = f_CSVfile.getFileContent(); l_FileRows = List(); if(!isBlank(v_FileContent)) { l_FileRows = v_FileContent.toList("\n"); } // loop through each row for each r_Data in l_FileRows { // initialize record m_Record = Map(); m_Record.put("EmployeeID",""); m_Record.put("Name",""); m_Record.put("Address",""); m_Record.put("Territory",""); // // split values by commas l_FieldValues = r_Data.toList(","); if(l_FieldValues.size()>0) { m_Record.put("EmployeeID",l_FieldValues.get(0)); } if(l_FieldValues.size()>1) { m_Record.put("Name",l_FieldValues.get(1)); } if(l_FieldValues.size()>2) { m_Record.put("Address",l_FieldValues.get(2)); } if(l_FieldValues.size()>3) { m_Record.put("Territory",l_FieldValues.get(3)); } info m_Record.toString(); }
- // generate a sample CSV file
- v_DataCSV = "1,Joel Lipman,Kings Castle,England\n";
- v_DataCSV = v_DataCSV + "11,General Dogsbody,\"Flat 8, House Corner, My Street\",Brummieland\n";
- v_DataCSV = v_DataCSV + "64,Elephant Man,\"123 New Street,\nNew York City\",USA\n";
- f_CSVfile = v_DataCSV.toFile("test.csv");
- //
- // read and process CSV file
- v_FileContent = f_CSVfile.getFileContent();
- l_FileRows = List();
- if(!isBlank(v_FileContent))
- {
- l_FileRows = v_FileContent.toList("\n");
- }
- // loop through each row
- for each r_Data in l_FileRows
- {
- // initialize record
- m_Record = Map();
- m_Record.put("EmployeeID","");
- m_Record.put("Name","");
- m_Record.put("Address","");
- m_Record.put("Territory","");
- //
- // split values by commas
- l_FieldValues = r_Data.toList(",");
- if(l_FieldValues.size()>0)
- {
- m_Record.put("EmployeeID",l_FieldValues.get(0));
- }
- if(l_FieldValues.size()>1)
- {
- m_Record.put("Name",l_FieldValues.get(1));
- }
- if(l_FieldValues.size()>2)
- {
- m_Record.put("Address",l_FieldValues.get(2));
- }
- if(l_FieldValues.size()>3)
- {
- m_Record.put("Territory",l_FieldValues.get(3));
- }
- info m_Record.toString();
- }
Without a regex solution and the replacement back to commas then the above outputs:
{"EmployeeID":"1","Name":"Joel Lipman","Address":"Kings Castle","Territory":"England"} {"EmployeeID":"11","Name":"General Dogsbody","Address":"\"Flat 8","Territory":" House Corner"} {"EmployeeID":"64","Name":"Elephant Man","Address":"\"123 New Street","Territory":""} {"EmployeeID":"New York City\"","Name":"USA","Address":"","Territory":""}
- {"EmployeeID":"1","Name":"Joel Lipman","Address":"Kings Castle","Territory":"England"}
- {"EmployeeID":"11","Name":"General Dogsbody","Address":"\"Flat 8","Territory":" House Corner"}
- {"EmployeeID":"64","Name":"Elephant Man","Address":"\"123 New Street","Territory":""}
- {"EmployeeID":"New York City\"","Name":"USA","Address":"","Territory":""}
Actions
- Do you want to get rid of the annoying escaped double-quotes that no longer have any use in our data sample? Use a non-regex replaceall:
- If you are splitting your CSV rows by "\n", then you might want to consider that not all new lines/line breaks/carriage returns are at the end of a line.... for example, maybe your CSV contains multi-line fields; eg. where address street is on a different line to address city but in the same column... The following regex will replace new lines found in between 2 double-quotes and not at the end of the line:
- Replace all commas in between a pair of double-quotes: Well I couldn't find a single regex that can do this in one go but by applying the same regex to the same value (line 29), solved this for the sample data. If you are expecting a 3rd comma in the value, then apply the regex again.
The Final Solution:
The following code is a correction to the above and replaces any commas between two double-quotes with a custom string and then replaces this back to a comma when encountered. It will also replace any new lines found between two double-quotes with a single space (note how I've added a new line in the third row in the sample data):
// generate a sample CSV file v_DataCSV = "1,Joel Lipman,Kings Castle,England\n"; v_DataCSV = v_DataCSV + "11,General Dogsbody,\"Flat 8, House Corner, Street\",Brummieland\n"; v_DataCSV = v_DataCSV + "64,Elephant Man,\"123 New Street,\nNew York City\",USA\n"; f_CSVfile = v_DataCSV.toFile("test.csv"); // // read and process CSV file v_FileContent = f_CSVfile.getFileContent(); l_FileRows = List(); if(!isBlank(v_FileContent)) { // replace the new line between two quotes with |mySpecialNewLine| v_FileContent = v_FileContent.replaceAll("(\"[^\"\n]*)[\r\n](?!(([^\"]*\"){2})*[^\"]*$)","$1|mySpecialNewLine|",false); l_FileRows = v_FileContent.toList("\n"); } // loop through each row for each r_Data in l_FileRows { // initialize record m_Record = Map(); m_Record.put("EmployeeID",""); m_Record.put("Name",""); m_Record.put("Address",""); m_Record.put("Territory",""); // // replace the comma between two quotes with |mySpecialComma| v_FormattedData = r_Data.replaceAll("(\"[^\",]+)[,]([^\"]+\")","$1|mySpecialComma|$2",false); // again if there could be another comma in the value (repeat if more commas expected) v_FormattedData = v_FormattedData.replaceAll("(\"[^\",]+)[,]([^\"]+\")","$1|mySpecialComma|$2",false); // replace any double-quotes v_FormattedData = v_FormattedData.replaceAll("\"","",true); // // split values by commas l_FieldValues = v_FormattedData.toList(","); if(l_FieldValues.size()>0) { v_ThisColumnValue = l_FieldValues.get(0).trim(); v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialComma|", ",", true); v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialNewLine|", " ", true); m_Record.put("EmployeeID",v_ThisColumnValue); } if(l_FieldValues.size()>1) { v_ThisColumnValue = l_FieldValues.get(1).trim(); v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialComma|", ",", true); v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialNewLine|", " ", true); m_Record.put("Name",v_ThisColumnValue); } if(l_FieldValues.size()>2) { v_ThisColumnValue = l_FieldValues.get(2).trim(); v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialComma|", ",", true); v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialNewLine|", " ", true); m_Record.put("Address",v_ThisColumnValue); } if(l_FieldValues.size()>3) { v_ThisColumnValue = l_FieldValues.get(3).trim(); v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialComma|", ",", true); v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialNewLine|", ", ", true); m_Record.put("Territory",v_ThisColumnValue); } info m_Record.toString(); }
- // generate a sample CSV file
- v_DataCSV = "1,Joel Lipman,Kings Castle,England\n";
- v_DataCSV = v_DataCSV + "11,General Dogsbody,\"Flat 8, House Corner, Street\",Brummieland\n";
- v_DataCSV = v_DataCSV + "64,Elephant Man,\"123 New Street,\nNew York City\",USA\n";
- f_CSVfile = v_DataCSV.toFile("test.csv");
- //
- // read and process CSV file
- v_FileContent = f_CSVfile.getFileContent();
- l_FileRows = List();
- if(!isBlank(v_FileContent))
- {
- // replace the new line between two quotes with |mySpecialNewLine|
- v_FileContent = v_FileContent.replaceAll("(\"[^\"\n]*)[\r\n](?!(([^\"]*\"){2})*[^\"]*$)","$1|mySpecialNewLine|",false);
- l_FileRows = v_FileContent.toList("\n");
- }
- // loop through each row
- for each r_Data in l_FileRows
- {
- // initialize record
- m_Record = Map();
- m_Record.put("EmployeeID","");
- m_Record.put("Name","");
- m_Record.put("Address","");
- m_Record.put("Territory","");
- //
- // replace the comma between two quotes with |mySpecialComma|
- v_FormattedData = r_Data.replaceAll("(\"[^\",]+)[,]([^\"]+\")","$1|mySpecialComma|$2",false);
- // again if there could be another comma in the value (repeat if more commas expected)
- v_FormattedData = v_FormattedData.replaceAll("(\"[^\",]+)[,]([^\"]+\")","$1|mySpecialComma|$2",false);
- // replace any double-quotes
- v_FormattedData = v_FormattedData.replaceAll("\"","",true);
- //
- // split values by commas
- l_FieldValues = v_FormattedData.toList(",");
- if(l_FieldValues.size()>0)
- {
- v_ThisColumnValue = l_FieldValues.get(0).trim();
- v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialComma|", ",", true);
- v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialNewLine|", " ", true);
- m_Record.put("EmployeeID",v_ThisColumnValue);
- }
- if(l_FieldValues.size()>1)
- {
- v_ThisColumnValue = l_FieldValues.get(1).trim();
- v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialComma|", ",", true);
- v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialNewLine|", " ", true);
- m_Record.put("Name",v_ThisColumnValue);
- }
- if(l_FieldValues.size()>2)
- {
- v_ThisColumnValue = l_FieldValues.get(2).trim();
- v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialComma|", ",", true);
- v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialNewLine|", " ", true);
- m_Record.put("Address",v_ThisColumnValue);
- }
- if(l_FieldValues.size()>3)
- {
- v_ThisColumnValue = l_FieldValues.get(3).trim();
- v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialComma|", ",", true);
- v_ThisColumnValue = v_ThisColumnValue.replaceAll("|mySpecialNewLine|", ", ", true);
- m_Record.put("Territory",v_ThisColumnValue);
- }
- info m_Record.toString();
- }
{"EmployeeID":"1","Name":"Joel Lipman","Address":"Kings Castle","Territory":"England"} {"EmployeeID":"11","Name":"General Dogsbody","Address":"Flat 8, House Corner, Street","Territory":"Brummieland"} {"EmployeeID":"64","Name":"Elephant Man","Address":"123 New Street, New York City","Territory":"USA"}
- {"EmployeeID":"1","Name":"Joel Lipman","Address":"Kings Castle","Territory":"England"}
- {"EmployeeID":"11","Name":"General Dogsbody","Address":"Flat 8, House Corner, Street","Territory":"Brummieland"}
- {"EmployeeID":"64","Name":"Elephant Man","Address":"123 New Street, New York City","Territory":"USA"}
Source(s):