Split CSV file
Split CSV file
Description
Split a CSV file in two parts, one containing all entries conforming to user-defined constraints, the other containing all remaining entries. For each line, a decision is made whether it should be accepted or rejected, based on the condition parameters.
It is possible to check against multiple values by specifying an input file containing all possible values in addition to the value specified via the ‘value’ parameter. If the test for a CSV row succeeds for at least one of the specified values, the row is accepted.
Input files
- at least 1 input file (.csv)
- optional: values files (.txt)
Output files
- Accepted entries (accepted.csv)
- Rejected entries (rejected.csv)
Context
Example
Using the following table as input:
Item | Cost |
---|---|
Apple | 0.25 |
Banana | 0.25 |
Orange | 0.40 |
Fruit hip holster | 10.00 |
Specifying 'item' as column (or 'Item' since this is case-insensitve), 'contains' as operator and 'a' as value, the following rows will be accepted:
Item | Cost |
---|---|
Banana | 0.25 |
Orange | 0.40 |
Or, if case sensitivity is disabled:
Item | Cost |
---|---|
Apple | 0.25 |
Banana | 0.25 |
Orange | 0.40 |
Common CSV file problems
CSV files must be plain text files, using , as the entry separator, and " as the optional quote character. The quote character is used to denote a cell if the entry separator is part of the cell content. The first line is expected to represent the table header, all following lines are expected to represent table rows.
Condition
- Column
-
Examples: peptide, protein, defline, scan count, PBC count, Ratio mean, Ratio SD, Ratio RSD, charge, filename
- Operand
-
Choices: contains (default), starts with, is equal to, is not equal to, is less than, is less than or equal to, is greater than, is greater than or equal to
- Value
-
Examples: __putative__gpf_, __putative__orf_, __td__target_, __td__decoy_
- Be case sensitive
-
Choices: yes (default), no
Source code
split-csv-file.rb, split-csv-file.yaml (GitHub)