4 回答
TA贡献1872条经验 获得超3个赞
读取 20 MB CSV 文件并每行实例化一个对象,总耗时不到 1 秒。
细节
你没有定义“慢”这个词。所以我做了一个实验,一个随意的基准测试。
首先,我们创建一个包含 40,000Person
条记录的 20 MB 文件。每个都Person
包含一个法语名字和姓氏、一个UUID和一些任意文本作为描述。数据以UTF-8格式写入CSV文件中的四列。我使用Apache Commons CSV库进行读写。
其次,读取这个写入的文件。每行数据被读入内存,然后用于实例化和收集一个Person
对象。
读取此文件并Person
为每一行实例化对象的总经过时间不到一秒。每行大约需要 20K纳秒。实际上,这包括两次读取文件,因为我们进行扫描以计算数据行数以设置收集实例的初始容量。此外,我们将十六进制字符串输入解析为UUID的 128 位值,因此我们有一些时间花在数据处理上(不仅仅是读取)。
对于 Java 16+,将Person
class定义为record。我们覆盖toString
以避免打印出长description
内容。
record Person ( String givenName , String surname , UUID id , String description )
{
static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
@Override
public String toString ()
{
return "Person{ " +
"givenName='" + givenName + '\'' +
" | surname='" + surname + '\'' +
" | id='" + id + '\'' +
" }";
}
}
对于早期的 Java,编写一个常规Person类。
package work.basil.example;
import java.util.UUID;
public class Person
{
// Static
static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
// Member variables.
public String givenName, surname, description;
public UUID id;
public Person ( String givenName , String surname , UUID id , String description )
{
this.givenName = givenName;
this.surname = surname;
this.id = id;
this.description = description ;
}
@Override
public String toString ()
{
return "Person{ " +
"givenName='" + givenName + '\'' +
" | surname='" + surname + '\'' +
" | id='" + id + '\'' +
" }";
}
}
这是写入然后读取 20 MB 文件的完整应用程序。请学习和批评,因为我很快就完成了这个。我没有仔细检查我的工作。
你会找到一个write方法,一个read方法。该main方法调用两者,并跟踪时间。
package work.basil.example;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;
import java.io.BufferedReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Duration;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.ThreadLocalRandom;
public class CsvSpeed
{
public List < Person > read ( Path path )
{
// TODO: Add a check for valid file existing.
List < Person > list = List.of(); // Default to empty list.
try
{
// Prepare list.
int initialCapacity = ( int ) Files.lines( path ).count();
list = new ArrayList <>( initialCapacity );
// Read CSV file. For each row, instantiate and collect `DailyProduct`.
BufferedReader reader = Files.newBufferedReader( path );
Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
for ( CSVRecord record : records )
{
String givenName = record.get( "givenName" );
String surname = record.get( "surname" );
UUID id = UUID.fromString( record.get( "id" ) );
String description = record.get( "description" );
// Instantiate `Person` object, and collect it.
Person person = new Person( givenName , surname , id , description );
list.add( person );
}
} catch ( IOException e )
{
e.printStackTrace();
}
return list;
}
public void write ( final Path path )
{
ThreadLocalRandom random = ThreadLocalRandom.current();
try ( final CSVPrinter printer = CSVFormat.RFC4180.withHeader( "givenName" , "surname" , "id" , "description" ).print( path , StandardCharsets.UTF_8 ) ; )
{
int limit = 40_000; // 40_000 yields about 20 MB of data.
List < String > givenNames = List.of( "Adrien" , "Aimon" , "Alerion" , "Alexis" , "Alezan" , "Ancil" , "Andre" , "Antoine" , "Archard" , "Aurélien" , "Averill" , "Baptiste" , "Barnard" , "Bartelemy" , "Bastien" , "Baylee" , "Beale" , "Beau" , "Beaumont" , "Beauregard" , "Bellamy" , "Berger" , "Blaize" , "Blondel" , "Boyce" , "Bruce" , "Brunelle" , "Brys" , "Burcet" , "Burnell" , "Burrell" , "Byron" , "Canaan" , "Carden" , "Carolas" , "Cavell" , "Chace" , "Chanler" , "Chante" , "Chappel" , "Charles" , "Chasen" , "Chason" , "Chemin" , "Chene" , "Cher" , "Chevalier" , "Cheyne" , "Clément" , "Clemence" , "Corbin" , "Coty" , "Cygne" , "Damien" , "Dandre" , "Dariel" , "Darl" , "Dauphine" , "Davet" , "Dax" , "Dean" , "Delice" , "Delmon" , "Destin" , "Dominique" , "Donatien" , "Duke" , "Eliott" , "Elroy" , "Enzo" , "Erwan" , "Etalon" , "Ethan" , "Fabron" , "Ferrand" , "Filberte" , "Florent" , "Florian" , "Fontaine" , "Forest" , "Fortune" , "Franchot" , "Francois" , "Fraser" , "Frayne" , "Gaëtan" , "Gabin" , "Gage" , "Gaige" , "Garland" , "Garner" , "Gaston" , "Gauge" , "Gaylord" , "Germain" , "Germaine" , "German" , "Gervaise" , "Giles" , "Gilles" , "Gitan" , "Grosvener" , "Guifford" , "Guion" , "Guy" , "Guzman" , "Henri" , "Holland" , "Hugo" , "Hugues" , "Hyacinthe" , "Jérémy" , "Jacquan" , "Jacques" , "Jacquez" , "Janvier" , "Jardan" , "Jay" , "Jaye" , "Jehan" , "Jemond" , "Jocquez" , "Jonathan" , "Jules" , "Julien" , "Justus" , "Karoly" , "Lado" , "Lafayette" , "Lamond" , "Lancelin" , "Landis" , "Landry" , "Laron" , "Larrimore" , "Laurent" , "LaValle" , "Leandre" , "Leggett" , "Leonce" , "Leron" , "Leverett" , "Lilian" , "Loïc" , "Lorenzo" , "Louis" , "Lowell" , "Luc" , "Lucien" , "Lukas" , "Macaire" , "Mace" , "Mahieu" , "Maison" , "Malleville" , "Manneville" , "Mantel" , "Marc" , "Marcel" , "Marion" , "Marius" , "Markez" , "Markis" , "Marmion" , "Marquis" , "Marquise" , "Marshall" , "Martial" , "Maslin" , "Mason" , "Matheo" , "Mathias" , "Mathys" , "Matthieu" , "Maxence" , "Mayson" , "Mehdi" , "Merle" , "Merville" , "Montague" , "Montaigu" , "Monte" , "Montgomery" , "Montreal" , "Montrel" , "Moore" , "Morel" , "Mortimer" , "Nerville" , "Neuveville" , "Nicolas" , "Noë" , "Noah" , "Noe" , "Norman" , "Norville" , "Nouel" , "Olivier" , "Onfroi" , "Paien" , "Parfait" , "Parnell" , "Pascal" , "Patrice" , "Paul" , "Peppin" , "Percival" , "Percy" , "Pernell" , "Peverell" , "Philipe" , "Pierpont" , "Pierre" , "Pomeroy" , "Prewitt" , "Purvis" , "Quennell" , "Quentin" , "Quincey" , "Quincy" , "Quintin" , "Rémi" , "Rafaelle" , "Ranger" , "Raoul" , "Raphaël" , "Rapier" , "Rawlins" , "Ray" , "Raynard" , "Remi" , "René" , "Renard" , "Rene" , "Reule" , "Reynard" , "Robin" , "Romain" , "Rondel" , "Roy" , "Royal" , "Ruff" , "Rush" , "Russel" , "Rustin" , "Sabastien" , "Sacha" , "Salomon" , "Samuel" , "Satordi" , "Saville" , "Scoville" , "Sebastien" , "Sennett" , "Severin" , "Shant" , "Shantae" , "Sidney" , "Siffre" , "Simeon" , "Simon" , "Sinclair" , "Sofiane" , "Somer" , "Stephane" , "Sully" , "Sydney" , "Sylvain" , "Talbot" , "Talon" , "Telford" , "Tempest" , "Teppo" , "Théo" , "Thayer" , "Thibault" , "Thibaut" , "Thiery" , "Tiennan" , "Tiennot" , "Titouan" , "Toussaint" , "Travaris" , "Tyson" , "Urson" , "Vachel" , "Valentin" , "Valere" , "Vallis" , "Verdun" , "Victoir" , "Victor" , "Waltier" , "William" , "Wyatt" , "Yanis" , "Yann" , "Yves" , "Yvon" , "Zosime" , "Abrial" , "Abrielle" , "Abril" , "Adele" , "Alair" , "Alerion" , "Amee" , "Angelique" , "Annette" , "Antonella" , "Arian" , "Ariane" , "Armandina" , "Aubree" , "Aubrielle" , "Audra" , "Avril" , "Bella" , "Berneta" , "Bette" , "Blaise" , "Blanche" , "Blasa" , "Bonte" , "Brie" , "Brienne" , "Brigit" , "Cachay" , "Calice" , "Camille" , "Camylle" , "Caprice" , "Caressa" , "Caroline" , "Catin" , "Celesta" , "Celeste" , "Cera" , "Cerise" , "Chablis" , "Chalice" , "Chambray" , "Champagne" , "Chandell" , "Chaney" , "Chantal" , "Chante" , "Chanterelle" , "Chantile" , "Chantilly" , "Chantrice" , "Charla" , "Charlotte" , "Charmane" , "Chaton" , "Chemin" , "Chenetta" , "Cher" , "Chere" , "Cheri" , "Cheryl" , "Christine" , "Cidney" , "Cinderella" , "Claire" , "Claudette" , "Colette" , "Cordelle" , "Cydnee" , "Daeja" , "Daija" , "Daja" , "Damzel" , "Darelle" , "Darlene" , "Darselle" , "Dejanelle" , "Deleena" , "Delice" , "Demeri" , "Deni" , "Denise" , "Desgracias" , "Desire" , "Desiree" , "Destanee" , "Destiny" , "Dior" , "Domanique" , "Dominique" , "Elaina" , "Elaine" , "Elayna" , "Elise" , "Eloisa" , "Elyse" , "Emeline" , "Emmaline" , "Emmeline" , "Estella" , "Estrella" , "Etiennette" , "Evette" , "Fabienne" , "Fabrienne" , "Fanchon" , "Fancy" , "Fawna" , "Fayana" , "Fayette" , "Fifi" , "Fleur" , "Fleurette" , "Fontanna" , "Fosette" , "Francine" , "Frederique" , "Gabriel" , "Gabriele" , "Gabrielle" , "Gaby" , "Garcelle" , "Gena" , "Genie" , "Georgette" , "Germaine" , "Gervaise" , "Gitana" , "Harriet" , "Heloisa" , "Holland" , "Honnetta" , "Isabelle" , "Ivette" , "Ivonne" , "Jacqueena" , "Jacquetta" , "Jacquiline" , "Jacyline" , "Jaime" , "Jakqueline" , "Janeen" , "Janelly" , "Janina" , "Janiqua" , "Janique" , "Jannnelle" , "Jaquita" , "Jardena" , "Jeanetta" , "Jermaine" , "Jessamine" , "Jewel" , "Jewell" , "Joli" , "Jolie" , "Josephine" , "Jozephine" , "Julieta" , "Karessa" , "Karmaine" , "Klara" , "Laine" , "Lanelle" , "Laramie" , "Layne" , "Layney" , "Leala" , "Leonette" , "Lissette" , "Lizette" , "Lourdes" , "Lucienne" , "Ly" , "Lyla" , "Lysette" , "Madelaine" , "Malerie" , "Manette" , "Marais" , "Marcelle" , "Marché" , "Mardi" , "Margo" , "Marguerite" , "Marie" , "Marie Claude" , "Marie Frances" , "Marie Joelle" , "Marie Pascale" , "Marie Sophie" , "Marjolaine" , "Marquise" , "Marvella" , "Mathieu" , "Matisse" , "Maurelle" , "Maurissa" , "Mavis" , "Melisande" , "Michelle" , "Miette" , "Mignon" , "Mimi" , "Mirya" , "Monet" , "Moniqua" , "Monteen" , "Musetta" , "Myrlie" , "Nadeen" , "Nadia" , "Nadiyah" , "Naeva" , "Nanon" , "Natalle" , "Naudia" , "Nettie" , "Nicholas" , "Nicki" , "Nicky" , "Nicole" , "Nicolette" , "Nicolina" , "Nicolle" , "Nikolette" , "Ninette" , "Ninon" , "Noelle" , "Nycole" , "Odelette" , "Opaline" , "Orane" , "Orva" , "Page" , "Parisa" , "Parnel" , "Parris" , "Patrice" , "Peridot" , "Pippi" , "Prairie" , "Rachele" , "Rachelle" , "Racquel" , "Raphaelle" , "Raquelle" , "Remi" , "Renée" , "Renea" , "Renelle" , "Renita" , "Risette" , "Rochelle" , "Romy" , "Rosabel" , "Rosiclara" , "Ruba" , "Russhell" , "Saleena" , "Salina" , "Satin" , "Sedona" , "Serene" , "Shandelle" , "Shanta" , "Shante" , "Shariah" , "Sharita" , "Sharleen" , "Sheree" , "Shereen" , "Sherell" , "Sherice" , "Sherry" , "Sidnee" , "Sidney" , "Sidnie" , "Sidonie" , "Sinclaire" , "Solange" , "Solen" , "Sorrel" , "Suzette" , "Sydnee" , "Sydney" , "Tallis" , "Tempest" , "Toinette" , "Turquoise" , "Veronique" , "Vignette" , "Villette" , "Violeta" , "Virginie" , "Voleta" , "Vonny" );
List < String > surnames = List.of( "Arceneau" , "Aucoin" , "Babin" , "Babineaux" , "Benoit" , "Bergeron" , "Bernard" , "Bertrand" , "Bessette" , "Blanc" , "Blanchard" , "Bonnet" , "Boucher" , "Bourg" , "Bourque" , "Boutin" , "Bouvier" , "Braud" , "Broussard" , "Brun" , "Chevalier" , "David" , "Depaul" , "Desmarais" , "Disney" , "Dubois" , "Dupont" , "Dupuis" , "Durand" , "Fortescue" , "Fournier" , "Garnier" , "Gaudet" , "Gillet" , "Gillette" , "Girard" , "Gravois" , "Grosvenor" , "Lambert" , "Landry" , "Laroche" , "Laurent" , "Lefevre" , "Leroy" , "Leveque" , "Lisle" , "Martin" , "Michel" , "Molyneux" , "Moreau" , "Morel" , "Neville" , "Pelletier" , "Petit" , "Prideux" , "Renard" , "Richard" , "Robert" , "Rousseau" , "Roux" , "Rufus" , "Simon" , "Thomas" );
for ( int i = 1 ; i <= limit ; i++ )
{
String givenName = givenNames.get( random.nextInt( 0 , givenNames.size() ) );
String surname = surnames.get( random.nextInt( 0 , surnames.size() ) );
UUID id = UUID.randomUUID();
String description = Person.LOREM_IPSUM;
printer.printRecord( givenName , surname , id , description );
}
} catch ( IOException e )
{
e.printStackTrace();
}
}
public static void main ( final String[] args )
{
// Launch the app.
CsvSpeed app = new CsvSpeed();
// Write.
String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" );
Path pathOutput = Paths.get( "/Users/basilbourque/persons.csv" );
app.write( pathOutput );
System.out.println( "Writing file: " + pathOutput );
// Read.
long start = System.nanoTime();
Path pathInput = Paths.get( "/Users/basilbourque/persons.csv" );
List < Person > list = app.read( pathInput );
long stop = System.nanoTime();
// Time.
long elapsed = ( stop - start );
Duration d = Duration.ofNanos( elapsed );
System.out.println( "Reading elapsed: " + d );
System.out.println( "Reading took nanos per row: " + ( elapsed / list.size() ) );
System.out.println( "nanos elapsed: " + elapsed + " | list.size: " + list.size() );
}
}
运行时:
写入文件:/Users/basilbourque/persons.csv
读数经过:PT0.857816234S
每行读取纳秒数:21445
纳秒已过:857816234 | 列表大小:40000
技术栈:
Java 11.0.2 — Azul Systems 的Zulu(从 OpenJDK 构建)
在 IntelliJ 2019.1 中运行
MacBook Pro(视网膜显示屏,15 英寸,2013 年末)
处理器:2.3 GHz Intel Core i7(4 核,8 hyper)
16 GB 1600 MHz DDR3
存储:Apple 内置固态
TA贡献2051条经验 获得超10个赞
在csv-parsers-comparison上,我们可以找到CSV
Reader
/ Writer
-s 之间的比较。最快的是uniVocity CSV parser
。第三个是Jackson
我个人比较喜欢的。使用@Basil Bourque
很好的示例,我对其进行了一些更改并使用了Jackson
类。方法读取返回MappingIterator
,您可以使用它来初始化堆对象(请参阅我如何将元素添加到List
)。我没有包括时间详细信息,但您可以使用 Basil 和此解决方案自己完成:
import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.databind.ObjectReader;
import com.fasterxml.jackson.databind.ObjectWriter;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;
import java.io.File;
import java.io.FileWriter;
import java.time.Duration;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.ThreadLocalRandom;
public class CsvSpeed {
public static void main(String[] args) throws Exception {
File csvFile = new File("./resource/persons.csv").getAbsoluteFile();
CsvSchema schema = CsvSchema.builder()
.addColumn("givenName")
.addColumn("surname")
.addColumn("id")
.addColumn("description")
.build().withHeader();
CsvSpeed csvSpeed = new CsvSpeed();
csvSpeed.write(csvFile, schema);
// Read.
long start = System.nanoTime();
MappingIterator<Person> personMappingIterator = csvSpeed.read(csvFile, schema);
List<Person> persons = new ArrayList<>(40_000);
personMappingIterator.forEachRemaining(persons::add);
long stop = System.nanoTime();
System.out.println(persons.size());
// Time.
long elapsed = (stop - start);
Duration d = Duration.ofNanos(elapsed);
System.out.println("Reading elapsed: " + d);
System.out.println("Reading took nanos per row: " + (elapsed / persons.size()));
System.out.println("nanos elapsed: " + elapsed + " | list.size: " + persons.size());
}
public MappingIterator<Person> read(final File path, CsvSchema schema) throws Exception {
CsvMapper csvMapper = new CsvMapper();
ObjectReader reader = csvMapper.readerFor(Person.class).with(schema);
return reader.readValues(path);
}
public void write(final File path, CsvSchema schema) throws Exception {
ThreadLocalRandom random = ThreadLocalRandom.current();
CsvMapper csvMapper = new CsvMapper();
ObjectWriter writer = csvMapper.writerFor(Person.class).with(schema);
try (FileWriter fileWriter = new FileWriter(path)) {
List<String> givenNames = Arrays.asList("Adrien", "Aimon", "Alerion", "Alexis", "Alezan", "Ancil", "Andre", "Antoine", "Archard", "Aurélien", "Averill", "Baptiste", "Barnard", "Bartelemy", "Bastien", "Baylee", "Beale", "Beau", "Beaumont", "Beauregard", "Bellamy", "Berger", "Blaize", "Blondel", "Boyce", "Bruce", "Brunelle", "Brys", "Burcet", "Burnell", "Burrell", "Byron", "Canaan", "Carden", "Carolas", "Cavell", "Chace", "Chanler", "Chante", "Chappel", "Charles", "Chasen", "Chason", "Chemin", "Chene", "Cher", "Chevalier", "Cheyne", "Clément", "Clemence", "Corbin", "Coty", "Cygne", "Damien", "Dandre", "Dariel", "Darl", "Dauphine", "Davet", "Dax", "Dean", "Delice", "Delmon", "Destin", "Dominique", "Donatien", "Duke", "Eliott", "Elroy", "Enzo", "Erwan", "Etalon", "Ethan", "Fabron", "Ferrand", "Filberte", "Florent", "Florian", "Fontaine", "Forest", "Fortune", "Franchot", "Francois", "Fraser", "Frayne", "Gaëtan", "Gabin", "Gage", "Gaige", "Garland", "Garner", "Gaston", "Gauge", "Gaylord", "Germain", "Germaine", "German", "Gervaise", "Giles", "Gilles", "Gitan", "Grosvener", "Guifford", "Guion", "Guy", "Guzman", "Henri", "Holland", "Hugo", "Hugues", "Hyacinthe", "Jérémy", "Jacquan", "Jacques", "Jacquez", "Janvier", "Jardan", "Jay", "Jaye", "Jehan", "Jemond", "Jocquez", "Jonathan", "Jules", "Julien", "Justus", "Karoly", "Lado", "Lafayette", "Lamond", "Lancelin", "Landis", "Landry", "Laron", "Larrimore", "Laurent", "LaValle", "Leandre", "Leggett", "Leonce", "Leron", "Leverett", "Lilian", "Loïc", "Lorenzo", "Louis", "Lowell", "Luc", "Lucien", "Lukas", "Macaire", "Mace", "Mahieu", "Maison", "Malleville", "Manneville", "Mantel", "Marc", "Marcel", "Marion", "Marius", "Markez", "Markis", "Marmion", "Marquis", "Marquise", "Marshall", "Martial", "Maslin", "Mason", "Matheo", "Mathias", "Mathys", "Matthieu", "Maxence", "Mayson", "Mehdi", "Merle", "Merville", "Montague", "Montaigu", "Monte", "Montgomery", "Montreal", "Montrel", "Moore", "Morel", "Mortimer", "Nerville", "Neuveville", "Nicolas", "Noë", "Noah", "Noe", "Norman", "Norville", "Nouel", "Olivier", "Onfroi", "Paien", "Parfait", "Parnell", "Pascal", "Patrice", "Paul", "Peppin", "Percival", "Percy", "Pernell", "Peverell", "Philipe", "Pierpont", "Pierre", "Pomeroy", "Prewitt", "Purvis", "Quennell", "Quentin", "Quincey", "Quincy", "Quintin", "Rémi", "Rafaelle", "Ranger", "Raoul", "Raphaël", "Rapier", "Rawlins", "Ray", "Raynard", "Remi", "René", "Renard", "Rene", "Reule", "Reynard", "Robin", "Romain", "Rondel", "Roy", "Royal", "Ruff", "Rush", "Russel", "Rustin", "Sabastien", "Sacha", "Salomon", "Samuel", "Satordi", "Saville", "Scoville", "Sebastien", "Sennett", "Severin", "Shant", "Shantae", "Sidney", "Siffre", "Simeon", "Simon", "Sinclair", "Sofiane", "Somer", "Stephane", "Sully", "Sydney", "Sylvain", "Talbot", "Talon", "Telford", "Tempest", "Teppo", "Théo", "Thayer", "Thibault", "Thibaut", "Thiery", "Tiennan", "Tiennot", "Titouan", "Toussaint", "Travaris", "Tyson", "Urson", "Vachel", "Valentin", "Valere", "Vallis", "Verdun", "Victoir", "Victor", "Waltier", "William", "Wyatt", "Yanis", "Yann", "Yves", "Yvon", "Zosime", "Abrial", "Abrielle", "Abril", "Adele", "Alair", "Alerion", "Amee", "Angelique", "Annette", "Antonella", "Arian", "Ariane", "Armandina", "Aubree", "Aubrielle", "Audra", "Avril", "Bella", "Berneta", "Bette", "Blaise", "Blanche", "Blasa", "Bonte", "Brie", "Brienne", "Brigit", "Cachay", "Calice", "Camille", "Camylle", "Caprice", "Caressa", "Caroline", "Catin", "Celesta", "Celeste", "Cera", "Cerise", "Chablis", "Chalice", "Chambray", "Champagne", "Chandell", "Chaney", "Chantal", "Chante", "Chanterelle", "Chantile", "Chantilly", "Chantrice", "Charla", "Charlotte", "Charmane", "Chaton", "Chemin", "Chenetta", "Cher", "Chere", "Cheri", "Cheryl", "Christine", "Cidney", "Cinderella", "Claire", "Claudette", "Colette", "Cordelle", "Cydnee", "Daeja", "Daija", "Daja", "Damzel", "Darelle", "Darlene", "Darselle", "Dejanelle", "Deleena", "Delice", "Demeri", "Deni", "Denise", "Desgracias", "Desire", "Desiree", "Destanee", "Destiny", "Dior", "Domanique", "Dominique", "Elaina", "Elaine", "Elayna", "Elise", "Eloisa", "Elyse", "Emeline", "Emmaline", "Emmeline", "Estella", "Estrella", "Etiennette", "Evette", "Fabienne", "Fabrienne", "Fanchon", "Fancy", "Fawna", "Fayana", "Fayette", "Fifi", "Fleur", "Fleurette", "Fontanna", "Fosette", "Francine", "Frederique", "Gabriel", "Gabriele", "Gabrielle", "Gaby", "Garcelle", "Gena", "Genie", "Georgette", "Germaine", "Gervaise", "Gitana", "Harriet", "Heloisa", "Holland", "Honnetta", "Isabelle", "Ivette", "Ivonne", "Jacqueena", "Jacquetta", "Jacquiline", "Jacyline", "Jaime", "Jakqueline", "Janeen", "Janelly", "Janina", "Janiqua", "Janique", "Jannnelle", "Jaquita", "Jardena", "Jeanetta", "Jermaine", "Jessamine", "Jewel", "Jewell", "Joli", "Jolie", "Josephine", "Jozephine", "Julieta", "Karessa", "Karmaine", "Klara", "Laine", "Lanelle", "Laramie", "Layne", "Layney", "Leala", "Leonette", "Lissette", "Lizette", "Lourdes", "Lucienne", "Ly", "Lyla", "Lysette", "Madelaine", "Malerie", "Manette", "Marais", "Marcelle", "Marché", "Mardi", "Margo", "Marguerite", "Marie", "Marie Claude", "Marie Frances", "Marie Joelle", "Marie Pascale", "Marie Sophie", "Marjolaine", "Marquise", "Marvella", "Mathieu", "Matisse", "Maurelle", "Maurissa", "Mavis", "Melisande", "Michelle", "Miette", "Mignon", "Mimi", "Mirya", "Monet", "Moniqua", "Monteen", "Musetta", "Myrlie", "Nadeen", "Nadia", "Nadiyah", "Naeva", "Nanon", "Natalle", "Naudia", "Nettie", "Nicholas", "Nicki", "Nicky", "Nicole", "Nicolette", "Nicolina", "Nicolle", "Nikolette", "Ninette", "Ninon", "Noelle", "Nycole", "Odelette", "Opaline", "Orane", "Orva", "Page", "Parisa", "Parnel", "Parris", "Patrice", "Peridot", "Pippi", "Prairie", "Rachele", "Rachelle", "Racquel", "Raphaelle", "Raquelle", "Remi", "Renée", "Renea", "Renelle", "Renita", "Risette", "Rochelle", "Romy", "Rosabel", "Rosiclara", "Ruba", "Russhell", "Saleena", "Salina", "Satin", "Sedona", "Serene", "Shandelle", "Shanta", "Shante", "Shariah", "Sharita", "Sharleen", "Sheree", "Shereen", "Sherell", "Sherice", "Sherry", "Sidnee", "Sidney", "Sidnie", "Sidonie", "Sinclaire", "Solange", "Solen", "Sorrel", "Suzette", "Sydnee", "Sydney", "Tallis", "Tempest", "Toinette", "Turquoise", "Veronique", "Vignette", "Villette", "Violeta", "Virginie", "Voleta", "Vonny");
List<String> surnames = Arrays.asList("Arceneau", "Aucoin", "Babin", "Babineaux", "Benoit", "Bergeron", "Bernard", "Bertrand", "Bessette", "Blanc", "Blanchard", "Bonnet", "Boucher", "Bourg", "Bourque", "Boutin", "Bouvier", "Braud", "Broussard", "Brun", "Chevalier", "David", "Depaul", "Desmarais", "Disney", "Dubois", "Dupont", "Dupuis", "Durand", "Fortescue", "Fournier", "Garnier", "Gaudet", "Gillet", "Gillette", "Girard", "Gravois", "Grosvenor", "Lambert", "Landry", "Laroche", "Laurent", "Lefevre", "Leroy", "Leveque", "Lisle", "Martin", "Michel", "Molyneux", "Moreau", "Morel", "Neville", "Pelletier", "Petit", "Prideux", "Renard", "Richard", "Robert", "Rousseau", "Roux", "Rufus", "Simon", "Thomas");
Iterable<Person> persons = () -> {
return new Iterator<Person>() {
int counter = 40_000; //0_000; // 40_000 yields about 20 MB of data.
@Override
public boolean hasNext() {
return counter-- > 0;
}
@Override
public Person next() {
String givenName = givenNames.get(random.nextInt(0, givenNames.size()));
String surname = surnames.get(random.nextInt(0, surnames.size()));
UUID id = UUID.randomUUID();
String description = Person.LOREM_IPSUM;
return new Person(givenName, surname, id, description);
}
};
};
writer.writeValues(fileWriter).writeAll(persons);
}
}
}
class Person {
// Static
static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
// Member variables.
private String givenName, surname, description;
private UUID id;
public Person() {
}
public Person(String givenName, String surname, UUID id, String description) {
this.givenName = givenName;
this.surname = surname;
this.id = id;
this.description = description;
}
public String getGivenName() {
return givenName;
}
public void setGivenName(String givenName) {
this.givenName = givenName;
}
public String getSurname() {
return surname;
}
public void setSurname(String surname) {
this.surname = surname;
}
public String getDescription() {
return description;
}
public void setDescription(String description) {
this.description = description;
}
public UUID getId() {
return id;
}
public void setId(UUID id) {
this.id = id;
}
@Override
public String toString() {
return "Person{ " +
"givenName='" + givenName + '\'' +
" | surname='" + surname + '\'' +
" | id='" + id + '\'' +
" }";
}
}
TA贡献1815条经验 获得超6个赞
除了按照@Basil 的建议使用 java.nio 之外,简单地将FileReader
a包装起来BufferedReader
应该可以实现显着的加速。
FileReader fileReaderMes1 = new BufferedReader( new FileReader(FECHAS[0]));
TA贡献1772条经验 获得超5个赞
这是我的 Basil 提供的解决方案版本,但这使用了 univocity-parsers:
public class CsvSpeed {
public static class Person {
// Static
static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
// Member variables.
@Parsed
public String givenName, surname, description;
public UUID id;
@Parsed
public void id(String id) {
this.id = UUID.fromString(id);
}
@Override
public String toString() {
return "Person{ " +
"givenName='" + givenName + '\'' +
" | surname='" + surname + '\'' +
" | id='" + id + '\'' +
" }";
}
}
public List<Person> read(Path path) {
return new CsvRoutines(Csv.parseRfc4180()).parseAll(Person.class, path.toFile(), "UTF-8", 40_000);
}
public void write(final Path path) {
ThreadLocalRandom random = ThreadLocalRandom.current();
CsvWriter writer = new CsvWriter(path.toFile(), "UTF-8", Csv.writeRfc4180());
writer.writeHeaders("givenName" , "surname" , "id" , "description");
int limit = 40_000; // 40_000 yields about 20 MB of data.
List<String> givenNames = List.of("Adrien", "Aimon", "Alerion", "Alexis", "Alezan", "Ancil", "Andre", "Antoine", "Archard", "Aurélien", "Averill", "Baptiste", "Barnard", "Bartelemy", "Bastien", "Baylee", "Beale", "Beau", "Beaumont", "Beauregard", "Bellamy", "Berger", "Blaize", "Blondel", "Boyce", "Bruce", "Brunelle", "Brys", "Burcet", "Burnell", "Burrell", "Byron", "Canaan", "Carden", "Carolas", "Cavell", "Chace", "Chanler", "Chante", "Chappel", "Charles", "Chasen", "Chason", "Chemin", "Chene", "Cher", "Chevalier", "Cheyne", "Clément", "Clemence", "Corbin", "Coty", "Cygne", "Damien", "Dandre", "Dariel", "Darl", "Dauphine", "Davet", "Dax", "Dean", "Delice", "Delmon", "Destin", "Dominique", "Donatien", "Duke", "Eliott", "Elroy", "Enzo", "Erwan", "Etalon", "Ethan", "Fabron", "Ferrand", "Filberte", "Florent", "Florian", "Fontaine", "Forest", "Fortune", "Franchot", "Francois", "Fraser", "Frayne", "Gaëtan", "Gabin", "Gage", "Gaige", "Garland", "Garner", "Gaston", "Gauge", "Gaylord", "Germain", "Germaine", "German", "Gervaise", "Giles", "Gilles", "Gitan", "Grosvener", "Guifford", "Guion", "Guy", "Guzman", "Henri", "Holland", "Hugo", "Hugues", "Hyacinthe", "Jérémy", "Jacquan", "Jacques", "Jacquez", "Janvier", "Jardan", "Jay", "Jaye", "Jehan", "Jemond", "Jocquez", "Jonathan", "Jules", "Julien", "Justus", "Karoly", "Lado", "Lafayette", "Lamond", "Lancelin", "Landis", "Landry", "Laron", "Larrimore", "Laurent", "LaValle", "Leandre", "Leggett", "Leonce", "Leron", "Leverett", "Lilian", "Loïc", "Lorenzo", "Louis", "Lowell", "Luc", "Lucien", "Lukas", "Macaire", "Mace", "Mahieu", "Maison", "Malleville", "Manneville", "Mantel", "Marc", "Marcel", "Marion", "Marius", "Markez", "Markis", "Marmion", "Marquis", "Marquise", "Marshall", "Martial", "Maslin", "Mason", "Matheo", "Mathias", "Mathys", "Matthieu", "Maxence", "Mayson", "Mehdi", "Merle", "Merville", "Montague", "Montaigu", "Monte", "Montgomery", "Montreal", "Montrel", "Moore", "Morel", "Mortimer", "Nerville", "Neuveville", "Nicolas", "Noë", "Noah", "Noe", "Norman", "Norville", "Nouel", "Olivier", "Onfroi", "Paien", "Parfait", "Parnell", "Pascal", "Patrice", "Paul", "Peppin", "Percival", "Percy", "Pernell", "Peverell", "Philipe", "Pierpont", "Pierre", "Pomeroy", "Prewitt", "Purvis", "Quennell", "Quentin", "Quincey", "Quincy", "Quintin", "Rémi", "Rafaelle", "Ranger", "Raoul", "Raphaël", "Rapier", "Rawlins", "Ray", "Raynard", "Remi", "René", "Renard", "Rene", "Reule", "Reynard", "Robin", "Romain", "Rondel", "Roy", "Royal", "Ruff", "Rush", "Russel", "Rustin", "Sabastien", "Sacha", "Salomon", "Samuel", "Satordi", "Saville", "Scoville", "Sebastien", "Sennett", "Severin", "Shant", "Shantae", "Sidney", "Siffre", "Simeon", "Simon", "Sinclair", "Sofiane", "Somer", "Stephane", "Sully", "Sydney", "Sylvain", "Talbot", "Talon", "Telford", "Tempest", "Teppo", "Théo", "Thayer", "Thibault", "Thibaut", "Thiery", "Tiennan", "Tiennot", "Titouan", "Toussaint", "Travaris", "Tyson", "Urson", "Vachel", "Valentin", "Valere", "Vallis", "Verdun", "Victoir", "Victor", "Waltier", "William", "Wyatt", "Yanis", "Yann", "Yves", "Yvon", "Zosime", "Abrial", "Abrielle", "Abril", "Adele", "Alair", "Alerion", "Amee", "Angelique", "Annette", "Antonella", "Arian", "Ariane", "Armandina", "Aubree", "Aubrielle", "Audra", "Avril", "Bella", "Berneta", "Bette", "Blaise", "Blanche", "Blasa", "Bonte", "Brie", "Brienne", "Brigit", "Cachay", "Calice", "Camille", "Camylle", "Caprice", "Caressa", "Caroline", "Catin", "Celesta", "Celeste", "Cera", "Cerise", "Chablis", "Chalice", "Chambray", "Champagne", "Chandell", "Chaney", "Chantal", "Chante", "Chanterelle", "Chantile", "Chantilly", "Chantrice", "Charla", "Charlotte", "Charmane", "Chaton", "Chemin", "Chenetta", "Cher", "Chere", "Cheri", "Cheryl", "Christine", "Cidney", "Cinderella", "Claire", "Claudette", "Colette", "Cordelle", "Cydnee", "Daeja", "Daija", "Daja", "Damzel", "Darelle", "Darlene", "Darselle", "Dejanelle", "Deleena", "Delice", "Demeri", "Deni", "Denise", "Desgracias", "Desire", "Desiree", "Destanee", "Destiny", "Dior", "Domanique", "Dominique", "Elaina", "Elaine", "Elayna", "Elise", "Eloisa", "Elyse", "Emeline", "Emmaline", "Emmeline", "Estella", "Estrella", "Etiennette", "Evette", "Fabienne", "Fabrienne", "Fanchon", "Fancy", "Fawna", "Fayana", "Fayette", "Fifi", "Fleur", "Fleurette", "Fontanna", "Fosette", "Francine", "Frederique", "Gabriel", "Gabriele", "Gabrielle", "Gaby", "Garcelle", "Gena", "Genie", "Georgette", "Germaine", "Gervaise", "Gitana", "Harriet", "Heloisa", "Holland", "Honnetta", "Isabelle", "Ivette", "Ivonne", "Jacqueena", "Jacquetta", "Jacquiline", "Jacyline", "Jaime", "Jakqueline", "Janeen", "Janelly", "Janina", "Janiqua", "Janique", "Jannnelle", "Jaquita", "Jardena", "Jeanetta", "Jermaine", "Jessamine", "Jewel", "Jewell", "Joli", "Jolie", "Josephine", "Jozephine", "Julieta", "Karessa", "Karmaine", "Klara", "Laine", "Lanelle", "Laramie", "Layne", "Layney", "Leala", "Leonette", "Lissette", "Lizette", "Lourdes", "Lucienne", "Ly", "Lyla", "Lysette", "Madelaine", "Malerie", "Manette", "Marais", "Marcelle", "Marché", "Mardi", "Margo", "Marguerite", "Marie", "Marie Claude", "Marie Frances", "Marie Joelle", "Marie Pascale", "Marie Sophie", "Marjolaine", "Marquise", "Marvella", "Mathieu", "Matisse", "Maurelle", "Maurissa", "Mavis", "Melisande", "Michelle", "Miette", "Mignon", "Mimi", "Mirya", "Monet", "Moniqua", "Monteen", "Musetta", "Myrlie", "Nadeen", "Nadia", "Nadiyah", "Naeva", "Nanon", "Natalle", "Naudia", "Nettie", "Nicholas", "Nicki", "Nicky", "Nicole", "Nicolette", "Nicolina", "Nicolle", "Nikolette", "Ninette", "Ninon", "Noelle", "Nycole", "Odelette", "Opaline", "Orane", "Orva", "Page", "Parisa", "Parnel", "Parris", "Patrice", "Peridot", "Pippi", "Prairie", "Rachele", "Rachelle", "Racquel", "Raphaelle", "Raquelle", "Remi", "Renée", "Renea", "Renelle", "Renita", "Risette", "Rochelle", "Romy", "Rosabel", "Rosiclara", "Ruba", "Russhell", "Saleena", "Salina", "Satin", "Sedona", "Serene", "Shandelle", "Shanta", "Shante", "Shariah", "Sharita", "Sharleen", "Sheree", "Shereen", "Sherell", "Sherice", "Sherry", "Sidnee", "Sidney", "Sidnie", "Sidonie", "Sinclaire", "Solange", "Solen", "Sorrel", "Suzette", "Sydnee", "Sydney", "Tallis", "Tempest", "Toinette", "Turquoise", "Veronique", "Vignette", "Villette", "Violeta", "Virginie", "Voleta", "Vonny");
List<String> surnames = List.of("Arceneau", "Aucoin", "Babin", "Babineaux", "Benoit", "Bergeron", "Bernard", "Bertrand", "Bessette", "Blanc", "Blanchard", "Bonnet", "Boucher", "Bourg", "Bourque", "Boutin", "Bouvier", "Braud", "Broussard", "Brun", "Chevalier", "David", "Depaul", "Desmarais", "Disney", "Dubois", "Dupont", "Dupuis", "Durand", "Fortescue", "Fournier", "Garnier", "Gaudet", "Gillet", "Gillette", "Girard", "Gravois", "Grosvenor", "Lambert", "Landry", "Laroche", "Laurent", "Lefevre", "Leroy", "Leveque", "Lisle", "Martin", "Michel", "Molyneux", "Moreau", "Morel", "Neville", "Pelletier", "Petit", "Prideux", "Renard", "Richard", "Robert", "Rousseau", "Roux", "Rufus", "Simon", "Thomas");
for (int i = 1; i <= limit; i++) {
String givenName = givenNames.get(random.nextInt(0, givenNames.size()));
String surname = surnames.get(random.nextInt(0, surnames.size()));
UUID id = UUID.randomUUID();
String description = Person.LOREM_IPSUM;
writer.writeRow(givenName, surname, id, description);
}
writer.close();
}
public static void main(final String[] args) {
// Launch the app.
CsvSpeed app = new CsvSpeed();
// Write.
String when = Instant.now().truncatedTo(ChronoUnit.SECONDS).toString().replace(":", "•");
Path pathOutput = Paths.get("/tmp/persons.csv");
app.write(pathOutput);
System.out.println("Writing file: " + pathOutput);
// Read.
long start = System.nanoTime();
Path pathInput = Paths.get("/tmp/persons.csv");
List<Person> list = app.read(pathInput);
long stop = System.nanoTime();
// Time.
long elapsed = (stop - start);
Duration d = Duration.ofNanos(elapsed);
System.out.println("Reading elapsed: " + d);
System.out.println("Reading took nanos per row: " + (elapsed / list.size()));
System.out.println("nanos elapsed: " + elapsed + " | list.size: " + list.size());
}
}
在我的机器上运行我得到以下时间:
Writing file: /tmp/persons.csv
Reading elapsed: PT0.230395859S
Reading took nanos per row: 5759
nanos elapsed: 230395859 | list.size: 40000
它没有显示一旦您考虑 JIT 启动和优化代码,解析器可以达到多快。我已更改代码以生成 400K 记录(生成 200 MB 文件)。现在代码打印:
Reading elapsed: PT0.993483883S
Reading took nanos per row: 2483
nanos elapsed: 993483883 | list.size: 400000
并且有 4M 行(几乎 2GB 的数据):
Reading elapsed: PT7.961481755S
Reading took nanos per row: 1990
nanos elapsed: 7961481755 | list.size: 4000000
添加回答
举报