为了账号安全,请及时绑定邮箱和手机立即绑定

java读取CSV文件的最快方法

java读取CSV文件的最快方法

四季花海 2022-07-27 20:46:56
我一直在尝试使用 openCSV 读取几个 csv 文件(大约 20 MB),但到目前为止速度很慢。我正在尝试读取我设计的 4 个 csv 文件,这些文件正在加载到堆中。我想知道,如果有任何其他方法可以在更短的时间内完成。private Heap<VOMovingViolations> datosHeap; public void loadMovingViolations(){    Runtime garbage = Runtime.getRuntime();     garbage.gc();    try     {        FileReader fileReaderMes1 = new FileReader(FECHAS[0]);        FileReader fileReaderMes2 = new FileReader(FECHAS[1]);         FileReader fileReaderMes3 = new FileReader(FECHAS[2]);         FileReader fileReaderMes4 = new FileReader(FECHAS[3]);         CSVReader enero = new CSVReaderBuilder(fileReaderMes1).withSkipLines(1).build();        CSVReader febrero = new CSVReaderBuilder(fileReaderMes2).withSkipLines(1).build();        CSVReader marzo = new CSVReaderBuilder(fileReaderMes3).withSkipLines(1).build();        CSVReader abril = new CSVReaderBuilder(fileReaderMes4).withSkipLines(1).build();        String[] row;         while((row = enero.readNext()) != null)        {            int objectId = Integer.parseInt(row[0]);             int totalPaid = (int)Double.parseDouble(row[9]);            short fi = Short.parseShort(row[8]);            short penalty1 = Short.parseShort(row[10]);            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],                        row[12],row[14], row[15], row[4], row[3], penalty1));        }我真的很感激有人可以给我的任何帮助或任何想法。
查看完整描述

4 回答

?
守着一只汪

TA贡献1872条经验 获得超3个赞

读取 20 MB CSV 文件并每行实例化一个对象,总耗时不到 1 秒

细节

你没有定义“慢”这个词。所以我做了一个实验,一个随意的基准测试。

首先,我们创建一个包含 40,000Person条记录的 20 MB 文件。每个都Person包含一个法语名字和姓氏、一个UUID和一些任意文本作为描述。数据以UTF-8格式写入CSV文件中的四列。我使用Apache Commons CSV库进行读写。

其次,读取这个写入的文件。每行数据被读入内存,然后用于实例化和收集一个Person对象。

读取此文件并Person为每一行实例化对象的总经过时间不到一秒。每行大约需要 20K纳秒实际上,这包括两次读取文件,因为我们进行扫描以计算数据行数以设置收集实例的初始容量。此外,我们将十六进制字符串输入解析为UUID的 128 位值,因此我们有一些时间花在数据处理上(不仅仅是读取)。

对于 Java 16+,将Personclass定义为record。我们覆盖toString以避免打印出长description内容。

record Person ( String givenName , String surname , UUID id , String description ) 

{

    static public  String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";


    @Override

    public String toString ()

    {

        return "Person{ " +

                "givenName='" + givenName + '\'' +

                " | surname='" + surname + '\'' +

                " | id='" + id + '\'' +

                " }";

    }

}

对于早期的 Java,编写一个常规Person类。


package work.basil.example;


import java.util.UUID;


public class Person

{

    // Static

   static public  String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";


    // Member variables.

    public String givenName, surname, description;

    public UUID id;


    public Person ( String givenName , String surname , UUID id , String description )

    {

        this.givenName = givenName;

        this.surname = surname;

        this.id = id;

        this.description = description ;

    }


    @Override

    public String toString ()

    {

        return "Person{ " +

                "givenName='" + givenName + '\'' +

                " | surname='" + surname + '\'' +

                " | id='" + id + '\'' +

                " }";

    }

}

这是写入然后读取 20 MB 文件的完整应用程序。请学习和批评,因为我很快就完成了这个。我没有仔细检查我的工作。


你会找到一个write方法,一个read方法。该main方法调用两者,并跟踪时间。


package work.basil.example;


import org.apache.commons.csv.CSVFormat;

import org.apache.commons.csv.CSVPrinter;

import org.apache.commons.csv.CSVRecord;


import java.io.BufferedReader;

import java.io.IOException;

import java.nio.charset.StandardCharsets;

import java.nio.file.Files;

import java.nio.file.Path;

import java.nio.file.Paths;

import java.time.Duration;

import java.time.Instant;

import java.time.temporal.ChronoUnit;

import java.util.ArrayList;

import java.util.List;

import java.util.UUID;

import java.util.concurrent.ThreadLocalRandom;


public class CsvSpeed

{

    public List < Person > read ( Path path )

    {

        // TODO: Add a check for valid file existing.


        List < Person > list = List.of();  // Default to empty list.

        try

        {

            // Prepare list.

            int initialCapacity = ( int ) Files.lines( path ).count();

            list = new ArrayList <>( initialCapacity );


            // Read CSV file. For each row, instantiate and collect `DailyProduct`.

            BufferedReader reader = Files.newBufferedReader( path );

            Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );

            for ( CSVRecord record : records )

            {

                String givenName = record.get( "givenName" );

                String surname = record.get( "surname" );

                UUID id = UUID.fromString( record.get( "id" ) );

                String description = record.get( "description" );

                // Instantiate `Person` object, and collect it.

                Person person = new Person( givenName , surname , id , description );

                list.add( person );

            }

        } catch ( IOException e )

        {

            e.printStackTrace();

        }

        return list;

    }


    public void write ( final Path path )

    {

        ThreadLocalRandom random = ThreadLocalRandom.current();

        try ( final CSVPrinter printer = CSVFormat.RFC4180.withHeader( "givenName" , "surname" , "id" , "description" ).print( path , StandardCharsets.UTF_8 ) ; )

        {

            int limit = 40_000;  // 40_000 yields about 20 MB of data.

            List < String > givenNames = List.of( "Adrien" , "Aimon" , "Alerion" , "Alexis" , "Alezan" , "Ancil" , "Andre" , "Antoine" , "Archard" , "Aurélien" , "Averill" , "Baptiste" , "Barnard" , "Bartelemy" , "Bastien" , "Baylee" , "Beale" , "Beau" , "Beaumont" , "Beauregard" , "Bellamy" , "Berger" , "Blaize" , "Blondel" , "Boyce" , "Bruce" , "Brunelle" , "Brys" , "Burcet" , "Burnell" , "Burrell" , "Byron" , "Canaan" , "Carden" , "Carolas" , "Cavell" , "Chace" , "Chanler" , "Chante" , "Chappel" , "Charles" , "Chasen" , "Chason" , "Chemin" , "Chene" , "Cher" , "Chevalier" , "Cheyne" , "Clément" , "Clemence" , "Corbin" , "Coty" , "Cygne" , "Damien" , "Dandre" , "Dariel" , "Darl" , "Dauphine" , "Davet" , "Dax" , "Dean" , "Delice" , "Delmon" , "Destin" , "Dominique" , "Donatien" , "Duke" , "Eliott" , "Elroy" , "Enzo" , "Erwan" , "Etalon" , "Ethan" , "Fabron" , "Ferrand" , "Filberte" , "Florent" , "Florian" , "Fontaine" , "Forest" , "Fortune" , "Franchot" , "Francois" , "Fraser" , "Frayne" , "Gaëtan" , "Gabin" , "Gage" , "Gaige" , "Garland" , "Garner" , "Gaston" , "Gauge" , "Gaylord" , "Germain" , "Germaine" , "German" , "Gervaise" , "Giles" , "Gilles" , "Gitan" , "Grosvener" , "Guifford" , "Guion" , "Guy" , "Guzman" , "Henri" , "Holland" , "Hugo" , "Hugues" , "Hyacinthe" , "Jérémy" , "Jacquan" , "Jacques" , "Jacquez" , "Janvier" , "Jardan" , "Jay" , "Jaye" , "Jehan" , "Jemond" , "Jocquez" , "Jonathan" , "Jules" , "Julien" , "Justus" , "Karoly" , "Lado" , "Lafayette" , "Lamond" , "Lancelin" , "Landis" , "Landry" , "Laron" , "Larrimore" , "Laurent" , "LaValle" , "Leandre" , "Leggett" , "Leonce" , "Leron" , "Leverett" , "Lilian" , "Loïc" , "Lorenzo" , "Louis" , "Lowell" , "Luc" , "Lucien" , "Lukas" , "Macaire" , "Mace" , "Mahieu" , "Maison" , "Malleville" , "Manneville" , "Mantel" , "Marc" , "Marcel" , "Marion" , "Marius" , "Markez" , "Markis" , "Marmion" , "Marquis" , "Marquise" , "Marshall" , "Martial" , "Maslin" , "Mason" , "Matheo" , "Mathias" , "Mathys" , "Matthieu" , "Maxence" , "Mayson" , "Mehdi" , "Merle" , "Merville" , "Montague" , "Montaigu" , "Monte" , "Montgomery" , "Montreal" , "Montrel" , "Moore" , "Morel" , "Mortimer" , "Nerville" , "Neuveville" , "Nicolas" , "Noë" , "Noah" , "Noe" , "Norman" , "Norville" , "Nouel" , "Olivier" , "Onfroi" , "Paien" , "Parfait" , "Parnell" , "Pascal" , "Patrice" , "Paul" , "Peppin" , "Percival" , "Percy" , "Pernell" , "Peverell" , "Philipe" , "Pierpont" , "Pierre" , "Pomeroy" , "Prewitt" , "Purvis" , "Quennell" , "Quentin" , "Quincey" , "Quincy" , "Quintin" , "Rémi" , "Rafaelle" , "Ranger" , "Raoul" , "Raphaël" , "Rapier" , "Rawlins" , "Ray" , "Raynard" , "Remi" , "René" , "Renard" , "Rene" , "Reule" , "Reynard" , "Robin" , "Romain" , "Rondel" , "Roy" , "Royal" , "Ruff" , "Rush" , "Russel" , "Rustin" , "Sabastien" , "Sacha" , "Salomon" , "Samuel" , "Satordi" , "Saville" , "Scoville" , "Sebastien" , "Sennett" , "Severin" , "Shant" , "Shantae" , "Sidney" , "Siffre" , "Simeon" , "Simon" , "Sinclair" , "Sofiane" , "Somer" , "Stephane" , "Sully" , "Sydney" , "Sylvain" , "Talbot" , "Talon" , "Telford" , "Tempest" , "Teppo" , "Théo" , "Thayer" , "Thibault" , "Thibaut" , "Thiery" , "Tiennan" , "Tiennot" , "Titouan" , "Toussaint" , "Travaris" , "Tyson" , "Urson" , "Vachel" , "Valentin" , "Valere" , "Vallis" , "Verdun" , "Victoir" , "Victor" , "Waltier" , "William" , "Wyatt" , "Yanis" , "Yann" , "Yves" , "Yvon" , "Zosime" , "Abrial" , "Abrielle" , "Abril" , "Adele" , "Alair" , "Alerion" , "Amee" , "Angelique" , "Annette" , "Antonella" , "Arian" , "Ariane" , "Armandina" , "Aubree" , "Aubrielle" , "Audra" , "Avril" , "Bella" , "Berneta" , "Bette" , "Blaise" , "Blanche" , "Blasa" , "Bonte" , "Brie" , "Brienne" , "Brigit" , "Cachay" , "Calice" , "Camille" , "Camylle" , "Caprice" , "Caressa" , "Caroline" , "Catin" , "Celesta" , "Celeste" , "Cera" , "Cerise" , "Chablis" , "Chalice" , "Chambray" , "Champagne" , "Chandell" , "Chaney" , "Chantal" , "Chante" , "Chanterelle" , "Chantile" , "Chantilly" , "Chantrice" , "Charla" , "Charlotte" , "Charmane" , "Chaton" , "Chemin" , "Chenetta" , "Cher" , "Chere" , "Cheri" , "Cheryl" , "Christine" , "Cidney" , "Cinderella" , "Claire" , "Claudette" , "Colette" , "Cordelle" , "Cydnee" , "Daeja" , "Daija" , "Daja" , "Damzel" , "Darelle" , "Darlene" , "Darselle" , "Dejanelle" , "Deleena" , "Delice" , "Demeri" , "Deni" , "Denise" , "Desgracias" , "Desire" , "Desiree" , "Destanee" , "Destiny" , "Dior" , "Domanique" , "Dominique" , "Elaina" , "Elaine" , "Elayna" , "Elise" , "Eloisa" , "Elyse" , "Emeline" , "Emmaline" , "Emmeline" , "Estella" , "Estrella" , "Etiennette" , "Evette" , "Fabienne" , "Fabrienne" , "Fanchon" , "Fancy" , "Fawna" , "Fayana" , "Fayette" , "Fifi" , "Fleur" , "Fleurette" , "Fontanna" , "Fosette" , "Francine" , "Frederique" , "Gabriel" , "Gabriele" , "Gabrielle" , "Gaby" , "Garcelle" , "Gena" , "Genie" , "Georgette" , "Germaine" , "Gervaise" , "Gitana" , "Harriet" , "Heloisa" , "Holland" , "Honnetta" , "Isabelle" , "Ivette" , "Ivonne" , "Jacqueena" , "Jacquetta" , "Jacquiline" , "Jacyline" , "Jaime" , "Jakqueline" , "Janeen" , "Janelly" , "Janina" , "Janiqua" , "Janique" , "Jannnelle" , "Jaquita" , "Jardena" , "Jeanetta" , "Jermaine" , "Jessamine" , "Jewel" , "Jewell" , "Joli" , "Jolie" , "Josephine" , "Jozephine" , "Julieta" , "Karessa" , "Karmaine" , "Klara" , "Laine" , "Lanelle" , "Laramie" , "Layne" , "Layney" , "Leala" , "Leonette" , "Lissette" , "Lizette" , "Lourdes" , "Lucienne" , "Ly" , "Lyla" , "Lysette" , "Madelaine" , "Malerie" , "Manette" , "Marais" , "Marcelle" , "Marché" , "Mardi" , "Margo" , "Marguerite" , "Marie" , "Marie Claude" , "Marie Frances" , "Marie Joelle" , "Marie Pascale" , "Marie Sophie" , "Marjolaine" , "Marquise" , "Marvella" , "Mathieu" , "Matisse" , "Maurelle" , "Maurissa" , "Mavis" , "Melisande" , "Michelle" , "Miette" , "Mignon" , "Mimi" , "Mirya" , "Monet" , "Moniqua" , "Monteen" , "Musetta" , "Myrlie" , "Nadeen" , "Nadia" , "Nadiyah" , "Naeva" , "Nanon" , "Natalle" , "Naudia" , "Nettie" , "Nicholas" , "Nicki" , "Nicky" , "Nicole" , "Nicolette" , "Nicolina" , "Nicolle" , "Nikolette" , "Ninette" , "Ninon" , "Noelle" , "Nycole" , "Odelette" , "Opaline" , "Orane" , "Orva" , "Page" , "Parisa" , "Parnel" , "Parris" , "Patrice" , "Peridot" , "Pippi" , "Prairie" , "Rachele" , "Rachelle" , "Racquel" , "Raphaelle" , "Raquelle" , "Remi" , "Renée" , "Renea" , "Renelle" , "Renita" , "Risette" , "Rochelle" , "Romy" , "Rosabel" , "Rosiclara" , "Ruba" , "Russhell" , "Saleena" , "Salina" , "Satin" , "Sedona" , "Serene" , "Shandelle" , "Shanta" , "Shante" , "Shariah" , "Sharita" , "Sharleen" , "Sheree" , "Shereen" , "Sherell" , "Sherice" , "Sherry" , "Sidnee" , "Sidney" , "Sidnie" , "Sidonie" , "Sinclaire" , "Solange" , "Solen" , "Sorrel" , "Suzette" , "Sydnee" , "Sydney" , "Tallis" , "Tempest" , "Toinette" , "Turquoise" , "Veronique" , "Vignette" , "Villette" , "Violeta" , "Virginie" , "Voleta" , "Vonny" );

            List < String > surnames = List.of( "Arceneau" , "Aucoin" , "Babin" , "Babineaux" , "Benoit" , "Bergeron" , "Bernard" , "Bertrand" , "Bessette" , "Blanc" , "Blanchard" , "Bonnet" , "Boucher" , "Bourg" , "Bourque" , "Boutin" , "Bouvier" , "Braud" , "Broussard" , "Brun" , "Chevalier" , "David" , "Depaul" , "Desmarais" , "Disney" , "Dubois" , "Dupont" , "Dupuis" , "Durand" , "Fortescue" , "Fournier" , "Garnier" , "Gaudet" , "Gillet" , "Gillette" , "Girard" , "Gravois" , "Grosvenor" , "Lambert" , "Landry" , "Laroche" , "Laurent" , "Lefevre" , "Leroy" , "Leveque" , "Lisle" , "Martin" , "Michel" , "Molyneux" , "Moreau" , "Morel" , "Neville" , "Pelletier" , "Petit" , "Prideux" , "Renard" , "Richard" , "Robert" , "Rousseau" , "Roux" , "Rufus" , "Simon" , "Thomas" );

            for ( int i = 1 ; i <= limit ; i++ )

            {

                String givenName = givenNames.get( random.nextInt( 0 , givenNames.size() ) );

                String surname = surnames.get( random.nextInt( 0 , surnames.size() ) );

                UUID id = UUID.randomUUID();

                String description = Person.LOREM_IPSUM;

                printer.printRecord( givenName , surname , id , description );

            }

        } catch ( IOException e )

        {

            e.printStackTrace();

        }

    }


    public static void main ( final String[] args )

    {

        // Launch the app.

        CsvSpeed app = new CsvSpeed();


        // Write.

        String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" );

        Path pathOutput = Paths.get( "/Users/basilbourque/persons.csv" );

        app.write( pathOutput );

        System.out.println( "Writing file: " + pathOutput );


        // Read.

        long start = System.nanoTime();

        Path pathInput = Paths.get( "/Users/basilbourque/persons.csv" );

        List < Person > list = app.read( pathInput );

        long stop = System.nanoTime();


        // Time.

        long elapsed = ( stop - start );

        Duration d = Duration.ofNanos( elapsed );

        System.out.println( "Reading elapsed: " + d );

        System.out.println( "Reading took nanos per row: " + ( elapsed / list.size() ) );

        System.out.println( "nanos elapsed: " + elapsed + "  |  list.size: " + list.size() );

    }

}

运行时:

写入文件:/Users/basilbourque/persons.csv

读数经过:PT0.857816234S

每行读取纳秒数:21445

纳秒已过:857816234 | 列表大小:40000

技术栈:

  • Java 11.0.2 — Azul Systems 的Zulu(从 OpenJDK 构建)

  • 在 IntelliJ 2019.1 中运行

  • macOS 莫哈韦

  • MacBook Pro(视网膜显示屏,15 英寸,2013 年末)

  • 处理器:2.3 GHz Intel Core i7(4 核,8 hyper)

  • 16 GB 1600 MHz DDR3

  • 存储:Apple 内置固态


查看完整回答
反对 回复 2022-07-27
?
侃侃无极

TA贡献2051条经验 获得超10个赞

csv-parsers-comparison上,我们可以找到CSV ReaderWriter-s 之间的比较。最快的是uniVocity CSV parser。第三个是Jackson我个人比较喜欢的。使用@Basil Bourque很好的示例,我对其进行了一些更改并使用了Jackson类。方法读取返回MappingIterator,您可以使用它来初始化堆对象(请参阅我如何将元素添加到List)。我没有包括时间详细信息,但您可以使用 Basil 和此解决方案自己完成:

import com.fasterxml.jackson.databind.MappingIterator;

import com.fasterxml.jackson.databind.ObjectReader;

import com.fasterxml.jackson.databind.ObjectWriter;

import com.fasterxml.jackson.dataformat.csv.CsvMapper;

import com.fasterxml.jackson.dataformat.csv.CsvSchema;


import java.io.File;

import java.io.FileWriter;

import java.time.Duration;

import java.util.ArrayList;

import java.util.Arrays;

import java.util.Iterator;

import java.util.List;

import java.util.UUID;

import java.util.concurrent.ThreadLocalRandom;


public class CsvSpeed {


    public static void main(String[] args) throws Exception {

        File csvFile = new File("./resource/persons.csv").getAbsoluteFile();


        CsvSchema schema = CsvSchema.builder()

                .addColumn("givenName")

                .addColumn("surname")

                .addColumn("id")

                .addColumn("description")

                .build().withHeader();


        CsvSpeed csvSpeed = new CsvSpeed();

        csvSpeed.write(csvFile, schema);


        // Read.

        long start = System.nanoTime();

        MappingIterator<Person> personMappingIterator = csvSpeed.read(csvFile, schema);

        List<Person> persons = new ArrayList<>(40_000);

        personMappingIterator.forEachRemaining(persons::add);


        long stop = System.nanoTime();


        System.out.println(persons.size());


        // Time.

        long elapsed = (stop - start);

        Duration d = Duration.ofNanos(elapsed);

        System.out.println("Reading elapsed: " + d);

        System.out.println("Reading took nanos per row: " + (elapsed / persons.size()));

        System.out.println("nanos elapsed: " + elapsed + "  |  list.size: " + persons.size());

    }


    public MappingIterator<Person> read(final File path, CsvSchema schema) throws Exception {

        CsvMapper csvMapper = new CsvMapper();


        ObjectReader reader = csvMapper.readerFor(Person.class).with(schema);

        return reader.readValues(path);

    }


    public void write(final File path, CsvSchema schema) throws Exception {

        ThreadLocalRandom random = ThreadLocalRandom.current();


        CsvMapper csvMapper = new CsvMapper();

        ObjectWriter writer = csvMapper.writerFor(Person.class).with(schema);


        try (FileWriter fileWriter = new FileWriter(path)) {

            List<String> givenNames = Arrays.asList("Adrien", "Aimon", "Alerion", "Alexis", "Alezan", "Ancil", "Andre", "Antoine", "Archard", "Aurélien", "Averill", "Baptiste", "Barnard", "Bartelemy", "Bastien", "Baylee", "Beale", "Beau", "Beaumont", "Beauregard", "Bellamy", "Berger", "Blaize", "Blondel", "Boyce", "Bruce", "Brunelle", "Brys", "Burcet", "Burnell", "Burrell", "Byron", "Canaan", "Carden", "Carolas", "Cavell", "Chace", "Chanler", "Chante", "Chappel", "Charles", "Chasen", "Chason", "Chemin", "Chene", "Cher", "Chevalier", "Cheyne", "Clément", "Clemence", "Corbin", "Coty", "Cygne", "Damien", "Dandre", "Dariel", "Darl", "Dauphine", "Davet", "Dax", "Dean", "Delice", "Delmon", "Destin", "Dominique", "Donatien", "Duke", "Eliott", "Elroy", "Enzo", "Erwan", "Etalon", "Ethan", "Fabron", "Ferrand", "Filberte", "Florent", "Florian", "Fontaine", "Forest", "Fortune", "Franchot", "Francois", "Fraser", "Frayne", "Gaëtan", "Gabin", "Gage", "Gaige", "Garland", "Garner", "Gaston", "Gauge", "Gaylord", "Germain", "Germaine", "German", "Gervaise", "Giles", "Gilles", "Gitan", "Grosvener", "Guifford", "Guion", "Guy", "Guzman", "Henri", "Holland", "Hugo", "Hugues", "Hyacinthe", "Jérémy", "Jacquan", "Jacques", "Jacquez", "Janvier", "Jardan", "Jay", "Jaye", "Jehan", "Jemond", "Jocquez", "Jonathan", "Jules", "Julien", "Justus", "Karoly", "Lado", "Lafayette", "Lamond", "Lancelin", "Landis", "Landry", "Laron", "Larrimore", "Laurent", "LaValle", "Leandre", "Leggett", "Leonce", "Leron", "Leverett", "Lilian", "Loïc", "Lorenzo", "Louis", "Lowell", "Luc", "Lucien", "Lukas", "Macaire", "Mace", "Mahieu", "Maison", "Malleville", "Manneville", "Mantel", "Marc", "Marcel", "Marion", "Marius", "Markez", "Markis", "Marmion", "Marquis", "Marquise", "Marshall", "Martial", "Maslin", "Mason", "Matheo", "Mathias", "Mathys", "Matthieu", "Maxence", "Mayson", "Mehdi", "Merle", "Merville", "Montague", "Montaigu", "Monte", "Montgomery", "Montreal", "Montrel", "Moore", "Morel", "Mortimer", "Nerville", "Neuveville", "Nicolas", "Noë", "Noah", "Noe", "Norman", "Norville", "Nouel", "Olivier", "Onfroi", "Paien", "Parfait", "Parnell", "Pascal", "Patrice", "Paul", "Peppin", "Percival", "Percy", "Pernell", "Peverell", "Philipe", "Pierpont", "Pierre", "Pomeroy", "Prewitt", "Purvis", "Quennell", "Quentin", "Quincey", "Quincy", "Quintin", "Rémi", "Rafaelle", "Ranger", "Raoul", "Raphaël", "Rapier", "Rawlins", "Ray", "Raynard", "Remi", "René", "Renard", "Rene", "Reule", "Reynard", "Robin", "Romain", "Rondel", "Roy", "Royal", "Ruff", "Rush", "Russel", "Rustin", "Sabastien", "Sacha", "Salomon", "Samuel", "Satordi", "Saville", "Scoville", "Sebastien", "Sennett", "Severin", "Shant", "Shantae", "Sidney", "Siffre", "Simeon", "Simon", "Sinclair", "Sofiane", "Somer", "Stephane", "Sully", "Sydney", "Sylvain", "Talbot", "Talon", "Telford", "Tempest", "Teppo", "Théo", "Thayer", "Thibault", "Thibaut", "Thiery", "Tiennan", "Tiennot", "Titouan", "Toussaint", "Travaris", "Tyson", "Urson", "Vachel", "Valentin", "Valere", "Vallis", "Verdun", "Victoir", "Victor", "Waltier", "William", "Wyatt", "Yanis", "Yann", "Yves", "Yvon", "Zosime", "Abrial", "Abrielle", "Abril", "Adele", "Alair", "Alerion", "Amee", "Angelique", "Annette", "Antonella", "Arian", "Ariane", "Armandina", "Aubree", "Aubrielle", "Audra", "Avril", "Bella", "Berneta", "Bette", "Blaise", "Blanche", "Blasa", "Bonte", "Brie", "Brienne", "Brigit", "Cachay", "Calice", "Camille", "Camylle", "Caprice", "Caressa", "Caroline", "Catin", "Celesta", "Celeste", "Cera", "Cerise", "Chablis", "Chalice", "Chambray", "Champagne", "Chandell", "Chaney", "Chantal", "Chante", "Chanterelle", "Chantile", "Chantilly", "Chantrice", "Charla", "Charlotte", "Charmane", "Chaton", "Chemin", "Chenetta", "Cher", "Chere", "Cheri", "Cheryl", "Christine", "Cidney", "Cinderella", "Claire", "Claudette", "Colette", "Cordelle", "Cydnee", "Daeja", "Daija", "Daja", "Damzel", "Darelle", "Darlene", "Darselle", "Dejanelle", "Deleena", "Delice", "Demeri", "Deni", "Denise", "Desgracias", "Desire", "Desiree", "Destanee", "Destiny", "Dior", "Domanique", "Dominique", "Elaina", "Elaine", "Elayna", "Elise", "Eloisa", "Elyse", "Emeline", "Emmaline", "Emmeline", "Estella", "Estrella", "Etiennette", "Evette", "Fabienne", "Fabrienne", "Fanchon", "Fancy", "Fawna", "Fayana", "Fayette", "Fifi", "Fleur", "Fleurette", "Fontanna", "Fosette", "Francine", "Frederique", "Gabriel", "Gabriele", "Gabrielle", "Gaby", "Garcelle", "Gena", "Genie", "Georgette", "Germaine", "Gervaise", "Gitana", "Harriet", "Heloisa", "Holland", "Honnetta", "Isabelle", "Ivette", "Ivonne", "Jacqueena", "Jacquetta", "Jacquiline", "Jacyline", "Jaime", "Jakqueline", "Janeen", "Janelly", "Janina", "Janiqua", "Janique", "Jannnelle", "Jaquita", "Jardena", "Jeanetta", "Jermaine", "Jessamine", "Jewel", "Jewell", "Joli", "Jolie", "Josephine", "Jozephine", "Julieta", "Karessa", "Karmaine", "Klara", "Laine", "Lanelle", "Laramie", "Layne", "Layney", "Leala", "Leonette", "Lissette", "Lizette", "Lourdes", "Lucienne", "Ly", "Lyla", "Lysette", "Madelaine", "Malerie", "Manette", "Marais", "Marcelle", "Marché", "Mardi", "Margo", "Marguerite", "Marie", "Marie Claude", "Marie Frances", "Marie Joelle", "Marie Pascale", "Marie Sophie", "Marjolaine", "Marquise", "Marvella", "Mathieu", "Matisse", "Maurelle", "Maurissa", "Mavis", "Melisande", "Michelle", "Miette", "Mignon", "Mimi", "Mirya", "Monet", "Moniqua", "Monteen", "Musetta", "Myrlie", "Nadeen", "Nadia", "Nadiyah", "Naeva", "Nanon", "Natalle", "Naudia", "Nettie", "Nicholas", "Nicki", "Nicky", "Nicole", "Nicolette", "Nicolina", "Nicolle", "Nikolette", "Ninette", "Ninon", "Noelle", "Nycole", "Odelette", "Opaline", "Orane", "Orva", "Page", "Parisa", "Parnel", "Parris", "Patrice", "Peridot", "Pippi", "Prairie", "Rachele", "Rachelle", "Racquel", "Raphaelle", "Raquelle", "Remi", "Renée", "Renea", "Renelle", "Renita", "Risette", "Rochelle", "Romy", "Rosabel", "Rosiclara", "Ruba", "Russhell", "Saleena", "Salina", "Satin", "Sedona", "Serene", "Shandelle", "Shanta", "Shante", "Shariah", "Sharita", "Sharleen", "Sheree", "Shereen", "Sherell", "Sherice", "Sherry", "Sidnee", "Sidney", "Sidnie", "Sidonie", "Sinclaire", "Solange", "Solen", "Sorrel", "Suzette", "Sydnee", "Sydney", "Tallis", "Tempest", "Toinette", "Turquoise", "Veronique", "Vignette", "Villette", "Violeta", "Virginie", "Voleta", "Vonny");

            List<String> surnames = Arrays.asList("Arceneau", "Aucoin", "Babin", "Babineaux", "Benoit", "Bergeron", "Bernard", "Bertrand", "Bessette", "Blanc", "Blanchard", "Bonnet", "Boucher", "Bourg", "Bourque", "Boutin", "Bouvier", "Braud", "Broussard", "Brun", "Chevalier", "David", "Depaul", "Desmarais", "Disney", "Dubois", "Dupont", "Dupuis", "Durand", "Fortescue", "Fournier", "Garnier", "Gaudet", "Gillet", "Gillette", "Girard", "Gravois", "Grosvenor", "Lambert", "Landry", "Laroche", "Laurent", "Lefevre", "Leroy", "Leveque", "Lisle", "Martin", "Michel", "Molyneux", "Moreau", "Morel", "Neville", "Pelletier", "Petit", "Prideux", "Renard", "Richard", "Robert", "Rousseau", "Roux", "Rufus", "Simon", "Thomas");

            Iterable<Person> persons = () -> {

                return new Iterator<Person>() {

                    int counter = 40_000; //0_000;  // 40_000 yields about 20 MB of data.


                    @Override

                    public boolean hasNext() {

                        return counter-- > 0;

                    }


                    @Override

                    public Person next() {

                        String givenName = givenNames.get(random.nextInt(0, givenNames.size()));

                        String surname = surnames.get(random.nextInt(0, surnames.size()));

                        UUID id = UUID.randomUUID();

                        String description = Person.LOREM_IPSUM;

                        return new Person(givenName, surname, id, description);

                    }

                };

            };

            writer.writeValues(fileWriter).writeAll(persons);

        }

    }

}


class Person {

    // Static

    static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";


    // Member variables.

    private String givenName, surname, description;

    private UUID id;


    public Person() {


    }


    public Person(String givenName, String surname, UUID id, String description) {

        this.givenName = givenName;

        this.surname = surname;

        this.id = id;

        this.description = description;

    }


    public String getGivenName() {

        return givenName;

    }


    public void setGivenName(String givenName) {

        this.givenName = givenName;

    }


    public String getSurname() {

        return surname;

    }


    public void setSurname(String surname) {

        this.surname = surname;

    }


    public String getDescription() {

        return description;

    }


    public void setDescription(String description) {

        this.description = description;

    }


    public UUID getId() {

        return id;

    }


    public void setId(UUID id) {

        this.id = id;

    }


    @Override

    public String toString() {

        return "Person{ " +

                "givenName='" + givenName + '\'' +

                " | surname='" + surname + '\'' +

                " | id='" + id + '\'' +

                " }";

    }

}


查看完整回答
反对 回复 2022-07-27
?
红糖糍粑

TA贡献1815条经验 获得超6个赞

除了按照@Basil 的建议使用 java.nio 之外,简单地将FileReadera包装起来BufferedReader应该可以实现显着的加速。

FileReader fileReaderMes1 = new BufferedReader( new FileReader(FECHAS[0]));


查看完整回答
反对 回复 2022-07-27
?
月关宝盒

TA贡献1772条经验 获得超5个赞

这是我的 Basil 提供的解决方案版本,但这使用了 univocity-parsers:


public class CsvSpeed {


public static class Person {

    // Static

    static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";


    // Member variables.

    @Parsed

    public String givenName, surname, description;


    public UUID id;


    @Parsed

    public void id(String id) {

        this.id = UUID.fromString(id);

    }


    @Override

    public String toString() {

        return "Person{ " +

                "givenName='" + givenName + '\'' +

                " | surname='" + surname + '\'' +

                " | id='" + id + '\'' +

                " }";

    }

}


public List<Person> read(Path path) {

    return new CsvRoutines(Csv.parseRfc4180()).parseAll(Person.class, path.toFile(), "UTF-8", 40_000);

}


public void write(final Path path) {

    ThreadLocalRandom random = ThreadLocalRandom.current();

    CsvWriter writer = new CsvWriter(path.toFile(), "UTF-8", Csv.writeRfc4180());

    writer.writeHeaders("givenName" , "surname" , "id" , "description");


    int limit = 40_000;  // 40_000 yields about 20 MB of data.

    List<String> givenNames = List.of("Adrien", "Aimon", "Alerion", "Alexis", "Alezan", "Ancil", "Andre", "Antoine", "Archard", "Aurélien", "Averill", "Baptiste", "Barnard", "Bartelemy", "Bastien", "Baylee", "Beale", "Beau", "Beaumont", "Beauregard", "Bellamy", "Berger", "Blaize", "Blondel", "Boyce", "Bruce", "Brunelle", "Brys", "Burcet", "Burnell", "Burrell", "Byron", "Canaan", "Carden", "Carolas", "Cavell", "Chace", "Chanler", "Chante", "Chappel", "Charles", "Chasen", "Chason", "Chemin", "Chene", "Cher", "Chevalier", "Cheyne", "Clément", "Clemence", "Corbin", "Coty", "Cygne", "Damien", "Dandre", "Dariel", "Darl", "Dauphine", "Davet", "Dax", "Dean", "Delice", "Delmon", "Destin", "Dominique", "Donatien", "Duke", "Eliott", "Elroy", "Enzo", "Erwan", "Etalon", "Ethan", "Fabron", "Ferrand", "Filberte", "Florent", "Florian", "Fontaine", "Forest", "Fortune", "Franchot", "Francois", "Fraser", "Frayne", "Gaëtan", "Gabin", "Gage", "Gaige", "Garland", "Garner", "Gaston", "Gauge", "Gaylord", "Germain", "Germaine", "German", "Gervaise", "Giles", "Gilles", "Gitan", "Grosvener", "Guifford", "Guion", "Guy", "Guzman", "Henri", "Holland", "Hugo", "Hugues", "Hyacinthe", "Jérémy", "Jacquan", "Jacques", "Jacquez", "Janvier", "Jardan", "Jay", "Jaye", "Jehan", "Jemond", "Jocquez", "Jonathan", "Jules", "Julien", "Justus", "Karoly", "Lado", "Lafayette", "Lamond", "Lancelin", "Landis", "Landry", "Laron", "Larrimore", "Laurent", "LaValle", "Leandre", "Leggett", "Leonce", "Leron", "Leverett", "Lilian", "Loïc", "Lorenzo", "Louis", "Lowell", "Luc", "Lucien", "Lukas", "Macaire", "Mace", "Mahieu", "Maison", "Malleville", "Manneville", "Mantel", "Marc", "Marcel", "Marion", "Marius", "Markez", "Markis", "Marmion", "Marquis", "Marquise", "Marshall", "Martial", "Maslin", "Mason", "Matheo", "Mathias", "Mathys", "Matthieu", "Maxence", "Mayson", "Mehdi", "Merle", "Merville", "Montague", "Montaigu", "Monte", "Montgomery", "Montreal", "Montrel", "Moore", "Morel", "Mortimer", "Nerville", "Neuveville", "Nicolas", "Noë", "Noah", "Noe", "Norman", "Norville", "Nouel", "Olivier", "Onfroi", "Paien", "Parfait", "Parnell", "Pascal", "Patrice", "Paul", "Peppin", "Percival", "Percy", "Pernell", "Peverell", "Philipe", "Pierpont", "Pierre", "Pomeroy", "Prewitt", "Purvis", "Quennell", "Quentin", "Quincey", "Quincy", "Quintin", "Rémi", "Rafaelle", "Ranger", "Raoul", "Raphaël", "Rapier", "Rawlins", "Ray", "Raynard", "Remi", "René", "Renard", "Rene", "Reule", "Reynard", "Robin", "Romain", "Rondel", "Roy", "Royal", "Ruff", "Rush", "Russel", "Rustin", "Sabastien", "Sacha", "Salomon", "Samuel", "Satordi", "Saville", "Scoville", "Sebastien", "Sennett", "Severin", "Shant", "Shantae", "Sidney", "Siffre", "Simeon", "Simon", "Sinclair", "Sofiane", "Somer", "Stephane", "Sully", "Sydney", "Sylvain", "Talbot", "Talon", "Telford", "Tempest", "Teppo", "Théo", "Thayer", "Thibault", "Thibaut", "Thiery", "Tiennan", "Tiennot", "Titouan", "Toussaint", "Travaris", "Tyson", "Urson", "Vachel", "Valentin", "Valere", "Vallis", "Verdun", "Victoir", "Victor", "Waltier", "William", "Wyatt", "Yanis", "Yann", "Yves", "Yvon", "Zosime", "Abrial", "Abrielle", "Abril", "Adele", "Alair", "Alerion", "Amee", "Angelique", "Annette", "Antonella", "Arian", "Ariane", "Armandina", "Aubree", "Aubrielle", "Audra", "Avril", "Bella", "Berneta", "Bette", "Blaise", "Blanche", "Blasa", "Bonte", "Brie", "Brienne", "Brigit", "Cachay", "Calice", "Camille", "Camylle", "Caprice", "Caressa", "Caroline", "Catin", "Celesta", "Celeste", "Cera", "Cerise", "Chablis", "Chalice", "Chambray", "Champagne", "Chandell", "Chaney", "Chantal", "Chante", "Chanterelle", "Chantile", "Chantilly", "Chantrice", "Charla", "Charlotte", "Charmane", "Chaton", "Chemin", "Chenetta", "Cher", "Chere", "Cheri", "Cheryl", "Christine", "Cidney", "Cinderella", "Claire", "Claudette", "Colette", "Cordelle", "Cydnee", "Daeja", "Daija", "Daja", "Damzel", "Darelle", "Darlene", "Darselle", "Dejanelle", "Deleena", "Delice", "Demeri", "Deni", "Denise", "Desgracias", "Desire", "Desiree", "Destanee", "Destiny", "Dior", "Domanique", "Dominique", "Elaina", "Elaine", "Elayna", "Elise", "Eloisa", "Elyse", "Emeline", "Emmaline", "Emmeline", "Estella", "Estrella", "Etiennette", "Evette", "Fabienne", "Fabrienne", "Fanchon", "Fancy", "Fawna", "Fayana", "Fayette", "Fifi", "Fleur", "Fleurette", "Fontanna", "Fosette", "Francine", "Frederique", "Gabriel", "Gabriele", "Gabrielle", "Gaby", "Garcelle", "Gena", "Genie", "Georgette", "Germaine", "Gervaise", "Gitana", "Harriet", "Heloisa", "Holland", "Honnetta", "Isabelle", "Ivette", "Ivonne", "Jacqueena", "Jacquetta", "Jacquiline", "Jacyline", "Jaime", "Jakqueline", "Janeen", "Janelly", "Janina", "Janiqua", "Janique", "Jannnelle", "Jaquita", "Jardena", "Jeanetta", "Jermaine", "Jessamine", "Jewel", "Jewell", "Joli", "Jolie", "Josephine", "Jozephine", "Julieta", "Karessa", "Karmaine", "Klara", "Laine", "Lanelle", "Laramie", "Layne", "Layney", "Leala", "Leonette", "Lissette", "Lizette", "Lourdes", "Lucienne", "Ly", "Lyla", "Lysette", "Madelaine", "Malerie", "Manette", "Marais", "Marcelle", "Marché", "Mardi", "Margo", "Marguerite", "Marie", "Marie Claude", "Marie Frances", "Marie Joelle", "Marie Pascale", "Marie Sophie", "Marjolaine", "Marquise", "Marvella", "Mathieu", "Matisse", "Maurelle", "Maurissa", "Mavis", "Melisande", "Michelle", "Miette", "Mignon", "Mimi", "Mirya", "Monet", "Moniqua", "Monteen", "Musetta", "Myrlie", "Nadeen", "Nadia", "Nadiyah", "Naeva", "Nanon", "Natalle", "Naudia", "Nettie", "Nicholas", "Nicki", "Nicky", "Nicole", "Nicolette", "Nicolina", "Nicolle", "Nikolette", "Ninette", "Ninon", "Noelle", "Nycole", "Odelette", "Opaline", "Orane", "Orva", "Page", "Parisa", "Parnel", "Parris", "Patrice", "Peridot", "Pippi", "Prairie", "Rachele", "Rachelle", "Racquel", "Raphaelle", "Raquelle", "Remi", "Renée", "Renea", "Renelle", "Renita", "Risette", "Rochelle", "Romy", "Rosabel", "Rosiclara", "Ruba", "Russhell", "Saleena", "Salina", "Satin", "Sedona", "Serene", "Shandelle", "Shanta", "Shante", "Shariah", "Sharita", "Sharleen", "Sheree", "Shereen", "Sherell", "Sherice", "Sherry", "Sidnee", "Sidney", "Sidnie", "Sidonie", "Sinclaire", "Solange", "Solen", "Sorrel", "Suzette", "Sydnee", "Sydney", "Tallis", "Tempest", "Toinette", "Turquoise", "Veronique", "Vignette", "Villette", "Violeta", "Virginie", "Voleta", "Vonny");

    List<String> surnames = List.of("Arceneau", "Aucoin", "Babin", "Babineaux", "Benoit", "Bergeron", "Bernard", "Bertrand", "Bessette", "Blanc", "Blanchard", "Bonnet", "Boucher", "Bourg", "Bourque", "Boutin", "Bouvier", "Braud", "Broussard", "Brun", "Chevalier", "David", "Depaul", "Desmarais", "Disney", "Dubois", "Dupont", "Dupuis", "Durand", "Fortescue", "Fournier", "Garnier", "Gaudet", "Gillet", "Gillette", "Girard", "Gravois", "Grosvenor", "Lambert", "Landry", "Laroche", "Laurent", "Lefevre", "Leroy", "Leveque", "Lisle", "Martin", "Michel", "Molyneux", "Moreau", "Morel", "Neville", "Pelletier", "Petit", "Prideux", "Renard", "Richard", "Robert", "Rousseau", "Roux", "Rufus", "Simon", "Thomas");

    for (int i = 1; i <= limit; i++) {

        String givenName = givenNames.get(random.nextInt(0, givenNames.size()));

        String surname = surnames.get(random.nextInt(0, surnames.size()));

        UUID id = UUID.randomUUID();

        String description = Person.LOREM_IPSUM;

        writer.writeRow(givenName, surname, id, description);

    }

    writer.close();

}


public static void main(final String[] args) {

    // Launch the app.

    CsvSpeed app = new CsvSpeed();


    // Write.

    String when = Instant.now().truncatedTo(ChronoUnit.SECONDS).toString().replace(":", "•");

    Path pathOutput = Paths.get("/tmp/persons.csv");

    app.write(pathOutput);

    System.out.println("Writing file: " + pathOutput);


    // Read.

    long start = System.nanoTime();

    Path pathInput = Paths.get("/tmp/persons.csv");

    List<Person> list = app.read(pathInput);

    long stop = System.nanoTime();


    // Time.

    long elapsed = (stop - start);

    Duration d = Duration.ofNanos(elapsed);

    System.out.println("Reading elapsed: " + d);

    System.out.println("Reading took nanos per row: " + (elapsed / list.size()));

    System.out.println("nanos elapsed: " + elapsed + "  |  list.size: " + list.size());

}

}

在我的机器上运行我得到以下时间:


Writing file: /tmp/persons.csv

Reading elapsed: PT0.230395859S

Reading took nanos per row: 5759

nanos elapsed: 230395859  |  list.size: 40000

它没有显示一旦您考虑 JIT 启动和优化代码,解析器可以达到多快。我已更改代码以生成 400K 记录(生成 200 MB 文件)。现在代码打印:


Reading elapsed: PT0.993483883S

Reading took nanos per row: 2483

nanos elapsed: 993483883  |  list.size: 400000

并且有 4M 行(几乎 2GB 的数据):


Reading elapsed: PT7.961481755S

Reading took nanos per row: 1990

nanos elapsed: 7961481755  |  list.size: 4000000


查看完整回答
反对 回复 2022-07-27
  • 4 回答
  • 0 关注
  • 624 浏览

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信