返回顶部

使用 Python3,pandas 0.12 我正在尝试将多个 csv 文件(总大小为 7.9 GB)写入 HDF5 存储以供以后处理。csv 文件每个包含大约一百万行,15 列,数据类型主要是字符串,但也有一些浮点数。但是,当我尝试读取 csv 文件时,出现以下错误: Traceback (most recent call last): File "filter-1.py", line 38, in to_hdf() File "filter-1.py", line 31, in to_hdf for chunk in reader: File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 578, in __iter__ yield self.read(self.chunksize) File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 608, in read ret = self._engine.read(nrows) File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 1028, in read data = self._reader.read(nrows) File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas\parser.c:6745) File "parser.pyx", line 740, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:7146) File "parser.pyx", line 781, in pandas.parser.TextReader._read_rows (pandas\parser.c:7568) File "parser.pyx", line 768, in pandas.parser.TextReader._tokenize_rows (pa

3

0/300

评论 1

fish

我有一个类似的问题。列有“EOF inside string”的行有一个字符串,其中包含一个单引号。当我添加选项 quoting=csv.QUOTE_NONE 时,它解决了我的问题。 例如: import csv df = pd.read_csv(csvfile, header = None, delimiter="\t", quoting=csv.QUOTE_NONE, encoding='utf-8')

2022-02-07 14:58:39

- 没有更多了 -