python导出csv有引号_python – csv中的双引号元素不能用pandas读取
我有一個輸入文件,其中每個值都存儲為一個字符串.
它位于一個csv文件中,每個條目都在雙引號內.
示例文件:
"column1","column2", "column3", "column4", "column5", "column6"
"AM", "07", "1", "SD", "SD", "CR"
"AM", "08", "1,2,3", "PR,SD,SD", "PR,SD,SD", "PR,SD,SD"
"AM", "01", "2", "SD", "SD", "SD"
只有六列.我需要輸入哪些選項來pandas read_csv才能正確讀取?
我目前正在嘗試:
import pandas as pd
df = pd.read_csv(file, quotechar='"')
但這給了我錯誤信息:
CParserError:標記數據時出錯. C錯誤:第3行預計6個字段,見14
這顯然意味著它忽略了”’并將每個逗號解析為一個字段.
但是,對于第3行,第3列到第6列應該是包含逗號的字符串. (“1,2,3”,“PR,SD,SD”,“PR,SD,SD”,“PR,SD,SD”)
如何讓pandas.read_csv正確解析?
謝謝.
解決方法:
這會奏效.它回退到python解析器(因為你有非常規的分隔符,例如它們是逗號,有時是空格).如果你只有逗號它會使用c-parser并且速度更快.
In [1]: import csv
In [2]: !cat test.csv
"column1","column2", "column3", "column4", "column5", "column6"
"AM", "07", "1", "SD", "SD", "CR"
"AM", "08", "1,2,3", "PR,SD,SD", "PR,SD,SD", "PR,SD,SD"
"AM", "01", "2", "SD", "SD", "SD"
In [3]: pd.read_csv('test.csv',sep=',\s+',quoting=csv.QUOTE_ALL)
pandas/io/parsers.py:637: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.
ParserWarning)
Out[3]:
"column1","column2" "column3" "column4" "column5" "column6"
"AM" "07" "1" "SD" "SD" "CR"
"AM" "08" "1,2,3" "PR,SD,SD" "PR,SD,SD" "PR,SD,SD"
"AM" "01" "2" "SD" "SD" "SD"
標簽:python,pandas,csv
總結
以上是生活随笔為你收集整理的python导出csv有引号_python – csv中的双引号元素不能用pandas读取的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 叮!您收到一份超值Java基础入门资料!
- 下一篇: 【学习笔记】网络层:应用模型、DNS系统