【pdf表格解析神器】-创新互联
pdf表格解析神器
当前文章:【pdf表格解析神器】-创新互联
文章来源:http://ybzwz.com/article/csosei.html
- 依赖
- 代码
- 说明
org.apache.pdfbox pdfbox2.0.24 technology.tabula tabula1.0.3 com.fasterxml.jackson.core jackson-databind2.9.5
代码private static void parse() throws ParseException, IOException {
long start = System.currentTimeMillis();
String src = "C:\\Users\\账单\\表单.pdf";
String[] argsa = new String[]{"-f=JSON","-p=all", src,"-l"};
//CommandLineApp.main(argsa);
CommandLineParser parser = new DefaultParser();
CommandLine cmd = parser.parse(CommandLineApp.buildOptions(), argsa);
StringBuilder stringBuilder = new StringBuilder();
new CommandLineApp(stringBuilder, cmd).extractTables(cmd);
ObjectMapper objectMapper = new ObjectMapper();
JavaType javaType = objectMapper.getTypeFactory().constructParametricType(ArrayList.class, TabulaPageDTO.class);
objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
Listpages = objectMapper.readValue(stringBuilder.toString(), javaType);
pages.stream().flatMap(p ->p.getData().stream()).forEach(row ->{
row.forEach(a ->System.out.print(a.getText().replaceAll("\r|\n|\t", "").trim() + " "));
System.out.println();
});
long end = System.currentTimeMillis();
long cost = end - start;
System.out.println("解析耗时:" + cost);
}
//单元格DTO
public class TabulaAreaDTO {
private String text;
public String getText() {
return text;
}
public void setText(String text) {
this.text = text;
}
}
//页DTO
public class TabulaPageDTO {
private List>data;
public List>getData() {
return data;
}
public void setData(List>data) {
this.data = data;
}
}
说明好用的话记得点赞收藏哦!!!
你是否还在寻找稳定的海外服务器提供商?创新互联www.cdcxhl.cn海外机房具备T级流量清洗系统配攻击溯源,准确流量调度确保服务器高可用性,企业级服务器适合批量采购,新人活动首月15元起,快前往官网查看详情吧
当前文章:【pdf表格解析神器】-创新互联
文章来源:http://ybzwz.com/article/csosei.html