【pdf表格解析神器】-创新互联

pdf表格解析神器
    • 依赖
    • 代码
    • 说明

无锡ssl适用于网站、小程序/APP、API接口等需要进行数据传输应用场景,ssl证书未来市场广阔!成为成都创新互联的ssl证书销售渠道,可以享受市场价格4-6折优惠!如果有意向欢迎电话联系或者加微信:13518219792(备注:SSL证书合作)期待与您的合作!依赖
org.apache.pdfboxpdfbox2.0.24technology.tabulatabula1.0.3com.fasterxml.jackson.corejackson-databind2.9.5
代码
private static void parse() throws ParseException, IOException {
    long start = System.currentTimeMillis();
    String src = "C:\\Users\\账单\\表单.pdf";
    String[] argsa = new String[]{"-f=JSON","-p=all", src,"-l"};
    //CommandLineApp.main(argsa);
    CommandLineParser parser = new DefaultParser();
    CommandLine cmd = parser.parse(CommandLineApp.buildOptions(), argsa);
    StringBuilder stringBuilder = new StringBuilder();
    new CommandLineApp(stringBuilder, cmd).extractTables(cmd);
    ObjectMapper objectMapper = new ObjectMapper();
    JavaType javaType = objectMapper.getTypeFactory().constructParametricType(ArrayList.class, TabulaPageDTO.class);
    objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
    Listpages = objectMapper.readValue(stringBuilder.toString(), javaType);
    pages.stream().flatMap(p ->p.getData().stream()).forEach(row ->{
        row.forEach(a ->System.out.print(a.getText().replaceAll("\r|\n|\t", "").trim() + "    "));
        System.out.println();
    });
    long end = System.currentTimeMillis();
    long cost = end - start;
    System.out.println("解析耗时:" + cost);
}

//单元格DTO
public class TabulaAreaDTO {

private String text;

public String getText() {
    return text;
}

public void setText(String text) {
    this.text = text;
}

}

//页DTO
public class TabulaPageDTO {

private List>data;

public List>getData() {
    return data;
}

public void setData(List>data) {
    this.data = data;
}

}

说明

好用的话记得点赞收藏哦!!!

你是否还在寻找稳定的海外服务器提供商?创新互联www.cdcxhl.cn海外机房具备T级流量清洗系统配攻击溯源,准确流量调度确保服务器高可用性,企业级服务器适合批量采购,新人活动首月15元起,快前往官网查看详情吧


当前文章:【pdf表格解析神器】-创新互联
文章来源:http://ybzwz.com/article/csosei.html