Jericho - HTML 解析器


LGPL
跨平台
Java

软件简介

Jericho HTML解析器是一个Java库,以分析和操纵部分的HTML文件,其中包括服务器端的标签,而过滤掉任何无法识别的或无效的HTML
。它也提供高层次的HTML表单操作函数。

示例代码:

import net.htmlparser.jericho.*;  
import java.util.*;  
import java.io.*;  
import java.net.*;

public class Encoding {  
    public static void main(String[] args) throws Exception {  
        String sourceUrlString="data/test.html";  
        if (args.length==0)  
          System.err.println("Using default argument of \""+sourceUrlString+'"');  
        else  
            sourceUrlString=args[0];  
        if (sourceUrlString.indexOf(':')==-1) sourceUrlString="file:"+sourceUrlString;  
        System.out.println("\nSource URL:");  
        System.out.println(sourceUrlString);  
        URL url=new URL(sourceUrlString);  
        Source source=new Source(url);  
        System.out.println("\nDocument Title:");  
        Element titleElement=source.getFirstElement(HTMLElementName.TITLE);  
        System.out.println(titleElement!=null ? titleElement.getContent().toString() : "(none)");  
        System.out.println("\nSource.getEncoding():");  
        System.out.println(source.getEncoding());  
        System.out.println("\nSource.getEncodingSpecificationInfo():");  
        System.out.println(source.getEncodingSpecificationInfo());  
        System.out.println("\nSource.getPreliminaryEncodingInfo():");  
        System.out.println(source.getPreliminaryEncodingInfo());  
    }  
}