在Java中解析上下文类
ParseContext 类是Java包 org.apache.tika.parser 的一个组件,用于解析上下文并将其传递给 Tika(Apache Tika 工具包检测并从一千多种不同的文件类型中提取元数据和文本) 解析器 org.apache.tika.parser.ParseContext 实现了一个 Serializable 接口。
解析器的上下文是另一个概念。 Parse 上下文类的实例(对象)在非终端开始解析之前创建,并在解析结束后销毁
提供上下文是为了可扩展性。它的主要目的是暴露非终端的解析成员函数的开始和结束以适应外部钩子。我们可以通过编写专门的上下文类以多种方式扩展非终结符,而无需修改类本身。例如,我们可以通过编写一个上下文类来使非终端发出调试诊断信息,该类在解析遍历中调用非终端的每个点打印出扫描仪的当前状态。
Class ParseContext- org.apache.tika.parser
All Implemented Interfaces: Serializable
Class ParseContext -java.lang.Object org.apache.tika.parser.ParseContext
Parse context 使用将上下文信息传递给 Tika 解析器
句法:
public class ParseContext extends Object implements Serializable
Parse Context 类的构造函数
1. ParseContext():构造函数 ParseContext() 初始化 ParseContext 类的新实例。例如:
ParseContext ab = new ParseContext()
Note: ab is the new instance of the ParseContext Class
Parse Context 类的方法
Method | Description |
---|---|
get(Class | Returning the object in this context that implements the given interface |
getDocumentBuilder() | Returning the DOM builder specified in this parsing context |
getSAXParser() | Returning the SAX parser specified in this parsing context |
getSAXParserFactory() | Returning the SAX parser factory specified in this parsing context |
getTransformer() | Returning the transformer specified in this parsing context |
getXMLInputFactory() | Returning the StAX input factory specified in this parsing context |
getXMLReader() | Returning the XMLReader specified in this parsing context |
set(Class | Adding the given value to context as an implementation of the given interfaces |
get(Class | Returning the objects in this context that implementing the given interface, or the given default value if such an object it is not found |
例子:
Java
// Java Program To Getting Content of Document
// Using Tika Toolkit and ContextParser
// Importing required classes
import java.io.*;
import java.io.File.*;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.txt.TXTParser;
import org.apache.tika.sax.BodyContentHandler;
// Importing necessary Tika packages to it
import org.xml.sax.SAXException;
// Main class
class Cowin {
// Main driver method
public static void main(String[] args)
{
// Creating a File
File fileName = new File("abc.txt");
// Instance of File created using new keyword
FileInputStream fileInputStream
= new FileInputStream(fileName);
ParseContext parseContext = new ParseContext();
// new instance of parseContext class is created
MetaData metaData = new MetaData();
// Instance of MetaData is created
TXTParser textParser = new TXTParser();
// Instance of BodyContentHandler is created for it
BodyContentHandler bodyContentHandler
= new BodyContentHandler();
// TXTParser parse method is called for parsing to
// it
textParser.parse(fileInputStream,
bodyContentHandler, metaData,
parseContext);
System.out.println("Contents in the File="
+ bodyContenthandler.toString());
}
}
输出:
Contents in the File= Cowin is the webportal for Vaccination
abc.txt 文件包含以下数据-