Build your own Metadata extractor
ECM is capable to automatically extract metadata from files at upload time. We have extractors for most common office documents (.doc, .pdf, .ppt, .xls, ...). But this mechanism is extensible through plugins for your own metadata management. This tutorial will show you how to extract the metadata from a simple properties file.Create and deploy my extractor
- Create a class called org.exoplatform.tutorial MyMetadataExtractor which has to extend org.exoplatform.services.document.impl.BaseDocumentReader. As follow:
import org.exoplatform.services.document.impl.BaseDocumentReader; public class MyMetadataExtractor extends BaseDocumentReader {
- Implement the getContentAsText methods which are dedicated to the full content extraction. As follow:
/** * Text extraction for the full text indexing */ public String getContentAsText(InputStream input) throws Exception { // Create a Properties Object Properties properties = new Properties(); // Load the properties from the input stream properties.load(input); // Create a StringBuilder Object to append the full content of my file StringBuilder content = new StringBuilder(); for (Enumeration keys = properties.keys(); keys.hasMoreElements(); ) { String key = (String) keys.nextElement(); // Get the value from the key String value = properties.getProperty(key); // Append the value to the content content.append(value).append(' '); } return content.toString(); } /** * Text extraction for the full text indexing with a specific character encoding */ public String getContentAsText(InputStream input, String encoding) throws Exception { return getContentAsText(input); }
- Add the configuration file to declare the component to the eXo Platform, for this create a configuration.xml file into the conf/portal directory (into the source folder) and add the following content:
<?xml version="1.0" encoding="ISO-8859-1"?> <configuration> <!-- Define my Metadata Extractor --> <external-component-plugins> <target-component>org.exoplatform.services.document.DocumentReaderService</target-component> <component-plugin> <name>my.document.reader</name> <set-method>addDocumentReader</set-method> <type>org.exoplatform.tutorial.MyMetadataExtractor</type> <description>to read my specific stream</description> </component-plugin> </external-component-plugins> </configuration>
- Build the resulting jar, to do this you have to use maven(maven 2 must be correctly installed in your system), you can create a pom.xml in the root folder of the project with the following content:
<project> <modelVersion>4.0.0</modelVersion> <groupId>org.exoplatform.tutorial</groupId> <artifactId>exo.tutorial.metadata-extraction</artifactId> <packaging>jar</packaging> <version>trunk</version> <description>Tutorial metadata extraction</description> <dependencies> <dependency> <groupId>org.exoplatform.core</groupId> <artifactId>exo.core.component.document</artifactId> <version>trunk</version> <scope>compile</scope> </dependency> <dependency> <groupId>org.exoplatform.kernel</groupId> <artifactId>exo.kernel.commons</artifactId> <version>trunk</version> <scope>compile</scope> </dependency> <dependency> <groupId>org.exoplatform.kernel</groupId> <artifactId>exo.kernel.container</artifactId> <version>trunk</version> <scope>compile</scope> </dependency> </dependencies> <build> <resources> <resource> <directory>src/main/java</directory> <includes> <include>**/*.xml</include> </includes> </resource> </resources> </build> </project>
- Copy the jar into the ${TOMCAT_HOME}/lib directory
- Start tomcat and check for the following message "ExoContainer - org.exoplatform.tutorial.MyMetadataExtractor added to portal"
Test my extractor
To test the extractor we must:- Create a properties file called myInputFile.properties with the following content:
Author=my Author Subject=my Subject Description=my Description
- Authenticate to the portal as root
- Go to Content Management -> File Explorer
- Select a drive to upload the file like Collaboration Center
- Click on the Upload icon
- Choose the properties file that we created below then click on the Upload icon, the file will be uploaded to the server
- Click on the Save button, you will see the following window:
- Click on the Edit button to see the extracted metadata, you will see the following window:
- To retrieve easily your file you can select the tab Search then type "my Author" in the seach text box, you will see the following result:
Note: The source files can be found in the zip file attached to this page, see the end of this page
on 21/08/2008 at 23:58