Extract XMP Metadata
Extract XMP meta-data from PDF documents using the pdf-xmp
endpoint.
The pdf-xmp
endpoint is for extracting XMP meta-data from PDF documents. In this tutorial we demonstrate just how easy it is to extract XMP meta-data from a PDF document via the pdf-xmp
endpoint. We first call the pdf-xmp
endpoint directly using REST. We then use the DynamicPDF client libraries to illustrate using pdf-xmp
with the C#, Java, Node.js, and PHP client libraries.
#
Required ResourcesTo complete this tutorial, you must add the Get XMP Metadata sample to your samples
folder in your cloud storage space using the Resource Manager. After adding the sample resources, you should see a samples/get-xmp-metadata-pdf-endpoint
folder containing the resources for this tutorial.
Sample | Sample Folder | Resources |
---|---|---|
Get XMP Metadata | samples/get-xmp-metadata-pdf-endpoint | fw4.pdf |
- From the Resource Manager, download
fw4.pdf
to your local system; here we assume/temp/dynamicpdf-api-samples/get-xmp-metadata
. - After downloading, delete
fw4.pdf
from your cloud storage space using the Resource Manager.
Resource | Cloud/Local |
---|---|
fw4.pdf | local |
tip
See Sample Resources for instructions on adding sample resources.
#
Obtaining API KeyThis tutorial assumes a valid API key obtained from the DynamicPDF Cloud API's Environment Manager
. Refer to the following for instructions on getting an API key.
tip
If you are not familiar with the Resource Manager or Apps and API Keys, refer to the following tutorial and relevant Users Guide pages.
#
Calling API Directly Using POSTThe pdf-xmp
endpoint takes a POST request. When using cURL, you specify the endpoint, the HTTP command, the API key and the local resources required. The following cURL command illustrates.
- Create a cURL POST request, where you pass the API key as a header and the PDF as binary data.
curl -X POST "https://api.dynamicpdf.com/v1.0/pdf-xmp" -H "Content-Type: application/pdf"-H "Authorization: Bearer xxxxxxxx" --data-binary "@c:/temp/dynamicpdf-api-samples/get-xmp-metadata/fw4.pdf"
- Execute the cURL command and the XML metadata is written to the commandline.
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.398682, 2009/08/10-13:00:47 "> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:format>application/pdf</dc:format> <dc:subject> <rdf:Bag> <rdf:li>Fillable</rdf:li> </rdf:Bag> </dc:subject> <dc:description> <rdf:Alt> <rdf:li xml:lang="x-default">Employee's Withholding Certificate</rdf:li> </rdf:Alt> </dc:description> <dc:creator> <rdf:Seq> <rdf:li>SE:W:CAR:MP</rdf:li> </rdf:Seq> </dc:creator> <dc:title> <rdf:Alt> <rdf:li xml:lang="x-default">2021 Form W-4</rdf:li> </rdf:Alt> </dc:title> </rdf:Description> <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/"> <xmp:CreatorTool>Adobe LiveCycle Designer ES 9.0</xmp:CreatorTool> <xmp:MetadataDate>2020-12-31T09:12:43-05:00</xmp:MetadataDate> <xmp:ModifyDate>2020-12-31T09:12:43-05:00</xmp:ModifyDate> <xmp:CreateDate>2020-12-31T09:12:43-05:00</xmp:CreateDate> </rdf:Description> <rdf:Description rdf:about="" xmlns:pdf="http://ns.adobe.com/pdf/1.3/"> <pdf:Producer>Adobe LiveCycle Designer ES 9.0</pdf:Producer> <pdf:Keywords>Fillable</pdf:Keywords> </rdf:Description> <rdf:Description rdf:about="" xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"> <xmpMM:DocumentID>uuid:01d97a6e-5605-44ae-8015-54a82bc56c5c</xmpMM:DocumentID> <xmpMM:InstanceID>uuid:9d6007b3-eacb-4f13-8d6b-da9d46b7dfb3</xmpMM:InstanceID> </rdf:Description> <rdf:Description rdf:about="" xmlns:desc="http://ns.adobe.com/xfa/promoted-desc/"> <desc:embeddedHref rdf:parseType="Resource"> <rdf:value>..\..\..\..\..\..\..\TFACS\Misc\logo\pencil.bmp</rdf:value> <desc:ref>/template/subform[1]/subform[3]/draw[2]</desc:ref> </desc:embeddedHref> </rdf:Description> </rdf:RDF></x:xmpmeta><?xpacket end="w"?>
#
Calling Endpoint Using Client LibraryTo simplify development, you can also use one of the DynamicPDF Cloud API client libraries. Use the client library of your choice to complete this tutorial section.
#
Complete SourceYou can access the complete source for this project at one of the following GitHub projects.
Language | File Name | Location (package/namespace/etc.) | GitHub Project |
---|---|---|---|
Java | GetXmpMetaData.java | com.dynamicpdf.api.examples | https://github.com/dynamicpdf-api/java-client-examples |
C# | Program.cs | GetXmpMetaData | https://github.com/dynamicpdf-api/dotnet-client-examples |
Nodejs | GetXmpMetaData.js | nodejs-client-examples | https://github.com/dynamicpdf-api/nodejs-client-examples |
PHP | GetXmpMetaData.php | php-client-examples | https://github.com/dynamicpdf-api/php-client-examples |
tip
Click on the language tab of choice to view the tutorial steps for the particular language.
- C# (.NET)
- Java
- Node.js
- PHP
Available on NuGet:
Install-Package DynamicPDF.API
- Create a new Console App (.NET Core) project named
GetXmpMetaData
. - Add the DynamicPDF.API NuGet package.
- Create a new static method named
Run
. - Add the following code to the
Run
method. - Create a new
PdfXmp
instance andPdfResource
instance with the path to the PDF in its constructor. - Add a call to the
Process
method in thePdfXmp
instance. - Ensure the call was successful and add code to print the results to the console.
- Run the application and the XML metadata is printed to the console.
using DynamicPDF.Api;using System;
namespace GetXmpMetaData{ class Program { static void Main(string[] args) { Run("DP.xxxx --- api key --- xxxx", "C:/temp/dynamicpdf-api-samples/get-xmp-metadata"); }
public static void Run(String apiKey, String basePath) { //get the local pdf as pdf resource PdfResource resource = new PdfResource(basePath + "/fw4.pdf"); //load the pdf and call the endpoint PdfXmp pdfXmp = new PdfXmp(resource); pdfXmp.ApiKey = apiKey; XmlResponse response = pdfXmp.Process();
//if successful print results to console if (response.IsSuccessful) { Console.WriteLine(response.Content); } else { Console.WriteLine(response.ErrorJson); } } }}
Available on NPM:
npm i @dynamicpdf/api
- Use npm to install the DynamicPDF Cloud API module.
- Create a new class named
GetXmpMetaData
. - Create a static
Run
method. - Create a new
PdfXmp
instance andPdfResource
instance with the path to the PDF in its constructor. - Add a call to the
Process
method in thePdfXmp
instance.
import fs from 'fs';import { PdfXmp, PdfResource, Endpoint} from "@dynamicpdf/api"
export class GetXmpMetaData {
static async Run() { //get Pdf as PdfResource and load into new PdfXmp var resource = new PdfResource("C:/temp/dynamicpdf-api-samples/get-xmp-metadata/fw4.pdf") var pdfXmp = new PdfXmp(resource); pdfXmp.apiKey = "DP api key here";
//call the endpoint too get results var res = await pdfXmp.process();
//if call was successful print xml to console if (res.isSuccessful) { console.log(res.content); } else { console.log(res.errorJson); } }}await GetXmpMetaData.Run();
- Run the application
node GetXmpMetaData.js
and the XML is output to the console.
Available on Maven:
https://search.maven.org/search?q=g:com.dynamicpdf.api
<dependency> <groupId>com.dynamicpdf.api</groupId> <artifactId>dynamicpdf-api</artifactId> <version>1.0.0</version></dependency>
Create a new Maven project and add the DynamicPDF API as a dependency.
Create a new class named
GetXmpMetaData
with amain
method.Create a new method named
Run
.Add the
Run
method call tomain
.Create a new
PdfXmp
instance andPdfResource
instance with the path to the PDF in its constructor.Add a call to the
process
method in thePdfXmp
instance.Ensure the call was successful and add code to print the results to the console.
Run the application and the XML metadata is printed to the console.
package com.dynamicpdf.api.examples;
import com.dynamicpdf.api.PdfResource;import com.dynamicpdf.api.PdfXmp;import com.dynamicpdf.api.XmlResponse;
public class GetXmpMetaData {
public static void main(String[] args) {GetXmpMetaData.Run("DP.7vATWolKJ4xdaefbf/pTgSW7uGWofsZAKctZ1J/hzV9yTrzDvmDI1lwT", "C:/temp/dynamicpdf-api-samples/get-xmp-metadata/"); }
public static void Run(String apiKey, String basePath) { //load local pdf as a PdfResource and add to // PdfXmp instance PdfResource resource = new PdfResource(basePath + "/fw4.pdf"); PdfXmp pdfXmp = new PdfXmp(resource); pdfXmp.setApiKey(apiKey); //call the endpoint XmlResponse response = pdfXmp.process();
//if successful then print xml to console if (response.getIsSuccessful()) { System.out.println(response.getContent()); } else { System.out.println(response.getErrorJson()); } }}
Available as a Composer package:
composer require dynamicpdf/api
- Use composer to ensure you have the required PHP libraries.
- Create a new class named
GetXmpMetaData
. - Add a
Run
method. - Create a new
PdfXmp
instance andPdfResource
instance with the path to the PDF in its constructor. - Add a call to the
Process
method in thePdfXmp
instance. - Ensure the call was successful and add code to print the results to the console.
- Add the call to
GetXmpMetaData::Run()
method.
<?php require __DIR__ . '/vendor/autoload.php';
use DynamicPDF\Api\PdfXmp;use DynamicPDF\Api\PdfResource;
class GetXmpMetaData {
private static string $BasePath = "C:/temp/dynamicpdf-api-samples/get-xmp-metadata";
public static function Run() { //get the PDF and load as PdfResource then add to PdfXmp $resource = new PdfResource(GetXmpMetaData::$BasePath . "/fw4.pdf"); $pdfXmp = new PdfXmp($resource); $pdfXmp->ApiKey = "DP.7vATWolKJ4xdaefbf/pTgSW7uGWofsZAKctZ1J/hzV9yTrzDvmDI1lwT"; //call the endpoint to get the results $response = $pdfXmp->Process(); //print xml results to console if($response->IsSuccessful) { echo($response->Content); } else { echo($response->ErrorMessage); } }}GetXmpMetaData::Run();
- Run the application
php GetXmpMetaData.php
and the XML metadata is printed to the console.
In all four languages, the steps were similar. First, we created a new PdfResource
instance by loading the path to the PDF via the constructor. Next, we created a new instance of the PdfXmp
class, which abstracts the pdf-xmp
endpoint. Then the PdfXmp
instance prints the XML metadata after processing. Finally, we called the Process
method and printed the resultant XML to the console.