PDF.co API for Word to PDF Conversion and Parsing
Hello,
I am posting this topic here because the developer community has been so helpful in the past. We have been attempting to work with PDF.co to get an answer on some issues we're having, but there must be a time zone delay, and there's just been a lot of back and forth, rather than an actual resolution. Here is what we're attempting to do:
- Download a .docx file from a custom module's attachments
- Obtain a pre-signed URL from PDF.co (they're using AWS) to work with the file
- URL response - url to access uploaded file
- Presigned URL - used to upload local file
- Upload the .docx file to AWS with the pre-signed URL
- Convert the .docx file to a PDF using the PDF.co endpoint and the URL obtained from the pre-signed URL upload
- Parse the PDF and return as JSON to our deluge custom function
- We then iterate through it and do the things we need to do
A couple more notes before we get into the code:
- We tried to convert the .docx file to a PDF using the Zoho Writer API and found that cell borders disappeared causing issues with the parsing at PDF.co. BUT, everything else worked great, including uploading the PDF to the pre-signed URL.
- We attempted to upload the file to PDF.co's built in storage, rather than the pre-signed URL, but when we download the file, and then attempt to pass it as a parameter, we get an error saying the file parameter is missing. We believe this is because the upload endpoint wants the file path. We don't know how to get that or if we even can get that from Zoho.
- The pre-signed URL endpoint worked for us when we uploaded the Zoho converted PDF file.
Okay, on to the code. Thanks in advance for your help and grace with this far from perfect code! I have omitted the parsing code since we're getting hung up before then.
- thisRec = zoho.crm.getRecordById("Meeting_Details",thisRecId);
- accountId = thisRec.get("Account").get("id");
- newList = List();
- //Get all attachments on meeting detail rec
- response = invokeurl
- [
- url :"https://www.zohoapis.com/crm/v6/Meeting_Details/" + thisRecId + "/Attachments?fields=id,Owner,File_Name,Created_Time,Parent_Id&sort_order=desc&sort_by=Created_Time"
- type :GET
- connection:"******"
- ];
- // info response;
- responseNull = response.isNull();
- info "Attachment Response Empty: " + responseNull;
- if(responseNull == true)
- {
- info "No Attachments to parse";
- return;
- }
- //Get last uploaded file
- fileRec = response.get("data").get(0);
- info "File Rec: " + fileRec;
- fileId = fileRec.get("id");
- info "File ID: " + fileId;
- fileName = fileRec.get("File_Name");
- info "File Name: " + fileName;
- createdTime = fileRec.get("Created_Time");
- info "Created Time: " + createdTime;
- //Download File from Meeting Detail Rec
- downloadResponse = invokeurl
- [
- url :"https://www.zohoapis.com/crm/v6/Meeting_Details/" + thisRecId + "/Attachments/" + fileId
- type :GET
- connection:"******"
- ];
- info downloadResponse;
- fileCheck = downloadResponse.isFile();
- info "File? " + fileCheck;
- //
- //PDF.co Headers
- apiKey = "*******************";
- headers = Map();
- headers = {"x-api-key":apiKey};
- // Upload Permission
- setupFileUpload = invokeurl
- [
- url :"https://api.pdf.co/v1/file/upload/get-presigned-url?name=" + fileName + "&encrypt=false"
- type :GET
- headers:headers
- detailed:true
- ];
- setupFileUpload = setupFileUpload.get("responseText");
- // info "File Upload Response: " + setupFileUpload;
- uploadUrl = setupFileUpload.get("presignedUrl");
- info "Upload URL: " + uploadUrl;
- workingFileUrl = setupFileUpload.get("url");
- info "Working URL: " + workingFileUrl;
- // Actual Upload
- uploadHeaders = Map();
- uploadHeaders.put("x-api-key",apiKey);
- uploadHeaders.put("Content-Type","application/octet-stream");
- uploadParams = Map();
- uploadParams.put("file",downloadResponse);
- uploadParams.put("expiration",20);
- uploadParams.put("async","true");
- actualUpload = invokeurl
- [
- url :uploadUrl
- type :PUT
- parameters:uploadParams
- headers:uploadHeaders
- detailed:true
- ];
- info "Actual Upload: " + actualUpload;
- //
- //Convert to PDF
- conversionURL = "https://api.pdf.co/v1/pdf/convert/from/doc";
- conversionHeader = Map();
- conversionHeader.put("x-api-key",apiKey);
- conversionHeader.put("Content-Type","application/json");
- conversionPayload = {"url":workingFileUrl,"async":false,"inline":"true","password":"","profiles":""};
- conversionResult = invokeurl
- [
- url :conversionURL
- type :POST
- parameters:conversionPayload
- headers:conversionHeader
- detailed:true
- ];
- info "Conversion Result: " + conversionResult;
- convertedDocURL = conversionResult.get("responseText").get("url");
- info "Converted Doc URL: " + convertedDocURL;
We believe the issue is happening on the upload to the pre-signed URL. Once we upload the file 66-74, we then use the workingFileURL to look at the document. We are able to download, but have to use Word's recovery tool as it's corrupted. The code continues and it converts the file to a PDF, but it is unintelligible.
Thanks so much for your help!