Capacitor Plugin for Text Detection Part 2 : iOS Plugin

This is part 2/6 of the "Capacitor Plugin for Text Detection" series.

In the previous post, we created a skeleton plugin and a sample app (ImageReader) to work with the plugin. In this post, let's dive into creating an iOS Plugin for detecting text in still images.

iOS Plugin

For the iOS plugin, I'm using Apple's Vision Framework to perform text detection.

From ImageReader, open up the iOS project in XCode with npx cap open iOS, and make sure that the app's BundleIdentifier is same as the appId in capacitor.config.json

Plugin Code is located at Pods/Development Pods/CapML. Plugin.swift is the entry point for our plugin and Plugin.m has the definitions for our plugin

Step 1

Open up Plugin.swift and change the name of the function to detectText because that's what we'd be doing here. call: CAPPluginCall contains all the data that the client sends. For example, if the client sends in something like {filepath: 'file://path/filename', width: '200'}, we can extract the parameters from call like call.getString('filename') and call.getString('width'). Detailed documentation about it can be found on the capacitor website

import Foundation
import Capacitor

@objc(CapML)
public class CapML: CAPPlugin {
    @objc func detectText(_ call: CAPPluginCall) {
        guard var filepath = call.getString("filepath") else {
            call.reject("file not found")
            return
        }
        call.success([
            "value": filepath
        ])
    }
}

A plugin call can succeed or fail. Note in the above code snippet that we're using call.reject and call.success for failure and success respectively.

Step 2

Create a new swift file TextDetection.swift. Here, let's create a class TextDetection that works out the text detection.

We're using Apple's Vision Framework which supports text detection on iOS 13.0 or higher.

Create a new swift file TextDetection.swift which takes in an instance of CAPPluginCall we saw earlier and a UIImage.

@available(iOS 13.0, *)
public class TextDetector {
  let call: CAPPluginCall
  let image: UIImage

    public init(call: CAPPluginCall, image: UIImage) {
      self.call = call
      self.image = image
  }
}

VNImageRequestHandler(cgImage, options) processes image analysis requests on the CGImage that is passed in. We can get a CGImage from our UIImage, image like this image.cgImage.

Let's create a new function detectText and create an instance of VNImageRequestHandler.

public func detectText() {
  guard let cgImage = image.cgImage else {
    print("Looks like uiImage is nil")
    return
  }

  // VNImageRequestHandler processes image analysis requests on a single image.
  let imageRequestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
}

VNImageRequestHandler by default assumes that the image is upright. Optionally we can also pass in orientation like this if it's available

let inputOrientation = call.getString("orientation")
if inputOrientation != nil {
    orientation = self.getOrientation(orientation: inputOrientation!)
} else {
    orientation = CGImagePropertyOrientation.up
}

// VNImageRequestHandler processes image analysis requests on a single image.
let imageRequestHandler = VNImageRequestHandler(cgImage: cgImage,orientation: orientation,

the getOrientation function converts the orientation we receive from our client code into something Vision can understand

func getOrientation(orientation: String) -> CGImagePropertyOrientation {
  switch orientation {
  case "UP": return CGImagePropertyOrientation.up
  case "DOWN": return CGImagePropertyOrientation.down
  case "LEFT": return CGImagePropertyOrientation.left
  case "RIGHT": return CGImagePropertyOrientation.right
  default:
      return CGImagePropertyOrientation.up
  }
}

VNImageRequestHandler.perform performs the image analysis request we pass in, on the image. Here I'm passing in textDetectionRequest, the definition of which you'll see in the next step.

DispatchQueue.global(qos: .userInitiated).async {
  do {
      try imageRequestHandler.perform([self.textDetectionRequest])
  } catch let error as NSError {
      print("Failed to perform image request: \(error)")
      self.call.reject(error.description)
  }
}

Add in the definition for the image analysis request we passed in above - imageRequestHandler.perform([self.textDetectionRequest]). VNRecognizeTextRequest is the image analysis request that finds and recognizes text in an image.

lazy var textDetectionRequest: VNRecognizeTextRequest = {
  // Specifying the image analysis request to perform - text detection here
  let textDetectRequest = VNRecognizeTextRequest(completionHandler: handleDetectedText)
  return textDetectRequest
}()

Upon completion, it passes on the result to the completion handler handleDetectedText from where we'd be able to access the result.

func handleDetectedText(request: VNRequest?, error: Error?) {
  DispatchQueue.main.async {
    guard let results = request?.results as? [VNRecognizedTextObservation] else {
        self.call.reject("error")
        return
    }
  }
}

Here is the complete implementation of the class -

import Foundation
import Vision
import Capacitor

@available(iOS 13.0, *)
public class TextDetector {
    var detectedText: [[String: Any]] = []
    let call: CAPPluginCall
    let image: UIImage
    var orientation: CGImagePropertyOrientation

    public init(call: CAPPluginCall, image: UIImage) {
        self.call = call
        self.image = image
        self.orientation = CGImagePropertyOrientation.up
    }

    public func detectText() {
        guard let cgImage = image.cgImage else {
            print("Looks like uiImage is nil")
            return
        }

        let inputOrientation = call.getString("orientation")

       options: [:])

        DispatchQueue.global(qos: .userInitiated).async {
            do {
                try imageRequestHandler.perform([self.textDetectionRequest])
                self.call.success(["textDetections": self.detectedText])
            } catch let error as NSError {
                print("Failed to perform image request: \(error)")
                self.call.reject(error.description)
            }
        }
    }

    lazy var textDetectionRequest: VNRecognizeTextRequest = {
        // Specifying the image analysis request to perform - text detection here
        let textDetectRequest = VNRecognizeTextRequest(completionHandler: handleDetectedText)
        return textDetectRequest
    }()

    func handleDetectedText(request: VNRequest?, error: Error?) {
        if error != nil {
            call.reject("Text Detection Error \(String(describing: error))")
            return
        }
        DispatchQueue.main.async {
            //  VNRecognizedTextObservation contains information about both the location and
            //  content of text and glyphs that Vision recognized in the input image.
            guard let results = request?.results as? [VNRecognizedTextObservation] else {
                self.call.reject("error")
                return
            }

            self.detectedText = results.map {[
                "topLeft": [Double($0.topLeft.x), Double($0.topLeft.y)] as [Double],
                "topRight": [Double($0.topRight.x), Double($0.topRight.y)] as [Double],
                "bottomLeft": [Double($0.bottomLeft.x), Double($0.bottomLeft.y)] as [Double],
                "bottomRight": [Double($0.bottomRight.x), Double($0.bottomRight.y)] as [Double],
                "text": $0.topCandidates(1).first?.string as String?
            ]}
        }
    }

    func getOrientation(orientation: String) -> CGImagePropertyOrientation {
        switch orientation {
        case "UP": return CGImagePropertyOrientation.up
        case "DOWN": return CGImagePropertyOrientation.down
        case "LEFT": return CGImagePropertyOrientation.left
        case "RIGHT": return CGImagePropertyOrientation.right
        default:
            return CGImagePropertyOrientation.up
        }
    }
}

Step 3

Now that we know how text detection is happening, let's circle back to Plugin.swift and call detectText from there.

Given filepath, UIImage(contentsOfFile: filepath) fetches the image from the device. We can then instantiate the TextDetection class we just created and call detectText on it. Here's the complete implementation of Plugin.swift.

public class CapML: CAPPlugin {
    @objc func detectText(_ call: CAPPluginCall) {
        guard var filepath = call.getString("filepath") else {
            call.reject("file not found")
            return
        }

        // removeFirst(7) removes the initial "file://"
        filepath.removeFirst(7)
        guard let image = UIImage(contentsOfFile: filepath) else {
            call.reject("file does not contain an image")
            return
        }

        TextDetector(call: call, image: image).detectText()
    }
}

At this point, our iOS Plugin is pretty much ready.

Now that we know what the plugin takes in and returns we can circle back to our javascript code. But, before we go ahead and call our plugin, note that we're only going to develop iOS and android plugins but the out-of-the-box plugin came with web implementation as well. In the next post, we'll clean that up a little bit.

Next: Capacitor Plugin for Text Detection Part 3 : Web Implementation of the Plugin

Posts in this series

  • Capacitor Plugin for Text Detection Part 1 : Create Plugin
  • Capacitor Plugin for Text Detection Part 2 : iOS Plugin
  • Capacitor Plugin for Text Detection Part 3 : Web Implementation of the Plugin
  • Capacitor Plugin for Text Detection Part 4 : Using the Plugin
  • Capacitor Plugin for Text Detection Part 5 : Android Plugin
  • Capacitor Plugin for Text Detection Part 6 : Highlight text detections