Capacitor Plugin for Text Detection Part 2 : iOS Plugin
This is part 2/6 of the "Capacitor Plugin for Text Detection" series.
In the previous post, we created a skeleton plugin and a sample app (ImageReader
) to work with the plugin. In this post, let's dive into creating an iOS Plugin for detecting text in still images.
iOS Plugin
For the iOS plugin, I'm using Apple's Vision Framework to perform text detection.
From ImageReader, open up the iOS project in XCode with npx cap open iOS
, and make sure that the app's BundleIdentifier is same as the appId in capacitor.config.json
Plugin Code is located at Pods/Development Pods/CapML
. Plugin.swift
is the entry point for our plugin and Plugin.m has the definitions for our plugin
Step 1
Open up Plugin.swift
and change the name of the function to detectText
because that's what we'd be doing here. call: CAPPluginCall
contains all the data that the client sends. For example, if the client sends in something like {filepath: 'file://path/filename', width: '200'}
, we can extract the parameters from call
like call.getString('filename') and call.getString('width')
. Detailed documentation about it can be found on the capacitor website
import Foundation
import Capacitor
@objc(CapML)
public class CapML: CAPPlugin {
@objc func detectText(_ call: CAPPluginCall) {
guard var filepath = call.getString("filepath") else {
call.reject("file not found")
return
}
call.success([
"value": filepath
])
}
}
A plugin call can succeed or fail. Note in the above code snippet that we're using call.reject
and call.success
for failure and success respectively.
Step 2
Create a new swift file TextDetection.swift
. Here, let's create a class TextDetection
that works out the text detection.
We're using Apple's Vision Framework which supports text detection on iOS 13.0 or higher.
Create a new swift file TextDetection.swift
which takes in an instance of CAPPluginCall we saw earlier and a UIImage.
@available(iOS 13.0, *)
public class TextDetector {
let call: CAPPluginCall
let image: UIImage
public init(call: CAPPluginCall, image: UIImage) {
self.call = call
self.image = image
}
}
VNImageRequestHandler(cgImage, options) processes image analysis requests on the CGImage that is passed in. We can get a CGImage from our UIImage, image
like this image.cgImage
.
Let's create a new function detectText
and create an instance of VNImageRequestHandler.
public func detectText() {
guard let cgImage = image.cgImage else {
print("Looks like uiImage is nil")
return
}
// VNImageRequestHandler processes image analysis requests on a single image.
let imageRequestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
}
VNImageRequestHandler by default assumes that the image is upright. Optionally we can also pass in orientation like this if it's available
let inputOrientation = call.getString("orientation")
if inputOrientation != nil {
orientation = self.getOrientation(orientation: inputOrientation!)
} else {
orientation = CGImagePropertyOrientation.up
}
// VNImageRequestHandler processes image analysis requests on a single image.
let imageRequestHandler = VNImageRequestHandler(cgImage: cgImage,orientation: orientation,
the getOrientation
function converts the orientation we receive from our client code into something Vision can understand
func getOrientation(orientation: String) -> CGImagePropertyOrientation {
switch orientation {
case "UP": return CGImagePropertyOrientation.up
case "DOWN": return CGImagePropertyOrientation.down
case "LEFT": return CGImagePropertyOrientation.left
case "RIGHT": return CGImagePropertyOrientation.right
default:
return CGImagePropertyOrientation.up
}
}
VNImageRequestHandler.perform
performs the image analysis request we pass in, on the image. Here I'm passing in textDetectionRequest
, the definition of which you'll see in the next step.
DispatchQueue.global(qos: .userInitiated).async {
do {
try imageRequestHandler.perform([self.textDetectionRequest])
} catch let error as NSError {
print("Failed to perform image request: \(error)")
self.call.reject(error.description)
}
}
Add in the definition for the image analysis request we passed in above - imageRequestHandler.perform([self.textDetectionRequest])
.
VNRecognizeTextRequest is the image analysis request that finds and recognizes text in an image.
lazy var textDetectionRequest: VNRecognizeTextRequest = {
// Specifying the image analysis request to perform - text detection here
let textDetectRequest = VNRecognizeTextRequest(completionHandler: handleDetectedText)
return textDetectRequest
}()
Upon completion, it passes on the result to the completion handler handleDetectedText
from where we'd be able to access the result.
func handleDetectedText(request: VNRequest?, error: Error?) {
DispatchQueue.main.async {
guard let results = request?.results as? [VNRecognizedTextObservation] else {
self.call.reject("error")
return
}
}
}
Here is the complete implementation of the class -
import Foundation
import Vision
import Capacitor
@available(iOS 13.0, *)
public class TextDetector {
var detectedText: [[String: Any]] = []
let call: CAPPluginCall
let image: UIImage
var orientation: CGImagePropertyOrientation
public init(call: CAPPluginCall, image: UIImage) {
self.call = call
self.image = image
self.orientation = CGImagePropertyOrientation.up
}
public func detectText() {
guard let cgImage = image.cgImage else {
print("Looks like uiImage is nil")
return
}
let inputOrientation = call.getString("orientation")
options: [:])
DispatchQueue.global(qos: .userInitiated).async {
do {
try imageRequestHandler.perform([self.textDetectionRequest])
self.call.success(["textDetections": self.detectedText])
} catch let error as NSError {
print("Failed to perform image request: \(error)")
self.call.reject(error.description)
}
}
}
lazy var textDetectionRequest: VNRecognizeTextRequest = {
// Specifying the image analysis request to perform - text detection here
let textDetectRequest = VNRecognizeTextRequest(completionHandler: handleDetectedText)
return textDetectRequest
}()
func handleDetectedText(request: VNRequest?, error: Error?) {
if error != nil {
call.reject("Text Detection Error \(String(describing: error))")
return
}
DispatchQueue.main.async {
// VNRecognizedTextObservation contains information about both the location and
// content of text and glyphs that Vision recognized in the input image.
guard let results = request?.results as? [VNRecognizedTextObservation] else {
self.call.reject("error")
return
}
self.detectedText = results.map {[
"topLeft": [Double($0.topLeft.x), Double($0.topLeft.y)] as [Double],
"topRight": [Double($0.topRight.x), Double($0.topRight.y)] as [Double],
"bottomLeft": [Double($0.bottomLeft.x), Double($0.bottomLeft.y)] as [Double],
"bottomRight": [Double($0.bottomRight.x), Double($0.bottomRight.y)] as [Double],
"text": $0.topCandidates(1).first?.string as String?
]}
}
}
func getOrientation(orientation: String) -> CGImagePropertyOrientation {
switch orientation {
case "UP": return CGImagePropertyOrientation.up
case "DOWN": return CGImagePropertyOrientation.down
case "LEFT": return CGImagePropertyOrientation.left
case "RIGHT": return CGImagePropertyOrientation.right
default:
return CGImagePropertyOrientation.up
}
}
}
Step 3
Now that we know how text detection is happening, let's circle back to Plugin.swift
and call detectText
from there.
Given filepath, UIImage(contentsOfFile: filepath)
fetches the image from the device. We can then instantiate the TextDetection
class we just created and call detectText
on it. Here's the complete implementation of Plugin.swift.
public class CapML: CAPPlugin {
@objc func detectText(_ call: CAPPluginCall) {
guard var filepath = call.getString("filepath") else {
call.reject("file not found")
return
}
// removeFirst(7) removes the initial "file://"
filepath.removeFirst(7)
guard let image = UIImage(contentsOfFile: filepath) else {
call.reject("file does not contain an image")
return
}
TextDetector(call: call, image: image).detectText()
}
}
At this point, our iOS Plugin is pretty much ready.
Now that we know what the plugin takes in and returns we can circle back to our javascript code. But, before we go ahead and call our plugin, note that we're only going to develop iOS and android plugins but the out-of-the-box plugin came with web implementation as well. In the next post, we'll clean that up a little bit.