Automatically tagging, captioning and categorising locally stored images using the Azure Computer Vision API

It’s easy in the digital age to amass tens of thousands of photos (or more!). Categorising these can be a challenging task, let alone searching through them to find that one happy snap from 10 years ago.

Significant advances in machine learning over the past decade have made it possible to automatically tag and categorise photos without user input (assuming a machine learning model has been pre-trained). Many social media and photo sharing platforms make this functionality available for their users — for example, Flickr’s “Magic View”.  What if a user has a large number of files stored locally on a Hard Disk?

The problem

  • 49,049 uncategorised digital images stored locally
  • Manual categorisation
  • No easy way to search (e.g. “red dress”, “mountain”, “cat on a mat”)

The solution


  1. Obtain a Microsoft Azure cloud subscription (note – Azure is not free, however free trials may be available):
  2. Start a cognitive services account from the Azure portal and take note of one of the “Keys” (keys are interchangeable):
  3. Log in to your Linux machine and ensure you have python3 installed:> which python3
  4. Ensure you have these python libraries installed:
    sudo su -
    pip3 install python-xmp-toolkit
    pip3 install argparse
    pip3 install Pillow
  5. Obtain a copy of the image-auto-tag script:
    git clone
  6. Automatically tag, caption and categorise an image (e.g. image.jpg):
    cd image-auto-tag
      --captionConfidenceLevel 0.50 --tagConfidenceLevel 0.5
      --categoryConfidenceLevel 0.5 image.jpg

    Note – replace key with one of the ones obtained from the Azure Portal above

    Script will process the image:

    INFO: [image.jpg] Reading input file 1/1                                                                                                                      
    INFO: [image.jpg] Temporarily resized to 800x600                                                                                                              
    INFO: [image.jpg] Uploading to Azure Computer Vision API
                      (length: 107330 bytes)                                                                               
    INFO: [image.jpg] Response received from Azure Computer Vision API
                      (length: 1026 bytes)                                                                       
    INFO: [image.jpg] Appended caption 'a river with a mountain in the
                      background' (confidence: 0.67 >= 0.50)                                                     
    INFO: [image.jpg] Appended category 'outdoor_water'
                      (confidence: 0.84 >= 0.50)                                                                                
    INFO: [image.jpg] Appending tag 'nature' (confidence: 1.00 >= 0.50)                                                                                           
    INFO: [image.jpg] Appending tag 'outdoor' (confidence: 1.00 >= 0.50)                                                                                          
    INFO: [image.jpg] Appending tag 'water' (confidence: 0.99 >= 0.50)                                                                                            
    INFO: [image.jpg] Appending tag 'mountain' (confidence: 0.94 >= 0.50)                                                                                         
    INFO: [image.jpg] Appending tag 'river' (confidence: 0.90 >= 0.50)                                                                                            
    INFO: [image.jpg] Appending tag 'rock' (confidence: 0.89 >= 0.50)                                                                                             
    INFO: [image.jpg] Appending tag 'valley' (confidence: 0.75 >= 0.50)                                                                                           
    INFO: [image.jpg] Appending tag 'lake' (confidence: 0.60 >= 0.50)                                                                                             
    INFO: [image.jpg] Appending tag 'waterfall' (confidence: 0.60 >= 0.50)                                                                                        
    INFO: [image.jpg] Finished writing XMP data to file 1/1
  7. Verify the results:
    Auto tagging

    API has applied “tags” which can be searched

    Auto captioning

    API has captioned this image as “a beach with palm trees”

    Auto categorisation

    "plant_tree" hierarchical category has been applied
    API has applied the category “plant_tree” to this image

    Note – please see here for the API’s 86 category taxonomy

Script features

  • Writes to standard XMP metadata tags within JPG images which can be read by image management applications such as XnView MP and digiKam
  • Sends downsized images to Azure to improve performance

    – only send image of width 640 pixels (original image will retain its dimensions)

    --azureResizeWidth 640 image.jpg
  • Allows customisation of thresholds for tags, description and caption. This is useful because whilst good, the API is not perfect!

    Example – only caption image if caption confidence score from API is 0.5 or above:

    --captionConfidenceLevel 0.5 image.jpg

Sparkling-water – keeping the web UI alive

Spark is a great way to make use of the available RAM on a Hadoop cluster to run fast in-memory analysis and queries, and H2O is a great project for running distributed machine learning algorithms on data stored in Hadoop.  Together they form “Sparkling Water” (Spark + H2O, obviously!).

Easy to follow instructions for setting up Sparkling Water are available here:

Running spark on Yarn is a good way to utilise an existing Hadoop cluster, however it’s challenging using the “live” method below to keep the Sparkling Water H2O Flow interface running permanently.  Doing so can let a number of data scientists use the notebook style interface to run machine learning tasks.  Luckily, using the spark-submit invocation with the water.SparklingWaterDriver class can ensure the web UI remains online even after the shell session which kicked it off exits (see below Persistent method).

Live method – doesn’t stay online after exiting shell session

  1. Create a shell script:

    export SPARK_HOME=’/usr/hdp/current/spark-client/’
    export HADOOP_CONF_DIR=/etc/hadoop/conf
    export MASTER=”yarn-client”
    sparkling-water-1.3.5/bin/sparkling-shell –num-executors 3 –executor-memory 2g –master yarn-client

  2. Run sparkling-shell

    import org.apache.spark.h2o._
    val h2oContext = new H2OContext(sc).start()
    import h2oContext._

Persistent method – stays online even after exiting shell session

To start a “persistent” H2O cluster on Yarn (i.e. one which doesn’t exit immediately) simply run this command at the command line of a node where the spark client and sparkling water is installed:

nohup bin/spark-submit –class water.SparklingWaterDriver –master yarn-client –num-executors 3 –driver-memory 4g –executor-memory 2g –executor-cores 1 ../sparkling-water-0.2.1-58/assembly/build/libs/*.jar &

The Spark UI should be available on it’s usual port (http://XXX.XXX.XXX.XXX:54321) and should remain there even if the shell session which started the UI dies!

Thanks to the helpful and responsive folks at H2Oai for the above tip (originally answered here)!