# How to process images from an Excel spreadsheet

A common use case for the [Image Editing API](/image-editing-api-plus-plan/quickstart-guide.md) is to process a large amount of images, whose URLs are stored inside an Excel spreadsheet.

In this tutorial, we'll see how it's possible to process these images through a Python script that will:

1. parse the Excel spreadsheet
2. make the calls to the [Image Editing API](/image-editing-api-plus-plan/quickstart-guide.md)&#x20;
3. save the result images to the disk.

Let's take the example of this spreadsheet:

<figure><img src="/files/vUN7E6cHBaZWFFfGbOxp" alt=""><figcaption></figcaption></figure>

## Step 1: Calling the Image Editing API

First, we'll write a function that can process a single image through the [Image Editing API](/image-editing-api-plus-plan/quickstart-guide.md) and save the result image returned by the API to the disk:

```python
import os
import requests

API_KEY = "REPLACE_WITH_YOUR_API_KEY"

def process_image(input_image_url, output_image_path):
    try:
        url = "https://image-api.photoroom.com/v2/edit"

        query_string = {
            "background.color": "FFFFFFFF",
            "outputSize": "1000x1000",
            "padding": "0.1",
            "imageUrl": input_image_url
        }

        headers = {
            "Accept": "image/png, application/json",
            "x-api-key": API_KEY
        }

        response = requests.get(url, headers=headers, params=query_string)
        response.raise_for_status()

        with open(output_image_path, 'wb') as f:
            f.write(response.content)
            print(f"Image downloaded and saved to {output_image_path}")

    except requests.RequestException as e:
        print(f"Error: {str(e)}")
        return str(e)
```

This code is pretty straightforward:

1. it configures the parameters of the call to the [Image Editing API](/image-editing-api-plus-plan/quickstart-guide.md):
   1. white background
   2. output size of 1000x1000px
   3. padding of 10%
2. uses the `requests` library to make the GET HTTP call to the API
3. saves the result image at `output_image_path`

{% hint style="info" %}
The full list of supported parameters is available in the [API reference](https://www.photoroom.com/api/docs/reference/87e1c15260b1f-beta-edit-image-v2).

You can visually define the edits you want to apply using the [API playground](https://www.photoroom.com/api/playground).
{% endhint %}

Notice that you will need to update the value of the constant `API_KEY` with your own API key.

{% hint style="info" %}
If you don't have an API key, here are the [steps to create yours](/getting-started/introduction.md#how-can-i-get-my-api-key).
{% endhint %}

## Step 2: Parsing the Excel spreadsheet

Now that we have a function to process a single image, the next step is to parse the content of the Excel spreadsheet, iterate over all the image URLs it contains and call `process_image()` for each of them.

```python
import openpyxl
import concurrent.futures

def iterate_over_spreadsheet(spreadsheet_path, column_name, result_directory_path):
    # Load the spreadsheet
    wb = openpyxl.load_workbook(spreadsheet_path)
    
    image_urls = []

    # Loop over all sheets in the workbook
    for sheet_name in wb.sheetnames:
        sheet = wb[sheet_name]
        
        for col_num, col_cells in enumerate(sheet.iter_cols(values_only=True)):
            if col_cells[0] == column_name:
                break
        else:
            print(f"Column {column_name} not found in sheet {sheet_name}.")
            continue

        for cell in sheet.iter_rows(min_row=2, min_col=col_num + 1, max_col=col_num + 1):
            if cell[0].value:
                image_urls.append(cell[0].value)
```

The code above contains the first half of the function `iterate_over_spreadsheet()`, here are its important steps:

1. we open the spreadsheet located at `spreadsheet_path` using the library `openpyxl`
2. we iterate over all the sheets contained in the document
3. we look for the column whose first cell contains the value `column_name` (in the case of the spreadsheet we use as an example, that value would be `"Image URL"`)
4. we iterate over all the rows in that column, and collect their `value` in the array `image_urls`

Now that we have collected all the `image_urls`, we can write the second part of the function `iterate_over_spreadsheet()`:

```python
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    for image_url in image_urls:
        # Extracting the filename from the URL, stripping query parameters if any
        base_name = os.path.basename(image_url.split('?')[0])

        # Getting the filename without its original extension and appending '.png'
        result_file_name = os.path.splitext(base_name)[0] + '.png'
        result_path = os.path.join(result_directory_path, result_file_name)
        
        if not os.path.exists(result_path): # don't re-process images 
            executor.submit(process_image, image_url, result_path)
```

In this second part of the function we:

1. iterate over the `image_urls`
2. compose the `result_path` where the result image will be saved
3. check that a file doesn't already exist at `result_path` (so as to not process the same image twice)
4. call the function `process_image()` through an `executor`, which allows us to execute 4 API calls in parallel

{% hint style="info" %}
If you want, you can increase the value of the argument `max_workers` to run more API calls in parallel. Keep in mind though that the API is [rate limited to 60 calls/minutes](/getting-started/frequently-asked-questions.md#q-is-there-a-rate-limiting).
{% endhint %}

## Step 3: Running the script

We're almost there, the last thing we need is to actually run the script.

To do this, we'll add this final piece of code:

```python
if __name__ == "__main__":
    SPREADSHEET_PATH = "./path_to_spreadsheet.xlsx"
    COLUMN_NAME = "Image URL"
    OUTPUT_DIRECTORY = "./output/"

    if not os.path.exists(OUTPUT_DIRECTORY):
        os.makedirs(OUTPUT_DIRECTORY)

    iterate_over_spreadsheet(spreadsheet_path=SPREADSHEET_PATH, column_name=COLUMN_NAME, result_directory_path=OUTPUT_DIRECTORY)
```

Then, all that's left is to actually run the script using the terminal:

```bash
$ pip install requests openpyxl # run once to install the third-party libraries
$ python script.py
```

## Conclusion

In this tutorial, we saw how to use a Python script to easily process images whose URLs are store inside an Excel spreadsheet with the Photoroom API.

## Download the code sample

Here's the entire code sample, if you want to easily save it to a file:

```python
import concurrent.futures
import openpyxl
import os
import requests

API_KEY = "REPLACE_WITH_YOUR_API_KEY"

def process_image(input_image_url, output_image_path):
    try:
        url = "https://image-api.photoroom.com/v2/edit"

        query_string = {
            "background.color": "FFFFFFFF",
            "outputSize": "1000x1000",
            "padding": "0.1",
            "imageUrl": input_image_url
        }

        headers = {
            "Accept": "image/png",
            "x-api-key": API_KEY
        }

        response = requests.get(url, headers=headers, params=query_string)
        response.raise_for_status()

        with open(output_image_path, 'wb') as f:
            f.write(response.content)
            print(f"Image downloaded and saved to {output_image_path}")

    except requests.RequestException as e:
        print(f"Error: {str(e)}")
        return str(e)

def iterate_over_spreadsheet(spreadsheet_path, column_name, result_directory_path):
    # Load the spreadsheet
    wb = openpyxl.load_workbook(spreadsheet_path)
    
    image_urls = []

    # Loop over all sheets in the workbook
    for sheet_name in wb.sheetnames:
        sheet = wb[sheet_name]
        
        for col_num, col_cells in enumerate(sheet.iter_cols(values_only=True)):
            if col_cells[0] == column_name:
                break
        else:
            print(f"Column {column_name} not found in sheet {sheet_name}.")
            continue

        for cell in sheet.iter_rows(min_row=2, min_col=col_num + 1, max_col=col_num + 1):
            if cell[0].value:
                image_urls.append(cell[0].value)
                
        with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
            for image_url in image_urls:
                # Extracting the filename from the URL, stripping query parameters if any
                base_name = os.path.basename(image_url.split('?')[0])
        
                # Getting the filename without its original extension and appending '.png'
                result_file_name = os.path.splitext(base_name)[0] + '.png'
                result_path = os.path.join(result_directory_path, result_file_name)
                
                if not os.path.exists(result_path): # don't re-process images 
                    executor.submit(process_image, image_url, result_path)
                
if __name__ == "__main__":
    SPREADSHEET_PATH = "./path_to_spreadsheet.xlsx"
    COLUMN_NAME = "Image URL"
    OUTPUT_DIRECTORY = "./output/"

    if not os.path.exists(OUTPUT_DIRECTORY):
        os.makedirs(OUTPUT_DIRECTORY)

    iterate_over_spreadsheet(spreadsheet_path=SPREADSHEET_PATH, column_name=COLUMN_NAME, result_directory_path=OUTPUT_DIRECTORY)
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.photoroom.com/integrations/how-to-process-images-from-an-excel-spreadsheet.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
