Reduce Latency with the Size Flag

If you are building a real-time app, users care about how fast the result appears on screen, not just how fast the model itself runs. In BackgroundErase, one of the biggest levers you have for improving that end-to-end feel is the size parameter.

The important nuance is that size does not change the model input size. The inference step still runs on a 1024×1024 JPEG. What size changes is the size of the returned output and the amount of work required after inference: resizing, encoding, sending the result back over the network, and rendering it on the client.

Fastest rule: use preview or medium for live UI feedback, and save full for export, download, or final asset generation.

What the size parameter means in this API

The size option maps to a target megapixel budget for the returned image:

preview targets ~0.25 MP
medium targets ~1.50 MP
hd targets ~4.00 MP
full targets ~50.00 MP
auto targets ~50.00 MP

That means preview and medium return substantially smaller results than hd or full. Smaller outputs usually feel faster because there is less data to resize, encode, transfer, decode, and render.

Another subtle point: auto currently maps to the same target as full in this implementation, so it should not be treated as a special faster mode.

Why smaller size values feel faster

End-to-end latency in a real-time app is usually the sum of several stages, not just the model call:

Upload time for the source image
Server-side fetch and request handling
Model inference
Post-processing and resizing
Output encoding
Response download time
Client-side decode and rendering

Since size influences the returned output size, it directly affects several of those stages. Smaller outputs usually mean:

Less server-side resizing work after inference
Less time encoding the response image
Smaller payloads over the network
Less client-side decode time
Faster display in browsers, mobile apps, and canvases

This is exactly why preview often feels much snappier in live interfaces even though the core segmentation inference still happened upstream.

The key nuance: size does not change the inference input

The source image is converted into a 1024×1024 JPEG before being sent to our compute instance. That means preview, medium, hd, and full do not choose different model input sizes.

This matters because it changes how you talk about performance. The honest message is not “preview makes the model run on fewer pixels.” The honest message is “preview reduces output work and transfer costs, so the full request usually completes faster and feels more responsive.”

Best explanation for users: smaller size values reduce end-to-end latency even though the core inference stage still runs through the same model path.

Which size should you use?

Here is the simplest practical guidance:

preview: best for instant UI feedback, live tools, mobile previews, or anywhere response feel matters most
medium: best for most interactive product flows where users want a good-looking result quickly
hd: best when you need more detail but still want to avoid the cost of full-sized outputs
full: best for final export, save, download, asset generation, or customer-facing output that needs maximum retained size
auto: currently behaves like full in this API implementation

One easy product pattern is to show a quick preview first and only run full when the user saves or exports.

Recommended real-time workflow

For most real-time apps, the fastest user experience looks something like this:

For instant previews, use size=preview or size=medium
Show the result immediately in your UI
Only request size=full when the user exports, downloads, or confirms
Use jpg or webp when you do not need transparency
Keep the original upload if you may need a final high-resolution rerun later

Quick curl examples

Preview:

curl -H 'x-api-key: YOUR_API_KEY' \
-f https://api.backgrounderase.com/v2 \
-F 'image_file=@/absolute/path/to/input.jpg' \
-F 'format=png' \
-F 'size=preview' \
-o output-preview.png

Medium:

curl -H 'x-api-key: YOUR_API_KEY' \
-f https://api.backgrounderase.com/v2 \
-F 'image_file=@/absolute/path/to/input.jpg' \
-F 'format=png' \
-F 'size=medium' \
-o output-medium.png

Full:

curl -H 'x-api-key: YOUR_API_KEY' \
-f https://api.backgrounderase.com/v2 \
-F 'image_file=@/absolute/path/to/input.jpg' \
-F 'format=png' \
-F 'size=full' \
-o output-full.png

JSON clients can use the same idea:

{
  "image_base64": "BASE64_IMAGE_HERE",
  "format": "png",
  "size": "preview"
}

For live apps, preview first and full later

This is usually the best pattern for collaborative editors, mobile apps, ecommerce interfaces, avatar tools, design tools, and anything interactive:

Run preview or medium immediately
Display the result right away
Let the user continue working with the preview asset
Only request full when they save, export, or finalize

That gives users fast feedback without forcing every request to carry the full cost of high-resolution output handling.

async function removeBackgroundFast(file) {
  const form = new FormData();
  form.append("image_file", file);
  form.append("format", "png");
  form.append("size", "preview");

  const response = await fetch("https://api.backgrounderase.com/v2", {
    method: "POST",
    headers: {
      "x-api-key": process.env.BG_ERASE_API_KEY
    },
    body: form
  });

  if (!response.ok) {
    throw new Error(await response.text());
  }

  return await response.arrayBuffer();
}

// Later, when the user exports:
async function removeBackgroundFull(file) {
  const form = new FormData();
  form.append("image_file", file);
  form.append("format", "png");
  form.append("size", "full");

  const response = await fetch("https://api.backgrounderase.com/v2", {
    method: "POST",
    headers: {
      "x-api-key": process.env.BG_ERASE_API_KEY
    },
    body: form
  });

  if (!response.ok) {
    throw new Error(await response.text());
  }

  return await response.arrayBuffer();
}

Format matters too

The size parameter is not the only latency lever. Output format also matters. If you do not need transparency, a flattened jpg can be faster to transfer and display than a larger png. If you need transparency but still want smaller downloads, webp can be a good fit.

So the fastest combination for many real-time apps is often not just preview or medium, but also a more efficient output format for the user flow.

Upload size still matters

One more important detail: the size parameter only affects the returned output path. It does not make an oversized upload itself cheaper to send from the client. If users are uploading huge originals, the upload time can still dominate the experience.

So if you care about real-time feel, the best overall setup is often:

Keep uploads reasonably sized
Use preview or medium for the response
Choose an efficient output format
Only generate full when the user truly needs it

crop can also help end-to-end speed

If your UI or workflow benefits from tighter outputs, using crop=true can reduce the final returned area. That can make the response asset smaller, which helps transfer time and client rendering in some flows.

It is not a replacement for choosing the right size, but it can help when the subject occupies only a small part of the frame and you do not need the full canvas back.

Final recommendation

If your app is interactive, do not default every request to full. The best starting point for most real-time products is:

preview for ultra-fast UI feedback
medium for most general interactive workflows
hd when you need more detail but still care about speed
full only for final delivery or export

That usually gives you the best balance between perceived performance and output quality, while staying honest about how this API actually works under the hood.