Reducing latency for real-time apps: Why the size parameter affects speed

Learn how the BackgroundErase size parameter changes end-to-end latency, why smaller output sizes feel faster in real-time apps, and when to use preview, medium, hd, or full.

Jack
Written by Jack
Updated in March 2026

If you are building a real-time app, users care about how fast the result appears on screen, not just how fast the model itself runs. In BackgroundErase, one of the biggest levers you have for improving that end-to-end feel is the size parameter.

The important nuance is that size does not change the model input size. The inference step still runs on a 1024×1024 JPEG. What size changes is the size of the returned output and the amount of work required after inference: resizing, encoding, sending the result back over the network, and rendering it on the client.

Fastest rule: use preview or medium for live UI feedback, and save full for export, download, or final asset generation.


What the size parameter means in this API

The size option maps to a target megapixel budget for the returned image:

  • preview targets ~0.25 MP
  • medium targets ~1.50 MP
  • hd targets ~4.00 MP
  • full targets ~50.00 MP
  • auto targets ~50.00 MP

That means preview and medium return substantially smaller results than hd or full. Smaller outputs usually feel faster because there is less data to resize, encode, transfer, decode, and render.

Another subtle point: auto currently maps to the same target as full in this implementation, so it should not be treated as a special faster mode.

Why smaller size values feel faster

End-to-end latency in a real-time app is usually the sum of several stages, not just the model call:

  1. Upload time for the source image
  2. Server-side fetch and request handling
  3. Model inference
  4. Post-processing and resizing
  5. Output encoding
  6. Response download time
  7. Client-side decode and rendering

Since size influences the returned output size, it directly affects several of those stages. Smaller outputs usually mean:

  • Less server-side resizing work after inference
  • Less time encoding the response image
  • Smaller payloads over the network
  • Less client-side decode time
  • Faster display in browsers, mobile apps, and canvases

This is exactly why preview often feels much snappier in live interfaces even though the core segmentation inference still happened upstream.

The key nuance: size does not change the inference input

The source image is converted into a 1024×1024 JPEG before being sent to our compute instance. That means preview, medium, hd, and full do not choose different model input sizes.

This matters because it changes how you talk about performance. The honest message is not “preview makes the model run on fewer pixels.” The honest message is “preview reduces output work and transfer costs, so the full request usually completes faster and feels more responsive.”

Best explanation for users: smaller size values reduce end-to-end latency even though the core inference stage still runs through the same model path.


Which size should you use?

Here is the simplest practical guidance:

  • preview: best for instant UI feedback, live tools, mobile previews, or anywhere response feel matters most
  • medium: best for most interactive product flows where users want a good-looking result quickly
  • hd: best when you need more detail but still want to avoid the cost of full-sized outputs
  • full: best for final export, save, download, asset generation, or customer-facing output that needs maximum retained size
  • auto: currently behaves like full in this API implementation

One easy product pattern is to show a quick preview first and only run full when the user saves or exports.

Recommended real-time workflow

For most real-time apps, the fastest user experience looks something like this:

  1. For instant previews, use size=preview or size=medium
  2. Show the result immediately in your UI
  3. Only request size=full when the user exports, downloads, or confirms
  4. Use jpg or webp when you do not need transparency
  5. Keep the original upload if you may need a final high-resolution rerun later

Quick curl examples

Preview:

curl -H 'x-api-key: YOUR_API_KEY' \
-f https://api.backgrounderase.com/v2 \
-F 'image_file=@/absolute/path/to/input.jpg' \
-F 'format=png' \
-F 'size=preview' \
-o output-preview.png

Medium:

curl -H 'x-api-key: YOUR_API_KEY' \
-f https://api.backgrounderase.com/v2 \
-F 'image_file=@/absolute/path/to/input.jpg' \
-F 'format=png' \
-F 'size=medium' \
-o output-medium.png

Full:

curl -H 'x-api-key: YOUR_API_KEY' \
-f https://api.backgrounderase.com/v2 \
-F 'image_file=@/absolute/path/to/input.jpg' \
-F 'format=png' \
-F 'size=full' \
-o output-full.png

JSON clients can use the same idea:

{
  "image_base64": "BASE64_IMAGE_HERE",
  "format": "png",
  "size": "preview"
}

For live apps, preview first and full later

This is usually the best pattern for collaborative editors, mobile apps, ecommerce interfaces, avatar tools, design tools, and anything interactive:

  • Run preview or medium immediately
  • Display the result right away
  • Let the user continue working with the preview asset
  • Only request full when they save, export, or finalize

That gives users fast feedback without forcing every request to carry the full cost of high-resolution output handling.

async function removeBackgroundFast(file) {
  const form = new FormData();
  form.append("image_file", file);
  form.append("format", "png");
  form.append("size", "preview");

  const response = await fetch("https://api.backgrounderase.com/v2", {
    method: "POST",
    headers: {
      "x-api-key": process.env.BG_ERASE_API_KEY
    },
    body: form
  });

  if (!response.ok) {
    throw new Error(await response.text());
  }

  return await response.arrayBuffer();
}

// Later, when the user exports:
async function removeBackgroundFull(file) {
  const form = new FormData();
  form.append("image_file", file);
  form.append("format", "png");
  form.append("size", "full");

  const response = await fetch("https://api.backgrounderase.com/v2", {
    method: "POST",
    headers: {
      "x-api-key": process.env.BG_ERASE_API_KEY
    },
    body: form
  });

  if (!response.ok) {
    throw new Error(await response.text());
  }

  return await response.arrayBuffer();
}

Format matters too

The size parameter is not the only latency lever. Output format also matters. If you do not need transparency, a flattened jpg can be faster to transfer and display than a larger png. If you need transparency but still want smaller downloads, webp can be a good fit.

So the fastest combination for many real-time apps is often not just preview or medium, but also a more efficient output format for the user flow.

Upload size still matters

One more important detail: the size parameter only affects the returned output path. It does not make an oversized upload itself cheaper to send from the client. If users are uploading huge originals, the upload time can still dominate the experience.

So if you care about real-time feel, the best overall setup is often:

  • Keep uploads reasonably sized
  • Use preview or medium for the response
  • Choose an efficient output format
  • Only generate full when the user truly needs it

crop can also help end-to-end speed

If your UI or workflow benefits from tighter outputs, using crop=true can reduce the final returned area. That can make the response asset smaller, which helps transfer time and client rendering in some flows.

It is not a replacement for choosing the right size, but it can help when the subject occupies only a small part of the frame and you do not need the full canvas back.


Final recommendation

If your app is interactive, do not default every request to full. The best starting point for most real-time products is:

  • preview for ultra-fast UI feedback
  • medium for most general interactive workflows
  • hd when you need more detail but still care about speed
  • full only for final delivery or export

That usually gives you the best balance between perceived performance and output quality, while staying honest about how this API actually works under the hood.