With Gcore Everywhere Inference, you can use foundational open-source models from our AI model catalog or deploy a custom model by uploading a Docker container image.

Info

You might have to request a quota increase before deploying a new model; the Request quota increase guide explains this process.

Step 1. Initialize a new deployment

This step will differ slightly depending on whether you deploy a custom model or use a model from our catalog.

Tip

Before you continue, refer to the Prepare a custom model for deployment guide if you want to deploy a custom model to Everywhere Inference.

You can initialize a new deployment by opening this direct link to the deployment form or by following these three simple steps:

1. In the Gcore Customer Portal, navigate to Everywhere Inference > Deployments.

2. Click the Deploy model from catalog button in the top right of the screen.

3. Select the desired model from the Model dropdown.

Step 2. Select a pod configuration

This configuration determines the allocated resources (e.g., GPU, vCPU, and memory) for running the model. Be sure to select sufficient resources; otherwise, the model deployment might fail.

We recommended the following flavor parameters for models:

Billion parametersRecommended flavor
< 211 x L40s 48 GB
21 - 412 x L40s 48 GB

41 | 4 x L40s 48 GB

Step 3. Set up routing placement

Select the inference regions where the model will run from the worldwide points of presence (PoPs) list. The list of available PoPs depends on which pod configuration you selected in Step 3.

Step 4. Configure autoscaling

You can set up autoscaling for all pods (All selected regions) or only for pods located in specific areas (Custom).

1. Specify the range of nodes you want to maintain:

  • Minimum pods: The minimum number of pods must be deployed during low-load periods.
  • Maximum pods: The maximum number of pods that can be added during peak-load periods.
  • Cooldown period: A configured time interval that the autoscaler waits after scaling an application (up or down) before it considers making another scaling adjustment
  • Pod lifetime: The time in seconds that autoscaling waits before deleting a pod that isn’t receiving any requests. The countdown starts from the most recent request.

Info

A pod with a lifetime of zero seconds will take approximately one minute to scale down.

2. Define the events that should trigger the provisioning of a new pod:

Step 5. Set deployment details

Your deployment needs a name to make it easy to identify later. You can optionally add a description.

Step 6 (optional). Set environment variables and API Keys

If you want to add additional information to the deployment, create variables for your container in key-value pairs. These variables will only be used in the environment of the created container. To access the environment variables settings, turn on the Set environment variables toggle.

You can also configure API authentication for your deployment. To access the authentication settings, turn on the Enable API Key authentication toggle.

Tip : Multiple API keys can be associated with a single deployment, and the same API key can be attached to multiple deployments.

If you haven’t created an API key yet or want to use a new key for this deployment, click on Create a new API Key.

1. In a new dialog that opens, enter the key name to identify the key in the system. 2. (Optional) Add a key description to give more context about the key and its usage. 3. As a security measure, you can specify the key expiration date. If you don’t want to regenerate the key and instead want to keep it indefinitely, choose Never expire. 4. Click Create to generate the key.

After you generate the key, it will appear in the API Keys dropdown. You can then select it to authenticate to the deployment. Check out the API Keys guide for instructions on authenticating with the API key.

Step 7. Finalize the deployment

Scroll to the top of the page and click Deploy model in the top-right corner of the screen.

Info

If you don’t see the Deploy model button, add billing information to your account or create a quota change request.