← All posts

Building Your First Data Source

This is part 2 of a four-part series. Start with Part 1 — Your First Resource. Part 3 adds a provider function. Part 4 adds an ephemeral resource.


In part 1 we built a mycloud_server resource. Now we’ll add a data source — a read-only query that lets Terraform look up a server’s details without owning it.

Data sources follow the same pattern as resources, with two differences:

  • No create/update/delete — only read and _validate_config
  • Based on BaseDataSource instead of BaseResource

Add the Data Source

Create my_provider/server_info.py:

from attrs import define
from pyvider.data_sources import register_data_source
from pyvider.data_sources.base import BaseDataSource
from pyvider.resources.context import ResourceContext
from pyvider.schema import PvsSchema, a_str, s_data_source

from my_provider.server import Server  # access the in-memory store


@define
class ServerInfoConfig:
    server_id: str


@define
class ServerInfoState:
    id: str
    name: str
    status: str


@register_data_source("server_info")
class ServerInfo(BaseDataSource):
    config_class = ServerInfoConfig
    state_class = ServerInfoState

    @classmethod
    def get_schema(cls) -> PvsSchema:
        return s_data_source({
            "server_id": a_str(required=True, description="ID of the server to look up"),
            "id":        a_str(computed=True,  description="Server ID"),
            "name":      a_str(computed=True,  description="Server name"),
            "status":    a_str(computed=True,  description="Server status"),
        })

    async def _validate_config(self, config: ServerInfoConfig) -> list[str]:
        if not config.server_id:
            return ["server_id cannot be empty"]
        return []

    async def read(self, ctx: ResourceContext) -> ServerInfoState | None:
        data = Server._servers.get(ctx.config.server_id)
        if not data:
            return None  # Server doesn't exist — Terraform will surface an error
        return ServerInfoState(**data)

Register It

Import the data source in my_provider/__init__.py so Pyvider discovers it:

from pyvider.providers import BaseProvider, ProviderMetadata, register_provider
from pyvider.schema import PvsSchema, s_provider

import my_provider.server       # registers mycloud_server
import my_provider.server_info  # registers mycloud_server_info


@register_provider("mycloud")
class MyCloudProvider(BaseProvider):
    def __init__(self) -> None:
        super().__init__(
            metadata=ProviderMetadata(
                name="mycloud",
                version="0.1.0",
                protocol_version="6",
            )
        )

    @classmethod
    def get_schema(cls) -> PvsSchema:
        return s_provider({})

Update the Terraform Configuration

Update main.tf to query the server after creating it:

terraform {
  required_providers {
    mycloud = {
      source  = "example.com/tutorial/mycloud"
      version = "0.1.0"
    }
  }
}

provider "mycloud" {}

resource "mycloud_server" "web" {
  name = "web-01"
}

data "mycloud_server_info" "web" {
  server_id = mycloud_server.web.id
}

output "server_id"     { value = mycloud_server.web.id }
output "server_name"   { value = data.mycloud_server_info.web.name }
output "server_status" { value = data.mycloud_server_info.web.status }

The data source depends on the resource: Terraform will create the server first, then query it. You can also use a data source on its own — for a server created outside Terraform — by passing the ID directly:

data "mycloud_server_info" "existing" {
  server_id = "srv-042"
}

Required Methods on Every Data Source

MethodPurpose
get_schemaDeclares query params (required/optional) and result fields (computed)
_validate_configReturn a list of errors, or [] to pass
readReturn the current state, or None if the object doesn’t exist

That’s all — no _create_apply, _update_apply, or _delete_apply. Data sources are read-only by design.


Next: Part 3 — Adding a Function to generate consistent server names directly in your Terraform expressions.