-
Notifications
You must be signed in to change notification settings - Fork 566
Description
Environment
- OLM Version: v0.36.0 (using
quay.io/operator-framework/olm:master) and master branch - Kubernetes: Rancher Desktop (K3s)
- Catalog Source:
quay.io/operatorhubio/catalog:latest - Installation method: Helm chart (
./deploy/chart)
Issue Summary
I encountered this issue on both v0.36.0 and the master branch. After troubleshooting with Claude Code, we identified two NetworkPolicy misconfigurations that prevent OLM from functioning properly. Below are the findings and the workarounds that successfully resolved the issues.
Problem Description
When deploying OLM with NetworkPolicies enabled (specifically a default-deny-all-traffic policy), two critical connectivity issues prevent OLM from functioning properly:
Issue 1: Catalog gRPC connectivity blocked
The operatorhubio-catalog pod cannot receive incoming connections on port 50051, causing persistent connection failures:
failed to populate resolver cache from source operatorhubio-catalog/operator-lifecycle-manager:
failed to list bundles: rpc error: code = Unavailable desc = connection error:
desc = "transport: Error while dialing dial tcp 10.43.119.62:50051: connect: connection refused"
Status observed:
status:
connectionState:
lastObservedState: TRANSIENT_FAILURE
The Helm chart creates a default-deny-all-traffic NetworkPolicy that blocks all ingress by default, but doesn't include a corresponding NetworkPolicy to allow ingress to the catalog pod.
Issue 2: Bundle unpacking jobs cannot access Kubernetes API
When a subscription attempts to install an operator, the bundle unpacking jobs fail to access the Kubernetes API server:
Error: error loading manifests from directory:
Get "https://10.43.0.1:443/api/v1/namespaces/operator-lifecycle-manager/configmaps/...":
dial tcp 10.43.0.1:443: connect: connection refused
This causes InstallPlans to fail with:
status:
phase: Failed
conditions:
- type: BundleLookupFailed
reason: BackoffLimitExceeded
message: Job has reached the specified backoff limit
The default-deny-all-traffic NetworkPolicy blocks egress traffic, preventing jobs from reaching the Kubernetes API server.
Expected Behavior
When NetworkPolicies are enabled, OLM should include appropriate NetworkPolicies to allow:
- Ingress to catalog pods on port 50051 for gRPC communication
- Egress from all pods to access the Kubernetes API server and external registries
Reproduction Steps
- Deploy OLM using the Helm chart with default NetworkPolicies
- Create a CatalogSource pointing to
quay.io/operatorhubio/catalog:latest - Create a Subscription for any operator (e.g.,
cloudnative-pg) - Observe catalog connection failures and InstallPlan failures
Workaround Applied
Two additional NetworkPolicies were required to fix the issues:
1. Allow ingress to catalog pods
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: operatorhubio-catalog
namespace: operator-lifecycle-manager
spec:
podSelector:
matchLabels:
olm.catalogSource: operatorhubio-catalog
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 50051
egress:
- {}
2. Modify default-deny-all-traffic to allow egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all-traffic
namespace: operator-lifecycle-manager
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
egress:
- {} # Allow all egress traffic
Result: After applying these NetworkPolicies, OLM became fully operational with catalog connectivity working (lastObservedState: READY) and operators installing successfully.
Proposed Solution
The OLM Helm chart should include:
-
A catalog NetworkPolicy template that automatically creates ingress rules for any CatalogSource pods, using label selectors like
olm.catalogSource -
Modified default NetworkPolicy that includes egress rules for:
- Kubernetes API access (port 443)
- DNS resolution (port 53 TCP/UDP)
- Container registry access (port 443 for HTTPS registries)
- gRPC catalog communication (port 50051)
This would allow OLM to work out-of-the-box in environments with strict NetworkPolicy enforcement, which is a common security requirement in production Kubernetes clusters.
Additional Context
The existing Helm chart includes NetworkPolicies for catalog-operator, olm-operator, and packageserver, which do include appropriate egress rules. However:
- There's no NetworkPolicy for catalog pods themselves
- The
default-deny-all-trafficpolicy is too restrictive for OLM's operational requirements
Files to check:
deploy/chart/templates/networkpolicy.yaml- Catalog operator NetworkPolicy definitions