What is the Component Object Model (COM)?

 
 
  • Gérald Barré

The Component Object Model (COM) is a fundamental technology in the Windows ecosystem. Even though it was introduced more than 30 years ago, it is still widely used today. In fact, the modern Windows Runtime (WinRT) is built on top of COM. This post explores what COM is, where it comes from, and the problems it solves.

#What is COM?

Imagine you have a box of LEGO bricks. You can connect any brick to any other brick, regardless of its color, size, or shape, because they all follow the same "stud and tube" design standard. COM is like that LEGO standard, but for software.

It is a binary standard that allows different software components to connect and talk to each other, even if they were built by different people, at different times, using different programming languages (like C++, C#, or Visual Basic).

COM is not a programming language itself. It's a set of rules that defines:

  1. How to create an object.
  2. How to ask an object what it can do.
  3. How to destroy an object when you are done with it.

#Where does it come from?

COM was developed by Microsoft in the early 1990s as the foundation for OLE (Object Linking and Embedding), which allowed documents to be embedded within other documents (e.g., an Excel chart inside a Word document). It later became the basis for ActiveX controls, which were used to add interactivity to web pages and desktop applications.

Over the years, COM has evolved and been rebranded, but the underlying binary standard has remained consistent. This stability is why Microsoft continues to rely on it for major operating system features.

##Binary Compatibility and the "Fragile Base Class" Problem

In traditional C++ programming, sharing classes between different applications can be risky. If the author of a library changes a class (e.g., adds a new private variable), the memory layout of that class changes.

If you update the library DLL but don't recompile your application, your app will try to read the old memory layout from the new DLL. This is like trying to use an old map for a city that has been rearranged: you end up in the wrong place, and your app crashes. This is known as the "fragile base class" problem.

COM solves this by enforcing a strict separation between the interface (the contract) and the implementation (the code).

  • The Interface is immutable: Once an interface is published, it can never change. No new methods, no reordering.
  • The Implementation is hidden: The client never sees the object's internal memory layout. It only sees the interface.

This ensures that you can update a COM component (like a DLL) without breaking the applications that use it.

##Versioning

Because interfaces are immutable, you cannot simply add a method to an existing interface. Instead, COM handles versioning by requiring you to create a new interface.

For example, if you have an interface ICamera, and you want to add a new feature, you would define a new interface ICamera2 that inherits from ICamera (or IUnknown). Clients can then use QueryInterface to check if the object supports ICamera2. If it does, they can use the new features; if not, they can gracefully fall back to the old behavior. This allows components to evolve without breaking existing clients.

#How does it work?

To achieve this binary compatibility, COM relies on a few core concepts:

  1. **The vtable (Virtual Function Table)😗* A binary contract that defines the memory layout of an interface's functions.
  2. The IUnknown interface: The base interface that all COM objects must implement.

IUnknown provides three essential methods:

  • QueryInterface: Allows a client to ask an object if it supports a specific feature (interface). This is the mechanism for discovery.
  • AddRef and Release: Manage the lifetime of the object using reference counting. The object deletes itself when it is no longer being used.

##Interface Discovery (QueryInterface)

QueryInterface is the method used for casting safely between COM interfaces. It is analogous to is/as in C# or dynamic_cast in C++.

The signature is:

C++
HRESULT QueryInterface(REFGUID iid, void** ppvObject);

The method takes two parameters:

  • iid: The Interface Identifier (IID) of the interface being requested. This is a GUID (Globally Unique Identifier).
  • ppvObject: The address of a pointer variable that will receive the interface pointer requested in iid.

The method returns an HRESULT, which is a standard error code in COM. If the object supports the interface, it returns S_OK; otherwise, it returns E_NOINTERFACE.

The responsibility of QueryInterface is strictly defined:

  1. Check if the object implements the interface specified by iid.
  2. If it does, it must increment the reference count of the object (call AddRef) and store the interface pointer in *ppvObject.
  3. If it does not, it must set *ppvObject to nullptr and return an error code.

This mechanism ensures that clients only access valid interfaces and that the object's lifetime is correctly managed during the "cast" operation.

##Reference Counting

One of the key challenges in software is memory management. Who is responsible for deleting an object when it's no longer needed? If you delete it too early, your program crashes. If you forget to delete it, you leak memory.

COM solves this using Reference Counting. The object maintains an internal counter of how many active references exist to it.

  1. When a client obtains a pointer to an interface (e.g., via QueryInterface or CreateInstance), the object calls AddRef to increment the counter.
  2. When the client is done with the pointer, it calls Release to decrement the counter.
  3. When the counter reaches zero, the object knows it is no longer needed and deletes itself from memory.

This approach means you don't need to know who else is using the object. You just manage your own reference, and the object manages its own lifetime.

##Code Example

Here is a simplified C++ example of how a client uses IUnknown to discover interfaces and manage lifetime:

C++
void UseComObject(IUnknown* pUnk)
{
    // 1. Discovery: Ask for the ICamera interface
    ICamera* pCamera = nullptr;
    // If successful, QueryInterface automatically calls AddRef on the returned pointer
    HRESULT hr = pUnk->QueryInterface(IID_ICamera, (void**)&pCamera);

    if (SUCCEEDED(hr))
    {
        // 2. Use the interface
        pCamera->TakePhoto();

        // Example of explicit AddRef:
        // If we want to share this pointer with another part of the code,
        // we should increment the reference count.
        pCamera->AddRef();
        SomeOtherFunction(pCamera); // SomeOtherFunction is now responsible for calling Release()

        // 3. Lifetime management: Release the interface when done
        pCamera->Release();
    }
}

#The Full COM Experience

So far, we've talked about the "Nano COM" part—the binary standard and IUnknown. But COM is also a platform with many services.

  • COM Servers: These are the files that contain the COM objects. They can be DLLs (running inside your app) or EXEs (running as a separate process).
  • **Registration (CLSIDs)😗* Every COM class has a unique ID called a CLSID (Class ID). These are stored in the Windows Registry so the system knows where to find them.
  • **Registration-Free COM😗* Modern Windows versions allow applications to use COM components without registering them in the system Registry. Instead, they use "manifest" files (XML) to declare dependencies. This solves "DLL Hell" by allowing different applications to use different versions of the same component side-by-side.
  • **Activation (CoCreateInstance)😗* This is the standard function to create a COM object. You give it a CLSID, and it does the heavy lifting of finding the DLL/EXE, loading it, and creating the object.
  • Location Transparency: Because CoCreateInstance handles the creation, your code doesn't care if the object is in the same process, a different process, or even on a different computer (DCOM). It all looks the same to you.
  • Marshaling: If an object is in another process, you can't just call a function pointer. COM automatically creates a "bridge" (proxy/stub) to send your function calls and data across the process boundary.
  • **Automation (IDispatch)😗* This allows dynamic languages (like VBScript or Python) to call COM objects by name (e.g., object.MethodName()) instead of using the strict vtable structure.
  • **Threading Models (Apartments)😗* COM has rules to ensure thread safety. It uses "Apartments" to define whether an object can be accessed by multiple threads at once or if it needs to be accessed by only one thread at a time.

#COM and .NET

While COM is a native technology, .NET provides excellent support for interoperability with COM components. This allows you to use existing COM libraries in your .NET applications and vice versa.

The .NET runtime (CLR) hides the complexity of COM (like reference counting and QueryInterface) behind two wrapper objects:

  1. **Runtime Callable Wrapper (RCW)😗* When a .NET application uses a COM object, the CLR wraps the COM object in an RCW. The RCW looks like a normal .NET object to your code. It handles the transition between managed and unmanaged code, marshals data types, and manages the COM reference count. When the RCW is garbage collected, it releases the underlying COM object.
  2. **COM Callable Wrapper (CCW)😗* When a COM client uses a .NET object, the CLR wraps the .NET object in a CCW. The CCW exposes COM interfaces (like IUnknown and IDispatch) to the COM client. It handles the transition back to managed code and keeps the .NET object alive as long as the COM client holds a reference to the CCW.

This seamless interoperability is why you can easily automate Microsoft Office (Excel, Word) from C# or write Windows Shell extensions in .NET.

#Example: Direct3D and "Nano COM"

A great example of COM in action is the DirectX family of APIs, such as Direct3D, Direct2D, and DirectWrite. These APIs use what is often called "Nano COM."

They rely purely on the binary contract of COM interfaces (vtable + IUnknown) without using the heavier parts of the COM infrastructure, such as the system registry or CoCreateInstance. Instead, they provide a simple factory function to create the initial object. From that point on, everything is handled through COM interfaces.

This approach provides high performance and low overhead while still maintaining the benefits of binary compatibility and versioning. It is a testament to the flexibility and enduring utility of the COM design.

#Conclusion

COM is more than just a legacy technology; it is a robust solution to the complex problems of binary interoperability and component versioning. By defining a strict binary standard and separating interface from implementation, COM allows software to evolve gracefully over decades. Whether you are using classic OLE automation or modern WinRT APIs, you are relying on the principles established by the Component Object Model.

Do you have a question or a suggestion about this post? Contact me!

Follow me:
Enjoy this blog?