logicsmaze: Generic in C#

Reference : http://msdn.microsoft.com/en-us/library/ms379564(v=vs.80).aspx

Generics are the most powerful feature of C# 2.0. Generics allow you to define type-safe data structures, without committing to actual data types. This results in a significant performance boost and higher quality code, because you get to reuse data processing algorithms without duplicating type-specific code. In concept, generics are similar to C++ templates, but are drastically different in implementation and capabilities. This article discusses the problem space generics address, how they are implemented, the benefits of the programming model, and unique innovations, such as constrains, generic methods and delegates, and generic inheritance. You will also see how generics are utilized in other areas of the .NET Framework such as reflection, arrays, collections, serialization, and remoting, and how to improve on the basic offering.

Generics Problem Statement

Consider an everyday data structure such as a stack, providing the classic Push() and Pop() methods. When developing a general-purpose stack, you would like to use it to store instances of various types. Under C# 1.1, you have to use an Object-based stack, meaning that the internal data type used in the stack is an amorphous Object, and the stack methods interact with Objects:

public class Stack
{
   object[] m_Items; 
   public void Push(object item)
   {...}
   public object Pop()
   {...}
}

Code block 1 shows the full implementation of the Object-based stack. Because Object is the canonical .NET base type, you can use the Object-based stack to hold any type of items, such as integers:

Stack stack = new Stack();
stack.Push(1);
stack.Push(2);
int number = (int)stack.Pop();

Code block 1. An Object-based stack

public class Stack
{
   readonly int m_Size; 
   int m_StackPointer = 0;
   object[] m_Items; 
   public Stack():this(100)
   {}   
   public Stack(int size)
   {
      m_Size = size;
      m_Items = new object[m_Size];
   }
   public void Push(object item)
   {
      if(m_StackPointer >= m_Size) 
         throw new StackOverflowException();       
      m_Items[m_StackPointer] = item;
      m_StackPointer++;
   }
   public object Pop()
   {
      m_StackPointer--;
      if(m_StackPointer >= 0)
      {
         return m_Items[m_StackPointer];
      }
      else
      {
         m_StackPointer = 0;
         throw new InvalidOperationException("Cannot pop an empty stack");
      }
   }
}

However, there are two problems with Object-based solutions. The first issue is performance. When using value types, you have to box them in order to push and store them, and unbox the value types when popping them off the stack. Boxing and unboxing incurs a significant performance penalty in their own right, but it also increases the pressure on the managed heap, resulting in more garbage collections, which is not great for performance either. Even when using reference types instead of value types, there is still a performance penalty because you have to cast from an Object to the actual type you interact with and incur the casting cost:

Stack stack = new Stack();
stack.Push("1");
string number = (string)stack.Pop();

The second (and often more severe) problem with the Object-based solution is type safety. Because the compiler lets you cast anything to and from Object, you lose compile-time type safety. For example, the following code compiles fine, but raises an invalid cast exception at run time:

Stack stack = new Stack();
stack.Push(1);
//This compiles, but is not type safe, and will throw an exception: 
string number = (string)stack.Pop();

You can overcome these two problems by providing a type-specific (and hence, type-safe) performant stack. For integers you can implement and use the IntStack:

public class IntStack
{
   int[] m_Items; 
   public void Push(int item){...}
   public int Pop(){...}
} 
IntStack stack = new IntStack();
stack.Push(1);
int number = stack.Pop();

For strings you would implement the StringStack:

public class StringStack
{
   string[] m_Items; 
   public void Push(string item){...}
   public string Pop(){...}
}
StringStack stack = new StringStack();
stack.Push("1");
string number = stack.Pop();

And so on. Unfortunately, solving the performance and type-safety problems this way introduces a third, and just as serious problem—productivity impact. Writing type-specific data structures is a tedious, repetitive, and error-prone task. When fixing a defect in the data structure, you have to fix it not just in one place, but in as many places as there are type-specific duplicates of what is essentially the same data structure. In addition, there is no way to foresee the use of unknown or yet-undefined future types, so you have to keep an Object-based data structure as well. As a result, most C# 1.1 developers found type-specific data structures to be impractical and opt for using Object-based data structures, in spite of their deficiencies.

What Are Generics

Generics allow you to define type-safe classes without compromising type safety, performance, or productivity. You implement the server only once as a generic server, while at the same time you can declare and use it with any type. To do that, use the < and > brackets, enclosing a generic type parameter. For example, here is how you define and use a generic stack:

public class Stack<T>
{
   T[] m_Items; 
   public void Push(T item)
   {...}
   public T Pop()
   {...}
}
Stack<int> stack = new Stack<int>();
stack.Push(1);
stack.Push(2);
int number = stack.Pop();

Code block 2 shows the full implementation of the generic stack. Compare Code block 1 to Code block 2 and see that it is as if every use of object in Code block 1 is replaced with T in Code block 2, except that the Stack is defined using the generic type parameter T:

public class Stack<T>
{...}

When using a generic stack, you have to instruct the compiler which type to use instead of the generic type parameter T, both when declaring the variable and when instantiating it:

Stack<int> stack = new Stack<int>();

The compiler and the runtime do the rest. All the methods (or properties) that accept or return a T will instead use the specified type, an integer in the example above.

Code block 2. The generic stack

public class Stack<T>
{
   readonly int m_Size; 
   int m_StackPointer = 0;
   T[] m_Items;
   public Stack():this(100)
   {}
   public Stack(int size)
   {
      m_Size = size;
      m_Items = new T[m_Size];
   }
   public void Push(T item)
   {
      if(m_StackPointer >= m_Size) 
         throw new StackOverflowException();
      m_Items[m_StackPointer] = item;
      m_StackPointer++;
   }
   public T Pop()
   {
      m_StackPointer--;
      if(m_StackPointer >= 0)
      {
         return m_Items[m_StackPointer];
      }
      else
      {
         m_StackPointer = 0;
         throw new InvalidOperationException("Cannot pop an empty stack");
      }
   }
}

Note T is the generic type parameter (or type parameter) while the generic type is the Stack<T>. The int in Stack<int> is the type argument.

The advantage of this programming model is that the internal algorithms and data manipulation remain the same while the actual data type can change based on the way the client uses your server code.

Generics Implementation

On the surface C# generics look syntactically similar to C++ templates, but there are important differences in the way they are implemented and supported by the compiler. As you will see later in this article, this has significant implications on the manner in which you use generics.

Note In the context of this article, when referring to C++ it means classic C++, not Microsoft C++ with the managed extensions.

Compared to C++ templates, C# generics can provide enhanced safety but are also somewhat limited in capabilities.

In some C++ compilers, until you use a template class with a specific type, the compiler does not even compile the template code. When you do specify a type, the compiler inserts the code inline, replacing every occurrence of the generic type parameter with the specified type. In addition, every time you use a specific type, the compiler inserts the type-specific code, regardless of whether you have already specified that type for the template class somewhere else in the application. It is up to the C++ linker to resolve this, and it is not always possible to do. This may results in code bloating, increasing both the load time and the memory footprint.

In .NET 2.0, generics have native support in IL (intermediate language) and the CLR itself. When you compile generic C# server-side code, the compiler compiles it into IL, just like any other type. However, the IL only contains parameters or place holders for the actual specific types. In addition, the metadata of the generic server contains generic information.

The client-side compiler uses that generic metadata to support type safety. When the client provides a specific type instead of a generic type parameter, the client's compiler substitutes the generic type parameter in the server metadata with the specified type argument. This provides the client's compiler with type-specific definition of the server, as if generics were never involved. This way the client compiler can enforce correct method parameters, type-safety checks, and even type-specific IntelliSense.

The interesting question is how does .NET compile the generic IL of the server to machine code. It turns out that the actual machine code produced depends on whether the specified types are value or reference type. If the client specifies a value type, then the JIT compiler replaces the generic type parameters in the IL with the specific value type, and compiles it to native code. However, the JIT compiler keeps track of type-specific server code it already generated. If the JIT compiler is asked to compile the generic server with a value type it has already compiled to machine code, it simply returns a reference to that server code. Because the JIT compiler uses the same value-type-specific server code in all further encounters, there is no code bloating.

If the client specifies a reference type, then the JIT compiler replaces the generic parameters in the server IL with Object, and compiles it into native code. That code will be used in any further request for a reference type instead of a generic type parameter. Note that this way the JIT compiler only reuses actual code. Instances are still allocated according to their size off the managed heap, and there is no casting.

Generics Benefits

Generics in .NET let you reuse code and the effort you put into implementing it. The types and internal data can change without causing code bloat, regardless of whether you are using value or reference types. You can develop, test, and deploy your code once, reuse it with any type, including future types, all with full compiler support and type safety. Because the generic code does not force the boxing and unboxing of value types, or the down casting of reference types, performance is greatly improved. With value types there is typically a 200 percent performance gain, and with reference types you can expect up to a 100 percent performance gain in accessing the type (of course, the application as a whole may or may not experience any performance improvements). The source code available with this article includes a micro-benchmark application, which executes a stack in a tight loop. The application lets you experiment with value and reference types on an Object-based stack and a generic stack, as well as changing the number of loop iterations to see the effect generics have on performance.

logicsmaze

Pages

Wednesday, 28 November 2012

Generic in C#

Generics Problem Statement

What Are Generics

Generics Implementation

Generics Benefits

No comments:

Post a Comment

Contributors