Overview
The elmcity service doesn't (yet) require its users -- who are called curators -- to register in order to create and operate calendar hubs that aggregate calendar feeds. Instead it relies on several informal contracts involving partner services, including delicious, FriendFeed, and Twitter. Here's the Twitter contract:
The service will follow the curator at a specified Twitter account. By doing so, it enables the curator to send authentic messages to the service. The vocabulary used by those messages will initially be just a single verb: start. When the service receives the start message from a hub, it will re-aggregate the hub's list of feeds.
The code that supports this contract, Twitter.cs, only needs to do two things:
1. Follow curators' Twitter accounts from the service's Twitter account
2. Read direct messages from curators' accounts and transmit them to the service's account
Following a Twitter account
Here's the follow request:
public static HttpResponse FollowTwitterAccount(string account)
{
var url = String.Format
(
"http://api.twitter.com/friendships/create/{0}.xml",
account
);
var request = (HttpWebRequest)WebRequest.Create(new Uri(url));
request.Method = "POST";
var response = HttpUtils.DoAuthorizedHttpRequest
(
request,
Configurator.twitter_account,
Configurator.twitter_password,
data: new byte[0]
);
return response;
}
Elsewhere I'll explore the HttpUtils class. It's a wrapper around the core .NET WebRequest and WebResponse classes that helps me focus on the HTTP mechanics underlying the various partner services that my service interacts with. But as you can see, this is just a simple and straightforward HTTP request for Twitter to follow the specified account on behalf of the service's own account.
Monitoring Twitter direct messages
The method that reads Twitter direct messages is more involved. Let's start with its return type, List
public class TwitterDirectMessage
{
public string id { get; set; }
public string sender_screen_name { get; set; }
public string recipient_screen_name { get; set; }
public string text { get; set; }
public TwitterDirectMessage() { }
public TwitterDirectMessage(string id, string sender_screen_name,
string recipient_screen_name, string text)
{
this.id = id;
this.sender_screen_name = sender_screen_name;
this.recipient_screen_name = recipient_screen_name;
this.text = text;
}
Define a custom type, or just use a dictionary?
It's worth noting that you don't have to define a type for this purpose. Originally, in fact, I didn't. When I'm working in Python I often don't define types but instead use dictionaries to model simple packages of data in a fluid way. You can do the same in C#. The first incarnation of this method was declared like so:
public static List<Dictionary<string,object>> TwitterDirectMessage(...)
In that version, the return type was also a generic List. Each item was a generic Dictionary whose keys and values were both of type String. Modeling a Twitter direct message that way avoids the overhead of declaring a new type, but makes it a little harder to keep track of things. Consider these two declarations:
var dm_object = new TwitterDirectMessage
(
id: "123",
sender_screen_name: "elmcity",
recipient_screen_name: "judell",
text: "hello!"
);
var dm_dict = new Dictionary<string, string>()
{
{"id" , "123"},
{"sender_screen_name", "elmcity"},
{"recipient_screen_name", "judell"},
{"text", "hello!"}
};
They look similar when viewed in the debugger:
dm_object
{CalendarAggregator.TwitterDirectMessage}
id: "123"
recipient_screen_name: "judell"
sender_screen_name: "elmcity"
text: "hello!"
dm_dict
Count = 4
[0]: {[id, 123]}
[1]: {[sender_screen_name, elmcity]}
[2]: {[recipient_screen_name, judell]}
[3]: {[text, hello!]}
But these objects behave differently when you're writing code that uses them. If I use a dictionary-style object, I have to remember (or look up) that my convention for that dictionary is, for example, to use the key sender_screen_name rather than, say, sender_name. And if I get that wrong, I won't find out until run time. When I create a new custom type, though, Visual Studio knows the names and types of the object's properties and prompts for their values by name. I regard this static typing as optional, and sometimes I don't bother, but increasingly I've come to see it as a form of documentation that repays the overhead required to create it.
Unpacking a Twitter message using LINQ
Now let's look at how GetDirectMessagesFromTwitter unpacks the XML response from the Twitter API.
var xdoc = XmlUtils.XdocFromXmlBytes(response.bytes);
var messages = from message in xdoc.Descendants("direct_message")
select new TwitterDirectMessage()
{
id = message.Descendants("id").First().Value,
sender_screen_name = message.Descendants("sender_screen_name").First().Value,
recipient_screen_name = message.Descendants("recipient_screen_name").First().Value,
text = message.Descendants("text").First().Value
};
return messages.ToList();
In this example, response.bytes is the XML text returned from the Twitter API, and xdoc is a System.Xml.Linq.Xdocument. I could alternatively have read that XML text into a System.Xml.XmlDocument, and used XPath to pick out elements from within it. That was always my approach in Python, it's available in C# too, and there are some cases where I use it. But here I'm instead using LINQ (language-integrated query) which is a pattern that works for many different kinds of data sources. The elmcity project uses LINQ to query a variety of sources including XML, in-memory objects, JSON, and CSV. I've come to appreciate the generality of LINQ. And when the data source is XML, I find that System.Xml.Linq.XDocument makes namespaces more tractable than System.Xml.XmlDocument does.
The type of object returned by xdoc.Descendants() is IEnumerable
var messages = from message in xdoc.Descendants("direct_message")
It would then be the caller's responsibility to pick out the elements and construct TwitterDirectMessages from them. But instead, this method does that on the fly. Although xdoc.Descendants() is an IEnumerable
You could return that messages object directly, in which case it would be the caller's responsibility to actually do the enumeration that it encapsulates. That style of deferred execution is fundamental to LINQ. But here there's no need to decouple the declaration of the query from the materialization of a view based on it. So GetDirectMessagesFromTwitter returns messages.ToList(), which does the enumeration and sends back a List
Moving Twitter messages to and from Azure storage
Each message sent from a curator to the elmcity service should only be handled once. So the service needs to be able to identify messages it hasn't seen before, and then dispose of them after its handling is done. To accomplish this, and also to keep an archive of all messages sent to the service, it reflects all new messages into the Azure table service, which is a scalable key/value database that stores bags of properties without requiring an explicit schema. In this case, scaling isn't a concern. At any given time, there shouldn't ever be more than a handful of Twitter messages stored in Azure. But the bags-of-properties aspect of the Azure table service is really convenient, for the same reason that hashtables and dictionaries are.
(Azure is, by the way, not restricted to the key/value table store. There's a cloud-based relational store as well. My project's storage needs are basic, so I haven't used SQL Azure yet. But I expect that I eventually will.)
The Azure SDK models the table service using a set of abstractions that work with defined types, but I wanted to preserve the flexibility to store and retrieve arbitrary key/value collections. Hence my project's alternate interface to the service. I'll explore it in more detail elsewhere. For now, just consider the Azure-oriented counterpart to GetDirectMessagesFromTwitter:
public static List<TwitterDirectMessage> GetDirectMessagesFromAzure()
{
{
var q = string.Format("$filter=(PartitionKey eq '{0}')", pk_directs);
var qdicts = (List<Dictionary<string, object>>)ts.QueryEntities(ts_table, q).response;
var messages = new List<TwitterDirectMessage>();
foreach (var qdict in qdicts)
{
var message = (TwitterDirectMessage)ObjectUtils.DictObjToObj
(
qdict,
new TwitterDirectMessage().GetType()
);
messages.Add(message);
}
return messages;
}
}
If it were written in English, that method would say the following.
From the Azure tabled named twitter, retrieve all entities identified by a partition key whose value is direct_messages. The result is an Atom feed whose entries look like this:
<entry m:etag="W/"datetime'2010-06-14T15%3A01%3A39.103296Z'"">
<id>https://elmcity.table.core.windows.net/twitter(PartitionKey='direct_messages',RowKey='1003299227')id>
<title type="text"/>
<updated>2010-08-09T22:13:20Zupdated>
<author><name/>author>
<link rel="edit"title="twitter"href="twitter(PartitionKey='direct_messages',RowKey='1003299227')"/>
<category term="elmcity.twitter"scheme="http://schemas.micro...services/scheme"/>
<content type="application/xml">
<m:properties>
<d:PartitionKey>direct_messagesd:PartitionKey>
<d:RowKey>1003299227d:RowKey>
<d:Timestamp m:type="Edm.DateTime">2010-06-14T15:01:39.103296Zd:Timestamp>
<d:id>1003299227d:id>
<d:sender_screen_name>westboroughd:sender_screen_name>
<d:recipient_screen_name>elmcity_azured:recipient_screen_name>
<d:text>startd:text>
m:properties>
content>
entry>Convert each Atom entry into a dictionary (i.e., Dictionary
) and combine those into a list (i.e., List >). Then run through the list, convert each dictionary into its corresponding TwitterDirectMessage, and return a list of those.
That conversion is enabled by a method in the ObjectUtils class, DictObjToObj. It works in a generic way for any type. Here the conversion is from Dictionary
public class ExampleType
{
public int id = 123;
public string name = "elmcity";
public DateTime timestamp = DateTime.Now;
public bool validated = true;
}
DictObjToObj can make an instance of that type from this dictionary:
var dict = new Dictionary<string, object>()
{
{"id" , 123},
{"name", "elmcity"},
{"timestamp", DateTime.Now},
{"validated", true}
};
It does so using a C# capability called reflection:
Reflection provides objects (of type Type) that encapsulate assemblies, modules and types. You can use reflection to dynamically create an instance of a type, bind the type to an existing object, or get the type from an existing object and invoke its methods or access its fields and properties.
Here's how DictObjToObj uses reflection:
public static Object DictObjToObj(Dictionary<string, object> dict_obj, Type type)
{
var o = Activator.CreateInstance(type); // create object
if (type.GetProperties() == null)
{
GenUtils.LogMsg("exception", "DictObjToObj: " + type.Name,
"target type does not define properties");
return o;
}
foreach (var key in dict_obj.Keys)
{
try // set properties
{
type.GetProperty(key).SetValue(o, dict_obj[key], index: null);
}
catch (NullReferenceException)
{
// this is normal since an azure table includes PartitionKey, RowKey,
// and Timestamp which will not map into the object
}
catch (Exception e)
{
GenUtils.LogMsg("exception", "DictObjToObj: " + type.Name,
e.Message + e.StackTrace);
}
}
return o;
}
In English:
Given a source dictionary and a target type, and assuming that the keys of the dictionary match properties of the object, create an object of the type and map the dictionary's key/value pairs to the object's properties.
There's also a converse, ObjToDictObj, which reverses the transformation. It's used here to store a TwitterDirectMessage into an Azure table:
private static TableStorageResponse StoreDirectMessageToAzure(TwitterDirectMessage message)
{
var dict = ObjectUtils.ObjToDictObj(message);
return TableStorage.UpdateDictToTableStore
}
There are many ways to think about the relationship between objects in programming languages and objects in databases. I like thinking in terms of bags of properties. From the C# perspective, as we've seen, these can be defined more or less strictly, as needed. The same holds true for Azure table entities. It's true that they lack an explicit schema and are thus inherently more flexible. But you can nail down their attributes more or less strictly, depending on how strongly typed you want them to be in the database. In both the programming language and the database, you can move flexibly along a continuum of ways to represent bags of properties.
Identifying new Twitter messages
Given the ability to fetch a list of TwitterDirectMessage from Twitter, and a corresponding list from Azure, all that's left is to compare the lists in order to find items from the former that are missing from the latter. These are the new messages to be handled and then deleted.
Here's a side-by-side comparison of similar idioms for finding the difference between two lists. On the left is how I'd do it in Python. On the right is how I'm actually doing it in C#.
| Python | C# |
fetched_messages =
GetDirectMessagesFromTwitter()
stored_messages =
GetDirectMessagesFromAzure()
diff = set(fetched_messages) -
set(stored_messages)
return list(diff)
|
var fetched_messages =
GetDirectMessagesFromTwitter();
var stored_messages =
GetDirectMessagesFromAzure();
var diff = fetched_messages.Except(
stored_messages);
return diff.ToList();
|
I'm not sure which I prefer. Python's set arithmetic is mathematically pure. But C#'s noun-verb syntax is appealing too. For me it's a toss-up, which shows that C# need not be a lot more verbose than Python. That said, the elmcity project uses Python too -- specifically IronPython. In future installments I'll discuss the pros and cons of using C# and IronPython in complementary ways on Azure.

Help





