For a semantic annotation task I am trying to calculate the semantic similarity between two sets of annotations: S1 and S2. Both sets consist out of multiple nodes from within one graph (in my case an ontology). The sets do not necessarily contain the same amount of nodes.
I measure the similarity between two nodes using a path-based similarity metric (Leacock & Chodorow's). By measuring the similarity between each node in both subgraphs, I can generate a semantic similarity graph, see example here: http://graus.nu/thesis/measure-and-visualize-semantic-similarity-between-subgraphs/ (in this case, the red and blue nodes are nodes from S1 and S2, the edges represent similarity between nodes).
But now I want to know how to calculate the average, mean and standard deviation of the similarity between the two subgraphs as a whole. How would I go about that? I thought about taking the shortest path for each node in S1 to each node in S2, and using these to get my avg/mean/std deviation, however different sized sets means I miss information in some cases (for example in the case where S1 has 3 nodes and S2 has 8 nodes; I'd get 3 paths). Couldn't I just use all possible edges between all nodes from S1 to all nodes from S2 to get these numbers?
Researching so far has brought me distance matrices, graph bijection, but I'm a bit lost here, any help in pointing me in the right direction will be greatly appreciated!