Locally hosted output (using JS):
- "Ace Blogs Feed" and "Ace Political Blogs"
- "Ace Wordpress Blogs" and blogs by Aumotattic staff"
- "Wordpress blogs" using locally hosted OPML!
- Pipe of my LiveJournal Friends' posts
Many "ace blogs" into one feed
Problematic:
Various feeds have vastly different publication rates. Some (eg. reddit and Digg) may emit a huge number in a 12 hour period while others (e.g. "Portals and KM") might not have any on a particular day.
I want to capture the recent from each feed.
So? Given that a full feed breaks FeedBurner's size limit (I take that as diagnostic) that means I have no choice but to throttle quantity.
I'm aiming at a max of 4 from any (with 15 feeds that's more than plenty!) while capturing at least 1 from the most tardy.
Result: Sorting sub-streams and truncating as required.
NB: Screen shots do not show the full set of feeds (very cluttered and complicated) but rather only enough to show the pipe's logic.
Nota Bene: I have made use of the Sort module with good results, but not with full control.
I assumed that sorting by PubDate would fail because many items do not contain that parameter. And yet it seems to work pretty well.
Sorting by item.dc:date was the obvious choice, since many if not most feeds used that parameter. But nope; for some reason that ordered the emitted items properly but leaves some out. (What part of "Sort by item.dc:date" means "Hop, skip, and jump randomly"?!)
I've found no reason for this bizarre behaviour.
But another odd behaviour sorta came to the rescue, in an overly assertive sorta way: all posts have had item.pudDate appended, and very neatly ... go figure.
http://feeds.feedburner.com/yahoo/AceBlogs produces properly ordered output, c/w item:pubDate, as does http://pipes.yahoo.com/pipes/pipe.run?_id=da3d00ef5aebc4a8f76d6b09bb2c4b23&_render=rss ...
just where the pipe produces item:pubDate is beyond me.
Starting was easy; a grabbed a set of URLs from my favorite blogs, set up a set of FethFeed modules, ran them through a couple of Union modules, to the Output module, and there we were!
The first sign of trouble came when I passed that feed through FeedBurner. FB objected immediately that my feed was over 500KB and output nothing at all.
No major difficulty: I added a Truncate module and picked a number I figured would ensure that the feed wouldn't exceed 500KB on most days, at least.
Now the underlying problem showed its true face: there was output from FeedBurner but it consisted of huge slabls from some blogs and none at all from others.
Looking closely at the output I could see that Pipes! was outputting feeds in the order of the FetchFeed modules. All the posts from the first, then all the posts from the second, and so forth til it hit the Truncate limit.
A simple fix was at hand: the Sort module allowed "Descending by pub date". Putting that in just before the Truncate gave me close to what I wanted, a stream from all my "ace blogs" with each post appearing by order of date, without being influenced by the order of the feeds in the modules. A peek at the stream showed that was the effect.
So I had close to maximum size with a mixed stream. And yet ... two problems: first, the Truncate limit kept the stream under FeedBurner's max allowed, but that turned out to be far, far more than I wanted for the purposes of a daily read. And yet, even with that large an amount, some blogs weren't appearing.
As I wrote in the pipe description:
Problematic: Various feeds have vastly different publication rates. Some (eg.: reddit and Digg) may emit a huge number in a 12 hour period while others (Udell and Palast) might not have any on a particular day.
I'm aiming at a max of 4 from any (with 15 feeds that's more than plenty!) while capturing at least 1 from the most tardy.
What I've discovered:
http://www.lrb.co.uk/homerss.xml includes channel.pubDate for the entire collection of posts
http://weblog.infoworld.com/udell/rss.xml includes item:pubDate
http://reddit.com/r/politics/.rss includes item.dc:date
And to top it all off? It seems that in all cases Yahoo!Pipes transforms that data into item.pubDate ... across the board ... and since I cannot locate where this happens I cannot modify or moderate it. *shrug* The output seems ok, if slightly sub-optimal.
So my 2 Pipes are working.
Now I'm going to add a sophistication using Sort and Truncate:
for 3 similar feeds, instead of taking a total of 9, I'm going to set each for 4, then sort by pubDate, and then selecte the 10 newest. So, worst case, the tardiest feed will get only 2 in while the others get 4 in each.
"12 into 10" - click for full size
Sources (11)
- blog.JonUdell.net
- GregPalast via feedburner.com
- OpenDemocracy.org
- GlobalVoicesOnline.org
- MereRhetoric.com via feedburner.com
- London Review of Books - lrb.co.uk/
- "Portals and KM" via feedburner.com
- InfoQ.com
- "software" and "programming" from digg.com
- "politics" and "software" from reddit.com
- Weblog.InfoWorld.com
- InternetEvolution.com
created 22:16Z 03JAN08 [> clock <]