granary¶
About¶
Granary is a library and REST API that fetches and converts between a wide variety of data sources and formats:
- Facebook, Flickr, Google+, Instagram, and Twitter native APIs
- Instagram and Google+ scraped HTML
- ActivityStreams 1.0 and 2.0
- microformats2 HTML and JSON
- Atom
- XML
- JSON Feed
Free yourself from silo API chaff and expose the sweet social data foodstuff inside in standard formats and protocols!
Here’s how to get started:
- Granary is available on
PyPi. Install with
pip install granary
. - Click here for getting started docs.
- Click here for reference docs.
- The REST API and demo app are deployed at granary.io.
License: This project is placed in the public domain.
Using¶
All dependencies are handled by pip and enumerated in requirements.txt. We recommend that you install with pip in a virtualenv. (App Engine details.)
The library and REST API are both based on the OpenSocial Activity Streams service.
Let’s start with an example. This code using the library:
from granary import twitter
...
tw = twitter.Twitter(ACCESS_TOKEN_KEY, ACCESS_TOKEN_SECRET)
tw.get_activities(group_id='@friends')
is equivalent to this HTTP GET
request:
https://granary.io/twitter/@me/@friends/@app/
?access_token_key=ACCESS_TOKEN_KEY&access_token_secret=ACCESS_TOKEN_SECRET
They return the authenticated user’s Twitter stream, ie tweets from the people they follow. Here’s the JSON output:
{
"itemsPerPage": 10,
"startIndex": 0,
"totalResults": 12,
"items": [{
"verb": "post",
"id": "tag:twitter.com,2013:374272979578150912",
"url": "http://twitter.com/evanpro/status/374272979578150912",
"content": "Getting stuff for barbecue tomorrow. No ribs left! Got some nice tenderloin though. (@ Metro Plus Famille Lemay) http://t.co/b2PLgiLJwP",
"actor": {
"username": "evanpro",
"displayName": "Evan Prodromou",
"description": "Prospector.",
"url": "http://twitter.com/evanpro",
},
"object": {
"tags": [{
"url": "http://4sq.com/1cw5vf6",
"startIndex": 113,
"length": 22,
"objectType": "article"
}, "..."],
},
}, "..."]
"..."
}
The request parameters are the same for both, all optional: USER_ID
is a source-specific id or @me
for the authenticated user.
GROUP_ID
may be @all
, @friends
(currently identical to
@all
), @self
, @search
, or @blocks
; APP_ID
is
currently ignored; best practice is to use @app
as a placeholder.
Paging is supported via the startIndex
and count
parameters.
They’re self explanatory, and described in detail in the OpenSearch
spec
and OpenSocial
spec.
When using the GROUP_ID
@search
(for platforms that support it —
currently Twitter and Instagram), provide a search string via the q
parameter. The API is loosely based on the OpenSearch
spec,
the OpenSocial Core Container
spec,
and the OpenSocial Core Gadget
spec.
Output data is JSON Activity Streams
1.0 objects wrapped in the
OpenSocial
envelope,
which puts the activities in the top-level items
field as a list and
adds the itemsPerPage
, totalCount
, etc. fields.
Most Facebook requests and all Twitter, Google+, Instagram, and Flickr requests will need OAuth access tokens. If you’re using Python on Google App Engine, oauth-dropins is an easy way to add OAuth client flows for these sites. Otherwise, here are the sites’ authentication docs: Facebook, Flickr, Google+, Instagram, Twitter.
If you get an access token and pass it along, it will be used to sign and authorize the underlying requests to the sources providers. See the demos on the REST API endpoints above for examples.
Using the REST API¶
The endpoints above all serve the OpenSocial Activity Streams REST API. Request paths are of the form:
/USER_ID/GROUP_ID/APP_ID/ACTIVITY_ID?startIndex=...&count=...&format=FORMAT&access_token=...
All query parameters are optional. FORMAT
may be json
(the
default), xml
, or atom
, both of which return
Atom. atom
supports a boolean reader
query parameter for toggling rendering
appropriate to feed readers, e.g. location is rendered in content when
reader=true
(the default). The rest of the path elements and query
params are described above.
Errors are returned with the appropriate HTTP response code, e.g. 403 for Unauthorized, with details in the response body.
By default, responses are cached and reused for 5m without re-fetching
the source data. (Instagram responses are cached for 60m.) You can
prevent this by adding the cache=false
query parameter to your
request.
To use the REST API in an existing ActivityStreams client, you’ll need
to hard-code exceptions for the domains you want to use e.g.
facebook.com
, and redirect HTTP requests to the corresponding
endpoint above.
The web UI (granary.io) currently only fetches
Facebook access tokens for users. If you want to use it to access a
Facebook page, you’ll need to get an access token manually with the
Graph API Explorer
(click on the Get To… drop-down) . Then, log into Facebook on
granary.io and paste the page access token
into the access_token
text box.
(Google+ pages aren’t supported in their API.)
Using the library¶
See the example above for a quick start guide.
Clone or download this repo into a directory named granary
(note the
underscore instead of dash). Each source works the same way. Import the
module for the source you want to use, then instantiate its class by
passing the HTTP handler object. The handler should have a request
attribute for the current HTTP request.
The useful methods are get_activities()
and get_actor()
, which
returns the current authenticated user (if any). See the individual
method
docstrings
for details. All return values are Python dicts of decoded
ActivityStreams 1 JSON.
The microformats2.*_to_html()
functions are also useful for
rendering ActivityStreams 1 objects as nicely formatted HTML.
Troubleshooting/FAQ¶
Check out the oauth-dropins Troubleshooting/FAQ section. It’s pretty comprehensive and applies to this project too. For searchability, here are a handful of error messages that have solutions there:
bash: ./bin/easy_install: ...bad interpreter: No such file or directory
ImportError: cannot import name certs
ImportError: cannot import name tweepy
File ".../site-packages/tweepy/auth.py", line 68, in _get_request_token
raise TweepError(e)
TweepError: must be _socket.socket, not socket
Future work¶
We’d love to add more sites! Off the top of my head, YouTube, Tumblr, WordPress.com, Sina Weibo, Qzone, and RenRen would be good candidates. If you’re looking to get started, implementing a new site is a good place to start. It’s pretty self contained and the existing sites are good examples to follow, but it’s a decent amount of work, so you’ll be familiar with the whole project by the end.
Development¶
Pull requests are welcome! Feel free to ping me with any questions.
You’ll need the App Engine Python
SDK
version 1.9.15 or later (for
vendor
support). Add it to your $PYTHONPATH
, e.g.
export PYTHONPATH=$PYTHONPATH:/usr/local/google_appengine
, and then
run:
virtualenv local
source local/bin/activate
pip install -r requirements.txt
python setup.py test
If you send a pull request, please include (or update) a test for the
new functionality if possible! The tests require the App Engine
SDK or the
Google Cloud SDK (aka
gcloud
) with the gcloud-appengine-python
and
gcloud-appengine-python-extras
components.
If you want to work on
oauth-dropins at the same
time, install it in “source” mode with
pip install -e <path to oauth-dropins repo>
.
To deploy:
python -m unittest discover && gcloud -q app deploy granary-demo *.yaml
To deploy facebook-atom, twitter-atom, instagram-atom, and plusstreamfeed after a granary change:
#!/bin/tcsh
foreach s (facebook-atom twitter-atom instagram-atom plusstreamfeed)
cd ~/src/$s && gcloud -q app deploy $s *.yaml
end
The docs are built with Sphinx, including
apidoc,
autodoc, and
napoleon.
Configuration is in
docs/conf.py
To build them, first install Sphinx with pip install sphinx
. (You
may want to do this outside your virtualenv; if so, you’ll need to
reconfigure it to see system packages with
virtualenv --system-site-packages local
.) Then, run
docs/build.sh.
This ActivityStreams validator is useful for manual testing.
Changelog¶
1.10 - unreleased¶
- Moved web site and REST API to granary.io! granary-demo.appspot.com now 301 redirects.
- Twitter:
- Update the publish character limit to 280. Background.
- Fix a bug in ``preview_create` that auto-linked @-mentions inside URLs <https://github.com/snarfed/bridgy/issues/527#issuecomment-346302800>`__, e.g. Medium posts.
- Support videos and animated GIFs in
get_activities()
etc.
- Instagram:
- Add cookie query param to REST API to allow scraping that logged in user’s feed.
- HTML (including Atom content):
- Render image, video, and audio attachments more often and consistently.
- Include microformats2
u-photo
,u-video
, andu-audio
classes more often and consistently.
- Atom:
- Add
atom_to_activities()
for converting full feed documents. - Add to REST API and web UI.
- Add
- JSON Feed:
- Fix bug that omitted title in some cases (#122).
1.9 - 2017-10-24¶
- Add ActivityStreams
2.0! New
as2
module includesto_as1()
andfrom_as1()
functions. Currently supported: articles, notes, replies, likes, reposts, events, RSVPs, tags, attachments. - Atom:
- Add new
atom_to_activity()
function for converting Atom to AS1. - Add email field to author, if provided.
- Add new
- JSON Feed:
- Raise ValueError on bad (non-dict) input.
- REST API:
- Add
as2
value forformat
andinput
. Revise existing ActivityStreams and microformats2 value names toas1
,as1-xml
, andmf2-json
. Old valuesactivitystreams
,json
,json-mf2
, andxml
are still accepted, but deprecated.
- Add
1.8 - 2017-08-29¶
Add JSON Feed support to both library and REST API.
Twitter:
- Add
get_blocklist()
. - Bug fix for creating replies, favorites, or retweets of video URLs, e.g. https://twitter.com/name/status/123/video/1 .
- Bug fix for parsing favorites HTML to handle a small change on Twitter’s side.
post_id()
now validates ids more strictly before returning them.
- Add
Facebook:
Instagram:
- Update scraping to handle new home page (ie news feed) JSON schema, which changed sometime around 2017-02-27. (Profile pages and individual photo/video permalinks still haven’t changed yet.)
microformats2:
Add u-featured to ActivityStreams
image
.Improve
h-event
support.Minor whitespace change (added
) when rendering locations as HTML.
post_id()
now validates ids more strictly before returning them.Fix bugs in converting latitude and longitude between ActivityStreams and mf2.
Google+:
- Update HTML scraping to handle changed serialized JSON data format.
Atom:
- Add new
activity_to_atom()
function that renders a single top-level<entry>
instead of<feed>
. - Add new
reader
query param for toggling rendering decisions that are specific to feed readers. Right now, just affects location: it’s rendered in the content whenreader=true
(the default), omitted whenreader=false
. - Include author name when rendering attached articles and notes (e.g. quote tweets).
- Only include AS
activity:object-type
andactivity:verb
elements when they have values. - Render AS image and mf2 u-photo if they’re not already in content.
- Render
thr:in-reply-to
fromobject.inReplyTo
as well asactivity.context.inReplyTo
.
- Add new
REST API:
- Fix bugs in html => json-mf2 and html => html conversions.
1.7 - 2017-02-27¶
- microformats2:
- Interpret
h-cite
and u-quotation-of` (experimental) <https://indieweb.org/quotation#How_to_markup>`__ as attachments, e.g. for quote tweets. - Convert audio and video properties to AS attachments.
- Interpret
- Twitter:
- Linkify @-mentions and hashtags in
preview_create()
. - Support creating quote tweets from attachments with Twitter URLs.
- When converting quote tweets to AS, strip quoted tweet URL from end of text.
- Raise ValueError when
get_activities()
is passedgroup_id='@search'
but notsearch_query
.
- Linkify @-mentions and hashtags in
- Instagram:
- Improve HTML scraping error handling.
- Support multi-photo/video posts.
- Facebook:
- Disable creating “interested” RSVPs, since Facebook’s API doesn’t allow it.
- Atom:
- Support media enclosures for audio and video attachments.
- Source.get_activities(): start raising ValueError on bad argument values, notably invalid Facebook and Twitter ids and Instagram search queries.
- Fix rendering and linkifying content with Unicode high code points
(ie above the 16-bit Basic Multilingual Plane), including some emoji,
on “narrow” builds of Python 2 with
--enable-unicode=ucs2
, which is the default on Mac OS X, Windows, and older *nix.
1.6 - 2016-11-26¶
- Twitter:
- Handle new “extended” tweets with hidden reply-to @-mentions and trailing URLs for media, quote tweets, etc. Background: https://dev.twitter.com/overview/api/upcoming-changes-to-tweets
- Bug fix: ensure like.author.displayName is a plain unicode string so that it can be pickled normally, e.g. by App Engine’s memcache.
- Bug fix: handle names with emoji correctly in favorites_html_to_likes().
- Bug fix: handle search queries with unicode characters.
- Atom:
- Render full original quoted tweet in retweets of quote tweets.
- microformats2 HTML:
- Optionally follow and fetch rel=“author” links.
- Improve mapping between microformats2 and ActivityStreams ‘photo’ types. (mf2 ‘photo’ type is a note or article with a photo, but AS ‘photo’ type is a photo. So, map mf2 photos to underlying type without photo.)
- Support location properties beyond h-card, e.g. h-adr, h-geo, u-geo, and even when properties like latitude and longitude appear at the top level.
- Error handling: return HTTP 502 for non-JSON API responses, 504 for connection failures.
1.5 - 2016-08-25¶
- REST API:
- Support tag URI for user id, app id, and activity id.
- Twitter:
- Better error message when uploading a photo with an unsupported type.
- Only include original quote tweets, not retweets of them.
- Skip fetching retweets for protected accounts since the API call always 403s.
- Flickr:
- Better username detection. Flickr’s API is very inconsistent about username vs real name vs path alias. This specifically detects when a user name is probably actually a real name because it has a space.
- Uploading: detect and handle App Engine’s 10MB HTTP request limit.
- Bug fix in create: handle unicode characters in photo/video description, hashtags, and comment text.
- Atom:
- Bug fix: escape &s in attachments’ text (e.g. quote tweets).
1.4.1 - 2016-06-27¶
- Bump oauth-dropins requirement to 1.4.
1.4.0 - 2016-06-27¶
- REST API:
- Cache silo requests for 5m by default, 60m for Instagram because they aggressively blocking scraping. You can skip the cache with the new cache=false query param.
- Facebook:
- Upgrade from API v2.2 to v2.6. https://developers.facebook.com/docs/apps/changelog
- Add reaction support.
- De-dupe event RSVPs by user.
- Twitter:
- Switch create() to use brevity for counting characters. https://github.com/kylewm/brevity
- Fix bug in create() that occasionally incorrectly escaped ., +, and - characters.
- Fix text rendering bug when there are multipl photos/videos.
- When replying to yourself, don’t add a self @-mention.
- Instagram:
- Fix bugs in scraping.
- Upgrade to requests 2.10.0 and requests-toolbelt 0.60, which support App Engine.
1.3.1 - 2016-04-07¶
- Update oauth-dropins dependency to >=1.3.
1.3.0 - 2016-04-06¶
- Support posting videos! Currently in Facebook, Flickr, and Twitter.
- Instagram:
- Add support for scraping, since they’re locking down their API and requiring manual approval.
- Linkify @-mentions in photo captions.
- Facebook:
- Fetch Open Graph
stories
aka
news.publish
actions. - Many bug fixes for photo posts: better privacy detection, fix bug that attached comments to wrong posts.
- Fetch Open Graph
stories
aka
- Twitter:
- Handle all photos/videos attached to a tweet, not just the first.
- Stop fetching replies to @-mentions.
- Atom:
- Render attachments.
- Add
xml:base
.
- microformats2:
- Load and convert h-card.
- Implement full post type discovery algorithm, using mf2util. https://indiewebcamp.com/post-type-discovery
- Drop support for h-as-* classes, both incoming and outgoing. They’re deprecated in favor of post type discovery.
- Drop old deprecated
u-like
andu-repost
properties.
- Misc bug fixes.
- Set up Coveralls.
1.2.0 - 2016-01-11¶
- Improve original post discovery algorithm. (bridgy #51)
- Flickr tweaks. (bridgy #466)
- Add mf2, activitystreams, atom, and search to interactive UI. (#31, #29)
- Improved post type discovery (using mf2util).
- Extract user web site links from all fields in profile (e.g. description/bio).
- Add fabricated fragments to comment/like permalinks (e.g. #liked-by-user123) so that object urls are always unique (multiple silos).
- Improve formatting/whitespace support in create/preview (multiple silos).
- Google+:
- Add search.
- Facebook:
- Fetch more things in get_activities: photos, events, RSVPs.
- Support person tags in create/preview.
- Prevent facebook from automatically consolidating photo posts by uploading photos to “Timeline Photos” album.
- Include title in create/preview.
- Improve object id parsing/resolving.
- Improve tag handling.
- Bug fix for fetching nested comments.
- Misc improvements, API error/flakiness handling.
- Flickr:
- Create/preview support for photos, comments, favorites, tags, person tags, location.
- Twitter:
- Create/preview support for location, multiple photos.
- Fetch quote tweets.
- Fetching user mentions improvements, bug fixes.
- Fix embeds.
- Misc AS conversion improvements.
- microformats2:
- Improve like and repost rendering.
- Misc bug fixes.
- Set up CircleCI.
1.1.0 - 2015-09-06¶
- Add Flickr.
- Facebook:
- Fetch multiple id formats, e.g. with and without USERID_ prefix.
- Support threaded comments.
- Switch from /posts API endpoint to /feed.
- Google+:
- Support converting plus.google.com HTML to ActivityStreams.
- Instagram:
- Support location.
- Improve original post discovery algorithm.
- New logo.
1.0.1 - 2015-07-11¶
- Bug fix for atom template rendering.
- Facebook, Instagram: support access_token parameter.
1.0 - 2015-07-10¶
- Initial PyPi release.