Twitter announced of releasing a new version of the Twitter text processing library — “We’re using for auto linking and extraction of usernames, lists & hashtags. This change will now extract URLs that have no specified protocols,” stated Twitter.
Concretely, it’ll add http:// to the beginning of no-protocol URLs if:
- Host name ends with gTLD (i.e. twitter.com)
- Host name has 2 sub-domains followed by ccTLD (i.e. yahoo.co.jp, google.co.uk)
- Host name consists of 1 sub-domain and ccTLD, which is followed by / (i.e. t.co/, bit.ly/)
Here is the very simplified version of the Regex, based on the one in twitter-text-java: (?: SUBDOMAIN+ DOMAIN ccTLD) | (?: SUBDOMAIN* DOMAIN gTLD) | (?: DOMAIN ccTLD (?=/) )
Twitter said, “all URLs regardless of length will be wrapped by t.co on October 10, 2011.” “On that date, we’ll also begin wrapping URLs without specified protocols. To help prepare you for this near eventuality, we’re considering adding this new linking strategy to the two opt-in developer features we introduced a month ago.”
That way you could simulate how URLs without protocol linking and t.co URLs wrapping will look like on October 10.
- Per-tweet basis: Using the wrap_links=true parameter to the POST statuses/update and POST direct_messages/new.
- Application basis: Visiting your application settings and configuring this option globally for your application.
The new twitter-text version will be be published on GitHub in a couple of days: