SearchTools Blog
Webwide search robots now indexing Flash (and filling in forms) 
2nd-Jul-2008 05:04 pm
searchtools.com

The SWF (Flash) file format has been open for a while, and a lot of search engines have used the format to get at some of the static text in in the Flash files. However, Flash is now an interactive web site application builder, and there is a lot of text that just does not exist until someone comes along and clicks. This has meant that people who wanted their sites properly indexed by webwide search engines could not use Flash, or would have to go to extra lengths to provide static text for search engine robots to find.

What Adobe and Google have just announced is that Adobe is making a special version of the Flash code that can approximate a human interacting with the Flash application in the SWF file, triggering as many application states as it can. As far as I can tell, the Flash client within the indexing robot will be clicking every possible button and entering text in text fields. While indexing the labels on buttons seems odd at first, it makes sense to think of that as as anchor text pointing at other pages (or at least URLs).

This is similar to what the googlebot is doing on some site forms: automatically clicking every combination of buttons, menus, and checkboxes, and submitting words from the site in text boxes. This has ended up creating phantom shopping carts and search queries. They only do this on GET actions, not on POST, and presumably will not do so if the page has meta NOINDEX and NOFOLLOW tags.

The chief concerns I've seen from web site publishers include: the lack of clarity about exactly which JavaScript Flash loading links will be acceptable (especially SWFObject); how external XML files loaded by Flash will be indexed, and how the deep linking into Flash files will work. Adobe has some explanations on their FAQ At the moment, it's SWF only, all versions from the oldest to the current, whether generated by Flash or Flex, which they call "RIAs" (rich Internet applications). However, they are not providing access to FLV files, which are used on YouTube etc. to contain video for playback, and rarely have textual metadata.

Adobe says Yahoo is working on this as well, and Adobe says that they are "exploring ways to make the technology more broadly available" to other search vendors.

No word on whether that includes enterprise and site search developers. There's an excellent writeup from the SEO point of view at Searchengineland, and searchmarketinggurus has a skeptical response.
Comments 
3rd-Jul-2008 04:11 am (UTC) - thanks for the link
Hi thanks for the link - I'm Li Evans, the owner of SMG. Skeptical is right on the money. While it can read those files, Flash designers can't really optimize anything but the text google reads.

Hopefully in the future Adobe will get better with things. Until then, sites are better off staying away from websites entirely done in flash!

Thanks again -
~Li (storyspinner here on LJ!)
3rd-Jul-2008 06:52 pm (UTC) - Re: thanks for the link
Nice to meet you, and your skepticism! I think automated extraction will never be as good as human design, so no question, site built by people thinking about robots and indexers will look better and work better. But for my clients, who are indexing intranets they don't control, pulling stuff out of Flash will probably improve the search engine. So it goes.
This page was loaded Dec 1st 2009, 11:23 pm GMT.