Semantic Coding
How information idealism improves the internet
This article is written for the lay (business) person to better help him or her understand how the modern internet (and information technology in general) is constructed. At least I tried to write it for that audience. I don’t think I quite succeeded though. I’m not even quite sure it’s even possible to explain this to someone without any technical knowledge of how the web works. Basically, web developers are transitioning to a semantic form of writing code because it saves time and money. Semantic coding involves tagging content with code that specifies what type of information it is. In the past, web pages integrated code for both content and visual presentation together. With semantic coding, content structure and visual presentation is separated. This allows for faster loading sites, better search engine ranking, and more dynamic content.
Introduction
There is an ongoing movement in information technology to code semantically. If we were given a document with no paragraph breaks, headings the same font as body text, and pictures without captions, we would probably be able to figure out what the document was trying to communicate after slowly reading through it. We can do this because we have the ability to intelligently interpret information presented to us. Now imagine if this same document was given to a computer and we asked it to format it into a more presentable document. It wouldn’t know where one paragraph ends and another begins. Personal computers don’t have the artificial intelligence to decipher language and then reflect on what is written. This is where adding semantic code to the document comes in.
As the term implies, semantic coding attempts to add meaning to what is being coded so that it is more meaningful to both humans and computers. It accomplishes this by classifying information with markup tags specifying what each piece of information is. In the most basic case, paragraphs have paragraph tags and headers are identified as superior to subheaders. This lets the computer know how to format the document for human viewing. When semantic coding is applied to web development, it allows for web sites that are more accessible, efficient, and dynamic.
How the Web Became Broken
Tim Berners-Lee invented the World Wide Web with the goal of linking information to other information. He described his vision in which “every quotation would have been a link back to its source,” and “a computer could represent associations between things that might seem unrelated but somehow did, in fact, share a relationship.” 1 To accomplish this goal, one of his creations was hypertext markup language (HTML) code to tag, or markup, information. The code was meant to describe the structure of a web page’s content. Headers of descending levels in a hierarchy are represented by <h1>, <h2>, and <h3>, while paragraphs are marked with <p> tags. The <blockquote> tag could be used to signify that the text contained between the start and end tag is a quote, while the source of the quote is contained between <cite> tags. How this information would look to the end consumer was determined by how that person’s web browser visually interpreted the various tags. The <h1> information appeared bigger than <h2> but the actual size was up to the viewer’s own browser and font size settings. The original purpose of HTML was to meaningfully classify information allowing it to be linked to related information. The Web was meant to be semantic from the start.
However, problems with this model of the Web started to arise as designers wanted more control of a web page’s presentation. To create text with special effects in a font that the viewer didn’t have installed, designers would create image files of that text and insert it in their web page where the title might be. David Siegel is the web designer most infamous for popularizing the practice of mixing structural markup with visual presentation. In his 1997 essay “The Web is Ruined and I Ruined it,” 2 Siegel laments that he “ruined the Web by mixing chocolate and peanut butter so they could never become unmixed.” He put paragraphs of text into tables even though that text wasn’t tabular data, such as numbers in a spreadsheet. This allowed him more control over its visual presentation. Visual designers tend to be well versed in grid systems and a table is exactly that. The sites that he designed became popular and he wrote a book, Creating Killer Web Sites, which became an Amazon bestseller for nearly half a year. Partially because of this, table-based layout became the standard in web design, and markup code was used to describe how the content should look instead of what it meant in the context of the document.
Why Mixing Presentational with Structural Markup is a Problem
- It requires an excess amount of code to be inserted into a web page
- Makes the information less accessible
- Results in sites that are difficult to maintain
If presentational markup is used, every web page in a site contains repeated code. Branding and navigational elements tend to be consistent across an entire web site. But every time a page is viewed, all the code describing how the logo should look and where the navigation buttons are located have to be loaded again. This creates unnecessary bandwidth traffic, leading to slower loading sites for visitors and a bigger bill for the site owner at the end of the month.
The information is also less accessible because if the presentational code is mixed with content, it won’t transfer well when viewed in anything other than a traditional web browser. For example, screen reading software for the visually impaired will end up reading aloud the presentational code even though it is irrelevant. Or if the web page is viewed on a mobile device, it may become unintelligible, as the mobile web browser futilely attempts to reformat the content for a smaller screen size. If site navigation is entirely image-based, then the site becomes impossible to navigate on browsers without image support, which is the case on many mobile phones. Though it requires more work and is not required for non-governmental sites, making a site accessible to the widest audience is usually better for everybody. According to the web usability tome, Design of Sites, accessibility isn’t just for those with physical impairments. Ramps designed for people with wheelchairs also help parents with strollers, while closed captioning allows for television watching in a noisy sports bar. There is an equivalent in web design. “The idea is to try to accompany as much of your Web content—including images, audio, and movies—with plain text, since text is highly accessible.” 3 This means that images have captions and audiovisual content is labeled and described. While it is considered “plain” text, the plain text being advocated must still be semantically tagged so software can interpret it for maximum accessibility.
While accessibility might not be on everybody’s priority list when designing sites with a very specific audience in mind, the ease of site maintenance probably is. If all the presentation were integrated into each web page rather than wrapped around it, any change to a site’s aesthetics would require changing each and every file. A change such as adding a thin border around all the photos on a site would take hours or days depending on the size of the site. This could quickly become a massive ordeal if a site were comprised of hundreds of pages. However, if the presentation code were contained in a file separate but linked to each page of content, this change would require adding just one or two lines of code. The designer would simply specify that all images of the class type “photo” should have a border. However, semantic coding would have to first be used when creating the site so that all the photos had the “photo” label. The ability to implement site wide changes relatively easily is just another advantage that the semantic philosophy of building web sites brings to the table.
- Berners-Lee, Tim. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web. New York: HarperCollins, 2000.
- Siegel, David. “The Web is Ruined and I Ruined it.” XML.com. 7 Oct 1997. O’Reilly. Ed. Kendall Grant Clark. 30 Jan 2007. <http://www.xml.com/pub/a/w3j/s1.people.html>.
- Van Duyne, Douglas K., James A. Landay, and Jason I. Hong. The Design of Sites: Patterns for Creating Winning Web Sites. Upper Saddle River, NJ: Prentice Hall, 2007.
Semantic Coding by Daniel Yang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.