html basics

Details: Last Updated: Tuesday, 03 August 2021 01:32; Published: Tuesday, 14 April 2020 02:39; Hits: 2772

HTML: Hyper Text Markup Language:

HTML is a language for describing hypertext documents. Hypertext documents are made up of headings, paragraphs, bulleted lists, and, importantly, links to other hypertext documents; it’s the links that constitute the hyper part of hypertext. Markup implies adding markup to the plain text to indicate which bits of it are headings, paragraphs, lists, links, etc.

History

In 1990, Tim Berners Lee designed a simple hyper text based system, which called to be known as WWW or internet. In earlier days, when researchers shared their documents amongst computers, each had their document written by a specific software, and unless the other side also had the same software, the only way to read was to convert the document to a format that the software on the receiving computer can read. Tim Lee had written many such utilities, but realized that everyone writing in a single format that is easy to transfer would be the solution. So, he introduced HTML languauge that added tags to document and some other syntax. Then he developed a viewer called browser that could read this document. Anyone could write their own browser, as it just needed to parse the tags and display the content. Thus, HTML became the default language of internet.

HTML 1.0:

HTML is the language that the content on any website is written in. In earlier days of internet, you could write a simple text file with simple tags to change text style, size, color, etc.. However, this would be simple text file, with no control on how to display it on the page, how to add images, etc. Later a lot more tags were added to make the page appealing to user. However in early days of internet, everyone started defining their own tags which supported only their browser. In 1993, W3C (world wide web consortium) was formed to standardize the process. Soon, HTML 1.0 was introduced. It's called hyper text markup lnguage, since early HTML std had support for mostly hypertext (i.e links which pointed to other HTML docs).

HTML 2.0:, DHTML, CSS 1.0:

In 1994, Marc Andersson introduced Mosaic browser, and in 1995, Microsoft introduced Internet Explorer. With browser wars starting, HTML 2.0 standard was released in 1995. HTML 2.0 introduced forms which allowed data to be sent back to server. Shortly after that, Netscape introduced JavaScript that enabled web pages to respond to user actions without going back to the server at all. This allowed web pages to become highly interactive, and came known as dynamic HTML (DHTML). 2 main browsers, Netscape and IE implemented DHTML very differently, and thus had no compatibility. W3C stepped in 1999 to come up with a standard, and IE achieved full compatibility with DOM (dynamic object model) level 1. In the meantime CSS 1.0 standard came in late 1990s which was an addition to HTML.

HTML 3.2:

HTML3.2, introduced in 1996, added widely used features as tables, applets, text flow around images, etc. It was W3C first recommendation for HTML, and was backward compatible with HTNL2.0.

HTML 4.01, XHTML1.0:

HTML 3.2 std was superceded by HTML 4.01 in 1999. With HTML4.01 and CSS2 in early 2000, web became very popular. HTML4.01 added support for more multimedia options, style sheets, scripting languages, etc. Where features were lacking in the standards, developers used JavaScript or third-party plug-ins, such as Macromedia’s (now Adobe’s) Flash, to fill in the gaps. Around this time, W3C decided that the future of HTML lay in XML . XML is superficially similar to HTML —documents, tags, and elements all exist in XML. However, XML is superior in 2 ways: it's more strict in syntax, and it's extensible (i.e new tags can be added by defining them in a new file). W3C redefined HTML 4.01 into XHTML 1.0. It contained no new elements or features; all the valid elements were identical to those in HTML 4.01 . The only changes came from it now being a dialect of XML . The plan was to extend XHTML in a modular fashion by plugging in new XML dialects. Some of the better-known XML dialects the W3C expected to be plugged in to XHTML were Scalable Vector Graphics (SVG) , and M ath ML , an XML language for describing equations.

AJAX:

In 2004, Firefox browser, which was a desendent of Netscape browser introduced XmlHttpRequest (XHR) object which allowed only a small part of the page to be reloaded instead of reloading the whole page with every click of the user. This made Gmail, which used this, extremely fast and led to a spurt in XHR -based web applications and renewed interest in JavaScript. The approach was soon given the acronym AJAX (for Asynchronous JavaScript and XML). AJAX became very popular because of all the cool things it could do on client side.

XHTML2 vs HTML 5:

In year around 2005, 2 different paths were taken for future std for HTML. W3C continued to work on XHTML 2 which was radically different, while a new group of comapnies started developing HTML 5 which was more evolutionary. HTML 5 started getting more traction, and so in 2007 W3C started working on HTML 5.0 and put XHTML 2.0 on hold. HTML5 intended to replace all previous HTML and XHTML. At the same time, new CSS3 std was being developed, which was modular. it’s split into sections such as Backgrounds and Borders, Values and Units, and Text Layout. In the meantime, until a particular module is ready, the corresponding section of the CSS 2.1 spec is regarded as the current standard. HTML5 and CSS3 were released around 2015. These are the ones that are supported by all browsers as of 2020, and 90% of the websites already use these.

HTML syntax: Even though HTML5 is the latest, we still learn HTML, as HTML5 is backward compatible with HTML.

HTML language, specified "elements" or "tags" told the browser how to display the text content. Browser do not the display the tag, but they use these tags to display the content in a way that the tags tell them to. These tags may also have special attributes defined. If you ever look at the source of any webpage (by hitting "ctrl + U" on firefox, or by going to Web developer->Page source on firefox), you will see that the whole code is in html. Purpose of browser is to read HTML docs and display them.

Below is a link from w3schools website, which is one of the best places to learn programming languages for websites:

https://www.w3schools.com/html/default.asp

HTML tags: Tags in html are element names surrounded by <> (angle brackets). There is a start tag (<h1>) and an end tag (</h1>) denoted by "/" within <>. Everything from start tag to end tag (including tags) is called an element. Everything between the tags is the content. Content can have tags too, so tags can be nested. Tags are not case sensitive.

ex: <h1> content heading <b> in bold </b> continuation </h1> => here tag <b> is nested within tag <h1>. Here the whole element is called "h1" element since it contains h1 tag on it's outermost. When we say "b" element, it refers to nested element in b tags. (i.e text "<b> in bold </b>")

Attribute: Each element can have additional attributes, which is always specified in the starting tag with name="value" format. value is enclosed in "..". There can be multiple attributes separated by space. Attribute may be optional for some tags, while required for other tags. Some attributes are specific to certain elements, and others can be applied to any element. The two most common attributes are "id" and "class", which are used in CSS a lot.

ex: <a href="/abc.com"> link to abc </a> => Here <a> is starting tag for link and </a> is ending tag. href is an attribute and is required for <a> tag.

HTML document: An HTML document is a tree of elements descending from an <html> element and its two children: <head> for metadata (literally, “data about data”) and other nonvisible elements, and <body> for the page content. HTML doc is divided in 2 portions: head and body. This is how most html docs will look like: a top level html tag, then within it are a head tag and a body tag.

html: <html> .... </html> =>
head: <head> title="my website" </head> => contains metadata about the page. This is data about the page and is not displayed. Some of the elements used in head are:
1. title: shows title in title bar
2. link: Reference external resources such as style sheets
3. script: Specify code to be run in the browser
4. meta: Provide key-value pairs of metadata
body: <body> .... </body> => contains all content to display

Sample html file: You can copy below file in any editor and save it as test.html. Then if you click this file, you will see it open in your browser.

<!DOCTYPE html> => specifies that following doc is in html5 format

<html>

<head>
<title>Page Title</title>
</head>

<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>

</html>

NOTE: for clarity we add newline and spaces in a html file. Even if we put everything in one line, the behaviour of browser regarding how the content is displayed won't change, since only tags are used to tell browser how to display spaces and new lines. All whitespace char in html are collapsed to a single white space.

File paths inside html file: Inside html doc, from time to time, we have to specify location of files to access. For example, in hyperlinks, we have to specify url of other websites, while when showing some local picture on our website, we have to specify the path where that picture is. Important to note that we can only access files on the webserver that have correct permissions and and are in the correct dir (i.e dir which apache server is allowed to access).

ex: To access file: pictures/pic.jpg on my local server www.maaldaar.com.

A. Absolute path: One way is to specify everything as a url, even if the file is on your local website.

Here, I can specify file as a url: "www.maaldaar.com/pictures/pic.jpg".

However, the above approach is not preferred, especially since you may have a lot of local files that you are referring to, and every time you change your domain name, you may have to change all the links manually to refer to new domain name. Better approach is to use relative path names for pointing to local files:

B. Relative path: Here I specify file as "/pictures/pic.jpg". Here / refers to root of the webserver where that website is located and not to the root of the Linux dir on your computer where that server is installed. We can also specify path as relative to the folder where the current html page is. So, if root has 2 folders "pictures" and "docs", and if the current folder is "docs", then to access the file, we specify "../pictures/pic.jpg". To specify something in current "docs" dir, we specify "my_pic.jpg".

Few imp tags: visible html elements fall in 2 broad categories: block elements and inline elements. Block-level elements should never be children of inline elements, but inline elements can be children of other inline elements.

Block elements: A block element naturally takes up the full width available to it; consecutive block elements naturally start below the previous block element. Below are 3 common block elements.

1. paragraph: <p> my name </p> => my name is in it's own paragraph. Anyhting after this will be in a new paragraph. So, paragraphs automatically cause line break (i.e start of new line)

2. heading: <h1> ... <h6> => headings with h1 being the largest size text, and h6 being the smallest. headings automatically cause line break (i.e start of new line)

3. lists: shows list as bullets or numbers. lists automatically cause line break. ul=> unordered list (bullets), ol=> ordered list(numbered), li=>list items in the list

<ul>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
</ul>

<ol>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
</ol>

Inline elements: These fit exactly to their content and sit naturally on the line of text in which they’re situated. The previous and next element are in same line, except if they were block elements. ex are formatting tags, links etc.

1. Formatting tags: <b> = bold, <i>=italic, <mark>=marked text (highlighted with some color), <small>=small text, <em>=for putting emphasis on text (like putting it in italic)

2. quotation tags: <q>=quote, etc. Mostly useful for search engines, parsing etc. Display doesn't necessarily change.

neutral elements: A <div> is a block-level element, and a <span> is an inline element. By themselves, these elements are intentionally semantically neutral; they don’t “mean” anything, we can make them mean anything we want. Basically we define the behaviour of new elements to what we want them to be. We will learn about these in CSS.

Other tags:

comments:  => comment tag for inserting your own comments. this is not displayed on browser.

line break: <br> => line break tag to start next text on new line. Closing tag for br is optional and is called "empty element", since br tag doesn't have any content inside it. We can also use </br> instead of <br>.

pre: <pre> => preformatted tag displays inside text as is, with all spaces and line breaks that are there in the text. Otherwise, any text without these tags, is displayed without any extra spaces or line breaks that are in the text (as html parser collapses all whitespaces into one). As the browser ignores extra spaces and line breaks when displaying text, that is why we don't see line breaks or paragraphs in displayed text in browser, even though our html file contains it.

image: <img src="w3schools.jpg" alt="W3Schools.com" width="104" height="142"> => shows an image w3schools.jpg on the local server in current dir (image can also be a link to some other website image, it doesn't need to be local). alt speciifes alternative text to display in case image can't be displayed. width/height is in pixels. For our locally installed apache server, images or any document has to be proper permissions and in correct dir, from where webserver is allowed to read file. for ex, if the document is not in /var/..., apache may not be able to read it depending on apache settings.

object: <object> element is more general purpose element for embedding content in your page. The <object> element can link to an arbitrary file. The only additional requirement is that you specify the file type. In browsers that support SVG images, <object> can be used to replace <img>.

link: <a href="/www.abc.com/intro.php" target="_blank"> link to abc </a> => text "link to abc" links to www.abc.com/intro.php. target specifies where to open the link. _blank opens the link in new window or tab, while _self opens in same window/tab. Other target attributes are not relevant as they are used for frames (pre CSS days). <a> is the most used tag in HTML, and this is what gives it the name "hyper text". Here href doesn't have to be another url, it can be a file or script on local server. See ex below:

ex: <a href="/scripts/welcome.php" > Welcome >/a> => This has link to php file on local server. so, php file is called, which gets run and o/p sent to be displayed on same window/tab (since target is not specified, "_target" is set to default value of "_self" which opens the doc in same window/tab). NOTE: scripts dir is looked for in current dir where we are right now. Most of the times, we don't change dir, so our current dir is still server main dir.

ex: <a href="/css/custom.asp" target="_self"> link </a> => Link to a script located in the css folder on the current web site."/" in beginning of path refers to staring dir of our website (i.e in whichever dir that website server is installed, that becomes "/" dir for our website. So, if abc.com is installed in dir /home/website/abc, then this dir "/home/website/abc" serves as "/" dir for abc.com). We can omit "/" too as most of the times we are in server main dir, and call all scripts/html_files from server main dir itself, so href="/css/custom.asp" is equally valid.

Aside from having text content as a link, we can have image or email also as a link:

ex: <a href="/scripts/welcome.php" > <img src="w3schools.jpg" alt="W3Schools.com" width="104" height="142"> </a> => Here instead of using text, we put <img> tag which serves as the clickable link (in lieu of text) to open the file specified. This is what you see in many websites, where clicking on a picture takes you to another link. It can also be used to make pictures larger on clicking.

ex: <a href="/images/my.png"> <img src="/images/my.png" alt="CHART" /> </a> => Here link is an image which is displayed on current page. However, that image is resized to fit the page, so it may appear smaller. However, on clicking that image, it takes you again to the same image, but now it's the full sized original picture on the same window replacing the earlier small sized picture.

email link: <a href="mailto:This email address is being protected from spambots. You need JavaScript enabled to view it.">Send email</a> => This opens the user's email program (to let them send a new email)

link to navigate to particular section on page: You see this on a lot of websites, where clicking takes you to bottom of a page or elsewhere. This is easy to achieve by adding #id_name to href link name. We add extra "#" to end of link. The "id_name" is the name that we put on that page within any tag as <h3 id="id_name">

ex: <a href="/www.abc.com/intro.php#mid_page"> link abc <a>. Now we put id name "mid_page" to any tag on the target page, where we want out link to go to. If we couldn't find a suitable place to put the tag, just add any tag as <p> on to your intro.php "html" section, with this tag. i.e add this to your destination page: <p id="mid_page">. Now clicking the link will take you to that section of page, where that id appears. Ofcurse that id name has to be unique on that page, else link will take you to the first id it finds.

If you want to link to different section on same page, then you don't need page url, just the # with id. ex: <a href="#para_4"> ... </a>. This takes to section on same page with id="para4". As explained earlier, this "id" can be assigned to any tag.

button: shows a button with the name. ex: <button> click me </button> => button "click me" to choose

iframe: allows us to create an embedded browser window inside the one the page is rendering in. This is an easy way to allow parts of the page to be updated without reloading the whole thing. The <iframe> element is used a lot for embedding advertising, displaying videos, and Facebook applications.

form: The form element is one of the most imp elements used to collect user input. The user input can then be sent to a server for processing or be used by a script or some other action.We will use forms a lot when learning javascript.

An HTML form contains form elements, as <input> element, <label> element, etc.

input element takes input from the user. It has a type attribute that determines how to take that input. "type" attribute can be text, radio or submit. It also has "name" attribute which assigns a "variable name" for that field, and gets value for that based on whatever user entered. Now we can use that var "name" to do any operation with it. Looks like the scope of this var is for form element only. value may be available outside the <form> tag. When autocomplete is on (autocomplete="on"), the browser automatically complete values based on values that the user has entered before.

There are lot more elements and attributes to be used with form. Look on w3schools.com.

ex: below form takes 2 i/p from user in text form (there are 2 boxes in which to enter text, they are prefilled with values "John" and "Doe"). Then there are 2 radio buttons with value male and female, and we can select one of them

<form>
<input type="text" id="fname" name="fname" value="John"><br> => By default, Value "Joe" is stored in var "fname". If user enters a value, then that's
<input type="text" id="lname" name="lname" value="Doe" autocomplete="on"> => Here autocomplete is turned ON for this field

<input type="radio" id="male" name="gender" value="male"><br> => Value "male" or "female" is stored in var "gender"

</form>

Above form does nothing as the data that user inputs is not used anywhere. We can add input type "submit" and assign an action to take when that submit button is clicked. The action attribute defines the action to be performed when the form is submitted. The target attribute specifies if the submitted result will open in a new browser tab, a frame, or in the current window. default valus is "_self" which means form will be submitted in current window, while "_blank" means form result opens in new window. The method attribute specifies the HTTP method (GET or POST) to be used when submitting the form data. Default method "Get" submits form data by making it visible in page's address field (i.e it becomes part of url), while the more secure "Post" method doesn't make it visible in addr field.

So, above <form> code may be rewritten as:

<form action ="script/action_page.php" target="_blank" method="get"> => This php script gets run on submitting the form (when we replace the 1st line in above ex with this line).

......

<input type="submit" value="SUBMIT IT"> => This submit attribute causes a "submit" button to appear on the page. On lcicking "SUBMIT IT", url "www.abc.com/script/action_page.php?firstname=John&lastname=Doe" is sent to the server (this is because method "get" was used). Then the server parses this url, gets the data (firstname, lastname) and runs script "action_page.php".

<button type="submit"> SUBMIT IT</button> => button type may also be used instead of <input ..>

NOTE: In many websites, you see that when you submit inputs, there is a validator that points out errors if any before submitting the form. This is done using Javascript, that we'll learn in Javascript section. We use onclick attr of form. Goto Inline event handler in javascript for an example.

Validating HTML document:

Once you have written HTML document, even if the doc is not correctly written, browsers will still display it without throwing an error. So, browsers are forgiving in syntax error, but behaviour is browser dependent. There are HTML validators available online that will validate the document tree. On top of this, browsers also have support for validating. In Firefox, look for the Web Developer menu option (by clicking , and select Inspector. The tools open with a tree view of the markup. Use this to highlight elements you’re interested in and check that the tree structure the browser has built corresponds to what you intended. DOM (Document Object Model) is what we'll see next.

Nav view search

Navigation

Search

html basics