The Internet Explained
What is the Internet
The internet is shorthand for Inter-network. An inter-network is a decentralized network of networks. In essence, it is the physical and wireless connections that allow networks to connect with other networks.
While physical and virtual devices such as Routers, switches and web servers allow the internet to connect. The interconnection of the separate networks requires the standardisation of data packets using TCP/IP (Transmission Control Protocol/Internet Protocol) that allows all the separate networks to communicate and be compatible with each other.
History of the Internet Creation
The internet was invented by the Advanced Research Projects Agency (ARPA) which was formed to bring the best scientific minds in the US to allow American military technology to stay ahead of their enemies. The predecessor of the internet called ARPANET (ARPA Network) was originally used as a proof of concept for a large-scale computer network.
Lawrence Roberts, the first person to connect two computers and scientist Leonard Kleinrock were responsible for developing the computer networks at APRA. In 1969 the development of packet-switching allowed ARPANET to send a message to another site. ARPANET became a tool allowing several American universities' academic engineers and computer scientists to communicate with each other. The system was slowly explained to government offices and other universities in America as well as overseas such as the UK.
After the success of ARPANET other organisations started creating their networks. However, these new networks were incompatible with other networks including ARPANET.
The creation and adoption of the TCP/IP (Transmission Control Protocol/Internet Protocol) to enforce how packets are formed and sent through the network allowed for a standardisation of networks which in turn allowed ARPANET to connect with other networks and become a global interconnected network of networks thus creating the internet.
How the Internet Works
Packet
A packet is the basic unit of information transmitted over the internet. Splitting information up into small, digestible pieces allows the network’s capacity to be used more efficiently.
TCP/IP is standard for the header that allows packets to be compatible with all networks. IP is essentially used as an address/destination to send the packets while TCP allows the receiver the ability to reassemble the packets back into the original message.
The header contains information that helps the packet get to its destination, including the length of the packet and its source and destination. After the header comes the actual data. A packet can contain up to 64 kilobytes of data, which is roughly 20 pages of plain text. Finally, the trailer lets the recipient know it reached the end of the packet and contains the checksum value that helps the recipient detect if a packet was correctly received. If it is incorrect or a packet doesn't arrive the recipient can ask the sender to resend that specific packet.
Routers
Routers manage and forward packets to different networks based on their destination. Routers are like the post office of the Internet, making sure that packets go to the right networks and devices. Typically the router connects directly to the modem and this acts as the connection point for your home network to the internet
Switches
Switches connect and manage the devices that share a single network. They use packet switching to ensure the returning packets are sent to the correct devices. They also receive outbound packets from those devices and pass them along to the next network. Switches are often used to increase the total wired connected devices to a router and can act as a hub.
Modem
The modem connects your home network to the internet via an Internet Service Provider. The modem can also sometimes be called a gateway which means it serves as both a router and modem which allows devices to directly.
Servers
Web servers are specialized high-powered computers that store and serve content (webpages, images, videos) to users, in addition to hosting applications and databases. Servers also respond to DNS queries and perform other important tasks to keep the Internet up and running. Most servers are kept in large data centres, which are located throughout the world.
Your device (client) connects to a LAN which then connects to an ISP (Internet Service Provider) which connects you to the Internet. The uses of communication protocols such as HTTP define the structure of the transmission of information between any 2 devices on the internet.
All devices on the internet have a unique IP address given by the ISP that allows for the information to be sent and received by the correct device. The use of routers located at junctions where 2 or more network connections meet direct packets to the device using the IP address. Each router has its unique IP address that wraps the packet allowing for the return packet to follow the path to return to the requester's device.
HTTP (Hypertext Transfer Protocol)
HTTP is an application-layer protocol for transmitting hypermedia documents, such as HTML (HyperText Markup Language). It was designed for communication between web browsers and web servers, but it can also be used for other purposes. RestAPI uses HTTP requests to access and use the API data.
HTTP follows a classical client-server model, with a client (browser) opening a connection to make a request, and then waiting until it receives a response. HTTP is a stateless protocol, meaning that the server treats each request as if it is from an unknown device. The typical HTTP request contains an HTTP method, a URL, an HTTP version, HTTP request headers and an optional HTTP body.
HTTPS (Hypertext Transfer Protocol Secure)
HTTP has largely been replaced with HTTPS which uses cryptography for secure communication. The encryption used is called SSL (Secure Sockets Layer). SSL is used to establish a secure connection thus safeguarding the data sent between 2 systems (typically client and server or server and server) called an SSL handshake. This prevents a possible third party from reading or changing the communication by being between the 2 systems or falsely acting as the server called a man-in-the-middle attack.
SSL has been updated and typically refers to TLS (Transport Layer Security) which is an updated, more secure, version of SSL that allows for a choice of ECC (Elliptic-curve cryptography), RSA (Rivest–Shamir–Adleman) or DSA (Digital Signature Algorithm) encryption.
SSL certificates are attached to a URL/site and contain site information, information about the issuer and the expiry date of the certificate. It also provides a server with a private and public key that is encrypted called a key pair that the certificate verifies. The private key is stored on the server and never shared. The public key is used to encrypt the data which then requires the private key to decrypt thus only the server can decrypt the data.
Domain Name System (DNS)
If you use the browser you often see a URL (Uniform Resource Locator). A URL is nothing more than the address of a given unique resource on the Web. The domain name is contained in the URL.
URL typically follow this format:
Scheme://Subdomain.SLD.TLD.ccTLD?/Subdirect..
The domain is comprised of the subdomain, SLD and TLD and sometimes a ccTLD.
Scheme
The scheme tells the browser the protocol it must use to request the resource. In other words the set method for exchanging or transferring data. A website typically uses HTTPS (Hypertext Transfer Protocol Secure). Other protocols include HTTP (Hypertext Transfer Protocol), FTPS (File Transfer Protocol Secure), mailto (Uniform Resource Identifier (URI) scheme for email addresses) and many others.
Subdomains
Subdomains are created to organize and navigate different sections of your website. https://www.google.com.au/
The most common is www (World wide web ) so the website is on a web server. Another possible subdomain is "accounts" which means the resource is in the accounts section.
Second-level Domain (SLD)
SLD commonly refers to the organization/company that registered the domain name. SLD is the identity of your website. Such as google in google.com lets you know you are on a Google-owned site.
Top-level Domain (TLD)
TLDs are generic domain extensions that are listed at the highest level in the domain name system. Examples include .com, .org, .net, .edu etc. They typically represent the purpose of the website. Commercial websites typically use .com and educational websites use .edu.
There are categories for TLDs generics that can be used for any website such as .com, generic restricted for intended purposes eg. .biz, sponsored domains which must be tied to the related industry such as .gov and .edu and Infrastructure for DNS services which only currently contains .arpa.
Country Code Top Level Domain (ccTLD)
Two-letter TLDs are always ccTLD or country-specific domain names and represent a country code extension like .uk for the United Kingdom and .au for Australia. The ccTLD is always 2 letters.
While the ccTLD was originally used to distinguish different countries/areas of the world which allows for websites to target a specific region and its users. There are exceptions such as .io.
The .io was originally for British Indian Ocean Territory and is currently highly used for Tech and gaming websites due to coincidental referencing input/output. Another example is the .be which represents Belgium and is also used by Youtube to form the URL https://youtu.be/ due to spelling.
Subdirectory
When you click on a link within the website, a subdirectory may appear. It represents a directory or folder and corresponds to specific pages within a website.
https://www.google.com.au/imghp
The "imghp" is the subdirectory.
Summary
https://www.google.com.au/imghp
The above links have a specified protocol/Scheme called HTTPS (Hypertext Transfer Protocol Secure).
Both links have subdomains but they differ the first use "www" (World wide web) and the other uses "accounts".
"google.com" is the domain name. It can be split into the second-level domain "google" and the top-level domain "com". The "au" is the country code top-level domain.
DNS and IP Address
The browser connects to a server/website through the use of an IP address this is a numerical address such as 74.125.196.103 which is linked to google.com. This could be hard for humans to remember so we use Domain Names. The domain name system allows a human to input a domain name instead of having to remember the IP and the domain name system returns the IP address that the browser will use to locate the resource.
The typical process is Domain Name-> DNS->IP address. The DNS has primary servers, secondary servers, and caching servers.
Primary DNS Servers
Primary (Master) DNS servers contain all relevant resource records and handle DNS queries for a domain. One primary server has multiple secondary servers connected. Each zone must have only one master name server, and it should have at least one secondary name server for backup purposes to minimize dependency on a particular node.
Secondary DNS Servers
Secondary DNS servers store read-only copies of DNS records they receive pertinent information from a primary server instead of their local files. This process is known as a zone transfer.
Caching servers
A caching DNS server is a server that handles recursive requests from clients. Almost every DNS server that the operating system’s stub resolver will contact will be a caching DNS server. This allows for servers to keep prior request so that only request to a DNS server that hasn't been made need to be requested from a secondary server.
Local Machine
The temporary storage of information about previous DNS lookups on your local machine's OS or web browser can be cached called a DNS cache allows for faster response time but may be outdated causing issues when trying to reach a site the cache can be flushed to remove outdated addresses.
The Full DNS System
-
A user opens a web browser, enters www.example.com in the address bar, and presses Enter. The request for www.example.com is routed to a DNS resolver, which is typically managed by the Internet service provider (ISP).
The DNS resolver for the ISP forwards the request for www.example.com to a DNS root name server.
The Root name server tells the DNS Resolver to the next step which is to ask a TLD server for the ".com" TLD.
The DNS resolver for the ISP forwards the request for www.example.com again, this time to one of the TLD name servers for .com domains.
The TLD Server responds with the correct Authoritative Server (DNS Server) responsible for that domain name
The authoritative DNS servers can be where the website is hosted or where the DNS provider is. www.example.com is passed to the authoritative server.
Authoritative Server checks where example.com hosted zone for the www.example.com record, get the associated value, such as the IP address for a web server and returns the IP address to the DNS resolver.
The DNS resolver for the ISP finally has the IP address that the user needs. The resolver returns that value to the web browser. The DNS resolver also caches (stores) the IP address for www.example.com for an amount of time (Time to Live) that is specified so that it can respond more quickly the next time someone browses www.example.com as it allows the DNS Resolver to skip steps 2-7 for subsequent requests.
The web browser sends a request for www.example.com to the IP address that it got from the DNS resolver.
The web server or other resource at the IP address returns the web page for www.example.com to the web browser, and the web browser displays the page.
Hosting
Any computer is called a server when directly connected to the internet and stores website pages. A server typically hosts websites on IP addresses that are linked to a domain name. The server receives requests from users (clients) and other servers and returns the response to the requester. Hosting establishes how the setup of computers is to host a server.
Dedicated hosting
Dedicated Hosting means your server is a dedicated machine that is only used for traffic to your site. Thus the server's only purpose is to have your website files, giving you greater leverage and control. These can be more expensive and are best for larger companies.
Cloud Hosting
Cloud hosting involves using cloud resources composed of a network of connected virtual and physical cloud servers to host the application or website, ensuring greater flexibility and scalability. Should one web server malfunction, another can step in, which minimizes downtime and ensures service consistency and dependability.
Shared Hosting
Most websites are on shared hosting plans, wherein a single server hosts multiple different sites. Shared hosting is more economical and a typical choice for small businesses.