Internet Security

The Internet with its ever expanding and enlarging nature will continue to connect more numbers of computers, mobile devices, and other devices/nodes to one another. Thus, the Internet is not just a single network, but rather, it is a collection of "loosely" connected networks accessible by any hub, host, o device connected over the network. 

However, with the convenience and easy access to information and ability to connect to the various terminals over the network, there also arises with it various risks. Amongst such of the risks is the issue of security of information accessed, extracted and used over the Internet. Information placed and made readily available over a network (such as the Internet, accessible worldwide) is more readily vulnerable than if such information is placed within a file cabinet within a location. Such information available on the network can be accessed, more copies of it created and worse still, evidence of such access can be hidden. 

Thus, for information on the Internet to be termed secured, its availability, confidentiality and integrity needs to be considered and evaluated. 
Information accessed, read, copied and used by an unauthorized person makes such information to lose its confidentiality. When such information becomes available over an unsecured network thereby prompting its modification in unexpected and unauthorized ways causes a loss of integrity, and when those authorized to access and use such information cannot have access to it because it has been erased or deleted results in loss of availability. 

Gaining unauthorized access to information over an unsecured network is relatively easy while detecting such intrusion is quite hard to do. Rapidly advancing technology and the open nature of the Internet makes it difficult to achieve watertight security. This is why you too must keep an eye on your own safety, especially when downloading files and supplying personal details on the Internet. It's wise for you to take some personal precautions, as this will ensure that any risks related to using the Internet are kept to a minimum.

Internet Vulnerability

Many early network protocols that now form part of the Internet infrastructure were designed without security in mind. Without a fundamentally secure infrastructure, network defense becomes more difficult. Furthermore, the Internet is an extremely dynamic environment, in terms of both topology and emerging technology. Due to the inherent openness of the Internet and the original design of the protocols, Internet attacks in general are quick, easy, inexpensive, and may be hard to detect or trace. An attacker does not have to be physically present to carry out the attack. In fact, many attacks can be launched readily from anywhere in the world - and the location of the attacker can easily be hidden. Nor is it always necessary to "break in" to a site (gain privileges on it) to compromise confidentiality, integrity, or availability of its information or service.

Even so, many sites place unwarranted trust in the Internet. It is common for sites to be unaware of the risks or unconcerned about the amount of trust they place in the Internet. They may not be aware of what can happen to their information and systems. They may believe that their site will not be a target or that precautions they have taken are sufficient. Because the technology is constantly changing and intruders are constantly developing new tools and techniques, solutions do not remain effective indefinitely.

Since much of the traffic on the Internet is not encrypted, confidentiality and integrity are difficult to achieve. This situation undermines not only applications (such as financial applications that are network-based) but also more fundamental mechanisms such as authentication and non-repudiation (see the section on basic security concepts for definitions). As a result, sites may be affected by a security compromise at another site over which they have no control. An example of this is a packet sniffer that is installed at one site but allows the intruder to gather information about other domains (possibly in other countries).

Another factor that contributes to the vulnerability of the Internet is the rapid growth and use of the network, accompanied by rapid deployment of network services involving complex applications. Often, these services are not designed, configured, or maintained securely. In the rush to get new products to market, developers do not adequately ensure that they do not repeat previous mistakes or introduce new vulnerabilities.

Compounding the problem, operating system security is rarely a purchase criterion. Commercial operating system vendors often report that sales are driven by customer demand for performance, price, ease of use, maintenance, and support. As a result, off-the-shelf operating systems are shipped in an easy-to-use but insecure configuration that allows sites to use the system soon after installation. These hosts/sites are often not fully configured from a security perspective before connecting. This lack of secure configuration makes them vulnerable to attacks, which sometimes occur within minutes of connection.

Finally, the explosive growth of the Internet has expanded the need for well-trained and experienced people to engineer and manage the network in a secure manner. Because the need for network security experts far exceeds the supply, inexperienced people are called upon to secure systems, opening windows of opportunity for the intruder community.


Internet Security

Internet security encompasses computer security specifically related to the Internet, often involving browser security but also network security on a more general level as it applies to other applications or operating systems on a whole. Its objective is to establish rules and measures to use against attacks over the Internet. The Internet represents an insecure channel for exchanging information leading to a high risk of intrusion or fraud, such as phishing. Different methods have been used to protect the transfer of data, including encryption and use of firewalls. 

A firewall controls access between networks. It generally consists of gateways and filters which vary from one firewall to another. Firewalls also screen network traffic and are able to block traffic that is dangerous. Firewalls act as the intermediate server between SMTP and HTTP connections.
Role of firewalls in Internet security and web security[edit]
Firewalls impose restrictions on incoming and outgoing packets to and from private networks. All the traffic, whether incoming or outgoing, must pass through the firewall; only authorized traffic is allowed to pass through it. Firewalls create checkpoints between an internal private network and the public Internet, also known as choke points. Firewalls can create choke points based on IP source and TCP port number. They can also serve as the platform for IPsec. Using tunnel mode capability, firewall can be used to implement VPNs. Firewalls can also limit network exposure by hiding the internal network system and information from the public Internet.

Security Technology

A variety of technologies have been developed to help organizations secure their systems and information against intruders. These technologies help protect systems and information against attacks, detect unusual or suspicious activities, and respond to events that affect security. In this section, the focus is on two core areas: operational technology and cryptography. The purpose of operational technology is to maintain and defend the availability of data resources in a secure manner. The purpose of cryptography is to secure the confidentiality, integrity, and authenticity of data resources.

Operational Technology

Intruders actively seek ways to access networks and hosts. Armed with knowledge about specific vulnerabilities, social engineering techniques, and tools to automate information gathering and systems infiltration, intruders can often gain entry into systems with disconcerting ease. System administrators face the dilemma of maximizing the availability of system services to valid users while minimizing the susceptibility of complex network infrastructures to attack. Unfortunately, services often depend on the same characteristics of systems and network protocols that make them susceptible to compromise by intruders. In response, technologies have evolved to reduce the impact of such threats. No single technology addresses all the problems. Nevertheless, organizations can significantly improve their resistance to attack by carefully preparing and strategically deploying personnel and operational technologies. Data resources and assets can be protected, suspicious activity can be detected and assessed, and appropriate responses can be made to security events as they occur.
One-Time Passwords. Intruders often install packet sniffers to capture passwords as they traverse networks during remote log-in processes. Therefore, all passwords should at least be encrypted as they traverse networks. A better solution is to use one-time passwords because there are times when a password is required to initiate a connection before confidentiality can be protected.

One common example occurs in remote dial-up connections. Remote users, such as those traveling on business, dial in to their organization's modem pool to access network and data resources. To identify and authenticate themselves to the dial-up server, they must enter a user ID and password. Because this initial exchange between the user and server may be monitored by intruders, it is essential that the passwords are not reusable. In other words, intruders should not be able to gain access by masquerading as a legitimate user using a password they have captured.

One-time password technologies address this problem. Remote users carry a device synchronized with software and hardware on the dial-up server. The device displays random passwords, each of which remains in effect for a limited time period (typically 60 seconds). These passwords are never repeated and are valid only for a specific user during the period that each is displayed. In addition, users are often limited to one successful use of any given password. One-time password technologies significantly reduce unauthorized entry at gateways requiring an initial password.

Firewalls. Intruders often attempt to gain access to networked systems by pretending to initiate connections from trusted hosts. They squash the emissions of the genuine host using a denial-of-service attack and then attempt to connect to a target system using the address of the genuine host. To counter these address-spoofing attacks and enforce limitations on authorized connections into the organization's network, it is necessary to filter all incoming and outgoing network traffic.

A firewall is a collection of hardware and software designed to examine a stream of network traffic and service requests. Its purpose is to eliminate from the stream those packets or requests that fail to meet the security criteria established by the organization. A simple firewall may consist of a filtering router, configured to discard packets that arrive from unauthorized addresses or that represent attempts to connect to unauthorized service ports. More sophisticated implementations may include bastion hosts, on which proxy mechanisms operate on behalf of services. These mechanisms authenticate requests, verify their form and content, and relay approved service requests to the appropriate service hosts. Because firewalls are typically the first line of defense against intruders, their configuration must be carefully implemented and tested before connections are established between internal networks and the Internet.

Monitoring Tools. Continuous monitoring of network activity is required if a site is to maintain confidence in the security of its network and data resources. Network monitors may be installed at strategic locations to collect and examine information continuously that may indicate suspicious activity. It is possible to have automatic notifications alert system administrators when the monitor detects anomalous readings, such as a burst of activity that may indicate a denial-of-service attempt. Such notifications may use a variety of channels, including electronic mail and mobile paging. Sophisticated systems capable of reacting to questionable network activity may be implemented to disconnect and block suspect connections, limit or disable affected services, isolate affected systems, and collect evidence for subsequent analysis.

Tools to scan, monitor, and eradicate viruses can identify and destroy malicious programs that may have inadvertently been transmitted onto host systems. The damage potential of viruses ranges from mere annoyance (e.g., an unexpected "Happy Holidays" jingle without further effect) to the obliteration of critical data resources. To ensure continued protection, the virus identification data on which such tools depend must be kept up to date. Most virus tool vendors provide subscription services or other distribution facilities to help customers keep up to date with the latest viral strains.

Security Analysis Tools. Because of the increasing sophistication of intruder methods and the vulnerabilities present in commonly used applications, it is essential to assess periodically network susceptibility to compromise. A variety of vulnerability identification tools are available, which have garnered both praise and criticism. System administrators find these tools useful in identifying weaknesses in their systems. Critics argue that such tools, especially those freely available to the Internet community, pose a threat if acquired and misused by intruders.

Cryptography

One of the primary reasons that intruders can be successful is that most of the information they acquire from a system is in a form that they can read and comprehend. When you consider the millions of electronic messages that traverse the Internet each day, it is easy to see how a well-placed network sniffer might capture a wealth of information that users would not like to have disclosed to unintended readers. Intruders may reveal the information to others, modify it to misrepresent an individual or organization, or use it to launch an attack. One solution to this problem is, through the use of cryptography, to prevent intruders from being able to use the information that they capture.

Encryption is the process of translating information from its original form (called plaintext) into an encoded, incomprehensible form (called ciphertext). Decryption refers to the process of taking ciphertext and translating it back into plaintext. Any type of data may be encrypted, including digitized images and sounds.

Cryptography secures information by protecting its confidentiality. Cryptography can also be used to protect information about the integrity and authenticity of data. For example, checksums are often used to verify the integrity of a block of information. A checksum, which is a number calculated from the contents of a file, can be used to determine if the contents are correct. An intruder, however, may be able to forge the checksum after modifying the block of information. Unless the checksum is protected, such modification might not be detected. Cryptographic checksums (also called message digests) help prevent undetected modification of information by encrypting the checksum in a way that makes the checksum unique.

The authenticity of data can be protected in a similar way. For example, to transmit information to a colleague by E-mail, the sender first encrypts the information to protect its confidentiality and then attaches an encrypted digital signature to the message. When the colleague receives the message, he or she checks the origin of the message by using a key to verify the sender's digital signature and decrypts the information using the corresponding decryption key. To protect against the chance of intruders modifying or forging the information in transit, digital signatures are formed by encrypting a combination of a checksum of the information and the author's unique private key. A side effect of such authentication is the concept of non-repudiation. A person who places their cryptographic digital signature on an electronic document cannot later claim that they did not sign it, since in theory they are the only one who could have created the correct signature.

Current laws in several countries, including the United States, restrict cryptographic technology from export or import across national borders. In the era of the Internet, it is particularly important to be aware of all applicable local and foreign regulations governing the use of cryptography.

Other actions that could be taken in order to ensure optimum security over the Internet are as follows:

  • Choose strong, unique passwords and ensure they are secured.
  • Enable multi-step verification for accounts used over the Internet.
  • Ensure your Softwares are Updates.
  • Be wary of suspicious emails and alerts.
  • Registry scan your computer, networks and applications with updated anti-virus programs where possible..
  • Avoid scams and prevent Identity theft.
  • Very important, all activities over the Internet should be over a secured network.


Web Information Retrieval, Search Engines and Web Crawlers

The role that right and factual information now plays in this current age cannot be overemphasized. Proper and accurate information plays a major role in every decision making process. It is also however not an exaggeration to say that the survival of virtually all fields depends on information. 

However, with the vast amount of data readily available on various sources with the Internet hosting a larger percentage of them, there is thus a need to retrieve only the needed information from the readily available ones.

Information retrieval is thus the activity of obtaining information resources relevant (or satisfactory) to an information need from a collection of information resources. 

There is now available on the internet a vast collection of electronic information which are readily made available on different websites and pages covering large geographical areas with ease of access, use and consistency. The World Wide Web (WWW), as a global distributed information repository, has become the largest data sources in today’s world. 

Web Information Retrieval is therefore a technology for helping users to accurately, quickly and easily find information on the web. An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. With the proliferation of huge amounts of (heterogeneous) data on the Web, the importance of information retrieval (IR) has grown considerably over the last few years. 

The internet has over 90 million domains, over 70 million personal blogs which are viewed by over 1 billion people around the world. Ironically the very size of this collection has become an obstacle for easy information retrieval . As the internet constantly expands, the amount of available online information expands as well. The user has to shift through scores of pages to come upon the information he/she desires. In this vast space of information, it is important to create order even without global scale. This may be done by using either building classification catalog or a search engine. Both of them require a web crawling tool to ease the burden of manual data processing. The issue on how to efficiently find, gather and retrieve this information has led to the research and development of systems and tools that attempt to provide a solution to this problem. 

A common tool available on the internet used for adequate information retrieval is the search engines. A search engine is an information retrieval system designed to help find information stored on a computer system. The most visible form of a search engine is a web search engine which searches for the information on the World Wide Web. Search engines provide an interface to a group of items that enables the user to specify the criteria about an item of interest and have the engine find the matching items. Search Engines are however able to do their job with the aid of Web Crawlers. Web crawlers are the heart of search engine.

A web crawler is a program or automated script which browses the World Wide Web in a methodical and automated manner thus enabling it to form an important component of web search engines. They are used to collect the corpus of web pages indexed by the search engine. Moreover, they are used in many other applications that process large numbers of web pages, such as web data mining and comparison shopping engines.

Web crawlers start with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier.  In order to crawl a substantial fraction of the “surface web” in a reasonable amount of time, web crawlers must download thousands of pages per second, and are typically distributed over tens or hundreds of computers. Their two main data structures – the “frontier” set of yet-to-be-crawled URLs and the set of discovered URLs – typically do not fit into main memory, so efficient disk-based representations need to be used. Finally, the need to be “polite” to content providers and not to overload any particular web server, and a desire to prioritize the crawl towards high-quality pages and to maintain corpus freshness impose additional engineering challenges.  

Web crawler’s search engine performs two basic functions. First, it compiles an ongoing index of web addresses (URLs).  Then, it retrieves and marks a document, analyzes the content of both its title and its full text, registers the relevant link it contains and then stores the information in its database. When a user submits a query in the form of one or more keywords, the web crawler compares it with the information in its index and reports back any matches and then stores the information in a database. Its second function is to search the internet in real time for the sites that matches a given query. It does this in an exact process of performing its first function, following links from one page to another.

Big players in the computer industry, such as Google, Microsoft and Yahoo!, are the primary contributors of technology for fast access to Web-based information; and searching capabilities are now integrated into most information systems, ranging from business management software and customer relationship systems to social networks and mobile phone applications.


The first search engine created was Archie, created in 1989 by Alan Emtage. Archie helped solve the data scatter problem by combining a script-based data gatherer with a regular expression matcher for retrieving file names matching a user query. Essentially Archie became a database of web filenames which it would match with the user’s queries. Just as Archie started to gain ground and popularity, Veronica was developed by the University of Nevada System Computing Services . Veronica served the same purpose as Archie, but it worked on plain text files. Soon another user interface name Jughead, a tool for obtaining menu information from various Gopher servers appeared later with the same purpose as Veronica, both of these were used for files sent via Gopher, which was created as an Archie alternative. 

In 1993, Matthew Gray created what is considered the first robot, called World Wide Web Wanderer. It was initially used for counting Web servers to measure the size of the Web. The Wanderer ran monthly from 1993 to 1995. Later, it was used to obtain URLs, forming the first database of Web sites called Wandex.  Also in 1993, Martijn Koster created ALIWEB (Archie-Like Indexing of the Web). ALIWEB allowed users to submit their own pages to be indexed. ALIWEB was a search engine based on automated meta-data collection, for the Web 

Brian Pinkerton of the University of Washington released WebCrawler on April 20, 1994 initially as a desktop application and not as a web service as it is been used today. It later went live on the web with a database containing documents from over 6,000 web servers. It was the first crawler which indexed entire pages while other bots were storing a URL, a title and at most 100 words. It was the first full-text search engine on the Internet; the entire text of each page was indexed for the first time. Soon it became so popular that during daytime hours it could not be used as the WebCrawler was averaging 15,000 hits a day.  WebCrawler opened the door for many other services to follow suit. After the debut of WebCrawler came Lycos, Infoseek, and OpenText.

Alta Vista also began in 1995. It was the first search engine to allow natural language inquires and advanced searching techniques. It also provides a multimedia search for photos, music, and videos. Inktomi started in 1996 and in June 1999 Inktomi introduced a directory search engine powered by "concept induction" technology. "Concept induction," according to the company, "takes the experience of human analysis and applies the same habits to a computerized analysis of links, usage, and other patterns to determine which sites are most popular and the most productive." AskJeeves and Northern Light were both launched in 1997. Google was launched in 1997 by Sergey Brian and Larry Page as part of a research project at Stanford University. It uses inbound links to rank sites. In 1998 MSN Search and the Open Directory were also started. 

Google came about as a result of its founders' attempt to organize the web into something searchable. Their early prototype was based upon a few basic principles, including:

  • The best pages tend to be the ones that people linked to the most.
  • The best description of a page is often derived from the anchor text associated with the links to a page.

Both of these principles were observed as structural features of the world wide web, and theories were developed to exploit these principles to optimize the task of retrieving the best documents for a user query.