Click here to close now.

Welcome!

Python Authors: Carmen Gonzalez, Ignacio M. Llorente, Elizabeth White, John Wetherill, Trevor Parsons

Blog Feed Post

How to Proxyfy Apache

INTRODUCTION

There are a variety of ways to implement proxying capabilities for web servers. As Apache is the most popular web server, we will try to implement proxying on it. Everyone who knows Apache well, probably knows that Apache implements proxying capability for AJP13 , FTP, CONNECT , HTTP/1.x.

The choice of reverse proxy server is fully dependent on what is actually trying to be hidden behind it. Each proxy mechanism has its own benefits and bottlenecks. Only for Apache, there are several ways to hide application servers (mod_proxy, mod_passenger, mod_wsgi, mod_jk). While mod_passenger and mod_wsgi are good for ruby and python servers respectively, these are a little bit outside the proxying idea. In this article I would like to discuss mod_proxy and mod_jk.

HOW TO

Now let’s think about what we have and what we want to put under proxy. The most common case is to put a pool of Tomcat servers behind Apache. Tomcat servers by default listen to 8080 for HTTP and 8009 for AJP. Now, we want to have Apache listen to 80 for incoming HTTP requests and 443 for HTTPS. People who have configured Tomcat for SSL will undoubtedly agree with me that SSL on Tomcat is quite annoying, so it’s better to implement SSL on the Apache side rather than playing with Tomcat’s keystores.

 

Okay, now we have two Tomcat servers on 2 different servers with our application installed, and both are on 8080 and an 8009 HTTP/AJP respectively. And one Apache on a third which will do HTTP on 80 , HTTPS on 443 for us and process requests to downstream Tomcat servers.

Situation 1 with mod_proxy and mod_proxy_http:

 

OK, here’s what this means:

 

User opens http://www.yourdomain.com in their browser

  1. Request comes to Apache
  2. Apache proxies it via HTTP to downstream Tomcat to port 8080
  3. Tomcat sends response to Apache via HTTP
  4. Apache delivers content to User’s browser

Well, so what are the pros and cons of this situation? We will provide some comparison tables below, but in general:

Pros:

  1. Easy and quick to configure
  2. Works for all downstream application servers

Cons:

  1. We do not have sticky sessions: if a user logs in to Tomcat1 and sends another request it will most likely go to Tomcat2 and the user will get a session expired error.
  2. mod_proxy does not support failover detection, so it will continue to send requests to downstream Tomcat even if it is down.
  3. Some Java applications exhibit unpredictable behavior when they are under a proxy environment. (From my experience, Atlassian Bamboo and Fisheye server’s progress bars stalled on several pages, but this was corrected by moving to JK; I have heard about other strange problems as well. )

Now let’s see Situation 2, where we use JK for downstream servers:

A REAL LIFE EXAMPLE

At first sight we can see that nothing has been changed, but this is only at first sight. The main difference here is that now Apache is talking to the Tomcats via AJP 13 and not HTTP protocol. So the process of opening the web site is the following:

  1. User opens http://www.yourdomain.com in their browser
  2. Request comes to Apache
  3. Apache proxies it via AJP 13 to downstream Tomcat to the port 8009
  4. Tomcat sends response to Apache via AJP
  5. Apache receives AJP and delivers content to Users browser via HTTP

It seems there is a little overhead with jumping around on HTTP and AJP, but there are benefits as well. Let’s see the Good and Bad sides of JK balancing:

Pros:

  1. After a little tweaking we can have sticky sessions just by adding sticky_session=True on Apache and jvmRoute=”NODENAME” on the Tomcat sides. After this, users who are logged in to Tomcat1 will never be dropped to Tomcat2 until Tomcat1 is alive. (Actually you can Use Membase or Memcached as session store so users will never lose their session until it expires normally)
  2. We have node failure detection, so if Tomcat1 fails, Apache will not send requests to it until it detects that it is back.
  3. JK configuration is much more advanced than that of mod_proxy and allows lots of tweaking, which will result in better performance and make the environment work just as you need it to.
  4. JK has a web admin tool that allows you to decommission, suspend and play with the LB factor in real time.

Cons:

  1. So far I have found only one bad thing: it is a little harder to configure, so it required some administrator skills.

At this moment you may be asking “Why do I need this? I have a single Tomcat server and it’s working fine”.  As a matter of fact, you need to build a network which can handle your current load, be scalable and which will not affect the normal behavior of your websites. From this point of view, the choice of reverse proxy solution is quite reasonable.

Here is a real life example of one of our client server architectures, which I think is a good one :)

 

In general, the process is as follows:

  1. User does DNS request, gets ip address of one of the Varnish servers and the Static content server/s (NGINX).
  2. NGINX delivers content directly.
  3. Varnish caches whatever needs to be cached and sends request downstream to one of the Apaches.
  4. Apache gets JSESSIONID and forwards request via JK to the required Tomcat server or does balance if user does not have cookie.
  5. Tomcat servers keep sessions in local RAM and copy in Membase cluster (so even if one Tomcat fails another can retrieve its session from Membase ). Membase is clustered memcache so it is fault tolerant by nature (we will have a closer look at Membase in another article).
  6. Tomcat does needed application logic, (retrieves information from Hadoop/HBase database, etc.) and responds to Apache.
  7. Apache sends response back to Varnish.
  8. Varnish updates cache if needed and does delivery to client.

This is a real live working scenario, and it proved itself to be fault tolerant and extremely fast.

I know that after reading this article a lot of people will ask, “why is Apache needed when Varnish can do session stickiness, etc. …”

But the idea here is to use the best possible software for each particular role, software which has real and approved redundancy and reasonable layers of architecture which can help us to easily and quickly detect problems and fix them as they appear. Also, if we keep in mind that the client uses not only HTTP, but also HTTPS, I did not see any webserver which worked with SSL as smoothly as Apache did. Even if we do not have SSL initially, we will have it soon, and I do not believe that any web project can go far without SSL.

Following is a little comparison of JK and mod_proxy, so you can see more closely what these tools are.

 

Features mod_proxy Weight mod_jk Weight
Load balancing Basic 5 Advanced 10
Node failure detection mod_proxy_balancer has to be present in the server 7 Advanced 10
Backend SSL supported (mod_ssl required) 5 not supported 0
Session stickiness not supported 0 Supported via JVM Route 10
Protocols HTTP, HTTPS 10 AJP 13 8
Node decommissioning Manual needs Apache reload 3 Online via web admin 10
Web admin interface Not present 0 Advanced with RO and RW support 10
Large AJP packet sizes 8K 5 Larger than 8K 10
Compatibility with other app. servers Works with all HTTP application servers 10 AJP Compatible (Tomcat, Glassfish, etc. …) 5
Configuration Compatible with Apache Httpd configuration file 10 Need separate JK Workers file in .properties format 8
Summary 55 81

 

So now let’s do some stress tests on both mod_jk and mod_proxy. The Installation schema is as described above (one load balancer, two application servers.) On both Apache server hosts, monitoring software from Monitis.com is installed which will check the servers’ health in real time.

We have used Amazon EC2 medium instances for this test. Here are the load test results in both graphical and plain text mode.

Monitoring is implemented using Monitis M3 monitors.

There are 2 monitors used:

apache_monitor – used for apache server’s health check.

http_load monitor - used to check the load time difference during Apache benchmarking.

 

The mentioned monitors provide useful information which helps to find relationships between various metrics.

mod_proxy:

The graphic below depicts Apache worker’s status while busy (upper line) and idle (lower line) while benchmarking using

mod_proxy balancer.

This graph shows Apache busy and idle worker processes on the Apache web server, so we can see that of 150 enabled processes, almost all are busy during the stress test.

 

Http content load time (time connect, time transfer, time total)

Following is data provided by siege after benchmarking 7 times (using mod_proxy), each time increasing the concurrent users’ number by 100:

 

Concurrent conns. Trans Elap Time Data Trans Resp Time Trans Rate Throughput Concurrent Failed
100 112173 359.18 206 0.32 312.30 0.57 99.93 0
200 181578 360.01 333 0.40 504.37 0.92 199.72 3
300 179025 360.00 329 0.60 497.29 0.91 299.37 5
400 177681 360.00 326 0.81 493.56 0.91 397.44 40
500 166401 359.99 305 1.07 462.24 0.85 494.52 130
600 160853 359.99 295 1.31 446.83 0.82 584.32 444

 

mod_jk:

The graphic below represents Apache worker’s busy (upper line) and idle (lower line) status while benchmarking using

mod_jk.


This graph shows Apache busy and idle worker processes on the Apache webserver, so we can see that of 150 enabled processes, almost all are busy during the stress test.

Http content load time (time connect, time transfer, time total)

Following is data provided by siege after benchmarking 7 times (using mod_jk), each time increasing the concurrent users number by 100:

 

Concurrent conns. Trans Elap time Data Trans Resp Time Trans time Throughput Concurrent Failed
100 106919 359.60 198 0.34 297.33 0.55 99.93 0
200 186123 360.01 345 0.39 516.99 0.96 199.76 0
300 183017 360.00 339 0.59 508.38 0.94 299.29 8
400 179891 360.00 333 0.80 499.70 0.93 397.34 49
500 169284 359.99 313 1.05 470.25 0.87 494.55 124
600 182954 359.99 339 1.16 508.22 0.94 590.32 258

 

 

CONCLUSION

Both mentioned modules, mod_proxy and mod_jk, are used as balancers for backend application servers such as Tomcat and GlassFish. What are the most important features in load balancing? I assumed node failure detection at first, and ease of session stability and load balancing configuration, without requiring any other extra tools or packages. Do not forget about performance, as well.

So what do we have? The resulting tables show that when advanced load balancing or node failure detection is needed, mod_jk is preferable. However, it cannot provide flexibility such as mod_proxy does when configuring (mod_proxy configuration is as easy as Apache configuration and there is no need for separate files like workers.properties) nor for compatibility needs with servers, other than AJP compatibility.

Now a little bit about performance. While the concurrent users count is not so high (in our case: 400), both servers’ behavior is similar, and it seems mod_proxy is able to provide better performance, but things changed as the number of concurrent users grew.

Take a look at this table:

 

Concurrent users Failed requests(10 Seconds Timeout)
mod_jk 590.32 258
mod_proxy 584.32 444

As you see, with an almost equal number of connections, mod_proxy fails approximately 59% more often.

If you have a small project, or need to hide a variety of application servers (Tomcat+Rails+Django), and if you need an easily configurable and fast SSL solution and your server load is not heavy, then use mod_proxy.

But if your goal is to loadbalance Java applications servers, then JK is definitely the better solution.

Share Now:del.icio.usDiggFacebookLinkedInBlinkListDZoneGoogle BookmarksRedditStumbleUponTwitterRSS

Read the original blog entry...

More Stories By Hovhannes Avoyan

Hovhannes Avoyan is the CEO of Monitis, Inc., a provider of on-demand systems management and monitoring software to 50,000 users spanning small businesses and Fortune 500 companies.

Prior to Monitis, he served as General Manager and Director of Development at prominent web portal Lycos Europe, where he grew the Lycos Armenia group from 30 people to over 200, making it the company's largest development center. Prior to Lycos, Avoyan was VP of Technology at Brience, Inc. (based in San Francisco and acquired by Syniverse), which delivered mobile internet content solutions to companies like Cisco, Ingram Micro, Washington Mutual, Wyndham Hotels , T-Mobile , and CNN. Prior to that, he served as the founder and CEO of CEDIT ltd., which was acquired by Brience. A 24 year veteran of the software industry, he also runs Sourcio cjsc, an IT consulting company and startup incubator specializing in web 2.0 products and open-source technologies.

Hovhannes is a senior lecturer at the American Univeristy of Armenia and has been a visiting lecturer at San Francisco State University. He is a graduate of Bertelsmann University.

@ThingsExpo Stories
The 4th International Internet of @ThingsExpo, co-located with the 17th International Cloud Expo - to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
There's no doubt that the Internet of Things is driving the next wave of innovation. Google has spent billions over the past few months vacuuming up companies that specialize in smart appliances and machine learning. Already, Philips light bulbs, Audi automobiles, and Samsung washers and dryers can communicate with and be controlled from mobile devices. To take advantage of the opportunities the Internet of Things brings to your business, you'll want to start preparing now.
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal an...
P2P RTC will impact the landscape of communications, shifting from traditional telephony style communications models to OTT (Over-The-Top) cloud assisted & PaaS (Platform as a Service) communication services. The P2P shift will impact many areas of our lives, from mobile communication, human interactive web services, RTC and telephony infrastructure, user federation, security and privacy implications, business costs, and scalability. In his session at @ThingsExpo, Robin Raymond, Chief Architect at Hookflash, will walk through the shifting landscape of traditional telephone and voice services ...
The 17th International Cloud Expo has announced that its Call for Papers is open. 17th International Cloud Expo, to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, APM, APIs, Microservices, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In his session at Internet of @ThingsExpo, James Kirkland, Chief Architect for the Internet of Things and Intelligent Systems at Red Hat, described how to revolutioniz...
All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo, June 9-11, 2015, at the Javits Center in New York City. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be
The security devil is always in the details of the attack: the ones you've endured, the ones you prepare yourself to fend off, and the ones that, you fear, will catch you completely unaware and defenseless. The Internet of Things (IoT) is nothing if not an endless proliferation of details. It's the vision of a world in which continuous Internet connectivity and addressability is embedded into a growing range of human artifacts, into the natural world, and even into our smartphones, appliances, and physical persons. In the IoT vision, every new "thing" - sensor, actuator, data source, data con...
Container frameworks, such as Docker, provide a variety of benefits, including density of deployment across infrastructure, convenience for application developers to push updates with low operational hand-holding, and a fairly well-defined deployment workflow that can be orchestrated. Container frameworks also enable a DevOps approach to application development by cleanly separating concerns between operations and development teams. But running multi-container, multi-server apps with containers is very hard. You have to learn five new and different technologies and best practices (libswarm, sy...
SYS-CON Events announced today that DragonGlass, an enterprise search platform, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. After eleven years of designing and building custom applications, OpenCrowd has launched DragonGlass, a cloud-based platform that enables the development of search-based applications. These are a new breed of applications that utilize a search index as their backbone for data retrieval. They can easily adapt to new data sets and provide access to both structured and unstruc...
There's Big Data, then there's really Big Data from the Internet of Things. IoT is evolving to include many data possibilities like new types of event, log and network data. The volumes are enormous, generating tens of billions of logs per day, which raise data challenges. Early IoT deployments are relying heavily on both the cloud and managed service providers to navigate these challenges. In her session at Big Data Expo®, Hannah Smalltree, Director at Treasure Data, discussed how IoT, Big Data and deployments are processing massive data volumes from wearables, utilities and other machines...
Buzzword alert: Microservices and IoT at a DevOps conference? What could possibly go wrong? In this Power Panel at DevOps Summit, moderated by Jason Bloomberg, the leading expert on architecting agility for the enterprise and president of Intellyx, panelists will peel away the buzz and discuss the important architectural principles behind implementing IoT solutions for the enterprise. As remote IoT devices and sensors become increasingly intelligent, they become part of our distributed cloud environment, and we must architect and code accordingly. At the very least, you'll have no problem fil...
SYS-CON Events announced today that MetraTech, now part of Ericsson, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Ericsson is the driving force behind the Networked Society- a world leader in communications infrastructure, software and services. Some 40% of the world’s mobile traffic runs through networks Ericsson has supplied, serving more than 2.5 billion subscribers.
The worldwide cellular network will be the backbone of the future IoT, and the telecom industry is clamoring to get on board as more than just a data pipe. In his session at @ThingsExpo, Evan McGee, CTO of Ring Plus, Inc., discussed what service operators can offer that would benefit IoT entrepreneurs, inventors, and consumers. Evan McGee is the CTO of RingPlus, a leading innovative U.S. MVNO and wireless enabler. His focus is on combining web technologies with traditional telecom to create a new breed of unified communication that is easily accessible to the general consumer. With over a de...
Disruptive macro trends in technology are impacting and dramatically changing the "art of the possible" relative to supply chain management practices through the innovative use of IoT, cloud, machine learning and Big Data to enable connected ecosystems of engagement. Enterprise informatics can now move beyond point solutions that merely monitor the past and implement integrated enterprise fabrics that enable end-to-end supply chain visibility to improve customer service delivery and optimize supplier management. Learn about enterprise architecture strategies for designing connected systems tha...
Cloud is not a commodity. And no matter what you call it, computing doesn’t come out of the sky. It comes from physical hardware inside brick and mortar facilities connected by hundreds of miles of networking cable. And no two clouds are built the same way. SoftLayer gives you the highest performing cloud infrastructure available. One platform that takes data centers around the world that are full of the widest range of cloud computing options, and then integrates and automates everything. Join SoftLayer on June 9 at 16th Cloud Expo to learn about IBM Cloud's SoftLayer platform, explore se...
SYS-CON Media announced today that 9 out of 10 " most read" DevOps articles are published by @DevOpsSummit Blog. Launched in October 2014, @DevOpsSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce softw...
15th Cloud Expo, which took place Nov. 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA, expanded the conference content of @ThingsExpo, Big Data Expo, and DevOps Summit to include two developer events. IBM held a Bluemix Developer Playground on November 5 and ElasticBox held a Hackathon on November 6. Both events took place on the expo floor. The Bluemix Developer Playground, for developers of all levels, highlighted the ease of use of Bluemix, its services and functionality and provide short-term introductory projects that developers can complete between sessions.
From telemedicine to smart cars, digital homes and industrial monitoring, the explosive growth of IoT has created exciting new business opportunities for real time calls and messaging. In his session at @ThingsExpo, Ivelin Ivanov, CEO and Co-Founder of Telestax, shared some of the new revenue sources that IoT created for Restcomm – the open source telephony platform from Telestax. Ivelin Ivanov is a technology entrepreneur who founded Mobicents, an Open Source VoIP Platform, to help create, deploy, and manage applications integrating voice, video and data. He is the co-founder of TeleStax, a...
The Internet of Things (IoT) promises to evolve the way the world does business; however, understanding how to apply it to your company can be a mystery. Most people struggle with understanding the potential business uses or tend to get caught up in the technology, resulting in solutions that fail to meet even minimum business goals. In his session at @ThingsExpo, Jesse Shiah, CEO / President / Co-Founder of AgilePoint Inc., showed what is needed to leverage the IoT to transform your business. He discussed opportunities and challenges ahead for the IoT from a market and technical point of vie...