Great tool but needs customization to look cloud-ready
March 16, 2021

Great tool but needs customization to look cloud-ready

Anonymous | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User

Overall Satisfaction with New Relic

We use it to monitor our test and production systems. We try to detect outages, slowdowns and pinpoint where the problems are occurring. We also use it to compare A vs B on proposed changes before we go to production.
  • Nice graphs.
  • Interactive graphs.
  • Ability to modify queries.
  • Query builder is pretty good for NRQL.
  • Sometimes we can't drill down deep enough on errors or traces.
  • Inter communication paths are not obvious (stack) though service maps are helpful at times.
  • I often feel like it's not really built for cloud monitoring and microservices.
  • There are additional plugins that our IT dept can't seem to get working with our product, like Kubernetes, PostgreSQL.
  • Overall we use it successfully.
  • The cost frequently stimulates a conversation on what other tools we should be looking at.
  • I personally feel Apdex is overrated and/or our company insists on setting these too tight.
I don't feel like I see a good view of the full stack. The "what is calling what" or "what is this service waiting for?" is very hard to see and often requires troubleshooting with one of the architects who knows this vs what NR is telling us. In the past I've used Dynatrace for java enterprise systems and it was perfect at letting me trace a single call through my system.

I see options in NR for tracing but that is not usually showing us anything at all.
Absolutely this is how we use it. We have had excellent success with NR and our prod rollouts/releases are predictable. We are seldom surprise with issues in production. We are down to issues that just are not covered well in test or scenarios we can't simulate well with testing.

Another thing I find missing is something like the I get from other dashboards like Kubernetes. For example, where is my alert that I have pods in restart or frequently crashing? if my apdex is averaging ok, I don't really see this. I'd have stuff like that and hitting limits on the first page of NR. This is my big con... make this look like it was made for the cloud. I'm finding I CAN get much of this but I'm writing my own queries and creating my own dashboards - why do I have to do this?
Of course they have. I'm not part of that team but we share info back and forth and we oftentimes involve them. There are times though that the problem is not showing up in NR and is something we see in our Splunk log collections. Why isn't this part of NR. I think I saw something about this in NR One but our dev ops or developers don't have it working yet.
My background before using NR was with Java enterprise systems. Cloudwatch has recently been mentioned by OPs to be a tool to consider.
It's a good tool but it requires a lot of customization. Some alerts we have to create I thought should be more obvious or out of the box. I also feel I don't get a deep insight into my Pod > Containers > services. I see my pod memory, how's my actual node service using memory? Are my java processes garbage collecting well? Another big complaint is the lack of availability/observability into GraphQL requests (should be a very popular api). I really had to dig to find anything that was telling me how my pod CPU and memory are doing "in relation to their request and limits." Are my limits too high, too low? Why am I not getting an alert when I'm sitting at 100% of my limit. Instead the default graph shows my pod is at 60% CPU, that sounds good until you figure out 60% is the limit and NOTHING is telling me I'm hitting my limit. I probably need to adjust my resources, why NR isn't my tool making this easy to see? Where is the report for this? The other big concern is COST. I don't want to be told by IT/Ops that we can't add something or monitor something because the tool is so dang expensive.