Thursday, September 14, 2017

Introduction to web applications performance

Introduction

The field of web application performance is huge and cannot be covered in a single article, especially when we talk about practical techniques to build fast applications. Thus, I will just scratch the surface of the topic to cover important concepts around performance like performance metrics, the difference between performance and scalability, and perceived performance.

Let's go...

Basic Concepts

Performance metrics

  • Response time: This is the most widely used metric of performance and it is simply a direct measure of how long it takes to process a request.
  • Throughput: A straightforward count of the number of requests that the application can process within a defined time interval. For Web applications, a count of page impressions or requests per second is often used as a measure of throughput.
  • System availability: Usually expressed as a percentage of application running time minus the time the application can’t be accessed by users. This is an indispensable metric, because both response time and throughput are zero when the system is unavailable. Most companies use this as a way to measure uptime for service level agreements (SLA).
  • Responsiveness: is about how quickly the system acknowledges a request as opposed to processing it. This is important in many systems because users may become frustrated if a system has low responsiveness, even if its response time is good. If your system waits during the whole request, then your responsiveness and response time are the same. However, if you indicate that you've received the request before you complete, then your responsiveness is better. Providing a progress bar during a file copy improves the responsiveness of your user interface, even though it doesn't improve response time.
  • Requests Rate: Understanding how much traffic your application receives can be useful to correlate to other application performance metrics to understand the dynamics of how your application scales. The more requests users send to the application, the higher the load. As load increases, performance decreases. A similar but slightly different metric to track is the number of concurrent users.
  • Resources Consumption: Measuring resources (CPU, Memory, and disk space) in correlation to other metrics - like response time or throughput - is very important in resource planning for scalability.
  • Efficiency is performance divided by resources. A system that gets 30 transaction per second (tps) on two CPUs is more efficient than a system that gets 40 tps on four identical CPUs.
  • Latency is the minimum time required to get any form of response, even if the work to be done is nonexistent. It's usually the big issue in remote systems. If I ask a program to do nothing, but to tell me when it's done doing nothing, then I should get an almost instantaneous response if the program runs on my laptop. However, if the program runs on a remote computer, I may get a few seconds just because of the time taken for the request and response to make their way across the wire. As an application developer, I can usually do nothing to improve latency. Latency is also the reason why you should minimize remote calls.
  • Server response time measures how long it takes to load the necessary HTML to begin rendering the page from your server, subtracting out the network latency between the browser and your server.

What is performance?

Performance is either throughput or response time—whichever matters more to you (In Web Applications, response time is more popular). It can sometimes be difficult to talk about performance when a technique improves throughput but decreases response time, so it's best to use the more precise term. From a user's perspective responsiveness may be more important than response time, so improving responsiveness at a cost of response time or throughput will increase performance.

What is scalability?

Application performance will always be affected by resource constraints. Scalability is the ability to overcome performance limits by adding resources.  No matter how much hardware we have at a certain point we will see decreasing performance. This means increasing response times or a limit in throughput. Will adding additional hardware solve the problem? If yes, then we can scale. If not, we have a scalability problem. In other words, Scalability is a measure of how adding resources (usually hardware) affects performance. A scalable system is one that allows you to add hardware and get a commensurate (متناسب) performance improvement, such as doubling how many servers you have to double your throughput. Vertical scalability, or scaling up, means adding more power to a single server, such as more memory. Horizontal scalability, or scaling out, means adding more servers.

Will scaling solve performance problems?

Ideally yes, but practically resources constraints may not be the issue! For example, if resources are not overloaded then adding more resources will not solve performance problems because they are not the bottleneck. If you add resources and didn’t get commensurate performance improvement, you have a performance issue.
The rules here:
  1. If adding more resources can solve performance issue, go for it; Developers are expensive and hardware is cheap.
  2. When adding resources became expensive, or does not help in the issue, enhance performance and efficiency.


How to define an application performance?

It is common to define application performance like “Response time is 2 seconds”.
But this not descriptive enough, especially in scalability planning context. It is better to say: “System response time is 2 seconds at 500 concurrent requests, with a CPU load of 50%, and a memory utilization of 92%”.

Performance Tests

To make accurate performance definition, we should do some performance tests:

Load Test

Load testing is the simplest form of performance testing. A load test is usually conducted to understand the behaviour of the system under a specific expected load. This load can be the expected concurrent number of users on the application performing a specific number of transactions within the set duration. This test will give out the response times of all the important business critical transactions. The database, application server, etc. are also monitored during the test, this will assist in identifying bottlenecks in the application software and the hardware that the software is installed on.

Stress Test

Stress testing is normally used to understand the upper limits of capacity within the system. This kind of test is done to determine the system's robustness in terms of extreme load and helps application administrators to determine if the system will perform sufficiently if the current load goes well above the expected maximum.


Responsiveness as perceived performance

While response time - for example - is important to measure performance, it’s important as well to remember that users are the ultimate judge of performance. As users are not clocks, their time perception may be different from what we measure empirically.
In judging system responsiveness, we must first know how fast users expect a system to be. Research shows that there are four main categories of response times:
  • We expect an instantaneous response, 0.1 to 0.2 milliseconds, for any action similar to a physical interaction. Pressing a button, for example. We expect this button to indicate that it is pressed within this time.
  • We expect an immediate response, within a half to one second, indicating our information is received. Even in the case of a simple interaction, we expect an actual response to our request. This is especially true for information we assume is already available, as when paging or scrolling content.
  • We expect a reply within 2 to 5 [or better 3] seconds for any other interactive request. If the response takes longer, we begin to feel that the interaction is cumbersome. The more complex we perceive a task to be, the longer we are willing to wait for the response.
Simple UI tricks, such as progress bars, redirecting users’ attention using animation, or placing slower loading sections at the bottom of a page or offscreen, can often ‘fix’ a performance problem without the need to tune the underlying code. These UI tricks are an important tool to have in your performance tuning toolbox and can be much quicker and easier than addressing the underlying issue. They can act as a holdover until you have the time to devote to the core problem.

How can we measure perceived performance?

As with performance, human perception for faster or slower is not as precise as technical measurements. This is very important when judging whether a performance optimization is worth the effort.
In essence, our limited perception of application responsiveness can’t detect performance changes of much less than 20%. We can use this as a rule of thumb when deciding which improvements will impact users most noticeably. If we optimize 300 milliseconds a request that took 4 seconds, we can assume that many users will not perceive any difference in response time. It might still make sense to optimize the application, but if our goal is to make things faster for the users, we have to make it faster by at least 0.8 seconds.

References


Thursday, August 10, 2017

Deconstructing Docker on Windows



“A problem well stated is a problem half solved.” –Charles Kettering[1]

Docker gained much traction during the last period, and it seems it is the right time to start learning it, especially after it became natively supported in Windows.
All my knowledge about docker and containers world is vague, I read a bunch of articles and watched few videos here and there about the topic but didn't have a solid understanding or practical usage yet.

My Goal

I have many .NET 4.x web applications, read: .NET 4.x not .NET Core, and I have some pain in automating their deployment. I want to start hosting these applications in Docker hoping to relieve my pains and gain all the benefits promised by Docker world.

From Where to start?

Well, I have read an interesting book about rapid skills aquisition named: The First 20 Hours: How to Learn Anything . . . Fast!  (You can read its summary here). The author of the book mentioned some steps to acquire any new skill quickly:

  1. Deconstruct it into the smallest possible sub-skills.
  2. Learn enough about each sub-skill to be able to practice effectively and self-correct.
  3. Remove any physical, mental, or emotional barriers that get in the way of practice.
  4. Practice the most important sub-skills for at least 20 hours.

And from step 1 I will start with listing all the skills I think will help me in understanding and applying Docker to achieve my goal.

The Plan

I hope I can find enough time to write about each of these skills in detail, but unfortunately, I am a busy developer, but I will try to -at least- updating this article with useful links that may help.

Docker Skills Deconstructed

Basic Concepts

  • What are containers? And how do they work?
  • What is the difference between containers and virtual machines?
  • What is Docker? And how does it work?
  • What is the difference between Docker on Linux and Docker on Windows?
  • What are the jargon needed to work with containers/docker?
  • What is the benefits of using containers/Docker for development, operations, security, etc.?
  • What are other alternatives to Docker?
  • What are containers available on Windows?

Maturity

  • Who uses Docker?
  • Who uses Docker on Windows in production?

Installation

  • Which Windows version supports containers/Docker?
  • What are the prerequisites to install Docker?
  • How to install Docker on Windows?

Interacting with Docker


  • Do we have UI to work with Docker or we must use command line?
  • Can we integrate with Visual Studio?

Installing Software

  • How can I install a software inside Docker? Like IIS, SQL Server, etc?
  • How to create a base image that contains all the software I need (IIS, SQL Server, .NET Framework, etc)?
  • Can I install development tools on containers, like Visual Studio, SQL Server Management Studio, IIS Manager, etc?

Development

  • Create a Hello World ASP.NET 4.6 application and host it on docker.
  • How to host an existing large web application in docker?
  • How can I debug issues on containers?

Deployment

  • How can I host my web application inside docker?
  • How can I host a windows service in Docker?
  • How can I host my database inside Docker?
  • How can I host my files inside Docker?
  • Can I host all the layers of my application in one Docker image? 
  • How can Docker help me in automating deployment?
  • Can I use Windows Explorer, IIS Manager, and SQL Server Management Studio in my machine to manage files, web applications, and databases hosted in containers?

Networking

  • How to assign an IP to a docker image?
  • How containers can talk to each other?
  • How to add this image to an existing Web Farm?
  • How to create a web farm using only containers?

Infrastructure

  • How resources are managed by the image? i.e. what if my application hosted in Docker needs more CPU or memory, how the host manages it?
  • Does Containers restarts automatically with Windows?
  • can I build my private cloud with Docker?


Managing Docker Images

  • How can I copy downloaded images to other machines?
  • How can I create my own private docker hub?
  • How can I add images to source control?

Side Skills To Learn

  • PowerShell

Misc

  • Create a cheat sheet for Docker commands - if not exists -

-------
[1] Source: The first 20 Hours book

Friday, March 10, 2017

ما أهمية تقسيم المشروع لطبقات layers and tiers؟

من فترة كان فيه سؤال على المجموعة Egpytian Geeks على الفيسبوك عن أهمية إني أقسم المشروع بتاعي لlayers و tiers, و هل ده هايأثر على سرعة و أداء التطبيق و لا إيه اللي هايحصل؟
ساعتها كنت كتبت كلام كتير و افتكرته دلوقتي، فقلت أظبطته شوية و أعيد نشره على صفحتي على الفيسبوك لتعم الفائدة، ثم نصحني بعض الأصدقاء بإعادة نشره المدونة دي, و عشان كده مكتوب بالعامية...
و أصل النقاش كان هنا لمن أراد الاطلاع عليه.

الفرق بين الlayers و الtiers

خلينا الأول نبين الفرق بين الtier و الlayer. الاتنين طبقات, بس لو الطبقات بيشتغلوا على نفس الكمبيوتر يبقى اسمهم layers, لو انفصلوا يبقى اسمهم tiers. بعبارة أخرى, الlayers هي logical separation أما ال tiers فهي physical separation.

هل الطبقات دي هاتأثر على طريقة أداء المشروع؟

آه طبعا...
- هاتبطأه (و لو قليلا في حالة الlayers (فيمكن إهماله) و كثيرا في حالة الtiers)
- هاتخليه أصعب في الفهم و الdebugging: تخيل إنت عمال تتنط بين الlayers عشان توصل للbug أسهل و لا لو هي ف نفس المكان معاك أسهل؟
- هاتكبر المشروع: هاتزود tier يعني هاتعمل project منفصل لكل tier و هاتحتاج تزود  web services...إلخ, و ده طبعا هايصعب الدنيا أكتر
- فيه ناس هاتقول لك هاتقدر تعيد استخدام الطبقات دي في مشاريع تانية، و الفكرة دي تسمى  reusability ، و دي انساها خالص مش هاتحصل!!! -- هاتيجي بالتفصيل تحت - إن شاء الله -
أمال ليه وجع القلب ده؟!!!
 خلينا ناخد شوية تفاصيل عشان نعرف الفايدة الحقيقية...

ليه بقسم المشروع لطبقات؟ 

 - فيه نظرية عامة في هندسة البرمجيات بتقول: إن أي مشكلة في تصميم البرامج ممكن أحلها بإضافة طبقة غير مباشرة - و الاستثناء الوحيد هو مشكلة كثرة الطبقات غير المباشرة -
 We can solve any problem by introducing an extra level of indirection....except for the problem of too many levels of indirection
ممكن تقرأ المزيد عن النظرية دي هنا

يعني إيه الكلام ده؟

 يعني أنا عندي فورمة عليها شوية كونترولز, و عندي قاعدة بيانات, ليه ما بندهش لقاعدة البيانات على طول و أريح دماغي؟
 عشان أنا عندي مشكلة (أو عدة مشاكل) عاوز أحلها, و بتطبيق النظرية السابقة, رحت عامل طبقة حطيت فيها شغل الداتابيز, و طبقة حطيت فيها البزنس بتاعي, و خليت الطبقة اللي فيها الUI بسيطة لا بتهش و لا بتنش!

طب إيه هي المشاكل اللي عاوز أحلها دي؟

تخيل الآتي:
إني لو جبت الكود بتاع البزنس + الكود بتاع الداتبيز + الكود بتاع الuii و حطيتهم في حتة واحدة, فكر معايا إيه اللي هايحصل في السيناريوهات دي:
1- كنا شغالين ال UI بasp.net web forms بس مايكروسوفت عملت حاجة جديدة اسمها asp.net mvc بس لما جيت أطبقها ما عرفتش عشان الكود ملخفن!
2- البرنامج كان صغير, و كنا شغالبن أكسس, بس لما كبر البرنامج أكسس ما استحملتش, فقلنا لازم نستخدم sql server بس ما عرفناش عشان الكود بتاع الداتابيز معتمد على أكسس, و ملخفن جوة الكود بتاع الui و محتاج تعديلات في حتت كتيرة!
3- البرنامج كان شغال desktop بس برامج الdesktop بقت ميتة, و كل الناس متجهة نحو الويب و الموبايل, و أنا كاتب كود بقالي سنين كتيرة و حرام أرميه, و أبدأه من أول و جديد, و مش عارف أسلكه من الكود الملخفن!
 .. الأمثلة من دي كتيرة, الشاهد: إن فكرة تقسيم البرنامج لطبقات فوق بعض بتديني ميزة تسهيل استبدال أي طبقة (نوعا ما) , و ده اللي بنسميه design for replacement not reusability!
لاحظ هنا إني حصلت على ميزة سهولة الاستبدال replacement مش سهولة إعادة الاستخدام reusability، طب إيه الفرق بينهم؟
الأولى تعني *إعادة الاستخدام* يعني عندي layer هاستخدمها ف حتة تانية
أما الثانية فتعني *الاستبدال*, يعني عندي layer هارميها و أجيب واحدة بدالها
اسأل أي واحد عمل كام مشروع وييب, كام مرة أخد Layer ك dlll و أعاد استخدامها في مشروع تاني؟ عمر ده ما حصل معايا أنا - على الأقل -! أقصى ما فعلته إني أخدت شوية كود copy/paste و استخدمتهم في مشروع تاني. و ده ما اسمهوش reusability ده اسمه code duplication و ده ربما يكون مصدر كل الشرور في البرمجة - كما قال روبرت مارتن!
لكن كام مرة احتجت إني استبدل layer فيها مشاكل بlayer تانية لسبب أو لآخر؟ ده حصل معايا كتيييير. عشان كده بقول لك design for replacement not reusability و ده بيفرق على فكرة في طريقة التصميم نفسها, بس ده مش مجاله دلوقتي.

طب إزاي حصلت على ميزة القابلية للاستبدال دي؟

بإني خليت كل طبقة تعمل حاجة واحدة بس, و بالتالي بقى فيه سبب واحد للعب في الطبقة دي.
مثلا: طبقة الpresentation بقت مش هالعب فيها إلا لما عاوز أعدل حاجة في طريقة العرض أو استبدلها كليا, و ده من غير ما قاعدة البيانات تتأثر. و نفس الكلام بينطبق على الطبيقات التانية.
و ده اللي بنسميه separation of concerns و فيه رواية أخرى  single responsibility principle

مشكلة كمان...

 دلوقتي أنا حاطط كل الطبقات بتاعتي على كمبيوتر واحد, بس أنا خايف إن حد يقدر يخترق الكمبيوتر ده و يحصل على البيانات اللي متسجلة في قاعدة البيانات, إيه الحل؟
Add layers of indirection!
بس الطبقات اللي هاضيفها هنا هاتكون في الحقيقة tiers مش layers بحيث لو أي tier فيهم اتضربت compromised هايكون لسه عندي خطوط دفاع تانية (الtiers اللي ورا)!
و بمجرد ما أقسم البرنامج بتاعي لtiers هاتتفتح لي آفاق تانية مثيرة جدا!!!

نرجع لمشكلة البطء...

زي ما قلنا، إن أول حاجة هاتحصل لما أقسم البرنامج على أكتر من كمبيوتر tiers إن البرنامج هايبطأ شوية!
أيوة... عشان فيه وقت هايستهلك في إن الدااتا تتنقل من كمبيوتر للتاني عبر الشبكة.
 لو الكمبيوترين دول بعاد عن بعض, و عملية نقل البيانات هاتتم عبر الإنترنت فالدنيا هاتكون أبطأ مما لو كانوا على شبكة داخلية.
 بس عشان أتغلب على مشكلات البطء دي محتاج أظبط حتت كتير في البرنامج, يعني مثلا أعتمد كتير على client = javascript و الajax calls و في السيرفر أعتمد كتير على الasync calls و هاحتاج اعمل optimization للdatabase queries و أعمل caching للبيانات...إلخ و في الآخر الدنيا هاتظبط بصورة كبيرة.
بالرغم من إن سرعة البرنامج قلت (نوعا ما) إلا إني لما عملت tiers هاقدر دلوقتي أحسن اللscalability بتاعت البرنامج, وده هايحسن السرعة جدا!
 طب إزاي؟
الscalability قصة طويلة مش ده وقتها, بس خلينا نقول إن معناها قد إيه البرنامج بتاعي هايستحمل زيادة عدد المستخدمين؟
في حالة الكمبيوتر الواحد one tier قدرة البرنامج مرتبط بقدرة الكمبيوتر اللي هو شغال عليه. البرنامج هايفضل شغال معايا كويس و زي الفل لغاية لما عدد المستخدمين يوصل لحد معين بعده البرنامج سرعته هاتقل بصورة دراماتيكية (حلوة دراماتيكية دي :)
و الحل إما تزويد إمكانات الكمبيوتر أو تزويد الكمبيوترات.
فيه كلام هنا كتير web farms - web gardens ...إلخ, بس اختصارا, توزيع البرنامج هايسمح لي إني أعمل optimization لأجزاء معينة في البرنامج تحسن الscalability بتاعته.
طب السرعة هاتزيد و لا هاتقل, آخر كلام؟ :)
لو البرنامج بتاعك صغير و عدد المستخدمين صغير, هاتحس إنها بطيئة.
لو البرنامج بتاعك كبر, و عدد المستخدمين زاد هاتحس إنها أسرع.
 ---
و ده يقودنا إن للهدف الرئيسي من كل الحوارات دي: المرونة!
لما بقسم الدنيا لأجزاء أو طبقات بيبقى عندي المرونة اللي تخيني أعمل optimizations على كل جزء على حدة.
----
و كما يظهر من كل الهري ده, إني زودت الcomplexity بتاعت البرنامج بس في المقابل حصلت على مرونة: مرونة إني أستبدل بأجزاء تعبانة في البرنامج أجزاء كويسة , مرونة إني أظبط أجزاء في البرنامج من غير ما أؤثر على أجزاء تانية..إلخ.
 ---

طيب إمتى أعقد الدنيا كده؟

لازم أختم الكلام بإن إضافة الcomplexity دي على المشروع لازم تكون مبررة, يعني المشروع طول ما هو صغير (يتعمل في أيام) مفيش داعي إني أعمل الفصلة في الكود, أي عجن في الcontroller هايكون كويس
لو المشروع كبر شوية (يتعمل في أسابيع) هنا الlayers هاتكون مناسبة
لو كبر أكتر (يتعمل في شهور) هنا الtiers هاتكون مناسبة أكتر
... و هكذا
 أو بعبارة أخرى: الفائدة من الحوارات دي بتبان لما البرنامج يكبر, لكن طول ما هو صغير فهاتفضل تحس إنه صعبة و رخمة و مالهاش فايدة كبيرة!

ما هي فائدة الORMs؟

مقدمة

كان فيه نقاش على مجموعة Egyptian geeks حول فائدة الORMs. و لقيت إني كتبت كلام كتير، فهاعيد نشره هنا لتعم الفائدة...
و ده رابط النقاش الأصلي للي عاوز يطلع عليه.
طبعا أنا كنت قائل إني هاكتب على البلوج دي بالإنجليزي، بس لقيت نفسي مش بكتب بقالي كتير، و لما نشرت الكلام ده على الفيسبوك بعض الأصدقاء نصحوني إني أنشرها على مدونتي، و عشان كده مكتوبة باللغة العامية. و يارب تكون مفيدة.

أولا: إيه لازمة الORM؟ أو إيه هي المشاكل اللي بيحلها؟


* مشكلة رقم 1: impedance mismatch و دي لو أنت اشتغلت بالتقنيات القديمة (عصر ما قبل الorm) هاتلاقي إنك كنت دايما بتعمل class و تعبي فيها النتائج بتاعت الكويريز بتاعتك، و ده أمر مرهق جدا، الorm عملهولك ببساطة. ميزة إضافية هنا إنك ممكن تربط الclass بتاعتك بكذا جدول، أو جدول واحد يتفك ف كذا class


* مشكلة رقم 2: persistence ignorance و دي تعني إنك فعليا بتتعامل مع الobjects ككobjects بعض النظر هي جاية من داتابيز و لا من ملف، و بغض النظر نوع الداتابيز دي إيه أو الملف ده إيه. دي مفيدة في بعض الحالات زي مثلا إني أغير قاعدة البيانات بتاعتي، و دي طوال سنين خبرتي لم تطلب مني إلا مرة واحدة بس!
* مشكلة رقم 3: كتابة و صيانة أكواد الsqll عملية مش لطيفة خالص، خاصة إن البزنس بتاعك بيبقى موزع على التطبيق و قاعدة البيانات، فالorm بيساعدك في تنفيذ الseparation of concerns و بيديك لغة زي ال sql بس غالبا بتكون statically typed و هذا يعني إنك ممكن تكتشف مشاكل الsyntax من غير من تشغل البرنامج، و فيه pattern مستخدم هنا اسمه object query pattern، و من أجمد التطبيقات للباترن ده: LINQ
* مشكلة رقم 4: الcaching: لما بتجيب حاجة من قاعدة البيانات، ليه تروح تجيبها تاني لما ممكن تحتفظ بيها طول الفترة اللي ممكن تحتاجها فيها؟ و عشان يعملوا الcaching ده فيه pattern بيستخدموه اسمه identity map. فيه مستويين من الcaching: المستوى الأول في الsession الواحدة، و دي موجودة في أغلب الorms. و المستوى التاني فيما بين الsessions المختلفة، و دي موجودة في بعض الorms. المستوى الأول بيسموه 1st level cache و المستوى التاني بيسموه 2nd level cache
* مشكلة رقم 5: أحيانا بيكون بيانات مرتبطة ببعضيها بس موزعة على كذا جدول، في الأول بتجيب الmaster record و بعدين بتكتب كويري منفصل عشان تجيب البيانات المرتبطة بيه من الجداول التانية. الموضوع ده بيبقى رخم لما بيبقى كتير. الorms جواها lazy loading implementation عشان يسهل عليك الحوار ده: البيانات المرتبطة هي عبارة عن reference to object و أول ما بتستعمله الorm بتروح تجيبهولك من غير ما أنت تبذل أي مجهود.
ده لذيذ جدا لغاية لما تخبط في مشكلى اسمها n+1: و دي تعني إنك مثلا عندك 1000 بيان في قاعدة البيانات و كلهم ليهم بيانات مرتبطة بالmaster records. فأنت لما تعوز تجيب البيانات المرتبطة دي بتعمل for loop كل مرة فيها الorm هايروح قاعدة البيانات من غير ما أنت تبقى واخد بالك، و كده أنت فشخت الperformance لما رحت 100 مرة + المرة الأولى اللي جبت فيها قائمة البيانات الأساسية.
و دي برضو الormm عامل لك ليها حل إنك تقدر تجيب البيانات الأساسية بالبيانات المرتبطة بيهم مرة واحدة، و ده بيسمى: eager loading

ثانيا: إيه هي المشاكل اللي بتحصل لما تستخدم الorms؟


* مشكلة رقم 1: سوء الاستخدام!!
- المبرمجين بيتستهلوا فبيعكوا في كتابة أكواد الاستعلام فبيطلع sql query زبالة، فبيأثر على الperformance!
و بسبب سوء الاستخدام ممكن تخبط في مشكلة n+1 اللي ذكرتها في التعليق السابق.
و بسبب سوء الاستخدام ممكن ما تنتبهش إن الorm بيعمل caching للobjects و تحتفظ باللsession لفترة طويلة أنت مش محتاجها (صدق أو لا تصدق: شفت برنامج بيحتاج 170 GB RAM عشان المبرمج كان بيحط الdbcontext في ال session بتاعت كل مستخدم!!!)
و ممكن نقول هنا بصفة عامة الكويريز اللي بتطلع من الOrms ما بتبقاش لطيفة!


* مشكلة رقم 2: التأثير السلبي على performance.عشان كده لازم تبقى فاهم الorm كويس عشان ما تعكش منك.
* مشكلة رقم 3: استخدامه في حالات هناك حلولا أفضل منها: مثال, إنك تستخدم الorm في التقارير، هنا مثلا أنت مش محتاج إن الorm يعمل caching للobjects و عاوزه يبقى سريع جدا، و الorm هايبطأ لك الدنيا

ثالثا: استدراكات لابد منها

1- كل نظام و له ظروفه، و الOrms لما طلعت - فيما أعلم - كانت عشان تحل مشاكل في الenterprise applications
و عشان كده لما موقع زي stackoverflow قابلته مشكلة في الperformance بقى إن الorm بوضعه الحالي مش مناسب، فعمل micro orm بيحل أول مشكلة ذكرتها فوق بس، فالنتيجة: لازال يستخدم صورة من صور الorm عشان توفر عليه شوية مجهود قد يكون كبير (حسب حجم النظام)
2- - لو أنت محتاج *بعض* وظائف الorms ما تعملهاش بنفسك، حتى لو بfunction صغيرة، عشان أنت كده بتعيد اختراع العجلة، خاصة لو فيه عجل كويس و هايسد معاك زي الmicro-orms!

Saturday, February 13, 2016

If working it is, reinvent it you shouldn't!

The Second System Effect

More than 40 years ago, in his influencing book (The Mythical-Man month), Frederick Brooks said about what he called: The Second System Effect:
The second-system effect proposes that, when an architect designs a second system, it is the most dangerous system they will ever design, because they will tend to incorporate all of the additions they originally did not add to the first system due to inherent time constraints. Thus, when embarking on a second system, an engineer should be mindful that they are susceptible to over-engineering it.

Unfortunately, I have learned this lesson the hard way, and used to see developers repeat my fault again and again!
So why do they tend to re-invent every wheel they are working on???

Mess and Myth

Joel Spolsky, one of the influencing guys in the industry (and the co-founder of stackoverflow, the CEO of StackExchange, BTW) answered this question in one of his most popular articles
There's a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:
It’s harder to read code than to write it.
Said Joel!
Legacy code may contain messy code, lengthy methods, too much conditions that you can't figure out why on earth they are there? But the answer is very simple: these are bug fixes developed over years! and when you try to rewrite the application from scratch you are trying to throw away all the knowledge/experience in the code!

Why thinking that re-developing messy code from scratch will lead to better application is simply a myth?
Let's continue with Spolsky...
When programmers say that their code is a holy mess (as they always do), there are three kinds of things that are wrong with it.
  1. First, there are architectural problems. The code is not factored correctly. The networking code is popping up its own dialog boxes from the middle of nowhere; this should have been handled in the UI code. These problems can be solved, one at a time, by carefully moving code, refactoring, changing interfaces. They can be done by one programmer working carefully and checking in his changes all at once, so that nobody else is disrupted. Even fairly major architectural changes can be done without throwing away the code. On the Juno project we spent several months rearchitecting at one point: just moving things around, cleaning them up, creating base classes that made sense, and creating sharp interfaces between the modules. But we did it carefully, with our existing code base, and we didn't introduce new bugs or throw away working code.
  2. A second reason programmers think that their code is a mess is that it is inefficient. The rendering code in Netscape was rumored to be slow. But this only affects a small part of the project, which you can optimize or even rewrite. You don't have to rewrite the whole thing. When optimizing for speed, 1% of the work gets you 99% of the bang.
  3. Third, the code may be doggone ugly. One project I worked on actually had a data type called a FuckedString. Another project had started out using the convention of starting member variables with an underscore, but later switched to the more standard "m_". So half the functions started with "_" and half with "m_", which looked ugly. Frankly, this is the kind of thing you solve in five minutes with a macro in Emacs, not by starting from scratch.

Resume Driven Development

I have noticed another hidden reason why developers tend to refactor working software? They want to try a new technology to write it in their resumes! even if it is not mature enough to be used in production code!
This behavior becomes more dangerous when they want to try these immature technologies in a working software! Which is one of the main reasons of software failures!

Exceptional cases

This post is aimed mainly at reinventing working/legacy applications from scratch. If it works, you shouldn't replace it, rather refactor pain parts gradually. Sure, there will be some cases that re-inventing the legacy application is the correct decision, but most of the times it is not, rather it will cost you a lot.
There are some exceptional cases that reinventing the wheel is worth doing (again, unless you are working on a working application):
  • You may reinvent parts of a working application not the whole application.
  • You may follow DIY principle (Do It Yourself) if alternatives are not feasible, for example your application depends on an expensive library that your budget can't afford, they you go for building it yourself.
  • When you are starting a new application (not refactoring an existing one)
  • When you want to learn something new.

Jeff Atwood, the other co-founder of stackoverflow, wrote a nice article about such exceptional cases, check it!



Tuesday, February 9, 2016

Software development status quo - 1: general insights (as of February 2016)

In this post I will select some results from surveys/studies/reports/statistics about the state of software development.
The intent of this post is to help developers understanding what is happening around them, and what are technology trends so that they can adapt and keep competitive.
For shared or different results between resources, I selected a single resource as a reference and let other results to readers to find differences themselves.

General

Popular Programming languages[1]


* Note: Have a look at this other article which summarizes findings from different important resources in a single post.

Most Loved Technologies[2]

Most Hated Technologies [2]

Most Wanted Technologies [2]

Web Development

Server-Side 

Server-Side Technologies[3]

Web Servers [3]


Client-Side

Client-Side Technologies[3]

Javascript Libraries[3]

Text Editors [2]

* Note: Text editors not IDEs

Text editors by occupation [2]

Source Control [2]

Mobile Development

Operating Systems Market Shares [5]

Mobile Application Markets [5]

Mobile Developers [2]


Agile Methodologies [4]







[1] IEEE Spectrum: The 2015 Top 10 Programming Languages
[2] Stackoverflow holds an annual survey about the state of software development. These results as per 2015 survey.
[3] W3Techs - World Wide Web Technology Surveys, as per February 2016
[4] VersionOne - 9th annual State Of Agile Survey, 2015
[5] MobiForge - Mobile Software Statistics, 2015


Sunday, January 31, 2016

What is the role of software architect?

AFAIK, there is no consensus on the roles or activities every software architect should do! Even different levels of architect titles (Application Architect, Solutions Architect, Enterprise Architect) has no formal job description and I saw them used interchangeably.
All architect's roles are determined by the organization she is working for!

In this post I will try to describe the most important duties that I believe an architect should be concerned about. Most of what you will read in this post are arguable opinions, but here is how I see it...

Eisenhower Decision Matrix

[Source: Wikipedia]
I am a big fan of Stephen R. Covey and his awesome book: The 7 habits of highly effective people.
One of the concepts that was popularized by this book is Eisenhower decision matrix, where duties' priorities are divided into 4 categories:

  1. Urgent and important
  2. Not urgent and important
  3. Not important and urgent
  4. Not important and not urgent.
If you want to achieve outstanding results, then you should focus in the second quadrant, in the important but not urgent duties, at which I believe most software architect duties fall in!

The Software Architect Role

Now, the role definition IMHO:
A Software Architect is a chief programmer who is the owner of every technical strategic decision in a given organization, by making or taking such decisions.
The previous definition may seem a little bit weird, but I will illustrate it in detail in the following words...

Technical Decisions

The architect can analyze alternatives to make the decision whether it is better to buy or build a software solution.
If he sees that buying a solution is better, he should set selection criteria, analyze alternatives that match these criteria, and finally select the best fit from different vendors.
If he sees that building the solution in house is better, he should select technology stack, and do other related duties mentioned later in this post.
Managerial decisions like setting the budget, assigning resources, deciding the scope, ..etc, are out of architect responsibilities, although she may participate in them sometimes.

Making or taking decisions

Most of the time, the architect will take technical decisions like the ones mentioned in the previous example, but sometimes his recommendations may contradict with some business or management constraints, for example, he decided that a given tool is very important for the organization, but it is a very expensive tool that the project budget cannot afford, then the project manager will reject the architect's recommendation, and the architect have to adapt to the new constraints by looking for open source alternative for example.
In such cases, the architect is making the decision not taking it.

Strategic decisions

An architect is concerned mainly about strategic technical decisions rather that tactical decisions. By strategic decisions I mean expensive decisions that cost weeks to months to change in future, like selecting technology stack, setting the application architecture, what parts of the solution should be abstracted for future replacement?, ..etc.
By tactical decisions I mean cheap decisions that cost few hours to few days to change in future, like refactoring a given class, deciding to use certain design pattern, ..etc. An architect may participate in such decisions sometimes, but this is not his main concern.

Architecture owner

I have seen brilliant solutions come out from the least experienced developers. So, don't underestimate your team skills; whatever experience the architect may have, there are many many gaps in her knowledge!
I like to be called "Architecture Owner" instead of "Architect", the architecture owner facilitates the emergence of the architecture instead of forcing it. Architecture contributions are welcomed from all team member, but the architecture owner is responsible of accepting or rejecting them.
The Architecture owner's concerns are not limited to greenfield applications, but she should inspect brownfield applications as well, find design flaws, and lead the refactoring processes.

Quality attributes and the second quadrant

As I said before, most of the architect duties lie in the second quadrant of Eisenhower Decision Matrix, that is: the important but not urgent tasks.
Among these important but underestimated software features: quality attributes, or non-functional requirements, like: Conceptual Integrity, Maintainability, Re-usability, Portability, Security, Performance, Scalability ..etc.
Quality attributes is a huge topic that deserve a separate article, or more!
Specifying the coding standards belongs to this category of duties as well.

Should architect write code?

Definitely!
I know the answer to this question is debatable, but I hate talking from ivory towers and proposing infeasible solutions!
The architect should get his hands dirty to be able to decide which technology/solution is feasible and which is not.
From the other hand, I believe the competitive advantage of architects come from their technical expertise and their hands on experience, once an architect start to write code less, his competitive advantage will fade!
To be fair at closing this important point, the architect should not spend most of her time writing code, I would say she should sometimes write code, not always. For example, to develop a Proof of Concept (POC) for a certain idea, to develop a solution for a challenging problem, to develop a tough part of the system end-to-end..etc. That is why I consider her a chief programmer!

Process owner

Every project has its characteristics, and there is no single process that fits all projects.
One of the architect's roles is deciding the best process/methodology that fits the project.
In agile environments, the architect may be seen as agile coach, or coach of coaches!

Other duties

The architect role intersects with other roles, he can sometimes do some secondary duties that are assigned to someone else in the team.
An Architect is a leader, he may review developers' code, he may teach them, he may pair with them,...etc.
An architect is a designer, he may do some detailed design, and refactor some parts of the system.
An architect may interface with higher managers, and with customers.
The architect may do some managerial tasks like following-up project statuses, assign resources, ..etc.
I have created a dumb shape to illustrate where the architect's role fall between other roles in the team.


A final word: The architect in agile environments

An architect in agile environment will be concerned about strategic decisions upfront, and then be the architecture owner who is responsible of emerging the architecture incrementally.

Now, Your turn

I think this article is important and what I've said should not be by any means complete, and I hope from readers to enrich this post by sharing their thought in comments.
   

Want to know more?

I think the following books may help: